Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] RL env countered crash PhysX error while the cup hit the table #1460

Open
cidxb opened this issue Nov 25, 2024 · 2 comments
Open

Comments

@cidxb
Copy link

cidxb commented Nov 25, 2024

This is the error log

0%|                                                                                                                                                                                         | 10/32000 [00:00<22:27, 23.74it/s]

2024-11-25 09:13:08 [11,526ms] [Error] [omni.physx.plugin] PhysX error: Synchronizing GPU Narrowphase failed! 700
, FILE /builds/omniverse/physics/physx/source/gpunarrowphase/src/PxgNarrowphaseCore.cpp, LINE 1259
2024-11-25 09:13:08 [11,526ms] [Error] [omni.physx.plugin] Cuda context manager error, simulation will be stopped and new cuda context manager will be created.
2024-11-25 09:13:08 [11,526ms] [Error] [omni.physx.plugin] PhysX error: Fetching GPU Narrowphase failed! 700
, FILE /builds/omniverse/physics/physx/source/gpunarrowphase/src/PxgNarrowphaseCore.cpp, LINE 1367
2024-11-25 09:13:08 [11,526ms] [Error] [omni.physx.plugin] Cuda context manager error, simulation will be stopped and new cuda context manager will be created.
2024-11-25 09:13:08 [11,526ms] [Error] [omni.physx.plugin] PhysX error: SynchronizeStreams cuEventRecord failed with error 700
, FILE /builds/omniverse/physics/physx/source/gpucommon/include/PxgCudaUtils.h, LINE 57
2024-11-25 09:13:08 [11,526ms] [Error] [omni.physx.plugin] Cuda context manager error, simulation will be stopped and new cuda context manager will be created.
2024-11-25 09:13:08 [11,526ms] [Error] [omni.physx.plugin] PhysX error: SynchronizeStreams cuStreamWaitEvent failed with error 700

=========There are some repeat error ===============================================


2024-11-25 09:13:08 [11,738ms] [Error] [omni.physx.plugin] Cuda context manager error, simulation will be stopped and new cuda context manager will be created.
2024-11-25 09:13:08 [11,739ms] [Error] [omni.physx.tensors.plugin] CUDA error: an illegal memory access was encountered: ../../../extensions/runtime/source/omni.physx.tensors/plugins/gpu/GpuArticulationView.cpp: 650
2024-11-25 09:13:08 [11,739ms] [Error] [omni.physx.tensors.plugin] CUDA error: an illegal memory access was encountered: ../../../extensions/runtime/source/omni.physx.tensors/plugins/gpu/CudaKernels.cu: 382
2024-11-25 09:13:08 [11,739ms] [Error] [omni.physx.tensors.plugin] Failed to fetch DOF velocity attribute
  0%|                                                                                                                                                                                         | 19/32000 [00:00<26:31, 20.09it/s]
Error executing job with overrides: []
Traceback (most recent call last):
  File "/home/xxx/workspace/isaaclab_px/source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/utils/hydra.py", line 91, in hydra_main
    func(env_cfg, agent_cfg, *args, **kwargs)
  File "/home/xxx/workspace/isaaclab_px/source/standalone/workflows/skrl/train.py", line 178, in main
    runner.run()
  File "/home/xxx/anaconda3/envs/RL/lib/python3.10/site-packages/skrl/utils/runner/torch/runner.py", line 376, in run
    self._trainer.train()
  File "/home/xxx/anaconda3/envs/RL/lib/python3.10/site-packages/skrl/trainers/torch/sequential.py", line 81, in train
    self.single_agent_train()
  File "/home/xxx/anaconda3/envs/RL/lib/python3.10/site-packages/skrl/trainers/torch/base.py", line 182, in single_agent_train
    next_states, rewards, terminated, truncated, infos = self.env.step(actions)
  File "/home/xxx/anaconda3/envs/RL/lib/python3.10/site-packages/skrl/envs/wrappers/torch/isaaclab_envs.py", line 63, in step
    self._observations, reward, terminated, truncated, self._info = self._env.step(actions)
  File "/home/xxx/anaconda3/envs/RL/lib/python3.10/site-packages/gymnasium/wrappers/order_enforcing.py", line 56, in step
    return self.env.step(action)
  File "/home/xxx/workspace/isaaclab_px/source/extensions/omni.isaac.lab/omni/isaac/lab/envs/direct_rl_env.py", line 332, in step
    self.scene.update(dt=self.physics_dt)
  File "/home/xxx/workspace/isaaclab_px/source/extensions/omni.isaac.lab/omni/isaac/lab/scene/interactive_scene.py", line 374, in update
    articulation.update(dt)
  File "/home/xxx/workspace/isaaclab_px/source/extensions/omni.isaac.lab/omni/isaac/lab/assets/articulation/articulation.py", line 202, in update
    self._data.update(dt)
  File "/home/xxx/workspace/isaaclab_px/source/extensions/omni.isaac.lab/omni/isaac/lab/assets/articulation/articulation_data.py", line 78, in update
    self.joint_acc
  File "/home/xxx/workspace/isaaclab_px/source/extensions/omni.isaac.lab/omni/isaac/lab/assets/articulation/articulation_data.py", line 350, in joint_acc
    self._joint_acc.data = (self.joint_vel - self._previous_joint_vel) / time_elapsed
  File "/home/xxx/workspace/isaaclab_px/source/extensions/omni.isaac.lab/omni/isaac/lab/assets/articulation/articulation_data.py", line 340, in joint_vel
    self._joint_vel.data = self._root_physx_view.get_dof_velocities()
  File "/home/xxx/anaconda3/envs/RL/lib/python3.10/site-packages/isaacsim/extsPhysics/omni.physics.tensors/omni/physics/tensors/impl/api.py", line 446, in get_dof_velocities
    raise Exception("Failed to get DOF velocities from backend")
Exception: Failed to get DOF velocities from backend

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
2024-11-25 09:13:08 [11,774ms] [Error] [omni.physx.fabric.plugin] CUDA error: an illegal memory access was encountered: ../../../extensions/runtime/source/omni.physx.fabric/plugins/DirectGpuHelper.cpp: 328
2024-11-25 09:13:08 [11,774ms] [Error] [omni.physx.fabric.plugin] CUDA error: an illegal memory access was encountered: ../../../extensions/runtime/source/omni.physx.fabric/plugins/DirectGpuHelper.cpp: 331
2024-11-25 09:13:08 [11,774ms] [Error] [omni.physx.fabric.plugin] CUDA error: an illegal memory access was encountered: ../../../extensions/runtime/source/omni.physx.fabric/plugins/DirectGpuHelper.cpp: 334
2024-11-25 09:13:08 [11,774ms] [Error] [omni.physx.fabric.plugin] CUDA error: an illegal memory access was encountered: ../../../extensions/runtime/source/omni.physx.fabric/plugins/DirectGpuHelper.cpp: 337
2024-11-25 09:13:08 [11,774ms] [Error] [omni.physx.fabric.plugin] CUDA error: an illegal memory access was encountered: ../../../extensions/runtime/source/omni.physx.fabric/plugins/DirectGpuHelper.cpp: 340
2024-11-25 09:13:08 [11,775ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-11-25 09:13:08 [11,776ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-11-25 09:13:08 [11,776ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-11-25 09:13:08 [11,776ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-11-25 09:13:08 [11,776ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-11-25 09:13:08 [11,776ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-11-25 09:13:08 [11,776ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-11-25 09:13:08 [11,776ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-11-25 09:13:08 [11,776ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-11-25 09:13:08 [11,776ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-11-25 09:13:08 [11,776ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-11-25 09:13:08 [11,776ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-11-25 09:13:08 [11,776ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Core' for removal
2024-11-25 09:13:08 [11,778ms] [Warning] [omni.physx.plugin] PhysX warning: 

========================repeat the following errror 

/builds/omniverse/physics/physx/source/gpucommon/src/PxgCudaMemoryAllocator.cpp, FILE /builds/omniverse/physics/physx/source/gpucommon/src/PxgCudaMemoryAllocator.cpp, LINE 167
2024-11-25 09:13:08 [11,778ms] [Warning] [omni.physx.plugin] PhysX warning: /builds/omniverse/physics/physx/source/gpucommon/src/PxgCudaMemoryAllocator.cpp, FILE /builds/omniverse/physics/physx/source/gpucommon/src/PxgCudaMemoryAllocator.cpp, LINE 167


2024-11-25 09:13:08 [11,784ms] [Error] [omni.physx.plugin] PhysX error: Failed to unload CUDA module data, returned 700., FILE /builds/omniverse/physics/physx/source/cudamanager/src/CudaContextManager.cpp, LINE 817
2024-11-25 09:13:08 [11,846ms] [Warning] [carb] Recursive unloadAllPlugins() detected!

System Info

  • Commit:
  • Isaac Sim Version:4.2.02
  • OS: Ubuntu 22.04
  • GPU: RTX 4070
  • CUDA:12.4
  • GPU Driver: 550.120

Additional context

The problem took places when I testing my reinforcement learning environment with the skrl wrapped train script , in which I will spawn a cup and fall down to the table. The cup is a RigidObjectCfg object , and table is AssetBaseCfg.

I have opened the stream ,so I can see the crash happens while the cup hits the table. Does any one counter that or know how to fix it ?
THX!

Checklist

  • [ x ] I have checked that there is no similar issue in the repo (required)
  • [ x ] I have checked that the issue is not in running Isaac Sim itself and is related to the repo
@cidxb
Copy link
Author

cidxb commented Nov 25, 2024

Seems to be the problem of table? After i removed it, these problems disappear, but in my other script use the same objects are fine, the cup can just simply drop on the table

@kellyguo11
Copy link
Contributor

Hi there, unfortunately the error messages are generic GPU errors coming from the physics simulation. It'll be difficult to narrow down the issue without a repro script. In the meantime, you can try increasing the gpu_* buffer dimensions in PhysxCfg and see if that helps - https://isaac-sim.github.io/IsaacLab/main/source/api/lab/omni.isaac.lab.sim.html#omni.isaac.lab.sim.PhysxCfg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@kellyguo11 @cidxb and others