Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues caused by not shutting down the contained entity #44

Closed
YuanYuYuan opened this issue Nov 25, 2024 · 2 comments · Fixed by ros2/rmw_zenoh#333
Closed

Issues caused by not shutting down the contained entity #44

YuanYuYuan opened this issue Nov 25, 2024 · 2 comments · Fixed by ros2/rmw_zenoh#333

Comments

@YuanYuYuan
Copy link
Collaborator

YuanYuYuan commented Nov 25, 2024

Several failures have been found since ros2#313 was introduced.

  1. iRobot test begins hanging
  2. github.com/ZettaScaleLabs/ros2-simple-performance begins hanging

The workaround is using this branch https://github.com/ZettaScaleLabs/rmw_zenoh/tree/dev/1.0.0-revert-not-shutdown-contained-entities with the buggy commit reverted

@YuanYuYuan
Copy link
Collaborator Author

YuanYuYuan commented Nov 28, 2024

Other known issues

  • rclcpp/test_allocator_memory_strategy (Timeout)

And several failures were found in test_communication.

Root cause

Taking the rclcpp/test_allocator_memory_strategy as the sample, we observe a segfault in this pattern: z_close rmw_context session -> ze_undecalre_publication_cache hangs.

The ideal path should bez_drop rmw_context session -> z_drop publication_cache -> z_drop PublisherData session -> z_close session (if PublisherData is the last holder.

  • This only happens on 1.0 since Session::close does not completely undeclare all the underlying zenoh entities in 0.11.
  • This only happens on PublicationCache since we forgot to "skip" the undeclaration if the session has been closed. Tracked issue: PublicationCache should skip undeclare after the session is closed eclipse-zenoh/zenoh#1621
  • Even if a segmentation fault does not occur, invoking any session-dependent API after the session has been closed is inappropriate.

Solution

According to ros2#313, rmw entities are possible to run after rmw context is shut down. Conceptually, we ensure the zenoh session stay alive as long as any rmw entities using the zenoh session. We can wrap the zenoh session with a std::shared_ptr. Here's the branch for the implementation. https://github.com/ZettaScaleLabs/rmw_zenoh/tree/wip/session-clone

Evaluation

Here's a detailed comparison. The change has been shown to fix many errors in dev/1.0.0.

ros2#276 (comment)

@YuanYuYuan
Copy link
Collaborator Author

Closed by ros2#333

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant