Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Populate error_msg for all action result messages (#4341) #4460

Open
wants to merge 48 commits into
base: main
Choose a base branch
from

Conversation

aosmw
Copy link
Contributor

@aosmw aosmw commented Jun 22, 2024


Basic Info

Info Please fill out this column
Ticket(s) this addresses (#4341)
Primary OS tested on Ubuntu
Robotic platform tested on none
Does this PR contain AI generated software? No

Description of contribution in a few bullet points

First pass at populating error_msg for all action result messages.

Description of documentation updates required from your changes

Adds some new error codes to FollowWaypoints + FollowGPSWaypoints

  • NO_WAYPOINTS_GIVEN=602
  • STOP_ON_MISSED_WAYPOINT=603

Future work that may be required in bullet points

For Maintainers:

  • Check that any new parameters added are updated in docs.nav2.org
  • Check that any significant change is added to the migration guide
  • Check that any new features OR changes to existing behaviors are reflected in the tuning guide
  • Check that any new functions have Doxygen added
  • Check that any new features have test coverage
  • Check that any new plugins is added to the plugins page
  • If BT Node, Additionally: add to BT's XML index of nodes for groot, BT package's readme table, and BT library lists

@aosmw aosmw marked this pull request as draft June 22, 2024 12:11
@aosmw aosmw force-pushed the feature/mw/4341_error_msg_2 branch 2 times, most recently from d3be8c2 to da5e15a Compare June 22, 2024 12:31
@SteveMacenski
Copy link
Member

Let me know if you have any questions - but seems like we're covering that in the issue thread instead

@aosmw aosmw force-pushed the feature/mw/4341_error_msg_2 branch 2 times, most recently from 756982f to cdccbdd Compare June 26, 2024 07:39
Copy link

codecov bot commented Jun 26, 2024

Codecov Report

Attention: Patch coverage is 64.64646% with 105 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...avigator/src/navigators/navigate_through_poses.cpp 30.76% 18 Missing ⚠️
...2_bt_navigator/src/navigators/navigate_to_pose.cpp 29.16% 17 Missing ⚠️
nav2_controller/src/controller_server.cpp 35.00% 13 Missing ⚠️
nav2_waypoint_follower/src/waypoint_follower.cpp 40.90% 13 Missing ⚠️
nav2_behaviors/plugins/spin.cpp 40.00% 9 Missing ⚠️
...clude/nav2_behavior_tree/bt_action_server_impl.hpp 75.00% 7 Missing ⚠️
nav2_behaviors/plugins/assisted_teleop.cpp 33.33% 6 Missing ⚠️
...nclude/nav2_behaviors/plugins/drive_on_heading.hpp 66.66% 3 Missing ⚠️
nav2_behaviors/plugins/back_up.cpp 57.14% 3 Missing ⚠️
...e/plugins/action/navigate_through_poses_action.cpp 33.33% 2 Missing ⚠️
... and 10 more
Files with missing lines Coverage Δ
...ee/include/nav2_behavior_tree/bt_action_server.hpp 83.33% <ø> (ø)
...ior_tree/plugins/action/assisted_teleop_action.hpp 100.00% <100.00%> (ø)
...v2_behavior_tree/plugins/action/back_up_action.hpp 100.00% <100.00%> (ø)
...ugins/action/compute_path_through_poses_action.hpp 100.00% <100.00%> (ø)
...ree/plugins/action/compute_path_to_pose_action.hpp 100.00% <100.00%> (ø)
...or_tree/plugins/action/drive_on_heading_action.hpp 100.00% <100.00%> (ø)
...ehavior_tree/plugins/action/follow_path_action.hpp 100.00% <100.00%> (ø)
...e/plugins/action/navigate_through_poses_action.hpp 100.00% <100.00%> (ø)
...or_tree/plugins/action/navigate_to_pose_action.hpp 100.00% <100.00%> (ø)
...ehavior_tree/plugins/action/smooth_path_action.hpp 100.00% <100.00%> (ø)
... and 32 more

... and 1 file with indirect coverage changes

@aosmw aosmw force-pushed the feature/mw/4341_error_msg_2 branch 2 times, most recently from bd49b55 to 54f0402 Compare June 27, 2024 04:51
Copy link
Member

@SteveMacenski SteveMacenski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay here - it is still draft so it was lower on my priority list while getting Jazzy out. It seems generally complete, but there seems to be a couple of things missing. When might this be ready for a full review?

See below for an initial review of things that jumped out

@aosmw
Copy link
Contributor Author

aosmw commented Jul 10, 2024

I can rip through these fixups today.
Its still on a feature branch for me, ie not yet tested in the wild, hence still in draft.

@aosmw aosmw force-pushed the feature/mw/4341_error_msg_2 branch 2 times, most recently from 2fdbaea to 19c5b10 Compare July 10, 2024 06:36
Copy link
Contributor

mergify bot commented Jul 10, 2024

@aosmw, your PR has failed to build. Please check CI outputs and resolve issues.
You may need to rebase or pull in main due to API changes (or your contribution genuinely fails).

@SteveMacenski
Copy link
Member

Builds seem to fail from your additions of the error messages:


In file included from /opt/overlay_ws/src/navigation2/nav2_docking/opennav_docking_bt/src/undock_robot.cpp:18:
/opt/overlay_ws/src/navigation2/nav2_docking/opennav_docking_bt/include/opennav_docking_bt/undock_robot.hpp: In static member function ‘static BT::PortsList opennav_docking_bt::UndockRobotAction::providedPorts()’:
/opt/overlay_ws/src/navigation2/nav2_docking/opennav_docking_bt/include/opennav_docking_bt/undock_robot.hpp:93:24: error: ‘error_msg’ was not declared in this scope; did you mean ‘error_t’?
   93 |         BT::OutputPort<error_msg>(
      |                        ^~~~~~~~~
      |                        error_t
/opt/overlay_ws/src/navigation2/nav2_docking/opennav_docking_bt/include/opennav_docking_bt/undock_robot.hpp:93:34: error: no matching function for call to ‘OutputPort<<expression error> >(const char [10], const char [14])’
   93 |         BT::OutputPort<error_msg>(
      |         ~~~~~~~~~~~~~~~~~~~~~~~~~^
   94 |           "error_msg", "Error message"),
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /opt/ros/rolling/include/behaviortree_cpp/tree_node.h:21,
                 from /opt/ros/rolling/include/behaviortree_cpp/leaf_node.h:17,
                 from /opt/ros/rolling/include/behaviortree_cpp/action_node.h:22,
                 from /opt/overlay_ws/install/nav2_behavior_tree/include/nav2_behavior_tree/bt_action_node.hpp:22,
                 from /opt/overlay_ws/src/navigation2/nav2_docking/opennav_docking_bt/include/opennav_docking_bt/undock_robot.hpp:24:
/opt/ros/rolling/include/behaviortree_cpp/basic_types.h:482:1: note: candidate: ‘template<class T> std::pair<std::__cxx11::basic_string<char>, BT::PortInfo> BT::OutputPort(StringView, StringView)’
  482 | OutputPort(StringView name, StringView description = {})
      | ^~~~~~~~~~
/opt/ros/rolling/include/behaviortree_cpp/basic_types.h:482:1: note:   template argument deduction/substitution failed:
/opt/overlay_ws/src/navigation2/nav2_docking/opennav_docking_bt/include/opennav_docking_bt/undock_robot.hpp:93:34: error: template argument 1 is invalid
   93 |         BT::OutputPort<error_msg>(
      |         ~~~~~~~~~~~~~~~~~~~~~~~~~^
   94 |           "error_msg", "Error message"),
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/ros/rolling/include/behaviortree_cpp/basic_types.h:569:55: note: candidate: ‘template<class T> std::pair<std::__cxx11::basic_string<char>, BT::PortInfo> BT::OutputPort(StringView, StringView, StringView)’
  569 | [[nodiscard]] inline std::pair<std::string, PortInfo> OutputPort(StringView name,
      |                                                       ^~~~~~~~~~
/opt/ros/rolling/include/behaviortree_cpp/basic_types.h:569:55: note:   candidate expects 3 arguments, 2 provided
/opt/overlay_ws/src/navigation2/nav2_docking/opennav_docking_bt/include/opennav_docking_bt/undock_robot.hpp:82:30: error: cannot convert ‘<brace-enclosed initializer list>’ to ‘BT::PortsList’ {aka ‘std::unordered_map<std::__cxx11::basic_string<char>, BT::PortInfo>’}
   82 |     return providedBasicPorts(
      |            ~~~~~~~~~~~~~~~~~~^
   83 |       {
      |       ~                       
   84 |         BT::InputPort<std::string>(
      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~
   85 |           "dock_type", "The dock plugin type, if not previous instance used for docking"),
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   86 |         BT::InputPort<float>(
      |         ~~~~~~~~~~~~~~~~~~~~~ 
   87 |           "max_undocking_time", 30.0, "Maximum time to get back to the staging pose"),
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   88 | 
      |                               
   89 |         BT::OutputPort<ActionResult::_success_type>(
      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   90 |           "success", "If the action was successful"),
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   91 |         BT::OutputPort<ActionResult::_error_code_type>(
      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   92 |           "error_code_id", "Error code"),
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   93 |         BT::OutputPort<error_msg>(
      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~
   94 |           "error_msg", "Error message"),
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   95 |       });
      |       ~~                      
/opt/overlay_ws/install/nav2_behavior_tree/include/nav2_behavior_tree/bt_action_node.hpp:118:57: note:   initializing argument 1 of ‘static BT::PortsList nav2_behavior_tree::BtActionNode<ActionT>::providedBasicPorts(BT::PortsList) [with ActionT = nav2_msgs::action::UndockRobot; BT::PortsList = std::unordered_map<std::__cxx11::basic_string<char>, BT::PortInfo>]’
  118 |   static BT::PortsList providedBasicPorts(BT::PortsList addition)
      |                                           ~~~~~~~~~~~~~~^~~~~~~~
gmake[2]: *** [CMakeFiles/opennav_undock_action_bt_node.dir/build.make:76: CMakeFiles/opennav_undock_action_bt_node.dir/src/undock_robot.cpp.o] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:202: CMakeFiles/opennav_undock_action_bt_node.dir/all] Error 2
gmake: *** [Makefile:146: all] Error 2

Its still on a feature branch for me, ie not yet tested in the wild, hence still in draft.

Sounds good! In that case, I can give it a full review, I got through about ~half of it yesterday and I can do the other half right now.

@aosmw aosmw force-pushed the feature/mw/4341_error_msg_2 branch 2 times, most recently from 5ace62c to 135bbb5 Compare July 12, 2024 00:04
@aosmw aosmw force-pushed the feature/mw/4341_error_msg_2 branch 2 times, most recently from 5a1e7aa to 0461cfc Compare August 2, 2024 05:04
@aosmw
Copy link
Contributor Author

aosmw commented Aug 2, 2024

Status Report: I am still dog fooding this in my own backport to humble. I am successfully propagating useful error messages to our navigator action clients.

Tip for any zenoh users (zenoh-bridge-ros2dds 0.11.0-stable), default actions timeout after 5 seconds and you won't get a result, here is a snippet of zenoh config that may help.

        queries_timeout: {                                                                                                                                                                                                                                                                                               
          transient_local_subscribers: 3.0,                                                                                                                                                                                                                                                                              
          services: [                                                                                                                                                                                                                                                                                                    
            ".*=3.0",                                                                                                                                                                                                                                                                                                    
          ],                                                                                                                                                                                                                                                                                                             
          actions: {                                                                                                                                                                                                                                                                                                     
            send_goal: 1.0,                                                                                                                                                                                                                                                                                              
            cancel_goal: 1.0,                                                                                                                                                                                                                                                                                            
            get_result: [                                                                                                                                                                                                                                                                                                
              ".*/navigate_to_pose=3600",                                                                                                                                                                                                                                                                                
              ".*/navigate_through_poses=3600",                                                                                                                                                                                                                                                                          
              ".*/navigate_complete_coverage=3600",                                                                                                                                                                                                                                                                      
              ".*/follow_waypoints=3600",                                                                                                                                                                                                                                                                                
              ".*/follow_path=3600",                                                                                                                                                                                                                                                                                     
              ".*/compute_path_to_pose=30",                                                                                                                                                                                                                                                                              
              ".*/compute_path_through_poses=30",                                                                                                                                                                                                                                                                        
              ".*/compute_coverage_path=30",                                                                                                                                                                                                                                                                             
              ".*/smooth_path=15",                                                                                                                                                                                                                                                                                       
              ".*/backup=15",                                                                                                                                                                                                                                                                                            
              ".*/assisted_teleop=3600",                                                                                                                                                                                                                                                                                 
            ]                                                                                                                                                                                                                                                                                                            
          }  

I still need to hunt down the exception handling triggering [bt_navigator]: Action server failed while executing action callback: "basic_string: construction from null is not valid" showing up in test_waypoint_follower

    [tester_node-18] [INFO] [1722577025.528186847] [nav2_waypoint_tester]: Sending goal request...
    [waypoint_follower-13] [INFO] [1722577025.528650856] [waypoint_follower]: Received follow waypoint request with 1 waypoints.
    [bt_navigator-12] [INFO] [1722577025.529077965] [bt_navigator]: Begin navigating from current location (1.37, -0.48) to (100.00, 100.00)
    [planner_server-10] [WARN] [1722577025.529770042] [planner_server]: GridBasedplugin failed to plan from (1.37, -0.48) to (100.00, 100.00): "Goal Coordinates of(100.000000, 100.000000) was outside bounds"
    [planner_server-10] [WARN] [1722577025.529823778] [planner_server]: [compute_path_to_pose] [ActionServer] Aborting handle.
    [tester_node-18] [INFO] [1722577025.533000946] [nav2_waypoint_tester]: Goal accepted
    [tester_node-18] [INFO] [1722577025.533496865] [nav2_waypoint_tester]: Waiting for 'follow_waypoints' action to complete
    [tester_node-18] [INFO] [1722577025.536710890] [nav2_waypoint_tester]: Received amcl_pose
    [bt_navigator-12] [ERROR] [1722577025.559278904] [bt_navigator]: Goal failed
    [bt_navigator-12] [WARN] [1722577025.559374968] [bt_navigator]: [navigate_to_pose] [ActionServer] Aborting handle.
    [bt_navigator-12] [ERROR] [1722577025.559574641] [bt_navigator]: Action server failed while executing action callback: "basic_string: construction from null is not valid"
    [waypoint_follower-13] [INFO] [1722577025.978807964] [waypoint_follower]: Failed to process waypoint 0, moving to next.
    [waypoint_follower-13] [INFO] [1722577025.978852044] [waypoint_follower]: Completed all 1 waypoints requested.
    [tester_node-18] [INFO] [1722577025.981444559] [nav2_waypoint_tester]: Goal failed to process all waypoints, missed 1 wps.
    [tester_node-18] Traceback (most recent call last):
    [tester_node-18]   File "/opt/overlay_ws/src/navigation2/nav2_system_tests/src/waypoint_follower/tester.py", line 278, in <module>
    [tester_node-18]     main()
    [tester_node-18]   File "/opt/overlay_ws/src/navigation2/nav2_system_tests/src/waypoint_follower/tester.py", line 241, in main
    [tester_node-18]     test.action_result.missed_waypoints[0].error_code
    ```

@SteveMacenski
Copy link
Member

"basic_string: construction from null is not valid"

That's a classic decoding issue in the BT. I'm guessing you have the wrong type set for the input/output port. All the blackboard and port objects gets stringified to store, so its having a problem with that process. I'd look into the types and make sure that (A) they're types that have conversion functions implemented (or basic types) and (B) you're both setting the port correctly and setting its value to the right type in the BT node. I think we ran into an issue with a couple being wrong, so maybe you just missed one more.

Exciting update! Let me know how the dogfooding goes!

@aosmw aosmw force-pushed the feature/mw/4341_error_msg_2 branch from 0461cfc to bcb0638 Compare August 10, 2024 11:33
@aosmw
Copy link
Contributor Author

aosmw commented Aug 10, 2024

Finally found the "basic_string: construction from null is not valid" - details of how in the commit message bcb0638

@SteveMacenski
Copy link
Member

SteveMacenski commented Aug 15, 2024

Woof, I see that error, that's an annoying one to find. I'm glad you got that unblocked - copy paste errors can be the worst!

Let me know if I'm blocking you :-)

@aosmw
Copy link
Contributor Author

aosmw commented Aug 20, 2024

You are not blocking me. I am STILL hunting down what I think is a cancellation that gets reported by the navigator action server as an abort. The result error_code = 0 and error_msg is empty.

Use SFINAE to log get_error_details_if_availble() in ActionT::Result

Ideally all Action messages would have error_code and error_msg
fields in their Result.  BUT the tests use standard messages that
do not and will not.  i.e. test_msgs/action/Fibonacci.action

Some Action messages only have error_code and not error_msg

Signed-off-by: Mike Wake <[email protected]>
Key point is only use the internal_error_(code|msg)_
if the action result error_code is 0.

Signed-off-by: Mike Wake <[email protected]>
…on#4341)

Clear output path for ComputePathThroughPoses similar to
ComputePathToPose

Explicitly note NOT to clear error_code_id and error_msg
behavior tree output ports.  Otherwise we cannot read them
when the navigator reports errors.

Signed-off-by: Mike Wake <[email protected]>
…s-navigation#4341)

Make it consistent with NavigateThroughPoses.
This was introduced when changing logic to remove
internal exception.

Signed-off-by: Mike Wake <[email protected]>
…ion#4341)

Allow of reason for missed waypoint as error_msg.

Signed-off-by: Mike Wake <[email protected]>
Include error_msg in reason for missed waypoint

Signed-off-by: Mike Wake <[email protected]>
…#4341)

It was a red herring in search for error_msg clearing bug on abort

Signed-off-by: Mike Wake <[email protected]>
Use -R to invoke the regular expresion.

Signed-off-by: Mike Wake <[email protected]>
@aosmw aosmw force-pushed the feature/mw/4341_error_msg_2 branch from c3788b6 to f76ec0a Compare October 4, 2024 01:00
@aosmw
Copy link
Contributor Author

aosmw commented Oct 4, 2024

If you call Navigate To Pose in a full-simulation CI test from nav2 system tests, then check the populated error_code and error_msg within that action message, does that not test that capability? The high-level navigation action's error info is populated from the blackboard's, no?

At least for the highest of high level integration testing, you could hang that off of the navigate to pose and failure to navigate to pose actions in the nav2 system tests. That wouldn't be sufficient for full testing of this feature, but its a good smoke test of the full pipeline to include in addition to more tests at the module level. I would want that anyway to make sure 'it all works once put together' in a testable way to know if someone broke something subtle in 1, 2, 10 years.

That smoke test (if you also do one for navigate through poses) should cover the BT -> NavigateToPose/NavigateThroughPoses action population from that smoke test fully. So then its just a matter of (a) the BT nodes populating the BT blackboard, (b) the proper selection of the highest priority code/message pairing (which a simple unit test should be able to handle, no?), and (c) the Task Servers populating the Action for the BT nodes to receive.

For the module level tests, I'm sure it'll be a bit of a puzzle, but once you figure out a solution, it'll be copy+paste for all N instances needed since all the BT nodes and behavior tree testing resources are all very mirrored. Look at the tests within BT Navigator and BT Nodes itself might inspire some designs since we test other elements there.

I haven't had the head space to take this on so no updates of significance.

For what its worth I have rebased this feature on latest main again.

Any updates? This would be great to have in as a feature to discuss at ROSCon in our Nav2 birds of a feather session!
Are you going to be in attendance of ROSCon?

I would love to go to ROSCon, but alas its not going to happen this year.

@SteveMacenski
Copy link
Member

I completely understand that! Even if it takes a bit before its completed, it is nonetheless an extremely valuable update once its available 😄 I'd always rather slow and thoughtful over fast and rushed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants