Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Live stream using ZMQ doesn't work on Mac #34031

Closed
pbassut opened this issue Nov 15, 2024 · 9 comments · Fixed by commaai/msgq#634
Closed

Live stream using ZMQ doesn't work on Mac #34031

pbassut opened this issue Nov 15, 2024 · 9 comments · Fixed by commaai/msgq#634
Labels

Comments

@pbassut
Copy link
Contributor

pbassut commented Nov 15, 2024

Describe the bug

I go in my C3X and run:

cd /data/openpilot/cereal/messaging
./bridge &

Then I open cabana and pick ZMQ as a stream method and then I point to my C3X's IP. It hangs on the screen below:
image

The screenshot is not really mine. It's from a user on Discord but the issue is the same.

I know Mac isn't officially supported but wanted to flag this anyway.

Provide a route where the issue occurs

625ccd09daec723c/00000055--3b455c2343/0

openpilot version

0.9.7

Additional info

I have an Ubuntu 20.04 on the same machine(dual boot) and it works fine from there. Signaling it's not a network issue or something with C3X.

@pbassut pbassut added the bug label Nov 15, 2024
@deanlee
Copy link
Contributor

deanlee commented Nov 15, 2024

From reviewing the source code, it seems that it should work on macOS. However, here are a few things to check and troubleshoot:

  1. Firewall Settings: Ensure that the macOS firewall is not blocking the outbound connection from the sub socket to the device (which uses port 63557 for CAN messages). If necessary, try temporarily disabling the firewall for testing, or allow outgoing traffic on this specific port.
  2. ZeroMQ Version Compatibility: Ensure that the Mac is using a compatible version of ZeroMQ with the device. Version differences can sometimes cause connectivity issues. Consider upgrading to the latest version of ZeroMQ to ensure better cross-platform compatibility.

If none of these steps resolve the issue, please let me know, and we can continue troubleshooting.

@royjr
Copy link
Contributor

royjr commented Nov 16, 2024

  1. Confirmed firewall inactive on macOS 15.1.
Screenshot 2024-11-15 at 20 19 16
  1. Confirmed zeromq on latest version

brew info zeromq
==> zeromq: stable 4.3.5 (bottled), HEAD
High-performance, asynchronous messaging library
https://zeromq.org/
Installed
/opt/homebrew/Cellar/zeromq/4.3.5_1 (85 files, 6.2MB) *

@deanlee
Copy link
Contributor

deanlee commented Nov 16, 2024

If there are no firewall or zmq version issues, we can start by adding some debug print statements in msgq/impl_zmq.cc to track the connection process. Replace the ZMQSubSocket::connect and ZMQSubSocket::receive methods with the following debug versions:

int ZMQSubSocket::connect(Context *context, std::string endpoint, std::string address, bool conflate, bool check_endpoint){
  sock = zmq_socket(context->getRawContext(), ZMQ_SUB);
  if (sock == NULL){
    return -1;
  }

  zmq_setsockopt(sock, ZMQ_SUBSCRIBE, "", 0);

  if (conflate){
    int arg = 1;
    zmq_setsockopt(sock, ZMQ_CONFLATE, &arg, sizeof(int));
  }

  int reconnect_ivl = 500;
  zmq_setsockopt(sock, ZMQ_RECONNECT_IVL_MAX, &reconnect_ivl, sizeof(reconnect_ivl));


  full_endpoint = "tcp://" + address + ":";
  if (check_endpoint){
    full_endpoint += std::to_string(get_port(endpoint));
  } else {
    full_endpoint += endpoint;
  }
  printf("Attempting to connect to zmq endpoint: %s\n", full_endpoint.c_str());
  return zmq_connect(sock, full_endpoint.c_str());
}


Message * ZMQSubSocket::receive(bool non_blocking){
  zmq_msg_t msg;
  assert(zmq_msg_init(&msg) == 0);

  int flags = non_blocking ? ZMQ_DONTWAIT : 0;
  int rc = zmq_msg_recv(&msg, sock, flags);
  Message *r = NULL;

  if (rc >= 0) {
    // Make a copy to ensure the data is aligned
    r = new ZMQMessage;
    r->init((char *)zmq_msg_data(&msg), zmq_msg_size(&msg));
  } else {
    int err = zmq_errno();
    if (non_blocking && err == EAGAIN) {
      printf("Non-blocking receive: No message available\n");
    } else {
      printf("Error receiving message: %d, %s\n", err, zmq_strerror(err));
    }
  }

  zmq_msg_close(&msg);
  return r;
}

Once you've added these debug prints, compile Cabana using scons and then run it with the following command to observe the output:
cabana --stream --zmq <device_ip>

This will help you track connection and message reception details. Let me know what output you get, and we can continue troubleshooting.

@royjr
Copy link
Contributor

royjr commented Nov 16, 2024

On 3X..

comma@comma-bc60153f:/data/openpilot$ tmux kill-session -t comma; rm -f /tmp/safe_staging_overlay.lock;
comma@comma-bc60153f:/data/openpilot$ ./cereal/messaging/bridge

On macOS, add debug, scons and run..

Checking system → [✔]
Running command → tools/cabana/cabana --stream --zmq 172.16.0.20 │
─────────────────────────────────────────────────────────────────┘

Warning: Setting a new default format with a different version or profile after the global shared context is created may cause issues with context sharing.
Attempting to connect to zmq endpoint: tcp://172.16.0.20:60942
Non-blocking receive: No message available
Non-blocking receive: No message available
Non-blocking receive: No message available
2024-11-16 00:37:40.706 cabana[58889:5200031] +[IMKClient subclass]: chose IMKClient_Modern
2024-11-16 00:37:40.706 cabana[58889:5200031] +[IMKInputSession subclass]: chose IMKInputSession_Modern
Non-blocking receive: No message available
Non-blocking receive: No message available
Non-blocking receive: No message available

@deanlee
Copy link
Contributor

deanlee commented Nov 16, 2024

Thanks for helping with the troubleshooting! It looks like the issue is caused by a port mismatch between the Mac and the device:

  • Attempting to connect to zmq endpoint: tcp://172.16.0.20:60942
  • Expected port on the device: 63557

It seems the static int get_port(std::string endpoint) function returns different port values on macOS and Linux, likely due to platform differences in how std::hash behaves.

I'll submit a PR to fix this.

@royjr
Copy link
Contributor

royjr commented Nov 16, 2024

added this..

full_endpoint = "tcp://" + address + ":63557";

suceeded..

comma@comma-bc60153f:/data/openpilot$ ./cereal/messaging/bridge
socket [can] connected

but..

Non-blocking receive: No message available
Non-blocking receive: No message available

@deanlee
Copy link
Contributor

deanlee commented Nov 16, 2024

This might be caused by the pandad on the device not sending CAN messages. You can simulate sending CAN messages on the device using replay (make sure to quit openpilot before running it):

replay --demo --no-vipc

Then, run the bridge, and run Cabana on your Mac to see if it receives the messages.

@royjr
Copy link
Contributor

royjr commented Nov 16, 2024

Edit: it works!
Screenshot 2024-11-16 at 01 07 50

@pbassut
Copy link
Contributor Author

pbassut commented Nov 16, 2024

@deanlee How did the mismatching port happen? Nevermind, just read your comment here:

It seems the static int get_port(std::string endpoint) function returns different port values on macOS and Linux, likely due to platform differences in how std::hash behaves.

I'm wondering, if that's an unpredictable behaviour, perhaps when running ./bridge on the device, should it advertise the port it's listening on? Then let cabana also specify ip:port combination to connect. If both parties disclosure the full address they're using to communicate we would spot the issue way faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants