Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor connection to Border Router (CON-1415) #1153

Open
jzwyssig opened this issue Nov 13, 2024 · 4 comments
Open

Monitor connection to Border Router (CON-1415) #1153

jzwyssig opened this issue Nov 13, 2024 · 4 comments

Comments

@jzwyssig
Copy link

jzwyssig commented Nov 13, 2024

The event callback in esp_matter::start provides useful information, such as whether the WiFi station has connected or disconnected. I'd like to achieve the same for Thread, with the goal to detect when the Border Router is unavailable or got disconnected. Typically, if there’s no OTBR, the device changes its role. Unfortunately, this is not every time the case. Sometimes, I can also monitor network and address changes. Currently, I use the kThreadStateChange event to monitor changes:

    case chip::DeviceLayer::DeviceEventType::kThreadStateChange:
    {        
        if (event->ThreadStateChange.RoleChanged) {
            ESP_LOGW(TAG, "Thread role changed");
        }
        else if (event->ThreadStateChange.AddressChanged) {
            ESP_LOGW(TAG, "Thread address changed");
        }
        else if (event->ThreadStateChange.NetDataChanged) {
            ESP_LOGW(TAG, "Thread network data changed");
        }
        else if (event->ThreadStateChange.ChildNodesChanged) {
            ESP_LOGW(TAG, "Thread child nodes changed");
        }
        break;
    }

However, it sometimes takes over a minute before any events are triggered! Additionally, the events themselves are not very helpful. I would need to investigate specific APIs in OpenThread or Connectedhomeip. Is there a straightforward solution to this issue?

@github-actions github-actions bot changed the title Monitor connection to Border Router Monitor connection to Border Router (CON-1415) Nov 13, 2024
@wqx6
Copy link
Contributor

wqx6 commented Nov 14, 2024

These events are posted to Matter context at the state change callback of OpenThread. And there should not be a long delay for it (one minute is too long for event post). Could you check whether the post is successful at the state change callback?

Note that the role change will not happen immediately after you shutdown the Thread Border Router, it will take some time for the thread end-device to know that its parent router disappears and select a new router or become a router/leader as it needs to detect whether there are other routers it can attached.

If you want to know more information of the Thread state change. You can also try our OPENTHREAD_EVENT in ESP-IDF.

@jzwyssig
Copy link
Author

Thank you, @wqx6, for the quick response and helpful hints! The ESP OpenThread events are indeed useful. However, some of them are already propagated by the esp-matter and connectedhomeip event loops. I did some further testing, and here are my findings so far:

  • Roles are not reliable indicators of connection status in my case:
    If the TBR is not present, my device is promoted to a Router/Leader role. I prefer not to disable router functionality, as my device is always powered (24/7) and has no power constraints. Otherwise, I could just rely on the child role.

  • Behavior when disconnecting the Apple HomePod:
    After unplugging the HomePod, the only log entries I see are shown below. The first OpenThread event occurs after 200 seconds and is OPENTHREAD_EVENT_LOST_IP6. I could probably actively ping the TBR after such UDP messages to know if it is still reachable. However, there are no ESP OpenThread events for this.

    W(188816) OPENTHREAD:[W] Mle-----------: Failed to process UDP: Duplicated
    W(188816) OPENTHREAD:[W] Mle-----------: Failed to process UDP: Duplicated
    
  • Monitoring the connection to a TBR:
    I wrote a simple monitor function to check whether my device is connected to a TBR. Even with this polling mechanism, I only receive the disconnected event after ~200 seconds. Interestingly, when I plug the HomePod back in, it takes only ~20 seconds for the OPENTHREAD_EVENT_GOT_IP6 event to fire, and my monitoring function also starts returning true.

bool is_tbr_connected(otInstance *instance)
{
    otExternalRouteConfig route_config;
    otNetworkDataIterator iterator = OT_NETWORK_DATA_ITERATOR_INIT;
   
    // Iterate over all external routes
    while (otNetDataGetNextRoute(instance, &iterator, &route_config) == OT_ERROR_NONE)
    {
        if (route_config.mRloc16 != 0) {
            // Found a valid external route
            ESP_LOGI(TAG, "TBR detected: RLOC16 = 0x%04x, Prefix Length = %d", route_config.mRloc16, route_config.mPrefix.mLength);
            return true;
        }
    }
    return false;
}

My primary goal is to indicate whether my device has a connection to a TBR or not. I understand that Thread is a robust protocol, and the network organizes itself. If a TBR gets disconnected, another one might be promoted, and this process takes time. Currently, the OPENTHREAD_EVENT_LOST_IP6 and OPENTHREAD_EVENT_GOT_IP6 events provide the information I need, but the delay is too long for my requirements.

Any advice or suggestions on how to efficiently monitor the connection to a TBR would be greatly appreciated!

@wqx6
Copy link
Contributor

wqx6 commented Nov 18, 2024

Hi, the Thread End Device cannot immediately detect if the Thread Border Router (BR) gets disconnected, as the BR cannot notify other devices before disconnecting. The leader synchronizes with each router every 90 seconds and updates the router information in the network data. Therefore, if the Border Router disconnects, it may take up to 90 seconds for the leader to recognize the disconnection. (If the Border Router is the leader, additional time is required for another node to become the leader.) Once the leader detects the disconnection, it will update the OMR prefix in the network data.

But you can get a callback when the network data is changed:

static void network_data_change_callback(otChangedFlags changed_flags, void* ctx)
{
    if (changed_flags & OT_CHANGED_THREAD_NETDATA) {
        otBorderRouterConfig br_config;
        otNetworkDataIterator iterator = OT_NETWORK_DATA_ITERATOR_INIT;
        if (OT_ERROR_NOT_FOUND == otNetDataGetNextOnMeshPrefix(esp_openthread_get_instance(), &iterator, &br_config)) {
            printf("No Thread BR\n");
        }
    }
} 
...
otSetStateChangedCallback(esp_openthread_get_instance(), network_data_change_callback, NULL);

@jzwyssig
Copy link
Author

jzwyssig commented Nov 19, 2024

Thank you @wqx6, I used your code

static void matter_api_tbr_check_connection()
{
    static uint8_t connected = -1;
    otBorderRouterConfig br_config;
    otNetworkDataIterator iterator = OT_NETWORK_DATA_ITERATOR_INIT;
    otError error = otNetDataGetNextOnMeshPrefix(esp_openthread_get_instance(), &iterator, &br_config);
    if (error == OT_ERROR_NOT_FOUND && connected != 0)
    {
        connected = 0;
        ESP_LOGW(TAG, "Thread Border Router is disconnected");
    }
    else if (error == OT_ERROR_NONE && connected != 1)
    {
        connected = 1;
        ESP_LOGW(TAG, "Thread Border Router is connected");
    }
}

and integrated it into the event callback in esp_matter::start as follows. If there is no TBR connected after boot, the netData change event is not fired, that is why I also included it into the connectivity change event.

static void matter_api_event_cb(const ChipDeviceEvent *event, intptr_t arg)
{
     switch (event->Type)
     {
          case chip::DeviceLayer::DeviceEventType::kThreadConnectivityChange:
          {
              ESP_LOGI(TAG, "Thread connectivity changed");
              matter_api_tbr_check_connection();
              break;
          }
          case chip::DeviceLayer::DeviceEventType::kThreadStateChange:
          {        
              if (event->ThreadStateChange.NetDataChanged)
              {
                  ESP_LOGD(TAG, "Thread network data changed");
                  matter_api_tbr_check_connection();
              }
              break;
          }
     }
}

Do you know if the OPENTHREAD_EVENT_LOST_IP6 and OPENTHREAD_EVENT_GOT_IP6 are always fired after the TBR is disconnected or connected? Or is it safer to use the netData change event?

Also, I found a similar question here. As initially suggested, it seems the best option might be to get the IP address of the TBR and ping it periodically. Do you have any suggestions on how to implement this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants