fix: bind to a specific local IPv6 address during pair verify #272

roysjosh · 2022-12-20T20:50:06Z

This will prevent future traffic from picking the "best" address to communicate with an accessory. Current products tie sessions to the original IPv6 address and return a CoAP error code 4.04 when a new address attempts to continue the session.

It is likely that this could explode in fun, new ways: the DHCPv6 server returning a new address, the host selecting a new random SLAAC address, etc.

roysjosh · 2022-12-20T20:50:45Z

Cross reference aiocoap#298

roysjosh · 2022-12-20T20:51:28Z

I'm running this locally now. I've been seeing dozens of 4.04 errors a day. Let's see if this cuts that out.

codecov · 2022-12-20T20:52:38Z

Codecov Report

Base: 66.74% // Head: 66.59% // Decreases project coverage by -0.14% ⚠️

Coverage data is based on head (6901939) compared to base (4b02e0d).
Patch coverage: 17.39% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #272      +/-   ##
==========================================
- Coverage   66.74%   66.59%   -0.15%     
==========================================
  Files          71       71              
  Lines        6697     6718      +21     
==========================================
+ Hits         4470     4474       +4     
- Misses       2227     2244      +17

Flag	Coverage Δ
unittests	`66.59% <17.39%> (-0.15%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
aiohomekit/controller/coap/pairing.py	`31.85% <0.00%> (-0.48%)`	⬇️
aiohomekit/controller/coap/connection.py	`29.50% <19.04%> (-0.43%)`	⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

This will prevent future traffic from picking the "best" address to communicate with an accessory. Current products tie sessions to the original IPv6 address and return a CoAP error code 4.04 when a new address attempts to continue the session. It is likely that this could explode in fun, new ways: the DHCPv6 server returning a new address, the host selecting a new random SLAAC address, etc.

roysjosh · 2022-12-21T19:56:07Z

Wrap "get local IPv6 address" function in try/except. This prevents network errors from marking the device as not being provided by the integration.

This does seem to have eliminated frequent 404s from my environment.

A CoAP error code of 4.04 can indicate that the accessory has no record of the HAP session. This will certainly be the case if a device has lost power. Shutdown the session on our side so we can trigger a reconnect.

jfroy · 2022-12-26T18:42:33Z

This seems like a good fix, provided the library can react to local address changes. From the code, _get_local_ipv6_addr lets the kernel pick a source address and interface to connect to the accessory and the library snapshots that and assumes it will be good "forever". I wonder if there's a performant way to monitor that route and reset if it becomes invalid (e.g. SLACC reconfiguration, etc). Though in general with ULA addresses that should not happen.

I wonder if the logic could be made to prioritize ULAs or link-local versus say an ISP publicly routable address that is more subject to change (especially when using privacy-preserving randomized addresses).

Jc2k · 2022-12-26T19:37:52Z

The ip module has various attributes that could help with that: https://docs.python.org/3/library/ipaddress.html#ipaddress.IPv6Address.is_private

Jc2k · 2022-12-26T19:41:17Z

In terms of detecting failed connections, hoping we can get around to testing if the coap transport publishes a sane s# value in mdns. That number should go up if there is a state change that no one observed. So if we see it go up, we know we should restart the session.

Jc2k · 2022-12-26T19:54:49Z

It's nice that we get a new source address every time we pair verify.

My instinct is that it'd be nice if we could have the events service bind to * but set the source address on outgoing traffic. But I don't think that helps much in practice - if the local ip changes the device won't know it and if the remote ip changes we handle it through mdns. So let's not bother.

In terms of detecting our own ip changing, ideally I'd want to defer that to HA. Presumably they have to handle that to update their mdns record.

Jc2k

openthread/openthread#1520 is another case where openthread code had to be mindful of source address.

Jc2k · 2022-12-28T11:08:01Z

At the moment I'm tempted to merge the 4.04 handling and recovering only, as I'd like to try and understand the situations where the source address is wrong before deploying this work around. And I just don't think I'm going to get in front of a computer before the Jan release is out.

I'm struggling to think of scenarios where the source address flips if we have picked the thread ULA of the device as the destination address, the best one i came up with is if your HA loses its thread route but your home router doesn't. Default route would get it from HA to your main router, then your router delivers it to the thread router. But home routers don't generally accept thread routes out of the box, and would rely on your thread devices doing DHCP6, not just your thread router. That would be surprising as they are different L2 segments aren't they?

If we do go ahead with this work around I want to test what does happen if the ipv6 did address of HA did change - hopeful any pending recvs would get cancelled, and bubble up asyncio into our code?

Jc2k · 2022-12-28T14:33:42Z

2.4.2 handles Code.NOT_FOUND but doesn't contain the bind fix, going to put that into HA now. It also lowers the severity of a NOT_FOUND - something that can happen for valid reasons (like a session expiring while your border router is offline, battery changes, mains power cycling, etc) that we can automatically recover from shouldn't show up in the error log.

Jc2k · 2022-12-28T14:47:32Z

And PR to land it in HA - home-assistant/core#84700.

Hopefully I can find time with all my devices (test trashing the network etc ti test recovery) to test the full PR but can't say it's likely before mid jan.

roysjosh · 2023-02-07T14:54:10Z

I upgraded HA which wiped out this full by-hand change. I don't "see" the reconnects any more from the perspective of the frontend but they are still happening w/o the full fix.

Jc2k · 2023-02-07T14:58:33Z

Yep, still on my radar to loop back round to here and test out the full fix when I've got more time to try and break things.

roysjosh force-pushed the fix-404 branch from 8d7f224 to 08c9efd Compare December 21, 2022 19:51

roysjosh force-pushed the fix-404 branch from 08c9efd to 4b1bd35 Compare December 21, 2022 19:52

fix(coap): handle accessories dropping sessions

6901939

A CoAP error code of 4.04 can indicate that the accessory has no record of the HAP session. This will certainly be the case if a device has lost power. Shutdown the session on our side so we can trigger a reconnect.

jfroy mentioned this pull request Dec 26, 2022

More frequent disconnects of HomeKit Thread devices in 2022.12 (CoAP POST returned unexpected code) home-assistant/core#83739

Closed

Jc2k reviewed Dec 28, 2022

View reviewed changes

Jc2k pushed a commit that referenced this pull request Dec 28, 2022

Partial merge of #272 - handle not found error

9345d55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: bind to a specific local IPv6 address during pair verify #272

fix: bind to a specific local IPv6 address during pair verify #272

roysjosh commented Dec 20, 2022

roysjosh commented Dec 20, 2022

roysjosh commented Dec 20, 2022

codecov bot commented Dec 20, 2022 •

edited

Loading

roysjosh commented Dec 21, 2022

jfroy commented Dec 26, 2022

Jc2k commented Dec 26, 2022

Jc2k commented Dec 26, 2022

Jc2k commented Dec 26, 2022 •

edited

Loading

Jc2k left a comment

Jc2k commented Dec 28, 2022

Jc2k commented Dec 28, 2022

Jc2k commented Dec 28, 2022

roysjosh commented Feb 7, 2023

Jc2k commented Feb 7, 2023

fix: bind to a specific local IPv6 address during pair verify #272

Are you sure you want to change the base?

fix: bind to a specific local IPv6 address during pair verify #272

Conversation

roysjosh commented Dec 20, 2022

roysjosh commented Dec 20, 2022

roysjosh commented Dec 20, 2022

codecov bot commented Dec 20, 2022 • edited Loading

Codecov Report

roysjosh commented Dec 21, 2022

jfroy commented Dec 26, 2022

Jc2k commented Dec 26, 2022

Jc2k commented Dec 26, 2022

Jc2k commented Dec 26, 2022 • edited Loading

Jc2k left a comment

Choose a reason for hiding this comment

Jc2k commented Dec 28, 2022

Jc2k commented Dec 28, 2022

Jc2k commented Dec 28, 2022

roysjosh commented Feb 7, 2023

Jc2k commented Feb 7, 2023

codecov bot commented Dec 20, 2022 •

edited

Loading

Jc2k commented Dec 26, 2022 •

edited

Loading