Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: bind to a specific local IPv6 address during pair verify #272

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

roysjosh
Copy link
Contributor

This will prevent future traffic from picking the "best" address to communicate with an accessory. Current products tie sessions to the original IPv6 address and return a CoAP error code 4.04 when a new address attempts to continue the session.

It is likely that this could explode in fun, new ways: the DHCPv6 server returning a new address, the host selecting a new random SLAAC address, etc.

@roysjosh
Copy link
Contributor Author

Cross reference aiocoap#298

@roysjosh
Copy link
Contributor Author

I'm running this locally now. I've been seeing dozens of 4.04 errors a day. Let's see if this cuts that out.

@codecov
Copy link

codecov bot commented Dec 20, 2022

Codecov Report

Base: 66.74% // Head: 66.59% // Decreases project coverage by -0.14% ⚠️

Coverage data is based on head (6901939) compared to base (4b02e0d).
Patch coverage: 17.39% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #272      +/-   ##
==========================================
- Coverage   66.74%   66.59%   -0.15%     
==========================================
  Files          71       71              
  Lines        6697     6718      +21     
==========================================
+ Hits         4470     4474       +4     
- Misses       2227     2244      +17     
Flag Coverage Δ
unittests 66.59% <17.39%> (-0.15%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
aiohomekit/controller/coap/pairing.py 31.85% <0.00%> (-0.48%) ⬇️
aiohomekit/controller/coap/connection.py 29.50% <19.04%> (-0.43%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

This will prevent future traffic from picking the "best" address to
communicate with an accessory. Current products tie sessions to the
original IPv6 address and return a CoAP error code 4.04 when a new
address attempts to continue the session.

It is likely that this could explode in fun, new ways: the DHCPv6 server
returning a new address, the host selecting a new random SLAAC address,
etc.
@roysjosh
Copy link
Contributor Author

Wrap "get local IPv6 address" function in try/except. This prevents network errors from marking the device as not being provided by the integration.

This does seem to have eliminated frequent 404s from my environment.

A CoAP error code of 4.04 can indicate that the accessory has no record
of the HAP session. This will certainly be the case if a device has lost
power. Shutdown the session on our side so we can trigger a reconnect.
@jfroy
Copy link

jfroy commented Dec 26, 2022

This seems like a good fix, provided the library can react to local address changes. From the code, _get_local_ipv6_addr lets the kernel pick a source address and interface to connect to the accessory and the library snapshots that and assumes it will be good "forever". I wonder if there's a performant way to monitor that route and reset if it becomes invalid (e.g. SLACC reconfiguration, etc). Though in general with ULA addresses that should not happen.

I wonder if the logic could be made to prioritize ULAs or link-local versus say an ISP publicly routable address that is more subject to change (especially when using privacy-preserving randomized addresses).

@Jc2k
Copy link
Owner

Jc2k commented Dec 26, 2022

The ip module has various attributes that could help with that: https://docs.python.org/3/library/ipaddress.html#ipaddress.IPv6Address.is_private

@Jc2k
Copy link
Owner

Jc2k commented Dec 26, 2022

In terms of detecting failed connections, hoping we can get around to testing if the coap transport publishes a sane s# value in mdns. That number should go up if there is a state change that no one observed. So if we see it go up, we know we should restart the session.

@Jc2k
Copy link
Owner

Jc2k commented Dec 26, 2022

It's nice that we get a new source address every time we pair verify.

My instinct is that it'd be nice if we could have the events service bind to * but set the source address on outgoing traffic. But I don't think that helps much in practice - if the local ip changes the device won't know it and if the remote ip changes we handle it through mdns. So let's not bother.

In terms of detecting our own ip changing, ideally I'd want to defer that to HA. Presumably they have to handle that to update their mdns record.

Copy link
Owner

@Jc2k Jc2k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openthread/openthread#1520 is another case where openthread code had to be mindful of source address.

@Jc2k
Copy link
Owner

Jc2k commented Dec 28, 2022

At the moment I'm tempted to merge the 4.04 handling and recovering only, as I'd like to try and understand the situations where the source address is wrong before deploying this work around. And I just don't think I'm going to get in front of a computer before the Jan release is out.

I'm struggling to think of scenarios where the source address flips if we have picked the thread ULA of the device as the destination address, the best one i came up with is if your HA loses its thread route but your home router doesn't. Default route would get it from HA to your main router, then your router delivers it to the thread router. But home routers don't generally accept thread routes out of the box, and would rely on your thread devices doing DHCP6, not just your thread router. That would be surprising as they are different L2 segments aren't they?

If we do go ahead with this work around I want to test what does happen if the ipv6 did address of HA did change - hopeful any pending recvs would get cancelled, and bubble up asyncio into our code?

Jc2k pushed a commit that referenced this pull request Dec 28, 2022
@Jc2k
Copy link
Owner

Jc2k commented Dec 28, 2022

2.4.2 handles Code.NOT_FOUND but doesn't contain the bind fix, going to put that into HA now. It also lowers the severity of a NOT_FOUND - something that can happen for valid reasons (like a session expiring while your border router is offline, battery changes, mains power cycling, etc) that we can automatically recover from shouldn't show up in the error log.

@Jc2k
Copy link
Owner

Jc2k commented Dec 28, 2022

And PR to land it in HA - home-assistant/core#84700.

Hopefully I can find time with all my devices (test trashing the network etc ti test recovery) to test the full PR but can't say it's likely before mid jan.

@roysjosh
Copy link
Contributor Author

roysjosh commented Feb 7, 2023

I upgraded HA which wiped out this full by-hand change. I don't "see" the reconnects any more from the perspective of the frontend but they are still happening w/o the full fix.

@Jc2k
Copy link
Owner

Jc2k commented Feb 7, 2023

Yep, still on my radar to loop back round to here and test out the full fix when I've got more time to try and break things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants