-
Notifications
You must be signed in to change notification settings - Fork 20
DebuggingThread
These notes assume you are debugging a Thread problem in the context of Home Assistant. They are focussed on HAP (HomeKit Accessory Protocol) but should be broadly applicable to any Thread based protocol.
This guide is aimed at devices you have connected to Thread but you are losing connections to them.
-
Are you using VLANs? Home Assistant must be on the same VLAN as your border routers. All of them must be on that VLAN. This is mandatory. (For Matter, the Matter container must also be on this VLAN). If you need to use a phone as part of the setup for your device, it must also be on this VLAN.
-
Are you using HAOS? You really should. Thread is not well supported on "stock" Linux.
- Using NetworkManager? It doesn't support multiple border routers. Instead it will rotate which one it is using (potentially disrupting traffic). With more than 3 BR's, you'll see this at least once a minute. If your NetworkManager is older than the one in HAOS, it might have even worse ipv6 deficiences.
- Using stock linux route advertisements? This can work, but not out of the box without setting
sysctl
likenet.ipv6.conf.eth0.accept_ra_rt_info_max_plen
to 64. - Using
systemd-networkd
? We know it has behaviour that is not consistent with the kernel. It might work better than unpatched NetworkManager. Please let us know how testing that goes. - Not using the HAOS kernel patches? If you run OTBR or have ip forwarding configured, Linux might disable the routing logic from paying attention to neighbour discovery data. Why is that a problem? When a route is not valid, it can stay in the route table until its TTL (time to live) expires. That is normally 30 minutes. So your network might break for 30 minutes every time your BR changes its link local ip address (you can't make those static). If it was consulting the neighbour cache, it would stop using that route in under a minute.
-
Using Apple border routers?
- Make sure you are running iOS 17 on all your Apple routers. They have TREL. TREL lets the BR's mesh over WiFi and ethernet as well as thread. This basically solves mesh partitions.
- Make sure your BR's are actually on the same network. One of mine learned about the non-IoT vlan and kept switching between networks, causing carnage.
- Using a SkyConnect that is connected to your Apple BR's means only erratic support for TREL.
We'll need to be able to poke at your problematic devices. To that we need an mDNS tool like this. We are interested in records in the _hap._udp
namespace. All your border routers should be visible in _meshcop._udp
. (For Matter, _matter._tcp
).
If you are running HA Core, you should have this covered. You should be able to SSH in to your Docker host and run "docker ps" and see your Home Assistant Container.
If you are running HAOS, it is a little trickier. You need to follow this guide to setup SSH access on port 22222.
When you've done this you should be able to use ssh
on your Mac or Linux desktop (or PuTTY on Window) to remotely connect ot HAOS.
You need to get to the stage where you can run "docker ps" and see a list of containers running on your system.
Run docker ps
to see a list of containers running on your system.
Use docker exec -it NAME bash
to get a shell inside your home assistant container. On HAOS, replace NAME
with homeassistant
.
Any changes you make in this container will persist until the next upgrade you do (Changes in /config
will be permanent of coure).
You will need to do this every time you upgrade HA.
apk --no-cache add iproute2 tcpdump
Thread is designed for consumer networks and to be plug and play. This is in tension with most "advanced" users who have "Professional" or "Enterprise" grade home networks.
Your home router is basically not involved in this process. Imagine that your BR's were connected by ethernet. In the ideal case, you'd have a single "dumb" switch which all your BR's are connected to, and HA would also be directly connected to that switch. We might stretch that basic model and have multiple switches and WiFi hotspots, but at its core that is the environment thread expects - a single flat network where everything is configured automatically over IPv6.
Border routers will self configure. In general, they will define their own ULA network on top of whatever network you have. So your computer might pick a link local address for itself (fe80
), get an address from your home router and also get a 3rd address from yor border routers.
If you have public IPv6 your border routers might ask your router for a range of ip6 addresses of their own over DHCP6. When this happens your thread devices will get public ipv6 addresses in that range. Depending on the firewall configuration for your router, you might find they are reachable from the internet. But they will almost certainly be able to make outbound connections for themselves.
If you don't have public IPv6, your devices might still be able to make outbound network connections for themselves using IPv6 NAT (https://openthread.io/codelabs/openthread-border-router-nat64).
It's important to understand that the SkyConnect acts more like a Switch than a Router.
Use ip -6 route
inside your HA container to see your route table.
Use ip -6 neigh
inside your HA container to see your neighbour cache.
Use ip -6 a
to see ipv6 addresses assigned to your interfaces.