Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HOAS resolv.conf has wrong search path in homeassistant container #118

Open
ToxicFrog opened this issue Aug 24, 2023 · 10 comments
Open

HOAS resolv.conf has wrong search path in homeassistant container #118

ToxicFrog opened this issue Aug 24, 2023 · 10 comments
Labels
enhancement New feature or request

Comments

@ToxicFrog
Copy link

Describe the issue you are experiencing

I've been having DNS problems with a newly set up HOAS install, where it couldn't resolve any local hostnames. Looking at the logs showed it trying to resolve names like timelapse.local.hass.io rather than just timelapse or using the DNS search path provided by DHCP.

ha dns info is fine and nslookup works as expected on both the host and in the hassio_dns container. However, looking inside the main homeassistant container we see:

$ docker exec -it homeassistant cat /etc/resolv.conf
search local.hass.io
nameserver 172.32.30.3

That is definitely not the correct DNS search path, and it doesn't match the one in the host system or the DNS container! This looks similar to home-assistant/operating-system#454, but that was fixed years ago.

Furthermore, I can't even fix this by editing /etc/resolv.conf in the container, because it gets overwritten every time HA restarts. As a result, HA is basically nonfunctional for me right now.

What operating system image do you use?

ova (for Virtual Machines)

What version of Home Assistant Operating System is installed?

10.5

Did you upgrade the Operating System.

No

Steps to reproduce the issue

  1. Install HAOS in an environment where DHCP provides a local DNS search path.
  2. Configure HA with a hostname that matches that search path. (I suspect this part isn't necessary and you can name it whatever you like.)
  3. Add an integration like MPD using an unqualified hostname.
  4. Observe as it fails to talk to the device. Check the DNS logs and see lots of NXDOMAIN for hostname.local.hass.io rather than whatever your local DNS search path is.

Anything in the Supervisor logs that might be useful for us?

Nope. I was hoping for a nice "overwriting DNS configuration in container" smoking gun or something.

Anything in the Host logs that might be useful for us?

No.

System information

System Information

version core-2023.8.3
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.11.4
os_name Linux
os_version 6.1.45
arch x86_64
timezone America/Toronto
config_dir /config
Home Assistant Cloud
logged_in false
can_reach_cert_server ok
can_reach_cloud_auth ok
can_reach_cloud ok
Home Assistant Supervisor
host_os Home Assistant OS 10.5
update_channel stable
supervisor_version supervisor-2023.08.1
agent_version 1.5.1
docker_version 23.0.6
disk_total 30.8 GB
disk_used 4.9 GB
healthy true
supported true
board ova
supervisor_api ok
version_api ok
installed_addons File editor (5.6.0), Whisper (1.0.0), Piper (1.3.2)
Dashboards
dashboards 1
resources 0
mode auto-gen
Recorder
oldest_recorder_run August 14, 2023 at 10:03 PM
current_recorder_run August 23, 2023 at 11:51 PM
estimated_db_size 15.50 MiB
database_engine sqlite
database_version 3.41.2

Additional information

No response

@ToxicFrog ToxicFrog added the bug Something isn't working label Aug 24, 2023
@ToxicFrog
Copy link
Author

As a workaround, you can automate the fixing of resolv.conf on startup. Replace example.net with your real DNS search path.

There's probably a more elegant way to do this that dynamically fetches the correct search path when the container starts up, but I don't know what it is.

/config/tools/fix-dns

#!/usr/bin/env bash

# For some reason sed -i doesn't work inside the container, so we need
# this little dance
sed -E 's,^search .*,search example.net,' /etc/resolv.conf > /tmp/$$
cat /tmp/$$ > /etc/resolv.conf
rm /tmp/$$

/config/configuration.yaml

shell_command:
  fix_dns: 'bash /config/tools/fix-dns'

And then in Automations, create one with the trigger "HomeAssistant starts" and the action "call service shell_command.fix_dns".

@ToxicFrog
Copy link
Author

Update: the above doesn't work as well as I might hope, because some integrations fire off before the fix script runs -- so e.g. if you have a cmus media sink, it'll try to connect to it before resolv.conf is repaired, and fail. Some of these, like MQTT, will retry and succeed, but cmus doesn't seem to.

@agners
Copy link
Member

agners commented Aug 29, 2023

The container in Home Assistant use the DNS plug-in which in turn uses CoreDNS to resolve hostnames. I am transferring the issue to that plug-in.

@agners agners transferred this issue from home-assistant/operating-system Aug 29, 2023
@pvizeli
Copy link
Member

pvizeli commented Aug 30, 2023

Hass.io is a closed system and container orchastrator. If you want to access an external system, use the full qualified name. That is per design and not a bug.

@KevinCathcart
Copy link

If you want to access an external system, use the full qualified name. That is per design and not a bug.

It may not be a bug, but allowing this to work could be be a desirable feature, because it would remove a difference between core/docker installs and HAOS, it resolves what looks like inconsistent behavior in HAOS, and it looks to be really easy to do, and quite low risk.

Currently using bare hostnames for external devices via DCHP provided DNS search paths works just fine with core and docker installs, but doesn't work with HAOS or Supervised installations. This adds undesirable friction for people who want to migrate to HAOS from core or container.

Furthermore, using raw hostnames for devices with HAOS sometimes seems to work, and sometimes doesn't. The reason for this is because it works for devices that support LLMNR, but not for others.

So how could this be enabled in a simple way with minimal risk? Well to find out, let's look at what happens if you try to resolve a single label name (myname) relative from within core or an addon running under supervisor.

  1. Musl or glibc will notice it is a single label label name, will see the searchpath specified in /etc/resolv.conf.
  2. It will try to resolve myname.local.hass.io., via DNS protocol talking to coredns.
    1. Coredns will notice the .local.hass.io suffix, and will try to look this up as a name of a container. This will fail, returning nxdomain.
  3. The libc will now try to resolve myname. via DNS protocol talking to coredns.
    1. This time the MDNS plugin will kick in, since this is a single label name.
    2. It will ask the host's systemd-resolved to resolve myname.
    3. systemd-resolved on the host will determine that the candidate protocols
      1. LLMNR is a candidate because the name is single label.
      2. MDNs is not a candidate
      3. DNS will be considered a candidate because the name is single label and a search list exists on the host.
    4. systemd-resolved will try the candidate protocols.
      1. LLMNR will only find the device if it supports LLMNR.
      2. DNS won't find because systemd-resolved was passed myname. with the trailing dot, which disables using the search path. Further, systemd-resolved will refuse to send A or AAAA queries for myname. via DNS protocol because the (strongly discouraged) ResolveUnicastSingleLabel setting is not enabled. If systemd-resolved had been passed myname (with no dot) instead, then it would have used the host's DHCP derived suffix list.
    5. The mdns plugin will return whatever systemd-resolved returned, without further fallbacks.
  4. musl/glibc will accept the result from coredns, since there is nothing left to try.

So if the mdns plugin had instead removed the "." suffix (with code like hostname = strings.TrimSuffix(hostname, ".")) before passing the name to systemd-resolved from the host, then the host's search suffixes would be available.

Why do I claim this is low risk? Well, first of all it cannot affect any single label names for containers, as those will get tried first. This change also cannot affect any queries that the mdns plugin declines to process, so is limited to just affecting mdns and single label names. systemd-resolved treats names ending with a dot and those without identically, except for dns search list processing, which only gets applied for single label names.

@agners agners added enhancement New feature or request and removed bug Something isn't working labels Apr 5, 2024
@MassiPi
Copy link

MassiPi commented Jul 23, 2024

is this abandoned?
honestly seems a total nonsense
what should be the advantage to force IN MY LAN a search domain i do not use?
and ignoring DHCP option 15?

@tarocco
Copy link

tarocco commented Aug 12, 2024

Still seeing this issue present in HAOS 12.4 (with Core 2024.7.0). Please advise.

@StudioEtrange
Copy link

Same issue present in 2024.9.0
Still a problem

@aleks-mariusz
Copy link

aleks-mariusz commented Oct 24, 2024

i've encountered this in one of the add-ons (so it happens inside the docker container).

what should be the advantage to force IN MY LAN a search domain i do not use?

^-- exactly this. I don't understand the stubbornness that is resulting in this and not allowing us to use our internal network's normal search-domain setting. please someone care to explain why this is being a requirement?

well I have discovered another legitimate use-case to add to the list to justify changing this behaviour: My IP camera's RTSP does not work when using FQDN (seems it's too long an address to result in anything but a 401 unauthorized error, due to some unknown limit, change it to an IP and the login works fine). but i don't want to hard-code IPs (it's a well known anti-pattern).

so i created a CNAME under my search domain, but of course that doesn't work due to this bug. And yes, i understand it's a deliberate choice, but i would still categorize it as a bug and the wrong decision to make on behalf of everyone, and we'll continue to consider this a bug unless someone can explain the rationale behind this choice

@brianjmurrell
Copy link

Hass.io is a closed system and container orchastrator. If you want to access an external system, use the full qualified name. That is per design and not a bug.

I would submit that any device (closed system or not -- think any IOT device) in a network that ignores DHCP option 15 (domain to search when looking for unqualified names) is broken by design. The "closed system" nature of HAOS does not absolve it from operating properly in a properly configured network.

There are tons of IOT devices for example that work properly with DHCP option 15. Why should HAOS not?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants