Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The agentIP selection is wrong when multi devices have same IPv4/IPv6 address #61

Open
shuaishang opened this issue Feb 2, 2024 · 5 comments

Comments

@shuaishang
Copy link

We used hsflowd in SONiC.

  • The agent is set to Loopback0
  • the Loopback0 has both IPv4 and IPv6 address

Issue:
The IPv4 address priority should be higher than IPv6 per hsflowd design "EnumIPSelectionPriority".

But it selected the IPv6 address wrongly:

root@MC-54:/# cat /etc/hsflowd.auto
# WARNING: Do not edit this file. It is generated automatically by hsflowd.
rev_start=2
hostname=MC-54
sampling=400
header=128
datagram=1400
polling=20
agentIP=fd00:0:201::5
agent=Loopback0
ds_index=1
collector=26.34.15.106/6343//
rev_end=2

The device Loopback0, Loopback1001, Loopback1002 belong to different VRF so then can have same IPv4/IPv6 address:

root@MC-54:~# ip addr show dev Loopback0
35: Loopback0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 8a:5f:78:e1:3b:5d brd ff:ff:ff:ff:ff:ff
    inet 10.145.240.15/32 scope global Loopback0
       valid_lft forever preferred_lft forever
    inet6 fd00:0:201::5/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::885f:78ff:fee1:3b5d/64 scope link
       valid_lft forever preferred_lft forever
root@MC-54:~#
root@MC-54:~# ip addr show dev Loopback1001
212: Loopback1001: <BROADCAST,NOARP,UP,LOWER_UP> mtu 65536 qdisc noqueue master Vrf10002 state UNKNOWN group default qlen 1000
    link/ether 1a:4d:0d:10:8d:35 brd ff:ff:ff:ff:ff:ff
    inet 10.145.240.15/32 scope global Loopback1001
       valid_lft forever preferred_lft forever
    inet6 fd00:0:201::5/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::184d:dff:fe10:8d35/64 scope link
       valid_lft forever preferred_lft forever
root@MC-54:~#
root@MC-54:~# ip addr show dev Loopback1002
213: Loopback1002: <BROADCAST,NOARP,UP,LOWER_UP> mtu 65536 qdisc noqueue master Vrf10006 state UNKNOWN group default qlen 1000
    link/ether 02:1f:9f:c1:6d:25 brd ff:ff:ff:ff:ff:ff
    inet 10.145.240.15/32 scope global Loopback1002
       valid_lft forever preferred_lft forever
    inet6 fd00:0:201::5/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::1f:9fff:fec1:6d25/64 scope link
       valid_lft forever preferred_lft forever
root@MC-54:~#

But in function "readInterfaces", the HASH key of localIP/localIP6 has only IPv4/IPv6 address without dev/ifname.

  // keep v4 and v6 separate to simplify HT logic
  UTHash *newLocalIP = UTHASH_NEW(HSPLocalIP, ipAddr.address.ip_v4, UTHASH_DFLT);
  UTHash *newLocalIP6 = UTHASH_NEW(HSPLocalIP, ipAddr.address.ip_v6, UTHASH_DFLT);

In our example, Loopback0, Loopback1001, Loopback1002 have same IPv4 address "10.145.240.15/32".
But after "readInterfaces", the "localIP" has only one "10.145.240.15/32" for "Loopback1001".
Then agent "Loopback0" can't select correct agentIP.

@sflow
Copy link
Owner

sflow commented Feb 2, 2024

At first I thought this sounded like it might be running a version of hsflowd older than 2.0.39-9, when a correction was made to the automatic agent-address selection priorities. But on closer inspection the same thing might happen even with the latest version. Why? Because 10.145.240.15 is an RFC1918 address (which could easily be non-unique across a large network with multiple LANs) while fd00:0:201::5/64 has scope "global" and is therefore preferred as a more-likely-to-be-unique ID for the switch.

In SONiC the fix is to just tell the switch what it's agent address should be. It's a CLI option. That overrides the automatic selection. Will that work for you?

Note: another way to do this would be to put a thumb on the scale by adding something like:

agent.cidr=10.245.0.0/16

to the file /etc/hsflowd.conf inside the sflow container - which bumps up the priority of 10.245.0.0/16 addresses. But that seems awkward for SONiC. That way only really makes sense when the hsflowd.conf config file is easily accessible and is being set by something like Puppet, Kubernetes or DNS-SD.

@shuaishang
Copy link
Author

The issue is that automatic agent-address selection doesn't work when there are same address for different interfaces, since the Hash key of LocalIP has no interface name.
(it doesn't matter which IP has higher priority, IP4_RFC1918, IP6_GLOBAL, or CIDR)

Configure agentIP explicitly is ok, however it's a new feature for SONiC...

@sflow
Copy link
Owner

sflow commented Apr 26, 2024

I see what you mean now. The setting "agent=Loopback0" is supposed to boost the chances of a Loopback0 address being chosen here:
https://github.com/sflow/host-sflow/blob/v2.0/src/Linux/hsflowconfig.c#L1117-L1122
but the HSPLocalIP object for "10.145.240.15" has only one dev, and that can end up being "Loopback1001" or "Loopback1002", so the priority boost does not happen.

This example is a little confusing because even if it worked correctly it might still have picked the global fd00:0:201::5 address, but I can see that there might be some scenario where the address is chosen wrongly because the boost is not applied.

@sflow
Copy link
Owner

sflow commented Apr 29, 2024

I believe this is now addressed in master-branch. Let me know if you need a release to test.

@sflow
Copy link
Owner

sflow commented May 30, 2024

There is now a release that has this fix: 2.1.03-1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants