-
Notifications
You must be signed in to change notification settings - Fork 39
ACLs
- TC Filtering
- Flower Classifier
- Flower Actions
- Matchall Classifier
- Filter Chains
- Shared Blocks
- ACLs Prior to 4.14
- Further Resources
The Linux TC subsystem takes care of policing, classifying, scheduling and shaping of forwarded traffic. The fundamental element of the TC architecture are qdiscs, which are in some detail discussed on Queues Management page. Closely related are then filters.
Kernel Version | |
---|---|
4.11 | Matching on protocol (ethtype) |
Flower keys src_mac and dsc_mac, src_ip and dst_ip (both IPv4 and IPv6), ip_proto ("tcp" and "udp"), src_port and dst_port | |
Actions drop and mirred egress redirect | |
4.12 | Flower keys vlan_prio, vlan_id. Action vlan modify |
4.13 | Flower key tcp_flags. Action trap |
4.14 | Flower keys ip_ttl, ip_tos. Action goto chain |
4.15 | Action pass |
4.16 | Action mirred egress mirror |
5.3 | Flower key indev |
5.7 | Action skbedit priority, pedit TOS / traffic_class |
5.8 | Action pedit tcp / udp sport / dport |
5.9 | Action police |
5.13 | Action sample |
5.18 | Action pedit ip / ip6 src / dst |
6.5 | Flower key l2_miss |
6.6 | Flower port range matching |
Each TC filter has two main parts: a classifier and an action. The classifier describes a class of packets, depending on type of filter and its individual configuration. The action is what happens when a packet falls into the class described by the classifier, again depending on individual configuration.
When attached to a general classful qdisc, one possible action is to select a
certain qdisc class to enqueue the packet to (the class_id
action). However
mlxsw does not offload this action currently.
Besides the class_id
action, there is a broad range of programmed and control
actions, some of which mlxsw may be able to offload. A qdisc specifically meant
for attaching and evaluating filters is clsact
.
Note: The clsact
qdisc was not available until kernel 4.14. See
below for how to configure ACLs on older kernels.
When added, the clsact
qdisc allows attaching filters to egress and ingress of
a netdevice. The ingress filters are run just after the packet ingresses the
host. The egress filters run just before the packet is handed to the root qdisc
of the egress device.
mlxsw will offload filters if:
-
The netdevice corresponds to a front panel port.
That does NOT include uppers of a front-panel port netdevice, such as bridges, VLAN soft devices and others, only the front-panel port netdevices themselves.
-
The qdisc that the filter is added to is a
clsact
qdisc.
The following example first adds the clsact
qdisc and then attaches at the
ingress of a netdevice a filter that drops all packets:
# tc qdisc add dev swp6 clsact
# tc filter add dev swp6 ingress flower action drop
The flower
keyword introduces the classifier. This example uses the flower
classifier, which allows matching on packet headers using symbolic names. In
this example the classifier did not get any arguments and will match on all
packets.
The action
keyword specifies the action that should take place on matched
packets. drop
means that the packet should be removed from the forwarding
path.
To see the list of inserted filters, run one of these two commands, depending on which direction you are interested in:
# tc filter show dev swp6 ingress
# tc filter show dev swp6 egress
E.g.:
# tc filter show dev swp6 ingress
filter protocol all pref 49152 flower chain 0
filter protocol all pref 49152 flower chain 0 handle 0x1
in_hw in_hw_count 2
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1 installed 10 sec used 0 sec
used_hw_stats immediate
The example output shows a number of attributes of the filter that are assigned implicitly. The following sections will go through the interesting ones and discuss them.
A filter can be deleted using a delete
command:
# tc filter del dev swp6 ingress pref 49152
When a filter is offloaded, an in_hw
flag is shown in the dump (like in the
example above). Offloaded filters have effect on packets that are both in the HW
datapath as well as the SW datapath. If it is desirable that the filter exists
only in the HW, or only in the SW datapath, the classifier should be passed
either a skip_sw
or skip_hw
flag.
E.g. to insert a HW datapath-only filter:
# tc filter add dev swp6 ingress flower skip_sw action drop
Adding a SW datapath-only filter may make sense for classifiers or actions that are not supported by the device. See trapping for the details about how to get packets to the SW datapath.
In order to observe statistics related to packets, bytes transmitted, or
last time used, which are maintained on a per filter basis, add the -s
flag
to the filter show
command:
# tc -s filter show dev swp6 ingress
filter protocol all pref 49152 flower chain 0
filter protocol all pref 49152 flower chain 0 handle 0x1
in_hw in_hw_count 2
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1 installed 15 sec used 1 sec
Action statistics:
Sent 1456 bytes 18 pkt (dropped 18, overlimits 0 requeues 0)
Sent software 0 bytes 0 pkt
Sent hardware 1456 bytes 18 pkt
backlog 0b 0p requeues 0
used_hw_stats immediate
The individual statistics shown are:
-
installed
-- How long ago was the filter installed. -
used
-- How long ago has the filter last matched. -
sent software
-- Number of bytes and packets matched in SW datapath. -
sent hardware
-- Likewise for the HW datapath. -
used_hw_stats
-- Shows what type of statistics are used for this action. By default this isimmediate
, in which case the statistics are always up to date. It can be disabled instead.
Technically, statistics are reported per-action, not per-filter. In offloaded filters, mlxsw by default allocates one counter for the whole filter. Therefore, if multiple actions are attached to the same filter, they will have identical packets and bytes statistics.
The number of counters is limited and potentially lower than the number
of possible TC filters that can be programmed to the device. It is
possible to disable the allocation of the hardware counters using
hw_stats
action command line option during filter addition.
# tc filter add dev swp6 ingress flower skip_sw \
action drop hw_stats disabled
Disablement of per-flow counter only impacts the bytes
and packets
counters.
When disabled, they always report zeroes. The installed
and used
times are
still valid.
The default action when the hw_stats
directive is not used, is to allocate an
immediate counter. A way to request this behavior explicitly is to pass an
immediate
type:
# tc filter add dev swp6 ingress flower skip_sw \
action drop hw_stats immediate
The current occupancy of counters in HW can be queried using "devlink-resource":
# devlink resource show $(devlink dev) | grep 'name flow'
name flow size 24576 occ 12 unit entry dpipe_tables none
# tc filter add dev swp7 egress flower action trap
# devlink resource show $(devlink dev) | grep 'name flow'
name flow size 24576 occ 14 unit entry dpipe_tables none
Preference is the filter attribute that determines the order in which the
filters are evaluated. Filters with lower preference are evaluated before
filters with higher preference. If preference is not specified on the command
line, the kernel assigns one, starting at 49152 and decreasing by one for each
filter inserted without explicitly specified preference. To specify the
preference, use the pref
option:
# tc filter add dev swp6 ingress pref 123 prot ipv6 flower action drop
When several filters have the same preference, they are evaluated in the order of their addition.
To reduce the number of lookups, it is recommended to configure filters that share the same mask with the same preference. For example, if N flower filters that match on the desintation IP address are configured with N different preferneces, a packet can incur up to N lookups despite the fact that only a single filter can match. When all the filters are configured with the same prefernece, a packet will incur a single lookup.
Unless otherwise specified, the added filters match on packets regardless of
their EtherType. To match on packets with a specific EtherType, the filter
needs to be added to the filter tree dedicated to that protocol, through a
prot
argument. E.g. to drop only IPv6 packets:
# tc filter add dev swp6 ingress prot ipv6 flower action drop
These protocol-specific filter trees exist independent of each other. Since flower does not have a way of matching on EtherType, there is no way to match on a packet from one protocol, and only if that match fails, proceed to a match on another protocol.
A protocol selector all
can be used to explicitly select matching on packets
regardless of their EtherType:
# tc filter add dev swp6 ingress prot all flower action drop
Note: In the SW datapath, the indicated protocol matches on the outermost
EtherType. If the packet is VLAN tagged, the protocol value needs to be
802.1q
, not ip
, even if IP is what is inside the VLAN tag. Matching on the
inner IP is then done through flower vlan_ethtype
key. This is unlike the HW
datapath, where both protocol ip
and protocol 802.1q
would match.
One filter can perform several actions on the matched packets. For some of
the tc actions, such as tc-vlan
, the default control action is pipe
,
which means that when no control action is specified, listing the actions one
after another is all that needs to be done. For example, to change VLAN and
redirect the packet, one would do:
# tc filter add dev swp6 ingress flower \
action vlan modify id 85 \
action mirred egress redirect dev swp8
For other tc actions, such as tc-pedit
, the default control action is pass
,
and therefore in order to connect a few actions together, pipe
control action
needs to be specified between every two actions. For example, to set both the
source and destination IP of all packets sourced from swp6 and destined to
223.0.2.2, one would do:
# tc filter add dev swp6 egress prot ip flower dst_ip 223.0.2.2 skip_hw \
action pedit ex munge ip src set 1.1.1.1 pipe \
action pedit ex munge ip dst set 8.8.8.8
In order to avoid relying on the default behavior of various tc actions, it is
recommended to always specify the pipe
control action when the intention is
to stitch actions together.
Flower is the major filter that mlxsw is capable of offloading, as long as the keys used for matching are supported. The list of supported keys is as follows:
-
indev
-- Match on the port that the packet ingressed through. l2_miss
-
src_mac
,dst_mac
-- Match on the MAC address. -
vlan_ethtype
,vlan_prio
,vlan_id
-- Match on 802.1Q header. -
src_ip
,dst_ip
-- Match on source resp. destination IPv4 or IPv6 address. -
ip_ttl
-- Match on IPv4 TTL or IPv6 hop limit. -
ip_tos
-- Match on IPv4 TOS or IPv6 traffic class. -
ip_proto
-- Match on L4 protocol or IPv6 next header. -
src_port
,dst_port
-- Match on TCP or UDP ports. Including range matching. tcp_flags
The flower classifier allows matching on just part of the selected field. For example, to match just the DSCP part of the TOS field:
# tc filter add dev swp6 ingress prot ip \
flower ip_tos $((dscp << 2))/0xfc \
action drop
For the IP addresses, flower supports the usual address/length notation:
# tc filter add dev swp6 ingress prot ip \
flower src_ip 192.0.2.16/28 \
action drop
The key indev
matches packets that entered the switch through the indicated
netdevice. A filter is not offloaded unless the netdevice corresponds to a front
panel port.
# tc filter add dev swp7 egress flower indev swp6 action drop
The key l2_miss
can be used to match on layer 2 miss in the bridge
driver's FDB / MDB. When 1, match on packets that encountered a layer 2
miss. When 0, match on packets that were forwarded using an FDB / MDB
entry. Note that broadcast packets do not encounter a miss since a
lookup is not performed for them. The key can be used to implement
non-DF (Designated Forwarder) filtering in EVPN multi-homing, as
explained here.
# tc filter add dev swp7 egress flower l2_miss 1 action drop
The flower key vlan_id
matches on the VID in the 802.1q header:
# tc filter add dev swp1 ingress protocol 802.1q \
flower vlan_id 95 skip_sw action drop
Note: Packets arriving without 802.1q TCI, or ones which are only priority-tagged, are assigned a bridge PVID by the hardware. Thus, a flower match on a VID equal to PVID will match untagged packets as well.
The keys src_ip
and dst_ip
are used for matching on source resp. destination
address. Both IPv4 and IPv6 addresses are supported. The exact version depends
on the matched EtherType, which can be done either by matching on
protocol, or by using vlan_ethtype
flower key.
For example:
# tc filter add dev swp1 ingress protocol 802.1q pref 10 \
flower skip_sw vlan_ethtype ipv4 dst_ip 192.0.2.16/28 \
action drop
# tc filter add dev swp1 ingress protocol ipv6 pref 10 \
flower skip_sw dst_ip fe01::3 \
action drop
The same holds for other L3 headers. For example ip_ttl
is not available
unless the protocol is IP or IPv6, and ip_tos
with IPv6 really matches on
traffic class.
Note that matching partial IP addresses is possible using the usual mask/length notation. See above for more details.
The key ip_proto
allows matching on the IPv4 L4 protocol and IPv6 next header.
It also enables further matching on L4-specific keys. E.g. matching on the keys
src_port
and dst_port
is not allowed unless there is also a match on
ip_proto tcp
or udp
. For example:
# tc filter add dev swp1 ingress protocol ipv6 pref 10 \
flower skip_sw ip_proto tcp dst_port 3333 \
action drop
Note that matching on ip_proto
itself is not possible until the packet is
otherwise matched as IPv4 or IPv6, either through matching on
protocol, or by using vlan_ethtype
flower key.
It is possible to match on a range of source or destination ports by
specifying the value of the src_port
and dst_port
keys as a range.
For example:
# tc filter add dev swp1 ingress protocol ipv6 pref 10 \
flower skip_sw ip_proto tcp dst_port 3333-4444 \
action drop
Port range matching is implemented in the device using dedicated port range registers, which are limited in number. To overcome this limitation, the driver reuses a port range register across different filters if the filters match on the same range ({min, max}) and the same port type (source / destination).
The maximum number of port range registers as well as their current occupancy can be queried using "devlink-resource":
# devlink resource show $(devlink dev) | grep 'port_range'
name port_range_registers size 16 occ 2 unit entry dpipe_tables none
Note: ip_proto
match for IPv6 is not supported for following next header
values: routing, fragment, destination, authentication, esp, mobility,
hop_by_hop, host_identity_protocol, shim6. This is due to HW parser
architecture.
mlxsw can offload flower classifier with a number of actions.
The action drop
causes matched packets to be removed from the pipeline.
The action trap
removes matched packets from the HW pipeline and moves them to
the CPU, where the normal Linux SW datapath takes over. Such packets can be
observed through e.g. tcpdump or wireshark, unlike the normal HW-datapath
packets.
Note that in the SW datapath, the trap
action drops the packet. Thus the
action likely does not make sense unless specified as
skip_sw
.
The trapped packets will appear to ingress through the netdevice that corresponds to the front panel port through which the packet entered the switch. I.e. even if the trap is on egress, the packet will appear on ingress again.
In the following example, UDP/IP packets with destination port of 1234 are trapped to slow path for further processing by SW-only U32 classifier:
# tc filter add dev swp1 ingress prot ip \
flower skip_sw ip_proto udp dst_port 1234 \
action trap
# tc filter add dev swp1 ingress prot ip \
u32 skip_hw ...
The action pass
accepts matched packets for further HW pipeline forwarding.
Processing of more filters is thus avoided.
The mirred egress redirect
action serves to redirect a packet to the egress of
a specified port. This action is not offloaded unless the following holds:
- The filter is attached to the ingress of a netdevice.
- The destination netdevice corresponds to a front panel port.
In the following example, packets that arrive to swp1 are forwarded to swp2:
# tc filter add dev swp1 ingress flower \
action mirred egress redirect dev swp2
The mirred egress mirror
action causes packets to be copied to the egress of a
specified port. The Port Mirroring page discusses the mirred offload in more
detail.
The sample
action samples packets according to a configured sampling
rate (i.e., 1 out of N packets). Sampled packets are forwarded by the
data path (software or hardware), but a copy can be sent to higher
layers (e.g., user space) for inspection. The Packet Sampling page
discusses sampling in more detail.
The action vlan modify
allows changing of the VLAN ID:
# tc filter add dev swp1 parent ingress \
flower action vlan modify id 85
Note: Packets which arrive without 802.1q TCI, or which are only
priority-tagged, are assigned a bridge PVID by the hardware. Thus, a vlan modify
to a non-PVID tag apparently pushes a VLAN tag on such packet, and
likewise vlan modify
to a PVID tag pops it. That is unlike the software
pipeline, where vlan modify
is only meaningful on packets which are already
802.1q-tagged.
This action invokes further filters at a specified chain. See Filter Chains for further details.
Action skbedit priority
is offloaded to assign priority to a packet. See
ACL-Based Priority Assignment
for more details.
The action pedit
is offloaded to allow changing of some packet header fields.
The following pseudocode example gives an idea of the syntax:
# tc filter add dev swp6 ingress prot <prot> flower skip_sw \
action pedit ex munge <pedit-prot> <field> set <value> retain <mask>
Or, if the <mask>
should cover the whole field:
# tc filter add dev swp6 ingress prot <prot> flower skip_sw \
action pedit ex munge <pedit-prot> <field> set <value>
Note that for purposes of protocol matching (<prot>
above), IPv6 is called
ipv6
, whereas for purposes of pedit (<pedit-prot>
above) it is called ip6
.
The following protocols and fields are offloaded:
-
IPv4 and IPv6 fields
tos
resp.traffic_class
. The supported masks are 0xff (for the whole TOS / traffic class field), 0xfc (for just the DSCP subfield) and 0x03 (for the ECN subfield).The DSCP rewrite is covered on the Quality of Service page.
As an example, to remove ECN marking of an IPv4 packet without touching the rest of the TOS field:
# tc filter add dev swp6 ingress prot ip flower skip_sw \ action pedit ex munge ip dsfield set 0 retain 0x3
To change IPv6 DSCP:
# tc filter add dev swp6 ingress prot ipv6 flower skip_sw \ action pedit ex munge ip6 traffic_class set $((dscp << 2)) retain 0xfc
-
IPv4 and IPv6
src
anddst
fields on Spectrum-2 and above. Only full mask (e.g. 0xffffffff for IPv4 addresses) is supported.# tc filter add dev swp6 ingress prot ip flower skip_sw \ action pedit ex munge ip src set 198.51.100.1
-
TCP and UDP fields
sport
resp.dport
on Spectrum-2 and above. Only full mask (0xffff) is supported.# tc filter add dev swp6 ingress prot ip flower skip_sw \ action pedit ex munge udp sport set 1
The action police
is offloaded to allow policing of ingress or egress
bandwidth. For example:
# tc filter add dev swp1 ingress prot ip pref 1 \
flower skip_sw src_ip 192.0.2.1 \
action police rate 1gbit burst 16k conform-exceed drop/ok
To query the number of packets that were dropped by the policer, run:
# tc -s filter show dev swp3 ingress prot ip pref 1
filter flower chain 0
filter flower chain 0 handle 0x1
eth_type ipv4
src_ip 192.0.2.1
skip_sw
in_hw in_hw_count 1
action order 1: police 0x1 rate 1Gbit burst 16250b mtu 2Kb action drop overhead 0b
ref 1 bind 1 installed 54 sec used 0 sec
Action statistics:
Sent 6670013310 bytes 828985 pkt (dropped 365018, overlimits 0 requeues 0)
Sent software 0 bytes 0 pkt
Sent hardware 6670013310 bytes 828985 pkt
backlog 0b 0p requeues 0
used_hw_stats immediate
In the above example, 365018 packets were dropped by the policer.
Conforming packets can be piped to subsequent actions using pipe
action. The exceed action must always be set to drop
. For example, in
order to mirror policed packets to a different port, run:
# tc filter add dev swp1 ingress prot ip \
flower skip_sw src_ip 192.0.2.1 \
action police rate 1gbit burst 16k conform-exceed drop/pipe \
action mirred egress mirror dev swp2
Policers are a global resource and they can be shared by multiple filters. To do so, assign an index to a policer and then re-use it when installing more filters:
# tc filter add dev swp1 ingress prot ip \
flower skip_sw src_ip 192.0.2.1 \
action police rate 1gbit burst 16k conform-exceed drop/ok index 10
# tc filter add dev swp2 ingress prot ip \
flower skip_sw src_ip 198.51.100.1 \
action police index 10
The maximum number of supported policers and their current usage can be
read via the single_rate_policers
resource in devlink resource
.
Example:
# devlink resource show pci/0000:06:00.0
pci/0000:06:00.0:
...
name global_policers size 2040 unit entry dpipe_tables none
resources:
name single_rate_policers size 1984 occ 0 unit entry dpipe_tables none
-
Only the
rate
,burst
andconform-exceed
options are supported. The rest are ignored. -
While conforming packets can be piped to other actions, packets that exceed the policer's rate or burst size must be dropped.
-
For optimal results, the configured burst size should be at least:
min_burst = 0.4 * rate [bits]
Where
rate
is the configured rate in kilobits per second. For example, if the configured rate is5gbit
, the minimum burst size should be:min_burst = 0.4 * 5000000 = 2000000 [bits] = 250000 [bytes] = 250 [kb]
The matchall classifier simply matches all packets. It is offloaded only in a few specific cases:
- No protocol matching is allowed.
- Only
sample
andmirred egress mirror
actions are supported.
The Port Mirroring and Packet Sampling pages discuss the
mirred
and sample
offload, respectively.
Ingress matchall rules are executed in the device before ingress flower rules. Similarly, egress matchall rules are executed in the device after egress flower rules. This ordering is enforced by the driver:
# tc filter add dev swp1 ingress prot ip pref 2 \
flower skip_sw src_ip 192.0.2.1 \
action drop
# tc filter add dev swp1 ingress prot all pref 3 \
matchall skip_sw \
action sample rate 100 group 1 trunc 64
Error: Failed to add behind existing flower rules.
We have an error talking to the kernel
# tc filter add dev swp1 ingress prot all pref 1 \
matchall skip_sw \
action sample rate 100 group 1 trunc 64
TC filters are put together into chains by order of priority (pref). Each chain can be looked at as a table of classifier-action rules.
To insert a filter into a specific chain, one has to use the chain
parameter:
# tc filter add dev swp1 ingress chain 100 flower action drop
In this example, we added a filter into chain 100
. If the chain parameter is
omitted, the default chain 0 is assumed. Chain 0 is also the chain which is
always processed first. If other chains should be processed, the action goto chain
needs to be invoked.
# tc filter add dev swp1 ingress protocol ip \
flower skip_sw dst_ip 192.168.101.1 \
action goto chain 100
If a chain does not exist before a filter is added, it is implicitly created. Similarly, after the last filter is removed, implicitly created chains are destroyed. It is also possible to explicitly create and destroy chains:
# tc chain add dev swp1 ingress chain 11
# tc chain del dev swp1 ingress chain 11
If a chain contains filters when it is deleted, they are deleted as well. The delete command can be used for both implicitly and explicitly created chains.
To list existing chains, run:
# tc chain show dev swp1 ingress
chain parent ffff: chain 11
Note: There is a limit of 255 goto jumps that can be processed by the HW in a single packet processing. If more goto jumps are configured, the packet gets dropped.
As a chain is created (whether the implicit chain 0 or any other), mlxsw needs to guess which keys the user will want to match on in the filters that will be on this chain. If the guess proves to be too narrow, insertion of certain filters might fail, depending on the order in which they are added. If the guess proves to be too broad, some TCAM space will be wasted, which impacts the number of filters that can be offloaded.
The user often knows in advance, what keys they will want to use on a given chain. For example, they may only need matching on a destination IP address.
Chain templates allow the user to specify the shape that filters on this chain are going to have. mlxsw can then leverage this knowledge to configure the HW optimally to support the requested matching keys.
The template is configured during explicit chain creation, like this:
# tc chain add dev swp1 ingress proto ip chain 11 \
flower dst_ip 0.0.0.0/16
The template is then shown when listing chains:
# tc chain show dev swp1 ingress
chain parent ffff: flower chain 11
eth_type ipv4
dst_ip 0.0.0.0/16
Addition of filters that fit the template will be successful:
# tc filter add dev swp1 ingress proto ip chain 11 \
flower dst_ip 10.0.0.0/8 \
action drop
Addition of filters that do not fit the template will fail:
# tc filter add dev swp1 ingress proto ip chain 11 \
flower dst_ip 10.0.0.0/24 \
action drop
Error: cls_flower: Mask does not fit the template.
We have an error talking to the kernel, -1
By default, each qdisc has its own group of chains. This group of chains is
called a block. Therefore two clsact
qdiscs, each on a different device, will
each have their own suite of filter chains, even if the filters themselves are
otherwise exactly same. mlxsw currently does not attempt to deduplicate such
cases automatically. So not only is such a setup harder to configure, it also
wastes more TCAM resources, which may limit the scale of the solution.
Block sharing is a way to resolve the above issues. When creating a qdisc, it is possible to request a particular block that should be used for the ingress and egress chains:
# tc qdisc add dev swp1 ingress_block 22 egress_block 23 clsact
# tc qdisc add dev swp2 ingress_block 22 egress_block 23 clsact
These two commands added clsact
qdiscs to two netdevices. The ingress_block
and egress_block
options indicate which shared block should be used in the
respective direction. Since both qdiscs use the same numbers, the qdiscs end up
using identical filter sets. The numbers are arbitrary and it is up to the user
to keep track of which number corresponds to which block.
If you list the existing qdiscs, you see the block sharing info in the output:
# tc qdisc show dev swp1
qdisc clsact ffff: parent ffff:fff1 ingress_block 22 egress_block 23
# tc qdisc show dev swp2
qdisc clsact ffff: parent ffff:fff1 ingress_block 22 egress_block 23
To make it more visual, the situation looks like this:
swp1 clsact qdisc swp2 clsact qdisc
(ing)(egr) (egr)(ing)
| | | |
| +-----> block 23 <-----+ |
| + chain 0 |
| + flower |
| + ... |
| |
+----------> block 22 <---------+
+ chain 0
+ ...
There is no limitation to number of qdiscs that can share the same block.
Once the qdisc block is shared, it is no longer possible to manipulate the filters using the qdisc handle. One has to rather use the block index as a handle:
# tc filter add block 22 flower action drop
In order to implement device-specific filters in shared blocks, the
indev
flower key may be useful:
# tc filter add block 22 flower indev swp1 action drop
Another feature that uses shared blocks is qevents.
As above, a block attached to a qevent is implicitly created, and does not
disappear until it is not referenced anymore, whether by a clsact
qdisc or by
a qevent.
Formally qevent blocks are simply shared blocks, and filters can be attached to
them in the same way as to any other block. A single shared block can even be
used by both a clsact
instance and a qevent. However such configurations are
unlikely to be really useful, because the set of filters permissible in both
positions is very limited.
On recent kernels, a single clsact
qdisc holds both ingress and egress rules.
On kernels prior to 4.14, one would instead use an ingress
qdisc for ingress
rules, and an arbitrary egress qdisc for egress rules. E.g.:
$ tc qdisc add dev swp1 handle ffff: ingress
$ tc qdisc add dev swp1 handle 1: root prio
And attaching the matchall
classifier was done for ingress:
$ tc filter add dev swp1 parent ffff: \
matchall skip_sw \
action mirred egress mirror dev swp2
And for egress:
$ tc filter add dev swp1 parent 1: \
matchall skip_sw \
action mirred egress mirror dev swp2
- man tc
- man tc-flower
- man tc-actions
-
QoS in Linux with TC and Filters by Phil Sutter (part of
iproute
documentation) - Linux Traffic Control Classifier-Action Subsystem Architecture
- man tc-police
- man devlink-resource
- man tc-sample
General information
System Maintenance
Network Interface Configuration
- Switch Port Configuration
- Netdevice Statistics
- Persistent Configuration
- Quality of Service
- Queues Management
- How To Configure Lossless RoCE
- Port Mirroring
- ACLs
- OVS
- Resource Management
- Precision Time Protocol (PTP)
Layer 2
Network Virtualization
Layer 3
- Static Routing
- Virtual Routing and Forwarding (VRF)
- Tunneling
- Multicast Routing
- Virtual Router Redundancy Protocol (VRRP)
Debugging