[FROG] FRR "overlooks" creation of kernel routes (on linux) - how to debug?
Eugene Crosser
crosser at average.org
Fri Mar 25 18:04:08 UTC 2022
Hello Donald,
On 25/03/2022 18:28, Donald Sharp wrote:
> Why do we need to handle NLM_F_DUMP_INTR in normal operations? There is
> no dump commands being given except on startup
NLM_F_DUMP_INTR is not an error condition, it's a _normal_ condition
that happens in real life when underlying structures in the kernel (such
as list of addresses or routes) change between the kernel produces
separate netlink messages that collectively form a response to a dump
request. In such case userspace is advised to consume and drop any
pending messages in its netlink socket and re-request dump.
This situation is relatively easy to reproduce too, although it is not
common. It happens to us though, from time to time, as we have a very
dynamic virtual network.
When we encounter it, it manifests as an incomplete view of requested
configuration (not all interfaces, not all addresses etc.)
=======================
About the problem that prompted my message: in a test case, I have four
routes, two in their respective VRFs, and two pointing to veths each
that lead to the respective VRF
This is what I see in the kernel:
XXX.YYY.172.88 via 169.254.0.249 dev dwnsnk04050608 proto static
XXX.YYY.172.89 via 169.254.0.249 dev dwnsnk04050609 proto static
XXX.YYY.172.88 dev i04050608000000 proto static scope link
#dvrf04050608
XXX.YYY.172.89 dev i04050609000000 proto static scope link #dvrf04050609
But FRR is missing one of the four routes:
ps402a-6# sh ip route XXX.YYY.172.88
Routing entry for XXX.YYY.172.88/32
Known via "kernel", distance 0, metric 0, best
Last update 00:18:01 ago
* 169.254.0.249, via dwnsnk04050608
ps402a-6# sh ip route XXX.YYY.172.89
Routing entry for 0.0.0.0/0
Known via "bgp", distance 20, metric 0, best
Last update 02:01:52 ago
* <** our default routes are listed here **>
ps402a-6# sh ip route vrf dvrf04050608 XXX.YYY.172.88
Routing entry for XXX.YYY.172.88/32
Known via "kernel", distance 0, metric 0, vrf dvrf04050608, best
Last update 00:18:12 ago
* directly connected, i04050608000000(vrf rtvrf04050608)
ps402a-6# sh ip route vrf dvrf04050609 XXX.YYY.172.89
Routing entry for XXX.YYY.172.89/32
Known via "kernel", distance 0, metric 0, vrf dvrf04050609, best
Last update 00:18:15 ago
* directly connected, i04050609000000(vrf rtvrf04050609)
In the debug log, I see that all NEWROUTE events were processed:
2022/03/25 17:27:29 ZEBRA: [SKNFJ-G938V] RTM_NEWROUTE ipv4 unicast proto
static NS 0
2022/03/25 17:27:29 ZEBRA: [K2A4T-TS83H] RTM_NEWROUTE XXX.YYY.172.88/32
vrf dvrf04050608(1581) table_id: 800000001 metric: 0 Admin Distance: 0
2022/03/25 17:27:29 ZEBRA: [KMXEB-K771Y] netlink_parse_info:
netlink-dp-in (NS 0) type RTM_NEWROUTE(24), len=60, seq=395, pid=54641
2022/03/25 17:27:29 ZEBRA: [KMXEB-K771Y] netlink_parse_info:
netlink-listen (NS 0) type RTM_NEWROUTE(24), len=60, seq=395, pid=54641
2022/03/25 17:27:29 ZEBRA: [SKNFJ-G938V] RTM_NEWROUTE ipv4 unicast proto
static NS 0
2022/03/25 17:27:29 ZEBRA: [K2A4T-TS83H] RTM_NEWROUTE XXX.YYY.172.88/32
vrf default(0) table_id: 254 metric: 0 Admin Distance: 0
2022/03/25 17:27:29 ZEBRA: [KMXEB-K771Y] netlink_parse_info:
netlink-dp-in (NS 0) type RTM_NEWROUTE(24), len=52, seq=399, pid=54641
2022/03/25 17:27:29 ZEBRA: [KMXEB-K771Y] netlink_parse_info:
netlink-listen (NS 0) type RTM_NEWROUTE(24), len=52, seq=399, pid=54641
2022/03/25 17:27:29 ZEBRA: [SKNFJ-G938V] RTM_NEWROUTE ipv4 unicast proto
static NS 0
2022/03/25 17:27:29 ZEBRA: [K2A4T-TS83H] RTM_NEWROUTE XXX.YYY.172.89/32
vrf dvrf04050609(1584) table_id: 800000002 metric: 0 Admin Distance: 0
2022/03/25 17:27:29 ZEBRA: [KMXEB-K771Y] netlink_parse_info:
netlink-dp-in (NS 0) type RTM_NEWROUTE(24), len=60, seq=401, pid=54641
2022/03/25 17:27:29 ZEBRA: [KMXEB-K771Y] netlink_parse_info:
netlink-listen (NS 0) type RTM_NEWROUTE(24), len=60, seq=401, pid=54641
2022/03/25 17:27:29 ZEBRA: [SKNFJ-G938V] RTM_NEWROUTE ipv4 unicast proto
static NS 0
2022/03/25 17:27:29 ZEBRA: [K2A4T-TS83H] RTM_NEWROUTE XXX.YYY.172.89/32
vrf default(0) table_id: 254 metric: 0 Admin Distance: 0
2022/03/25 17:27:29 ZEBRA: [YXCJP-0WZWV] netlink_nexthop_msg_encode: ID
(4107): 169.254.0.249, via dwnsnk04050608(1582) vrf default(0)
2022/03/25 17:27:29 ZEBRA: [R43C6-KYHWT] netlink_nexthop_msg_encode:
RTM_NEWNEXTHOP, id=4107
2022/03/25 17:27:29 ZEBRA: [HYEHE-CQZ9G] nl_batch_send: netlink-dp (NS
0), batch size=48, msg cnt=1
2022/03/25 17:27:29 ZEBRA: [YXCJP-0WZWV] netlink_nexthop_msg_encode: ID
(4108): 169.254.0.249, via dwnsnk04050609(1585) vrf default(0)
2022/03/25 17:27:29 ZEBRA: [R43C6-KYHWT] netlink_nexthop_msg_encode:
RTM_NEWNEXTHOP, id=4108
2022/03/25 17:27:29 ZEBRA: [HYEHE-CQZ9G] nl_batch_send: netlink-dp (NS
0), batch size=48, msg cnt=1
2022/03/25 17:27:29 ZEBRA: [KMXEB-K771Y] netlink_parse_info:
netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=1508, seq=0, pid=0
Regards,
Eugene
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://lists.frrouting.org/pipermail/frog/attachments/20220325/cfcf7811/attachment.sig>
More information about the frog
mailing list