[FROG] Zebra crashes / FRR with netns VRFs
Casey Deccio
casey at deccio.net
Wed Aug 20 13:14:53 UTC 2025
Hi all,
I'm using FRR with VRFs based on network namespaces, and the zebra process keeps crashing (log messages below). Here is my setup:
1. I start the daemons with the -w option:
$ ps -ef | grep frr
root 65439 1 0 23:18 ? 00:00:00 /usr/lib/frr/watchfrr -d -w --log file:/tmp/frr.log --log-level=debug -F traditional zebra mgmtd ripd staticd
frr 68363 1 0 23:29 ? 00:00:00 /usr/lib/frr/zebra -d -w --log file:/tmp/frr.log --log-level=debug -F traditional -A 127.0.0.1 -s 90000000
frr 68368 1 0 23:29 ? 00:00:00 /usr/lib/frr/mgmtd -d -w --log file:/tmp/frr.log --log-level=debug -F traditional -A 127.0.0.1
frr 68370 1 0 23:29 ? 00:00:00 /usr/lib/frr/ripd -d -w --log file:/tmp/frr.log --log-level=debug -F traditional -A 127.0.0.1
frr 68373 1 0 23:29 ? 00:00:00 /usr/lib/frr/staticd -d -w --log file:/tmp/frr.log --log-level=debug -F traditional -A 127.0.0.1
2. I create some network namespaces (e.g., with unshare --netns=/run/netfs/h1 ...).
3. I check that the namespaces were detected by frr:
$ sudo vtysh -c 'show vrf'
netns-based vrfs
vrf h1 id 6 netns /run/netns/h1
vrf h2 id 5 netns /run/netns/h2
vrf h3 id 4 netns /run/netns/h3
vrf h4 id 3 netns /run/netns/h4
vrf r1 id 2 netns /run/netns/r1
vrf r2 id 1 netns /run/netns/r2
4. I create rip instances on two of the namespaces with the following:
sudo vtysh -c enable -c 'configure terminal' -c 'router rip vrf r1' -c 'redistribute connected' -c 'network r1-s1' -c 'network r1-r2' -c exit -c end -c end
Where r1-s1 and r1-r2 are the names of the interfaces in namespace r1. Similar with r2.
5. These show up in the running config:
$ sudo vtysh -c 'show ru'
Building configuration...
Current configuration:
!
frr version 10.3
frr defaults traditional
hostname debian
log syslog informational
service integrated-vtysh-config
!
router rip vrf r1
network r1-r2
network r1-s1
redistribute connected
exit
!
router rip vrf r2
network r2-r1
network r2-s2
redistribute connected
exit
!
end
6. At some point very soon, I get the following error in the log:
2025/08/19 16:22:33 ZEBRA: [PE6Y7-KR1RK] Zebra received unknown command 37
ZEBRA: Received signal 11 at 1755642153 (si_addr 0x0); aborting...
ZEBRA: zlog_signal+0xc4 ffff857a7024 ffff8472cc80 /usr/lib/aarch64-linux-gnu/frr/libfrr.so.0 (mapped at 0xffff856d0000)
ZEBRA: ? ffff857e8db8 ffff8472cdb0 /usr/lib/aarch64-linux-gnu/frr/libfrr.so.0 (mapped at 0xffff856d0000)
ZEBRA: ---- signal ----
ZEBRA: ? ffff85984808 ffff8472cf00 linux-vdso.so.1 (mapped at 0xffff85984000)
ZEBRA: netlink_route_multipath_msg_encode+0xc4 aaaaabfffbcc ffff8472e160 /usr/lib/frr/zebra (mapped at 0xaaaaabf50000)
ZEBRA: netlink_batch_add_msg+0x54 aaaaabff2224 ffff8472e210 /usr/lib/frr/zebra (mapped at 0xaaaaabf50000)
ZEBRA: kernel_update_multi+0x254 aaaaabff2580 ffff8472e2d0 /usr/lib/frr/zebra (mapped at 0xaaaaabf50000)
ZEBRA: ? aaaaac01bce0 ffff8472e3a0 /usr/lib/frr/zebra (mapped at 0xaaaaabf50000)
ZEBRA: ? aaaaac015390 ffff8472e420 /usr/lib/frr/zebra (mapped at 0xaaaaabf50000)
ZEBRA: event_call+0x84 ffff857fb7c8 ffff8472e510 /usr/lib/aarch64-linux-gnu/frr/libfrr.so.0 (mapped at 0xffff856d0000)
ZEBRA: ? ffff8578b8cc ffff8472e5a0 /usr/lib/aarch64-linux-gnu/frr/libfrr.so.0 (mapped at 0xffff856d0000)
ZEBRA: ? ffff854e5f78 ffff8472e690 /lib/aarch64-linux-gnu/libc.so.6 (mapped at 0xffff85460000)
ZEBRA: in thread dplane_thread_loop scheduled from ../zebra/zebra_dplane.c:6586 dplane_provider_work_ready()
2025/08/19 16:22:33 MGMTD: [X3G8F-PM93W] BE-adapter: mgmt_msg_read: got EOF/disconnect
2025/08/19 16:22:33 WATCHFRR: [HD38Q-0HBRT][EC 268435457] zebra state -> down : read returned EOF
It happens about 70% of the time, but I can't seem to put my finger on what is causing it. When this happens, the zebra program exits, and it isn't started again until until the watchfrr script restarts it. And that can take several minutes.
6. My version and build options are as follows:
$ sudo /usr/lib/frr/mgmtd -v
mgmtd version 10.3
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
'--build=aarch64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-option-checking' '--disable-silent-rules' '--libdir=${prefix}/lib/aarch64-linux-gnu' '--runstatedir=/run' '--disable-maintainer-mode' '--sbindir=/usr/lib/frr' '--with-vtysh-pager=/usr/bin/pager' '--libdir=/usr/lib/aarch64-linux-gnu/frr' '--with-moduledir=/usr/lib/aarch64-linux-gnu/frr/modules' '--disable-dependency-tracking' '--enable-rpki' '--enable-scripting' '--enable-pim6d' '--with-libpam' '--enable-doc' '--enable-doc-html' '--enable-snmp' '--enable-fpm' '--disable-protobuf' '--disable-zeromq' '--enable-ospfapi' '--enable-bgp-vnc' '--enable-multipath=256' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-configfile-mask=0640' '--enable-logfile-mask=0640' 'build_alias=aarch64-linux-gnu' 'LIBS= -latomic' 'PYTHON=python3'
$ dpkg --list | grep frr
ii frr 10.3-3 arm64 FRRouting Internet routing protocol suite
ii frr-pythontools 10.3-3 all FRRouting Internet routing protocol suite (reload support)
Any thoughts or ideas would be appreciated.
Thanks,
Casey
More information about the frog
mailing list