[FROG] Zebra crashes / FRR with netns VRFs

Casey Deccio casey at deccio.net
Wed Aug 20 13:14:53 UTC 2025


Hi all,

I'm using FRR with VRFs based on network namespaces, and the zebra process keeps crashing (log messages below).  Here is my setup:

1. I start the daemons with the -w option:

$ ps -ef | grep frr
root       65439       1  0 23:18 ?        00:00:00 /usr/lib/frr/watchfrr -d -w --log file:/tmp/frr.log --log-level=debug -F traditional zebra mgmtd ripd staticd
frr        68363       1  0 23:29 ?        00:00:00 /usr/lib/frr/zebra -d -w --log file:/tmp/frr.log --log-level=debug -F traditional -A 127.0.0.1 -s 90000000
frr        68368       1  0 23:29 ?        00:00:00 /usr/lib/frr/mgmtd -d -w --log file:/tmp/frr.log --log-level=debug -F traditional -A 127.0.0.1
frr        68370       1  0 23:29 ?        00:00:00 /usr/lib/frr/ripd -d -w --log file:/tmp/frr.log --log-level=debug -F traditional -A 127.0.0.1
frr        68373       1  0 23:29 ?        00:00:00 /usr/lib/frr/staticd -d -w --log file:/tmp/frr.log --log-level=debug -F traditional -A 127.0.0.1


2. I create some network namespaces (e.g., with unshare --netns=/run/netfs/h1 ...).


3. I check that the namespaces were detected by frr:

$ sudo vtysh -c 'show vrf'
netns-based vrfs
vrf h1 id 6 netns /run/netns/h1
vrf h2 id 5 netns /run/netns/h2
vrf h3 id 4 netns /run/netns/h3
vrf h4 id 3 netns /run/netns/h4
vrf r1 id 2 netns /run/netns/r1
vrf r2 id 1 netns /run/netns/r2


4. I create rip instances on two of the namespaces with the following:

sudo vtysh -c enable -c 'configure terminal' -c 'router rip vrf r1' -c  'redistribute connected' -c  'network r1-s1' -c  'network r1-r2' -c exit -c end -c end

Where r1-s1 and r1-r2 are the names of the interfaces in namespace r1.  Similar with r2.

5. These show up in the running config:

$ sudo vtysh -c 'show ru'
Building configuration...

Current configuration:
!
frr version 10.3
frr defaults traditional
hostname debian
log syslog informational
service integrated-vtysh-config
!
router rip vrf r1
 network r1-r2
 network r1-s1
 redistribute connected
exit
!
router rip vrf r2
 network r2-r1
 network r2-s2
 redistribute connected
exit
!
end


6. At some point very soon, I get the following error in the log:

2025/08/19 16:22:33 ZEBRA: [PE6Y7-KR1RK] Zebra received unknown command 37
ZEBRA: Received signal 11 at 1755642153 (si_addr 0x0); aborting...
ZEBRA: zlog_signal+0xc4                   ffff857a7024     ffff8472cc80 /usr/lib/aarch64-linux-gnu/frr/libfrr.so.0 (mapped at 0xffff856d0000)
ZEBRA: ?                                  ffff857e8db8     ffff8472cdb0 /usr/lib/aarch64-linux-gnu/frr/libfrr.so.0 (mapped at 0xffff856d0000)
ZEBRA:     ---- signal ----
ZEBRA: ?                                  ffff85984808     ffff8472cf00 linux-vdso.so.1 (mapped at 0xffff85984000)
ZEBRA: netlink_route_multipath_msg_encode+0xc4     aaaaabfffbcc     ffff8472e160 /usr/lib/frr/zebra (mapped at 0xaaaaabf50000)
ZEBRA: netlink_batch_add_msg+0x54         aaaaabff2224     ffff8472e210 /usr/lib/frr/zebra (mapped at 0xaaaaabf50000)
ZEBRA: kernel_update_multi+0x254          aaaaabff2580     ffff8472e2d0 /usr/lib/frr/zebra (mapped at 0xaaaaabf50000)
ZEBRA: ?                                  aaaaac01bce0     ffff8472e3a0 /usr/lib/frr/zebra (mapped at 0xaaaaabf50000)
ZEBRA: ?                                  aaaaac015390     ffff8472e420 /usr/lib/frr/zebra (mapped at 0xaaaaabf50000)
ZEBRA: event_call+0x84                    ffff857fb7c8     ffff8472e510 /usr/lib/aarch64-linux-gnu/frr/libfrr.so.0 (mapped at 0xffff856d0000)
ZEBRA: ?                                  ffff8578b8cc     ffff8472e5a0 /usr/lib/aarch64-linux-gnu/frr/libfrr.so.0 (mapped at 0xffff856d0000)
ZEBRA: ?                                  ffff854e5f78     ffff8472e690 /lib/aarch64-linux-gnu/libc.so.6 (mapped at 0xffff85460000)
ZEBRA: in thread dplane_thread_loop scheduled from ../zebra/zebra_dplane.c:6586 dplane_provider_work_ready()
2025/08/19 16:22:33 MGMTD: [X3G8F-PM93W] BE-adapter: mgmt_msg_read: got EOF/disconnect
2025/08/19 16:22:33 WATCHFRR: [HD38Q-0HBRT][EC 268435457] zebra state -> down : read returned EOF

It happens about 70% of the time, but I can't seem to put my finger on what is causing it.  When this happens, the zebra program exits, and it isn't started again until until the watchfrr script restarts it.  And that can take several minutes.

6. My version and build options are as follows:

$ sudo /usr/lib/frr/mgmtd -v
mgmtd version 10.3
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
	'--build=aarch64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-option-checking' '--disable-silent-rules' '--libdir=${prefix}/lib/aarch64-linux-gnu' '--runstatedir=/run' '--disable-maintainer-mode' '--sbindir=/usr/lib/frr' '--with-vtysh-pager=/usr/bin/pager' '--libdir=/usr/lib/aarch64-linux-gnu/frr' '--with-moduledir=/usr/lib/aarch64-linux-gnu/frr/modules' '--disable-dependency-tracking' '--enable-rpki' '--enable-scripting' '--enable-pim6d' '--with-libpam' '--enable-doc' '--enable-doc-html' '--enable-snmp' '--enable-fpm' '--disable-protobuf' '--disable-zeromq' '--enable-ospfapi' '--enable-bgp-vnc' '--enable-multipath=256' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-configfile-mask=0640' '--enable-logfile-mask=0640' 'build_alias=aarch64-linux-gnu' 'LIBS= -latomic' 'PYTHON=python3'

$ dpkg --list | grep frr
ii  frr                                   10.3-3                          arm64        FRRouting Internet routing protocol suite
ii  frr-pythontools                       10.3-3                          all          FRRouting Internet routing protocol suite (reload support)

Any thoughts or ideas would be appreciated.

Thanks,
Casey





More information about the frog mailing list