[FROG] migration issues frr6 to frr7 on FreeBSD11

mike tancsa mike at sentex.net
Sat Jan 16 22:04:13 UTC 2021


Ran into a strange issue when I upgraded frr6 to frr7 on a FreeBSD 11
box.  We have a LOT of bgp peers terminated (650 -- not all established)
with a routing table less than 500 prefixes.  Everything was working
just fine on 6, but upgrading to 7, bgp would sig6 a few min after
startup. Not much in the logs.

Jan 16 12:25:33 kit-b zebra[88629]: [EC 4043309122] Client 'bgp'
encountered an error and is shutting down.
Jan 16 12:25:33 kit-b zebra[88629]: [EC 4043309122] Client 'vnc'
encountered an error and is shutting down.
Jan 16 12:25:33 kit-b zebra[88629]: release_daemon_table_chunks:
Released 0 table chunks
Jan 16 12:25:33 kit-b zebra[88629]: zebra/zebra_ptm.c:1348 failed to
find process pid registration
Jan 16 12:25:33 kit-b zebra[88629]: client 11 disconnected 66 bgp routes
removed from the rib
Jan 16 12:25:33 kit-b zebra[88629]: release_daemon_table_chunks:
Released 0 table chunks
Jan 16 12:25:33 kit-b zebra[88629]: client 24 disconnected 0 vnc routes
removed from the rib
Jan 16 12:25:33 kit-b zebra[88629]: [EC 100663303] kernel_rtm:
0.0.0.0/0: rtm_write() unexpectedly returned -4 for command RTM_DELETE

It would also generate a lot of kernel messages while the daemon was
running such as

Jan 16 16:00:59 kit-b kernel: sonewconn: pcb 0xfffff801e65123a0: Listen
queue overflow: 193 already in queue awaiting acceptance (622 occurrences)
Jan 16 16:01:59 kit-b kernel: sonewconn: pcb 0xfffff801e65123a0: Listen
queue overflow: 193 already in queue awaiting acceptance (557 occurrences)
Jan 16 16:02:59 kit-b kernel: sonewconn: pcb 0xfffff801e65123a0: Listen
queue overflow: 193 already in queue awaiting acceptance (622 occurrences)
Jan 16 16:03:59 kit-b kernel: sonewconn: pcb 0xfffff801e65123a0: Listen
queue overflow: 193 already in queue awaiting acceptance (556 occurrences)

that are not generated when running frr6.  The problem version was built
from the ports. 7.5_1. The only I option I used was build vtysh.  In
case it was some memory issue, I tried a version with tcmalloc, however
it was failing as well.

We use frr7 elsewhere with a lot less peers and all works just fine, but
those are on RELENG_12.  As this is a production box, I cant do much
experimenting on it. Not sure how to recreate in the lab easily.  Do
these problems ring a bell with anyone ? the box is pretty quiet. There
are no memory nor CPU pressures on it.  It doesnt seem to be a
"thundering herd" problem as I tried shutting half the peers at startup
to no avail.

    ---Mike





More information about the frog mailing list