[FROG] FRR 7.2/Stable(latest) zebra core seen while handling signal in sigint()
hongal
hongal at gmail.com
Sun Nov 17 22:59:03 EST 2019
Hi FRRTeam,
I am seeing an assert as below, while I tried to play a bit with FRR
7.2/stable latest code.
code, with FPM enabled. Trigger for this is we tried to disable zebra .
This is highly reproducible if we enable FPM, with few 1-5routes/interface
configured.
Here is the backtrace for the zebra core: but let me know if you need more
info:
i1".
Core was generated by `/usr/lib/frr/zebra -d -A 127.0.0.1 -M snmp -M fpm'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f1456463428 in __GI_raise (sig=sig at entry=6) at
../sysdeps/unix/sysv/linux/raise.c:54
54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f1457641700 (LWP 403))]
(gdb) bt
#0 0x00007f1456463428 in __GI_raise (sig=sig at entry=6) at
../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007f145646502a in __GI_abort () at abort.c:89
#2 0x00007f1457193188 in core_handler (signo=11, siginfo=0x7ffcb511eab0,
context=0x7ffcb511e980) at lib/sigevent.c:228
#3 <signal handler called>
#4 0x00007f1453e942a5 in rnode_is_srcnode (rn=0x2607400) at
./lib/srcdest_table.h:85
#5 0x00007f1453e942d3 in srcdest_rnode_table (rn=0x2607400) at
./lib/srcdest_table.h:90
#6 0x00007f1453e94359 in rib_dest_table (dest=0x2436050) at zebra/rib.h:471
#7 0x00007f1453e94373 in rib_dest_vrf (dest=0x2436050) at zebra/rib.h:479
#8 0x00007f1453e948d1 in netlink_route_info_fill (ri=0x7ffcb511f000,
cmd=25, dest=0x2436050, re=0x0) at zebra/zebra_fpm_netlink.c:295
#9 0x00007f1453e953f5 in zfpm_netlink_encode_route (cmd=25,
dest=0x2436050, re=0x0, in_buf=0x2528f34 "$", in_buf_len=8188) at
zebra/zebra_fpm_netlink.c:575
#10 0x00007f1453e91c0e in zfpm_encode_route (dest=0x2436050, re=0x0,
in_buf=0x2528f34 "$", in_buf_len=8188, msg_type=0x7ffcb511f2a4) at
zebra/zebra_fpm.c:887
#11 0x00007f1453e91e10 in zfpm_build_route_updates () at
zebra/zebra_fpm.c:990
#12 0x00007f1453e92290 in zfpm_build_updates () at zebra/zebra_fpm.c:1151
#13 0x00007f1453e9237f in zfpm_write_cb (thread=0x7ffcb511f5c0) at
zebra/zebra_fpm.c:1189
#14 0x00007f14571a8240 in thread_call (thread=0x7ffcb511f5c0) at
lib/thread.c:1757
#15 0x00007f145715fd71 in frr_run (master=0x2351ac0) at lib/libfrr.c:1054
#16 0x00000000004262bf in main (argc=8, argv=0x7ffcb511f878) at
zebra/main.c:468
(gdb)
Here is the running config :
root at dcl4:~# vtysh-frr
Hello, this is FRRouting (version 7.2).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
dcl4# show running-config
Building configuration...
Current configuration:
!
frr version 7.2
frr defaults traditional
hostname dcl4
log file zebra.log
log timestamp precision 6
agentx
log file bgpd.log
log file staticd.log
log file bfdd.log
!
debug zebra dplane
debug zebra events
debug zebra fpm
debug zebra kernel
debug zebra rib
!
debug vrf
!
enable password zebra
password zebra
!
router-id 1.1.1.1
!
router bgp 6500
bgp router-id 1.1.1.1
neighbor 10.37.0.17 remote-as 6500
!
address-family ipv4 unicast
maximum-paths 1
exit-address-family
!
address-family ipv6 unicast
maximum-paths 1
exit-address-family
!
line vty
!
end
dcl4#
My summary analysis after core analysis and code walk is:
While we do handle zebra:main.c:sigint():
After vrf_terminate() is called, we are still holding routes in FPM
outbound queue.
And these routes need full context of route_node/route_info pointers, which
are
freed in 'vrf_terminate()' call above:
==========================================================
/* SIGINT handler. */
static void sigint(void)
{
struct vrf *vrf;
struct zebra_vrf *zvrf;
struct listnode *ln, *nn;
struct zserv *client;
static bool sigint_done;
if (sigint_done)
return;
sigint_done = true;
zlog_notice("Terminating on signal");
frr_early_fini();
zebra_dplane_pre_finish();
for (ALL_LIST_ELEMENTS(zrouter.client_list, ln, nn, client))
zserv_close_client(client);
zserv_close();
list_delete_all_node(zrouter.client_list);
zebra_ptm_finish();
if (retain_mode)
RB_FOREACH (vrf, vrf_name_head, &vrfs_by_name) {
zvrf = vrf->info;
if (zvrf)
SET_FLAG(zvrf->flags, ZEBRA_VRF_RETAIN);
}
if (zrouter.lsp_process_q)
work_queue_free_and_null(&zrouter.lsp_process_q);
vrf_terminate();
ns_walk_func(zebra_ns_early_shutdown);
zebra_ns_notify_close();
access_list_reset();
prefix_list_reset();
route_map_finish();
......
===================================
/*
* netlink_route_info_fill
*
* Fill out the route information object from the given route.
*
* Returns true on success and false on failure.
*/
static int netlink_route_info_fill(netlink_route_info_t *ri, int cmd,
rib_dest_t *dest, struct route_entry *re)
{
struct nexthop *nexthop;
struct zebra_vrf *zvrf;
memset(ri, 0, sizeof(*ri));
ri->prefix = rib_dest_prefix(dest);<------Here also core seen
sometimes
ri->af = rib_dest_af(dest);<--------------Core seen while looking
up info,
ri->nlmsg_type = cmd;
zvrf = rib_dest_vrf(dest);
if (zvrf)
==================Zebra Logs===================(may not match exact code,
but
this is a typical debug log pattern), I clearly see this assert is after
vrf's are freed.
=============================================
oot at dorado-colo-leaf4:~# thongal/tail-rs.sh dcl4
2019/11/17 17:57:25.411522 ZEBRA: [EC 100663304] ERROR: No such command on
config line 6: ptm-enable
2019/11/17 17:57:25.440060 ZEBRA: client 29 says hello and bids fair to
announce only bfd routes vrf=0
2019/11/17 17:57:25.468711 ZEBRA: client 34 says hello and bids fair to
announce only static routes vrf=0
2019/11/17 17:57:28.142658 ZEBRA: client 40 says hello and bids fair to
announce only bgp routes vrf=0
2019/11/17 17:57:42.226148 ZEBRA: if_zebra_speed_update: eth0.500 old
speed: 0 new speed: 10000
2019/11/17 17:57:44.910151 ZEBRA: if_zebra_speed_update: eth1.4092 old
speed: 0 new speed: 10000
2019/11/17 18:02:05.107662 ZEBRA: connection closed socket [40]
2019/11/17 18:02:05.107729 ZEBRA: [EC 4043309116] Client 'bgp' encountered
an error and is shutting down.
2019/11/17 18:02:05.107825 ZEBRA: Closing client 'bgp'
2019/11/17 18:02:05.107926 ZEBRA: release_daemon_table_chunks: Released 0
table chunks
2019/11/17 18:02:05.107966 ZEBRA: zebra/zebra_ptm.c:1441 failed to find
process pid registration
2019/11/17 18:02:05.108017 ZEBRA: client 40 disconnected. 0 bgp routes
removed from the rib
2019/11/17 18:02:05.504982 ZEBRA: Terminating on signal
2019/11/17 18:02:05.505038 ZEBRA: Zebra dataplane pre-fini called
2019/11/17 18:02:05.505170 ZEBRA: Closing client 'bfd'
2019/11/17 18:02:05.505249 ZEBRA: release_daemon_table_chunks: Released 0
table chunks
2019/11/17 18:02:05.505310 ZEBRA: client 29 disconnected. 0 bfd routes
removed from the rib
2019/11/17 18:02:05.505496 ZEBRA: Closing client 'static'
2019/11/17 18:02:05.505566 ZEBRA: release_daemon_table_chunks: Released 0
table chunks
2019/11/17 18:02:05.505657 ZEBRA: rib_delnode: 0:92.10.1.0/24: rn
0x1cbf3c0, re 0x1cbee40, removing
2019/11/17 18:02:05.505691 ZEBRA: rib_delnode: 0:93.10.1.0/24: rn
0x1cbf620, re 0x1cbf4c0, removing
2019/11/17 18:02:05.505711 ZEBRA: rib_delnode: 0:94.10.1.0/24: rn
0x1cbf880, re 0x1cbf720, removing
2019/11/17 18:02:05.505731 ZEBRA: client 34 disconnected. 3 static routes
removed from the rib
2019/11/17 18:02:05.505870 ZEBRA: vrf_terminate: Shutting down vrf subsystem
2019/11/17 18:02:05.505903 ZEBRA: VRF 0 is to be deleted.
2019/11/17 18:02:05.505930 ZEBRA: VRF 0 is to be disabled.
2019/11/17 18:02:05.505955 ZEBRA: VRF default id 0 is now inactive(deleting
routes, but not freeing)
2019/11/17 18:02:05.505988 ZEBRA: MESSAGE: ZEBRA_VRF_DELETE default
2019/11/17 18:02:05.506052 ZEBRA: VRF default id 0 deleted
2019/11/17 18:02:05.506095 ZEBRA: netlink_route_multipath(): RTM_DELROUTE
92.10.1.0/24 vrf 0(254)
2019/11/17 18:02:05.506126 ZEBRA: rib_unlink: 0:10.37.0.0/20: rn 0x1c9a9c0,
re 0x1ab02e0
2019/11/17 18:02:05.506156 ZEBRA: netlink_talk: netlink-dp (NS 0) type
RTM_DELROUTE(25), len=44 seq=8 flags 0x401
2019/11/17 18:02:05.506185 ZEBRA: rib_unlink: 0:92.10.1.0/24: rn 0x1cbf3c0,
re 0x1cbee40
2019/11/17 18:02:05.506212 ZEBRA: rib_unlink: 0:93.10.1.0/24: rn 0x1cbf620,
re 0x1cbf4c0
2019/11/17 18:02:05.506238 ZEBRA: netlink_route_multipath(): RTM_DELROUTE
93.10.1.0/24 vrf 0(254)
2019/11/17 18:02:05.506272 ZEBRA: rib_unlink: 0:94.10.1.0/24: rn 0x1cbf880,
re 0x1cbf720
2019/11/17 18:02:05.506304 ZEBRA: netlink_talk: netlink-dp (NS 0) type
RTM_DELROUTE(25), len=44 seq=9 flags 0x401
2019/11/17 18:02:05.506338 ZEBRA: rib_unlink: 0:172.179.1.0/24: rn
0x1cd8020, re 0x1cd7f40
2019/11/17 18:02:05.506366 ZEBRA: netlink_route_multipath(): RTM_DELROUTE
94.10.1.0/24 vrf 0(254)
2019/11/17 18:02:05.506407 ZEBRA: netlink_talk: netlink-dp (NS 0) type
RTM_DELROUTE(25), len=44 seq=10 flags 0x401
2019/11/17 18:02:05.506444 ZEBRA: rib_unlink: 0:10.37.0.0/20 (MRIB): rn
0x1cbed60, re 0x1cbec80
2019/11/17 18:02:05.506479 ZEBRA: rib_unlink: 0:172.179.1.0/24 (MRIB): rn
0x1cd8220, re 0x1cd8140
2019/11/17 18:02:05.506576 ZEBRA: Zebra dataplane fini called
2019/11/17 18:02:05.506637 ZEBRA: Zebra dataplane shutdown status check
called
2019/11/17 18:02:05.506667 ZEBRA: Failed to find route for ctx: no table
for afi 1, safi 1, vrf 0
2019/11/17 18:02:05.506706 ZEBRA: Failed to process dplane results: no
route for 0:10.37.0.0/20
2019/11/17 18:02:05.506744 ZEBRA: Failed to find route for ctx: no table
for afi 1, safi 1, vrf 0
2019/11/17 18:02:05.506791 ZEBRA: Failed to process dplane results: no
route for 0:92.10.1.0/24
2019/11/17 18:02:05.506822 ZEBRA: Failed to find route for ctx: no table
for afi 1, safi 1, vrf 0
2019/11/17 18:02:05.506848 ZEBRA: Failed to process dplane results: no
route for 0:93.10.1.0/24
2019/11/17 18:02:05.506877 ZEBRA: Failed to find route for ctx: no table
for afi 1, safi 1, vrf 0
2019/11/17 18:02:05.506905 ZEBRA: Failed to process dplane results: no
route for 0:94.10.1.0/24
2019/11/17 18:02:05.506935 ZEBRA: Failed to find route for ctx: no table
for afi 1, safi 1, vrf 0
2019/11/17 18:02:05.506964 ZEBRA: Failed to process dplane results: no
route for 0:172.179.1.0/24
ZEBRA: Received signal 11 at 1574042525 (si_addr 0x2bca8, PC
0x7f69611dd2a7); aborting...
Program counter: /usr/lib/frr/modules/zebra_fpm.so(+0x62a7)[0x7f69611dd2a7]
Backtrace for 19 stack frames:
/usr/lib/libfrr.so.0(zlog_backtrace_sigsafe+0x6d)[0x7f69644abd10]
/usr/lib/libfrr.so.0(zlog_signal+0x1a1)[0x7f69644abbee]
/usr/lib/libfrr.so.0(+0x8e10c)[0x7f69644dc10c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f6963b52390]
/usr/lib/frr/modules/zebra_fpm.so(+0x62a7)[0x7f69611dd2a7]
/usr/lib/frr/modules/zebra_fpm.so(+0x62d5)[0x7f69611dd2d5]
/usr/lib/frr/modules/zebra_fpm.so(+0x635b)[0x7f69611dd35b]
/usr/lib/frr/modules/zebra_fpm.so(+0x6375)[0x7f69611dd375]
/usr/lib/frr/modules/zebra_fpm.so(+0x68d3)[0x7f69611dd8d3]
/usr/lib/frr/modules/zebra_fpm.so(zfpm_netlink_encode_route+0x6d)[0x7f69611de3f7]
/usr/lib/frr/modules/zebra_fpm.so(+0x3c10)[0x7f69611dac10]
/usr/lib/frr/modules/zebra_fpm.so(+0x3e12)[0x7f69611dae12]
/usr/lib/frr/modules/zebra_fpm.so(+0x4292)[0x7f69611db292]
/usr/lib/frr/modules/zebra_fpm.so(+0x4381)[0x7f69611db381]
/usr/lib/libfrr.so.0(thread_call+0xb3)[0x7f69644f11e2]
/usr/lib/libfrr.so.0(frr_run+0x229)[0x7f69644a8d53]
/usr/lib/frr/zebra(main+0x37e)[0x4262b0]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f6963797830]
/usr/lib/frr/zebra(_start+0x29)[0x4129a9]
in thread zfpm_write_cb scheduled from zebra/zebra_fpm.c:482
2019/11/17 18:02:06.458539 ZEBRA: client 29 says hello and bids fair to
announce only bfd routes vrf=0
2019/11/17 18:02:06.488917 ZEBRA: client 34 says hello and bids fair to
announce only static routes vrf=0
2019/11/17 18:02:06.575787 ZEBRA: connection to the FPM has gone down:
closed socket in read
=============================================================
Any help appreciated, Do not see any bugs/issue filed on this yet on github.
Thanks
Thippanna
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.frrouting.org/pipermail/frog/attachments/20191117/69ccab05/attachment-0001.html>
More information about the frog
mailing list