Thippanna -- Yep looks like FPM doesn't like having it's plug pulled. I've submitted PR https://github.com/FRRouting/frr/pull/5361 which may fix your issue. Can you give it a try? I do not have a FPM setup at this moment to see if it properly stops the crash ( we may just be onto a different crash with this change ). In the meantime I'll try to figure out a FPM setup that I can reasonably test. donald On Sun, Nov 17, 2019 at 11:02 PM hongal <hongal@gmail.com> wrote:
Hi FRRTeam,
I am seeing an assert as below, while I tried to play a bit with FRR 7.2/stable latest code. code, with FPM enabled. Trigger for this is we tried to disable zebra .
This is highly reproducible if we enable FPM, with few 1-5routes/interface configured. Here is the backtrace for the zebra core: but let me know if you need more info:
i1". Core was generated by `/usr/lib/frr/zebra -d -A 127.0.0.1 -M snmp -M fpm'. Program terminated with signal SIGABRT, Aborted. #0 0x00007f1456463428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. [Current thread is 1 (Thread 0x7f1457641700 (LWP 403))] (gdb) bt #0 0x00007f1456463428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 #1 0x00007f145646502a in __GI_abort () at abort.c:89 #2 0x00007f1457193188 in core_handler (signo=11, siginfo=0x7ffcb511eab0, context=0x7ffcb511e980) at lib/sigevent.c:228 #3 <signal handler called> #4 0x00007f1453e942a5 in rnode_is_srcnode (rn=0x2607400) at ./lib/srcdest_table.h:85 #5 0x00007f1453e942d3 in srcdest_rnode_table (rn=0x2607400) at ./lib/srcdest_table.h:90 #6 0x00007f1453e94359 in rib_dest_table (dest=0x2436050) at zebra/rib.h:471 #7 0x00007f1453e94373 in rib_dest_vrf (dest=0x2436050) at zebra/rib.h:479 #8 0x00007f1453e948d1 in netlink_route_info_fill (ri=0x7ffcb511f000, cmd=25, dest=0x2436050, re=0x0) at zebra/zebra_fpm_netlink.c:295 #9 0x00007f1453e953f5 in zfpm_netlink_encode_route (cmd=25, dest=0x2436050, re=0x0, in_buf=0x2528f34 "$", in_buf_len=8188) at zebra/zebra_fpm_netlink.c:575 #10 0x00007f1453e91c0e in zfpm_encode_route (dest=0x2436050, re=0x0, in_buf=0x2528f34 "$", in_buf_len=8188, msg_type=0x7ffcb511f2a4) at zebra/zebra_fpm.c:887 #11 0x00007f1453e91e10 in zfpm_build_route_updates () at zebra/zebra_fpm.c:990 #12 0x00007f1453e92290 in zfpm_build_updates () at zebra/zebra_fpm.c:1151 #13 0x00007f1453e9237f in zfpm_write_cb (thread=0x7ffcb511f5c0) at zebra/zebra_fpm.c:1189 #14 0x00007f14571a8240 in thread_call (thread=0x7ffcb511f5c0) at lib/thread.c:1757 #15 0x00007f145715fd71 in frr_run (master=0x2351ac0) at lib/libfrr.c:1054 #16 0x00000000004262bf in main (argc=8, argv=0x7ffcb511f878) at zebra/main.c:468 (gdb)
Here is the running config : root@dcl4:~# vtysh-frr
Hello, this is FRRouting (version 7.2). Copyright 1996-2005 Kunihiro Ishiguro, et al.
dcl4# show running-config Building configuration...
Current configuration: ! frr version 7.2 frr defaults traditional hostname dcl4 log file zebra.log log timestamp precision 6 agentx log file bgpd.log log file staticd.log log file bfdd.log ! debug zebra dplane debug zebra events debug zebra fpm debug zebra kernel debug zebra rib ! debug vrf ! enable password zebra password zebra ! router-id 1.1.1.1 ! router bgp 6500 bgp router-id 1.1.1.1 neighbor 10.37.0.17 remote-as 6500 ! address-family ipv4 unicast maximum-paths 1 exit-address-family ! address-family ipv6 unicast maximum-paths 1 exit-address-family ! line vty ! end dcl4#
My summary analysis after core analysis and code walk is: While we do handle zebra:main.c:sigint(): After vrf_terminate() is called, we are still holding routes in FPM outbound queue. And these routes need full context of route_node/route_info pointers, which are freed in 'vrf_terminate()' call above:
========================================================== /* SIGINT handler. */ static void sigint(void) { struct vrf *vrf; struct zebra_vrf *zvrf; struct listnode *ln, *nn; struct zserv *client; static bool sigint_done;
if (sigint_done) return;
sigint_done = true;
zlog_notice("Terminating on signal");
frr_early_fini();
zebra_dplane_pre_finish();
for (ALL_LIST_ELEMENTS(zrouter.client_list, ln, nn, client)) zserv_close_client(client);
zserv_close(); list_delete_all_node(zrouter.client_list);
zebra_ptm_finish();
if (retain_mode) RB_FOREACH (vrf, vrf_name_head, &vrfs_by_name) { zvrf = vrf->info; if (zvrf) SET_FLAG(zvrf->flags, ZEBRA_VRF_RETAIN); } if (zrouter.lsp_process_q) work_queue_free_and_null(&zrouter.lsp_process_q); vrf_terminate(); ns_walk_func(zebra_ns_early_shutdown); zebra_ns_notify_close();
access_list_reset(); prefix_list_reset(); route_map_finish(); ......
===================================
/* * netlink_route_info_fill * * Fill out the route information object from the given route. * * Returns true on success and false on failure. */ static int netlink_route_info_fill(netlink_route_info_t *ri, int cmd, rib_dest_t *dest, struct route_entry *re) { struct nexthop *nexthop; struct zebra_vrf *zvrf;
memset(ri, 0, sizeof(*ri));
ri->prefix = rib_dest_prefix(dest);<------Here also core seen sometimes ri->af = rib_dest_af(dest);<--------------Core seen while looking up info,
ri->nlmsg_type = cmd; zvrf = rib_dest_vrf(dest); if (zvrf)
==================Zebra Logs===================(may not match exact code, but this is a typical debug log pattern), I clearly see this assert is after vrf's are freed. ============================================= oot@dorado-colo-leaf4:~# thongal/tail-rs.sh dcl4 2019/11/17 17:57:25.411522 ZEBRA: [EC 100663304] ERROR: No such command on config line 6: ptm-enable 2019/11/17 17:57:25.440060 ZEBRA: client 29 says hello and bids fair to announce only bfd routes vrf=0 2019/11/17 17:57:25.468711 ZEBRA: client 34 says hello and bids fair to announce only static routes vrf=0 2019/11/17 17:57:28.142658 ZEBRA: client 40 says hello and bids fair to announce only bgp routes vrf=0 2019/11/17 17:57:42.226148 ZEBRA: if_zebra_speed_update: eth0.500 old speed: 0 new speed: 10000 2019/11/17 17:57:44.910151 ZEBRA: if_zebra_speed_update: eth1.4092 old speed: 0 new speed: 10000
2019/11/17 18:02:05.107662 ZEBRA: connection closed socket [40] 2019/11/17 18:02:05.107729 ZEBRA: [EC 4043309116] Client 'bgp' encountered an error and is shutting down. 2019/11/17 18:02:05.107825 ZEBRA: Closing client 'bgp' 2019/11/17 18:02:05.107926 ZEBRA: release_daemon_table_chunks: Released 0 table chunks 2019/11/17 18:02:05.107966 ZEBRA: zebra/zebra_ptm.c:1441 failed to find process pid registration 2019/11/17 18:02:05.108017 ZEBRA: client 40 disconnected. 0 bgp routes removed from the rib 2019/11/17 18:02:05.504982 ZEBRA: Terminating on signal 2019/11/17 18:02:05.505038 ZEBRA: Zebra dataplane pre-fini called 2019/11/17 18:02:05.505170 ZEBRA: Closing client 'bfd' 2019/11/17 18:02:05.505249 ZEBRA: release_daemon_table_chunks: Released 0 table chunks 2019/11/17 18:02:05.505310 ZEBRA: client 29 disconnected. 0 bfd routes removed from the rib 2019/11/17 18:02:05.505496 ZEBRA: Closing client 'static' 2019/11/17 18:02:05.505566 ZEBRA: release_daemon_table_chunks: Released 0 table chunks 2019/11/17 18:02:05.505657 ZEBRA: rib_delnode: 0:92.10.1.0/24: rn 0x1cbf3c0, re 0x1cbee40, removing 2019/11/17 18:02:05.505691 ZEBRA: rib_delnode: 0:93.10.1.0/24: rn 0x1cbf620, re 0x1cbf4c0, removing 2019/11/17 18:02:05.505711 ZEBRA: rib_delnode: 0:94.10.1.0/24: rn 0x1cbf880, re 0x1cbf720, removing 2019/11/17 18:02:05.505731 ZEBRA: client 34 disconnected. 3 static routes removed from the rib 2019/11/17 18:02:05.505870 ZEBRA: vrf_terminate: Shutting down vrf subsystem 2019/11/17 18:02:05.505903 ZEBRA: VRF 0 is to be deleted. 2019/11/17 18:02:05.505930 ZEBRA: VRF 0 is to be disabled. 2019/11/17 18:02:05.505955 ZEBRA: VRF default id 0 is now inactive(deleting routes, but not freeing) 2019/11/17 18:02:05.505988 ZEBRA: MESSAGE: ZEBRA_VRF_DELETE default 2019/11/17 18:02:05.506052 ZEBRA: VRF default id 0 deleted 2019/11/17 18:02:05.506095 ZEBRA: netlink_route_multipath(): RTM_DELROUTE 92.10.1.0/24 vrf 0(254) 2019/11/17 18:02:05.506126 ZEBRA: rib_unlink: 0:10.37.0.0/20: rn 0x1c9a9c0, re 0x1ab02e0 2019/11/17 18:02:05.506156 ZEBRA: netlink_talk: netlink-dp (NS 0) type RTM_DELROUTE(25), len=44 seq=8 flags 0x401 2019/11/17 18:02:05.506185 ZEBRA: rib_unlink: 0:92.10.1.0/24: rn 0x1cbf3c0, re 0x1cbee40 2019/11/17 18:02:05.506212 ZEBRA: rib_unlink: 0:93.10.1.0/24: rn 0x1cbf620, re 0x1cbf4c0 2019/11/17 18:02:05.506238 ZEBRA: netlink_route_multipath(): RTM_DELROUTE 93.10.1.0/24 vrf 0(254) 2019/11/17 18:02:05.506272 ZEBRA: rib_unlink: 0:94.10.1.0/24: rn 0x1cbf880, re 0x1cbf720 2019/11/17 18:02:05.506304 ZEBRA: netlink_talk: netlink-dp (NS 0) type RTM_DELROUTE(25), len=44 seq=9 flags 0x401 2019/11/17 18:02:05.506338 ZEBRA: rib_unlink: 0:172.179.1.0/24: rn 0x1cd8020, re 0x1cd7f40 2019/11/17 18:02:05.506366 ZEBRA: netlink_route_multipath(): RTM_DELROUTE 94.10.1.0/24 vrf 0(254) 2019/11/17 18:02:05.506407 ZEBRA: netlink_talk: netlink-dp (NS 0) type RTM_DELROUTE(25), len=44 seq=10 flags 0x401 2019/11/17 18:02:05.506444 ZEBRA: rib_unlink: 0:10.37.0.0/20 (MRIB): rn 0x1cbed60, re 0x1cbec80 2019/11/17 18:02:05.506479 ZEBRA: rib_unlink: 0:172.179.1.0/24 (MRIB): rn 0x1cd8220, re 0x1cd8140 2019/11/17 18:02:05.506576 ZEBRA: Zebra dataplane fini called 2019/11/17 18:02:05.506637 ZEBRA: Zebra dataplane shutdown status check called 2019/11/17 18:02:05.506667 ZEBRA: Failed to find route for ctx: no table for afi 1, safi 1, vrf 0 2019/11/17 18:02:05.506706 ZEBRA: Failed to process dplane results: no route for 0:10.37.0.0/20 2019/11/17 18:02:05.506744 ZEBRA: Failed to find route for ctx: no table for afi 1, safi 1, vrf 0 2019/11/17 18:02:05.506791 ZEBRA: Failed to process dplane results: no route for 0:92.10.1.0/24 2019/11/17 18:02:05.506822 ZEBRA: Failed to find route for ctx: no table for afi 1, safi 1, vrf 0 2019/11/17 18:02:05.506848 ZEBRA: Failed to process dplane results: no route for 0:93.10.1.0/24 2019/11/17 18:02:05.506877 ZEBRA: Failed to find route for ctx: no table for afi 1, safi 1, vrf 0 2019/11/17 18:02:05.506905 ZEBRA: Failed to process dplane results: no route for 0:94.10.1.0/24 2019/11/17 18:02:05.506935 ZEBRA: Failed to find route for ctx: no table for afi 1, safi 1, vrf 0 2019/11/17 18:02:05.506964 ZEBRA: Failed to process dplane results: no route for 0:172.179.1.0/24 ZEBRA: Received signal 11 at 1574042525 (si_addr 0x2bca8, PC 0x7f69611dd2a7); aborting... Program counter: /usr/lib/frr/modules/zebra_fpm.so(+0x62a7)[0x7f69611dd2a7] Backtrace for 19 stack frames: /usr/lib/libfrr.so.0(zlog_backtrace_sigsafe+0x6d)[0x7f69644abd10] /usr/lib/libfrr.so.0(zlog_signal+0x1a1)[0x7f69644abbee] /usr/lib/libfrr.so.0(+0x8e10c)[0x7f69644dc10c] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f6963b52390] /usr/lib/frr/modules/zebra_fpm.so(+0x62a7)[0x7f69611dd2a7] /usr/lib/frr/modules/zebra_fpm.so(+0x62d5)[0x7f69611dd2d5] /usr/lib/frr/modules/zebra_fpm.so(+0x635b)[0x7f69611dd35b] /usr/lib/frr/modules/zebra_fpm.so(+0x6375)[0x7f69611dd375] /usr/lib/frr/modules/zebra_fpm.so(+0x68d3)[0x7f69611dd8d3] /usr/lib/frr/modules/zebra_fpm.so(zfpm_netlink_encode_route+0x6d)[0x7f69611de3f7] /usr/lib/frr/modules/zebra_fpm.so(+0x3c10)[0x7f69611dac10] /usr/lib/frr/modules/zebra_fpm.so(+0x3e12)[0x7f69611dae12] /usr/lib/frr/modules/zebra_fpm.so(+0x4292)[0x7f69611db292] /usr/lib/frr/modules/zebra_fpm.so(+0x4381)[0x7f69611db381] /usr/lib/libfrr.so.0(thread_call+0xb3)[0x7f69644f11e2] /usr/lib/libfrr.so.0(frr_run+0x229)[0x7f69644a8d53] /usr/lib/frr/zebra(main+0x37e)[0x4262b0] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f6963797830] /usr/lib/frr/zebra(_start+0x29)[0x4129a9] in thread zfpm_write_cb scheduled from zebra/zebra_fpm.c:482 2019/11/17 18:02:06.458539 ZEBRA: client 29 says hello and bids fair to announce only bfd routes vrf=0 2019/11/17 18:02:06.488917 ZEBRA: client 34 says hello and bids fair to announce only static routes vrf=0 2019/11/17 18:02:06.575787 ZEBRA: connection to the FPM has gone down: closed socket in read ============================================================= Any help appreciated, Do not see any bugs/issue filed on this yet on github.
Thanks Thippanna _______________________________________________ frog mailing list frog@lists.frrouting.org https://lists.frrouting.org/listinfo/frog