[FROG] FRR 7.2/Stable(latest) zebra core seen while handling signal in sigint()

Donald Sharp sharpd at cumulusnetworks.com
Mon Nov 18 14:41:53 EST 2019


Thippanna --

Yep looks like FPM doesn't like having it's plug pulled.  I've
submitted PR https://github.com/FRRouting/frr/pull/5361 which may fix
your issue.  Can you give it a try?  I do not have a FPM setup at this
moment to see if it properly stops the crash ( we may just be onto a
different crash with this change ).  In the meantime I'll try to
figure out a FPM setup that I can reasonably test.

donald

On Sun, Nov 17, 2019 at 11:02 PM hongal <hongal at gmail.com> wrote:
>
> Hi FRRTeam,
>
> I am seeing an assert as below, while I tried to play a bit with FRR 7.2/stable  latest code.
> code, with FPM enabled. Trigger for this is we tried to disable zebra .
>
> This is highly reproducible if we enable FPM, with few 1-5routes/interface configured.
> Here is the backtrace for the zebra core:  but let me know if you need more info:
>
> i1".
> Core was generated by `/usr/lib/frr/zebra -d -A 127.0.0.1 -M snmp -M fpm'.
> Program terminated with signal SIGABRT, Aborted.
> #0  0x00007f1456463428 in __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
> 54    ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> [Current thread is 1 (Thread 0x7f1457641700 (LWP 403))]
> (gdb) bt
> #0  0x00007f1456463428 in __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
> #1  0x00007f145646502a in __GI_abort () at abort.c:89
> #2  0x00007f1457193188 in core_handler (signo=11, siginfo=0x7ffcb511eab0, context=0x7ffcb511e980) at lib/sigevent.c:228
> #3  <signal handler called>
> #4  0x00007f1453e942a5 in rnode_is_srcnode (rn=0x2607400) at ./lib/srcdest_table.h:85
> #5  0x00007f1453e942d3 in srcdest_rnode_table (rn=0x2607400) at ./lib/srcdest_table.h:90
> #6  0x00007f1453e94359 in rib_dest_table (dest=0x2436050) at zebra/rib.h:471
> #7  0x00007f1453e94373 in rib_dest_vrf (dest=0x2436050) at zebra/rib.h:479
> #8  0x00007f1453e948d1 in netlink_route_info_fill (ri=0x7ffcb511f000, cmd=25, dest=0x2436050, re=0x0) at zebra/zebra_fpm_netlink.c:295
> #9  0x00007f1453e953f5 in zfpm_netlink_encode_route (cmd=25, dest=0x2436050, re=0x0, in_buf=0x2528f34 "$", in_buf_len=8188) at zebra/zebra_fpm_netlink.c:575
> #10 0x00007f1453e91c0e in zfpm_encode_route (dest=0x2436050, re=0x0, in_buf=0x2528f34 "$", in_buf_len=8188, msg_type=0x7ffcb511f2a4) at zebra/zebra_fpm.c:887
> #11 0x00007f1453e91e10 in zfpm_build_route_updates () at zebra/zebra_fpm.c:990
> #12 0x00007f1453e92290 in zfpm_build_updates () at zebra/zebra_fpm.c:1151
> #13 0x00007f1453e9237f in zfpm_write_cb (thread=0x7ffcb511f5c0) at zebra/zebra_fpm.c:1189
> #14 0x00007f14571a8240 in thread_call (thread=0x7ffcb511f5c0) at lib/thread.c:1757
> #15 0x00007f145715fd71 in frr_run (master=0x2351ac0) at lib/libfrr.c:1054
> #16 0x00000000004262bf in main (argc=8, argv=0x7ffcb511f878) at zebra/main.c:468
> (gdb)
>
>
>
> Here is the running config :
> root at dcl4:~# vtysh-frr
>
> Hello, this is FRRouting (version 7.2).
> Copyright 1996-2005 Kunihiro Ishiguro, et al.
>
> dcl4# show running-config
> Building configuration...
>
> Current configuration:
> !
> frr version 7.2
> frr defaults traditional
> hostname dcl4
> log file zebra.log
> log timestamp precision 6
> agentx
> log file bgpd.log
> log file staticd.log
> log file bfdd.log
> !
> debug zebra dplane
> debug zebra events
> debug zebra fpm
> debug zebra kernel
> debug zebra rib
> !
> debug vrf
> !
> enable password zebra
> password zebra
> !
> router-id 1.1.1.1
> !
> router bgp 6500
>  bgp router-id 1.1.1.1
>  neighbor 10.37.0.17 remote-as 6500
>  !
>  address-family ipv4 unicast
>   maximum-paths 1
>  exit-address-family
>  !
>  address-family ipv6 unicast
>   maximum-paths 1
>  exit-address-family
> !
> line vty
> !
> end
> dcl4#
>
>
> My summary analysis after core analysis and code walk is:
> While we do handle zebra:main.c:sigint():
> After vrf_terminate() is called, we  are still holding routes in FPM outbound queue.
> And these routes need full context of route_node/route_info pointers, which are
> freed in 'vrf_terminate()' call above:
>
>
> ==========================================================
> /* SIGINT handler. */
> static void sigint(void)
> {
>         struct vrf *vrf;
>         struct zebra_vrf *zvrf;
>         struct listnode *ln, *nn;
>         struct zserv *client;
>         static bool sigint_done;
>
>         if (sigint_done)
>                 return;
>
>         sigint_done = true;
>
>         zlog_notice("Terminating on signal");
>
>         frr_early_fini();
>
>         zebra_dplane_pre_finish();
>
>         for (ALL_LIST_ELEMENTS(zrouter.client_list, ln, nn, client))
>                 zserv_close_client(client);
>
>         zserv_close();
>         list_delete_all_node(zrouter.client_list);
>
>         zebra_ptm_finish();
>
>         if (retain_mode)
>                 RB_FOREACH (vrf, vrf_name_head, &vrfs_by_name) {
>                         zvrf = vrf->info;
>                         if (zvrf)
>                                 SET_FLAG(zvrf->flags, ZEBRA_VRF_RETAIN);
>                 }
>         if (zrouter.lsp_process_q)
>                 work_queue_free_and_null(&zrouter.lsp_process_q);
>         vrf_terminate();
>         ns_walk_func(zebra_ns_early_shutdown);
>         zebra_ns_notify_close();
>
>         access_list_reset();
>         prefix_list_reset();
>         route_map_finish();
> ......
>
>
> ===================================
>
> /*
>  * netlink_route_info_fill
>  *
>  * Fill out the route information object from the given route.
>  *
>  * Returns true on success and false on failure.
>  */
> static int netlink_route_info_fill(netlink_route_info_t *ri, int cmd,
>                                    rib_dest_t *dest, struct route_entry *re)
> {
>         struct nexthop *nexthop;
>         struct zebra_vrf *zvrf;
>
>         memset(ri, 0, sizeof(*ri));
>
>         ri->prefix = rib_dest_prefix(dest);<------Here also core seen sometimes
>         ri->af = rib_dest_af(dest);<--------------Core seen while looking up info,
>
>         ri->nlmsg_type = cmd;
>         zvrf = rib_dest_vrf(dest);
>         if (zvrf)
>
> ==================Zebra Logs===================(may not match exact code, but
> this is a typical debug log pattern), I clearly see this assert is after vrf's are freed.
> =============================================
> oot at dorado-colo-leaf4:~# thongal/tail-rs.sh dcl4
> 2019/11/17 17:57:25.411522 ZEBRA: [EC 100663304] ERROR: No such command on config line 6: ptm-enable
> 2019/11/17 17:57:25.440060 ZEBRA: client 29 says hello and bids fair to announce only bfd routes vrf=0
> 2019/11/17 17:57:25.468711 ZEBRA: client 34 says hello and bids fair to announce only static routes vrf=0
> 2019/11/17 17:57:28.142658 ZEBRA: client 40 says hello and bids fair to announce only bgp routes vrf=0
> 2019/11/17 17:57:42.226148 ZEBRA: if_zebra_speed_update: eth0.500 old speed: 0 new speed: 10000
> 2019/11/17 17:57:44.910151 ZEBRA: if_zebra_speed_update: eth1.4092 old speed: 0 new speed: 10000
>
>
> 2019/11/17 18:02:05.107662 ZEBRA: connection closed socket [40]
> 2019/11/17 18:02:05.107729 ZEBRA: [EC 4043309116] Client 'bgp' encountered an error and is shutting down.
> 2019/11/17 18:02:05.107825 ZEBRA: Closing client 'bgp'
> 2019/11/17 18:02:05.107926 ZEBRA: release_daemon_table_chunks: Released 0 table chunks
> 2019/11/17 18:02:05.107966 ZEBRA: zebra/zebra_ptm.c:1441 failed to find process pid registration
> 2019/11/17 18:02:05.108017 ZEBRA: client 40 disconnected. 0 bgp routes removed from the rib
> 2019/11/17 18:02:05.504982 ZEBRA: Terminating on signal
> 2019/11/17 18:02:05.505038 ZEBRA: Zebra dataplane pre-fini called
> 2019/11/17 18:02:05.505170 ZEBRA: Closing client 'bfd'
> 2019/11/17 18:02:05.505249 ZEBRA: release_daemon_table_chunks: Released 0 table chunks
> 2019/11/17 18:02:05.505310 ZEBRA: client 29 disconnected. 0 bfd routes removed from the rib
> 2019/11/17 18:02:05.505496 ZEBRA: Closing client 'static'
> 2019/11/17 18:02:05.505566 ZEBRA: release_daemon_table_chunks: Released 0 table chunks
> 2019/11/17 18:02:05.505657 ZEBRA: rib_delnode: 0:92.10.1.0/24: rn 0x1cbf3c0, re 0x1cbee40, removing
> 2019/11/17 18:02:05.505691 ZEBRA: rib_delnode: 0:93.10.1.0/24: rn 0x1cbf620, re 0x1cbf4c0, removing
> 2019/11/17 18:02:05.505711 ZEBRA: rib_delnode: 0:94.10.1.0/24: rn 0x1cbf880, re 0x1cbf720, removing
> 2019/11/17 18:02:05.505731 ZEBRA: client 34 disconnected. 3 static routes removed from the rib
> 2019/11/17 18:02:05.505870 ZEBRA: vrf_terminate: Shutting down vrf subsystem
> 2019/11/17 18:02:05.505903 ZEBRA: VRF 0 is to be deleted.
> 2019/11/17 18:02:05.505930 ZEBRA: VRF 0 is to be disabled.
> 2019/11/17 18:02:05.505955 ZEBRA: VRF default id 0 is now inactive(deleting routes, but not freeing)
> 2019/11/17 18:02:05.505988 ZEBRA: MESSAGE: ZEBRA_VRF_DELETE default
> 2019/11/17 18:02:05.506052 ZEBRA: VRF default id 0 deleted
> 2019/11/17 18:02:05.506095 ZEBRA: netlink_route_multipath(): RTM_DELROUTE 92.10.1.0/24 vrf 0(254)
> 2019/11/17 18:02:05.506126 ZEBRA: rib_unlink: 0:10.37.0.0/20: rn 0x1c9a9c0, re 0x1ab02e0
> 2019/11/17 18:02:05.506156 ZEBRA: netlink_talk: netlink-dp (NS 0) type RTM_DELROUTE(25), len=44 seq=8 flags 0x401
> 2019/11/17 18:02:05.506185 ZEBRA: rib_unlink: 0:92.10.1.0/24: rn 0x1cbf3c0, re 0x1cbee40
> 2019/11/17 18:02:05.506212 ZEBRA: rib_unlink: 0:93.10.1.0/24: rn 0x1cbf620, re 0x1cbf4c0
> 2019/11/17 18:02:05.506238 ZEBRA: netlink_route_multipath(): RTM_DELROUTE 93.10.1.0/24 vrf 0(254)
> 2019/11/17 18:02:05.506272 ZEBRA: rib_unlink: 0:94.10.1.0/24: rn 0x1cbf880, re 0x1cbf720
> 2019/11/17 18:02:05.506304 ZEBRA: netlink_talk: netlink-dp (NS 0) type RTM_DELROUTE(25), len=44 seq=9 flags 0x401
> 2019/11/17 18:02:05.506338 ZEBRA: rib_unlink: 0:172.179.1.0/24: rn 0x1cd8020, re 0x1cd7f40
> 2019/11/17 18:02:05.506366 ZEBRA: netlink_route_multipath(): RTM_DELROUTE 94.10.1.0/24 vrf 0(254)
> 2019/11/17 18:02:05.506407 ZEBRA: netlink_talk: netlink-dp (NS 0) type RTM_DELROUTE(25), len=44 seq=10 flags 0x401
> 2019/11/17 18:02:05.506444 ZEBRA: rib_unlink: 0:10.37.0.0/20 (MRIB): rn 0x1cbed60, re 0x1cbec80
> 2019/11/17 18:02:05.506479 ZEBRA: rib_unlink: 0:172.179.1.0/24 (MRIB): rn 0x1cd8220, re 0x1cd8140
> 2019/11/17 18:02:05.506576 ZEBRA: Zebra dataplane fini called
> 2019/11/17 18:02:05.506637 ZEBRA: Zebra dataplane shutdown status check called
> 2019/11/17 18:02:05.506667 ZEBRA: Failed to find route for ctx: no table for afi 1, safi 1, vrf 0
> 2019/11/17 18:02:05.506706 ZEBRA: Failed to process dplane results: no route for 0:10.37.0.0/20
> 2019/11/17 18:02:05.506744 ZEBRA: Failed to find route for ctx: no table for afi 1, safi 1, vrf 0
> 2019/11/17 18:02:05.506791 ZEBRA: Failed to process dplane results: no route for 0:92.10.1.0/24
> 2019/11/17 18:02:05.506822 ZEBRA: Failed to find route for ctx: no table for afi 1, safi 1, vrf 0
> 2019/11/17 18:02:05.506848 ZEBRA: Failed to process dplane results: no route for 0:93.10.1.0/24
> 2019/11/17 18:02:05.506877 ZEBRA: Failed to find route for ctx: no table for afi 1, safi 1, vrf 0
> 2019/11/17 18:02:05.506905 ZEBRA: Failed to process dplane results: no route for 0:94.10.1.0/24
> 2019/11/17 18:02:05.506935 ZEBRA: Failed to find route for ctx: no table for afi 1, safi 1, vrf 0
> 2019/11/17 18:02:05.506964 ZEBRA: Failed to process dplane results: no route for 0:172.179.1.0/24
> ZEBRA: Received signal 11 at 1574042525 (si_addr 0x2bca8, PC 0x7f69611dd2a7); aborting...
> Program counter: /usr/lib/frr/modules/zebra_fpm.so(+0x62a7)[0x7f69611dd2a7]
> Backtrace for 19 stack frames:
> /usr/lib/libfrr.so.0(zlog_backtrace_sigsafe+0x6d)[0x7f69644abd10]
> /usr/lib/libfrr.so.0(zlog_signal+0x1a1)[0x7f69644abbee]
> /usr/lib/libfrr.so.0(+0x8e10c)[0x7f69644dc10c]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f6963b52390]
> /usr/lib/frr/modules/zebra_fpm.so(+0x62a7)[0x7f69611dd2a7]
> /usr/lib/frr/modules/zebra_fpm.so(+0x62d5)[0x7f69611dd2d5]
> /usr/lib/frr/modules/zebra_fpm.so(+0x635b)[0x7f69611dd35b]
> /usr/lib/frr/modules/zebra_fpm.so(+0x6375)[0x7f69611dd375]
> /usr/lib/frr/modules/zebra_fpm.so(+0x68d3)[0x7f69611dd8d3]
> /usr/lib/frr/modules/zebra_fpm.so(zfpm_netlink_encode_route+0x6d)[0x7f69611de3f7]
> /usr/lib/frr/modules/zebra_fpm.so(+0x3c10)[0x7f69611dac10]
> /usr/lib/frr/modules/zebra_fpm.so(+0x3e12)[0x7f69611dae12]
> /usr/lib/frr/modules/zebra_fpm.so(+0x4292)[0x7f69611db292]
> /usr/lib/frr/modules/zebra_fpm.so(+0x4381)[0x7f69611db381]
> /usr/lib/libfrr.so.0(thread_call+0xb3)[0x7f69644f11e2]
> /usr/lib/libfrr.so.0(frr_run+0x229)[0x7f69644a8d53]
> /usr/lib/frr/zebra(main+0x37e)[0x4262b0]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f6963797830]
> /usr/lib/frr/zebra(_start+0x29)[0x4129a9]
> in thread zfpm_write_cb scheduled from zebra/zebra_fpm.c:482
> 2019/11/17 18:02:06.458539 ZEBRA: client 29 says hello and bids fair to announce only bfd routes vrf=0
> 2019/11/17 18:02:06.488917 ZEBRA: client 34 says hello and bids fair to announce only static routes vrf=0
> 2019/11/17 18:02:06.575787 ZEBRA: connection to the FPM has gone down: closed socket in read
> =============================================================
> Any help appreciated, Do not see any bugs/issue filed on this yet on github.
>
> Thanks
> Thippanna
> _______________________________________________
> frog mailing list
> frog at lists.frrouting.org
> https://lists.frrouting.org/listinfo/frog



More information about the frog mailing list