Bgpd 5.0.1 gets killed by watchdog timeout
Hi! I've been using 5.0.1 deb package provided on github on Debian 9.5. It looks like I'm having the same issue as https://lists.frrouting.org/pipermail/frog/2017-May/000016.html Also, it seems that systemd is enabled https://github.com/FRRouting/frr/blob/18d93bbb5a2d6acc791726ad6de7f11d6818d3... and also, libsystemd is linked to bgpd: $ ldd /usr/lib/frr/bgpd | grep system libsystemd.so.0 => /lib/x86_64-linux-gnu/libsystemd.so.0 (0x00007f6fc540c000) Am I missing something? Thanks! -- François
Can we get the output of `vtysh -c "show ver"` donald On Fri, Sep 14, 2018 at 6:30 AM, François <francois.serman@corp.ovh.com> wrote:
Hi!
I've been using 5.0.1 deb package provided on github on Debian 9.5.
It looks like I'm having the same issue as https://lists.frrouting.org/pipermail/frog/2017-May/000016.html Also, it seems that systemd is enabled https://github.com/FRRouting/frr/blob/18d93bbb5a2d6acc791726ad6de7f11d6818d3...
and also, libsystemd is linked to bgpd:
$ ldd /usr/lib/frr/bgpd | grep system libsystemd.so.0 => /lib/x86_64-linux-gnu/libsystemd.so.0 (0x00007f6fc540c000)
Am I missing something?
Thanks!
-- François
_______________________________________________ dev mailing list dev@lists.frrouting.org https://lists.frrouting.org/listinfo/dev
On Fri, Sep 14, 2018 at 07:25:08AM -0400, Donald Sharp wrote:
Can we get the output of `vtysh -c "show ver"`
Thanks, looks like it's the output I was looking for. # vtysh -c 'show vers' FRRouting 5.0.1 (eva22). Copyright 1996-2005 Kunihiro Ishiguro, et al. configured with: '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-silent-rules' '--libexecdir=${prefix}/lib/frr' '--disable-maintainer-mode' '--disable-dependency-tracking' '--enable-exampledir=/usr/share/doc/frr/examples/' '--localstatedir=/var/run/frr' '--sbindir=/usr/lib/frr' '--sysconfdir=/etc/frr' '--disable-snmp' '--enable-ospfapi=yes' '--enable-multipath=256' '--enable-ldpd' '--disable-tcp-zebra' '--enable-fpm' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-configfile-mask=0640' '--enable-logfile-mask=0640' '--enable-werror' '--with-libpam' '--enable-systemd=yes' '--enable-poll=yes' '--enable-cumulus=no' '--enable-pimd' '--enable-dependency-tracking' '--enable-bgp-vnc=yes' '--disable-rpki' 'CFLAGS=-g -O2 -fdebug-prefix-map=/home/ci/cibuild.5/debwork/frr-5.0.1=. -fstack-protector-strong -Wformat -Werror=format-security' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' 'CXXFLAGS=-g -O2 -fdebug-prefix-map=/home/ci/cibuild.5/debwork/frr-5.0.1=. -fstack-protector-strong -Wformat -Werror=format-security' 'FCFLAGS=-g -O2 -fdebug-prefix-map=/home/ci/cibuild.5/debwork/frr-5.0.1=. -fstack-protector-strong' 'FFLAGS=-g -O2 -fdebug-prefix-map=/home/ci/cibuild.5/debwork/frr-5.0.1=. -fstack-protector-strong' 'GCJFLAGS=-g -O2 -fdebug-prefix-map=/home/ci/cibuild.5/debwork/frr-5.0.1=. -fstack-protector-strong' 'LDFLAGS=-Wl,-z,relro -Wl,-z,now' 'OBJCFLAGS=-g -O2 -fdebug-prefix-map=/home/ci/cibuild.5/debwork/frr-5.0.1=. -fstack-protector-strong -Wformat -Werror=format-security' 'OBJCXXFLAGS=-g -O2 -fdebug-prefix-map=/home/ci/cibuild.5/debwork/frr-5.0.1=. -fstack-protector-strong -Wformat -Werror=format-security' 'build_alias=x86_64-linux-gnu' but --enable-systemd=yes -- François
Yes it does look that way. We'll need the output of `journalctl -f` during startup as well as any log files generated by FRR during this time. donald On Fri, Sep 14, 2018 at 8:02 AM, François <francois.serman@corp.ovh.com> wrote:
On Fri, Sep 14, 2018 at 07:25:08AM -0400, Donald Sharp wrote:
Can we get the output of `vtysh -c "show ver"`
Thanks, looks like it's the output I was looking for.
# vtysh -c 'show vers' FRRouting 5.0.1 (eva22). Copyright 1996-2005 Kunihiro Ishiguro, et al. configured with: '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-silent-rules' '--libexecdir=${prefix}/lib/frr' '--disable-maintainer-mode' '--disable-dependency-tracking' '--enable-exampledir=/usr/share/doc/frr/examples/' '--localstatedir=/var/run/frr' '--sbindir=/usr/lib/frr' '--sysconfdir=/etc/frr' '--disable-snmp' '--enable-ospfapi=yes' '--enable-multipath=256' '--enable-ldpd' '--disable-tcp-zebra' '--enable-fpm' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-configfile-mask=0640' '--enable-logfile-mask=0640' '--enable-werror' '--with-libpam' '--enable-systemd=yes' '--enable-poll=yes' '--enable-cumulus=no' '--enable-pimd' '--enable-dependency-tracking' '--enable-bgp-vnc=yes' '--disable-rpki' 'CFLAGS=-g -O2 -fdebug-prefix-map=/home/ci/cibuild.5/debwork/frr-5.0.1=. -fstack-protector-strong -Wformat -Werror=format-security' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' 'CXXFLAGS=-g -O2 -fdebug-prefix-map=/home/ci/cibuild.5/debwork/frr-5.0.1=. -fstack-protector-strong -Wformat -Werror=format-security' 'FCFLAGS=-g -O2 -fdebug-prefix-map=/home/ci/cibuild.5/debwork/frr-5.0.1=. -fstack-protector-strong' 'FFLAGS=-g -O2 -fdebug-prefix-map=/home/ci/cibuild.5/debwork/frr-5.0.1=. -fstack-protector-strong' 'GCJFLAGS=-g -O2 -fdebug-prefix-map=/home/ci/cibuild.5/debwork/frr-5.0.1=. -fstack-protector-strong' 'LDFLAGS=-Wl,-z,relro -Wl,-z,now' 'OBJCFLAGS=-g -O2 -fdebug-prefix-map=/home/ci/cibuild.5/debwork/frr-5.0.1=. -fstack-protector-strong -Wformat -Werror=format-security' 'OBJCXXFLAGS=-g -O2 -fdebug-prefix-map=/home/ci/cibuild.5/debwork/frr-5.0.1=. -fstack-protector-strong -Wformat -Werror=format-security' 'build_alias=x86_64-linux-gnu'
but --enable-systemd=yes
-- François
_______________________________________________ dev mailing list dev@lists.frrouting.org https://lists.frrouting.org/listinfo/dev
On Fri, Sep 14, 2018 at 08:40:58AM -0400, Donald Sharp wrote:
Yes it does look that way. We'll need the output of `journalctl -f` during startup as well as any log files generated by FRR during this time.
There's a lot of noise in those logs due to network not being totally configured, but everything is available at http://paste.debian.net/hidden/2cf176ff/ There is "nothing" specific in the frr config. I didn't do any configuration for watchfrr (see below). And only bgpd is started in /etc/frr/daemons. There's one mention of watchfrr in the /etc/frr/daemons.conf : # The list of daemons to watch is automatically generated by the init script. watchfrr_enable=yes watchfrr_options=(-d -r /usr/sbin/servicebBfrrbBrestartbB%s -s /usr/sbin/servicebBfrrbBstartbB%s -k /usr/sbin/servicebBfrrbBstopbB%s -b bB) Thanks for you help :) -- François
Hi again, do you have any hint on what could be wrong, or should I dig into the code? Thanks :) ----8<--------------------------------------------------------------------------- On Fri, Sep 14, 2018 at 03:04:17PM +0200, François wrote:
On Fri, Sep 14, 2018 at 08:40:58AM -0400, Donald Sharp wrote:
Yes it does look that way. We'll need the output of `journalctl -f` during startup as well as any log files generated by FRR during this time.
There's a lot of noise in those logs due to network not being totally configured, but everything is available at http://paste.debian.net/hidden/2cf176ff/
There is "nothing" specific in the frr config. I didn't do any configuration for watchfrr (see below). And only bgpd is started in /etc/frr/daemons. There's one mention of watchfrr in the /etc/frr/daemons.conf :
# The list of daemons to watch is automatically generated by the init script. watchfrr_enable=yes watchfrr_options=(-d -r /usr/sbin/servicebBfrrbBrestartbB%s -s /usr/sbin/servicebBfrrbBstartbB%s -k /usr/sbin/servicebBfrrbBstopbB%s -b bB)
Thanks for you help :)
-- François
François - I've recreated the issue and am currently scratching my head about what is going on. I'll continue debugging in the meantime. donald On Tue, Sep 18, 2018 at 1:05 PM, François <francois.serman@corp.ovh.com> wrote:
Hi again,
do you have any hint on what could be wrong, or should I dig into the code?
Thanks :)
----8<---------------------------------------------------------------------------
On Fri, Sep 14, 2018 at 03:04:17PM +0200, François wrote:
On Fri, Sep 14, 2018 at 08:40:58AM -0400, Donald Sharp wrote:
Yes it does look that way. We'll need the output of `journalctl -f` during startup as well as any log files generated by FRR during this time.
There's a lot of noise in those logs due to network not being totally configured, but everything is available at http://paste.debian.net/hidden/2cf176ff/
There is "nothing" specific in the frr config. I didn't do any configuration for watchfrr (see below). And only bgpd is started in /etc/frr/daemons. There's one mention of watchfrr in the /etc/frr/daemons.conf :
# The list of daemons to watch is automatically generated by the init script. watchfrr_enable=yes watchfrr_options=(-d -r /usr/sbin/servicebBfrrbBrestartbB%s -s /usr/sbin/servicebBfrrbBstartbB%s -k /usr/sbin/servicebBfrrbBstopbB%s -b bB)
Thanks for you help :)
-- François
_______________________________________________ dev mailing list dev@lists.frrouting.org https://lists.frrouting.org/listinfo/dev
So this appears to be a basic assumption in watchfrr.c that zebra is being started. Since it looks like you are going to run BGP as a Route Reflector, I would recommend adding `zebra=yes` to your `/etc/frr/daemons` file and making sure you create your bgp instance as a view to avoid passing data to zebra. In the meantime we need to have a bit of a discussion about whether or not this makes sense to continue having watchfrr.c assume that zebra must be running. donald On Wed, Sep 19, 2018 at 12:48 PM, Donald Sharp <sharpd@cumulusnetworks.com> wrote:
François -
I've recreated the issue and am currently scratching my head about what is going on. I'll continue debugging in the meantime.
donald
On Tue, Sep 18, 2018 at 1:05 PM, François <francois.serman@corp.ovh.com> wrote:
Hi again,
do you have any hint on what could be wrong, or should I dig into the code?
Thanks :)
----8<---------------------------------------------------------------------------
On Fri, Sep 14, 2018 at 03:04:17PM +0200, François wrote:
On Fri, Sep 14, 2018 at 08:40:58AM -0400, Donald Sharp wrote:
Yes it does look that way. We'll need the output of `journalctl -f` during startup as well as any log files generated by FRR during this time.
There's a lot of noise in those logs due to network not being totally configured, but everything is available at http://paste.debian.net/hidden/2cf176ff/
There is "nothing" specific in the frr config. I didn't do any configuration for watchfrr (see below). And only bgpd is started in /etc/frr/daemons. There's one mention of watchfrr in the /etc/frr/daemons.conf :
# The list of daemons to watch is automatically generated by the init script. watchfrr_enable=yes watchfrr_options=(-d -r /usr/sbin/servicebBfrrbBrestartbB%s -s /usr/sbin/servicebBfrrbBstartbB%s -k /usr/sbin/servicebBfrrbBstopbB%s -b bB)
Thanks for you help :)
-- François
_______________________________________________ dev mailing list dev@lists.frrouting.org https://lists.frrouting.org/listinfo/dev
Hi Donald, first, thank you for the heads up! On Wed, Sep 19, 2018 at 02:10:12PM -0400, Donald Sharp wrote:
So this appears to be a basic assumption in watchfrr.c that zebra is being started.
Great spot! I feel stupid, but I didn't figure it out.
Since it looks like you are going to run BGP as a Route Reflector, I would recommend adding `zebra=yes` to your `/etc/frr/daemons` file and making sure you create your bgp instance as a view to avoid passing data to zebra.
Well it's not exactly a RR, but at some point zebra could/should be turned on. I'm not familiar with the view thing, but I get the idea. I'll dig into that.
In the meantime we need to have a bit of a discussion about whether or not this makes sense to continue having watchfrr.c assume that zebra must be running.
Well, would that be stupid to parse the daemons file, and only check those who have a "yes" value? If so, I would like to dig into it. Thanks again! -- François
the tools/frr script parses the /etc/frr/daemons file and generates the list of daemons to pass to watchfrr at invocation time. watchfrr.c is receiving this list, and in your case a list of (bgpd), and noticing that zebra is not on that list and it is exiting from the it's execution. Since stderr is not properly being handled by systemd?( not sure what is going on here yet ) we missed the log message indicating what has gone wrong. I just submitted a PR https://github.com/FRRouting/frr/pull/3063 that allows us to capture these messages in a way that it will be obvious to the end user what needs to be done. On Thu, Sep 20, 2018 at 11:03 AM, François <francois.serman@corp.ovh.com> wrote:
Hi Donald,
first, thank you for the heads up!
On Wed, Sep 19, 2018 at 02:10:12PM -0400, Donald Sharp wrote:
So this appears to be a basic assumption in watchfrr.c that zebra is being started.
Great spot! I feel stupid, but I didn't figure it out.
Since it looks like you are going to run BGP as a Route Reflector, I would recommend adding `zebra=yes` to your `/etc/frr/daemons` file and making sure you create your bgp instance as a view to avoid passing data to zebra.
Well it's not exactly a RR, but at some point zebra could/should be turned on. I'm not familiar with the view thing, but I get the idea. I'll dig into that.
In the meantime we need to have a bit of a discussion about whether or not this makes sense to continue having watchfrr.c assume that zebra must be running.
Well, would that be stupid to parse the daemons file, and only check those who have a "yes" value? If so, I would like to dig into it.
Thanks again!
-- François
_______________________________________________ dev mailing list dev@lists.frrouting.org https://lists.frrouting.org/listinfo/dev
On Thu, Sep 20, 2018 at 01:04:02PM -0400, Donald Sharp wrote:
the tools/frr script parses the /etc/frr/daemons file and generates the list of daemons to pass to watchfrr at invocation time. watchfrr.c is receiving this list, and in your case a list of (bgpd), and noticing that zebra is not on that list and it is exiting from the it's execution. Since stderr is not properly being handled by systemd?( not sure what is going on here yet ) we missed the log message indicating what has gone wrong. I just submitted a PR https://github.com/FRRouting/frr/pull/3063 that allows us to capture these messages in a way that it will be obvious to the end user what needs to be done.
Ok I'll try the new version. In the mean time, I've been looking at watchfrr, and was expecting to find such a process; which I didn't. -- François
participants (2)
-
Donald Sharp -
François