Hi, We are plagued lately with an issue where bgpd terminates quite frequently out of the blue on one of our hosts. We weren't able to find anything in logs, but eventually strace gave us some information: [pid 13589] 10:04:47.905970 write(2</dev/null<char 1:3>>, "bgpd: lib/zlog_targets.c:127: zlog_fd: Assertion `iovpos == 0' failed.\n", 71) = -1 EBADF (Bad file descriptor) Searching Google only turned up this: https://git.edevau.net/Ede_Vau/frr/commit/db2baed166581081db692fab0214752dbb... Looking at FRR's Github it looks like this has been fixed? If that is the case are there plans to release this fix and if so when could we expect to see a fixed version out? For now we've increased the logging level to debug and currently monitoring to see if that's a viable mitigation. Are there any known mitigations we can try (if this doesn't work out)? We are running FRR 7.5.1 on CentOS 7. Kind regards -- Natterbox Limited Registered address: No 1 Croydon, 12-16 Addiscombe Road, Croydon CR0 0XT, UK Company number: 06968249 VAT number: 293 7488 48 This email and any files transmitted with it are confidential, intended solely for the use of the addressee and are not to be used, disseminated, forwarded, printed or copied by any other person. Any views or opinions are solely those of the author and do not necessarily represent those of Natterbox Limited unless specifically stated. If you have received this communication in error, please accept our apologies and promptly inform the sender by email or by telephoning the above number. Please also immediately delete this message and any attachments from your systems. Thank you. The files attached and/or any website linked to this email may contain viruses which could damage your computer, Natterbox Limited cannot accept liability for such damage.
That commit is in the 8.0 release. donald On Fri, Aug 13, 2021 at 6:49 AM Cloud Operations <operations@redmatter.com> wrote:
Hi,
We are plagued lately with an issue where bgpd terminates quite frequently out of the blue on one of our hosts. We weren't able to find anything in logs, but eventually strace gave us some information:
[pid 13589] 10:04:47.905970 write(2</dev/null<char 1:3>>, "bgpd: lib/zlog_targets.c:127: zlog_fd: Assertion `iovpos == 0' failed.\n", 71) = -1 EBADF (Bad file descriptor)
Searching Google only turned up this:
https://git.edevau.net/Ede_Vau/frr/commit/db2baed166581081db692fab0214752dbb...
Looking at FRR's Github it looks like this has been fixed? If that is the case are there plans to release this fix and if so when could we expect to see a fixed version out?
For now we've increased the logging level to debug and currently monitoring to see if that's a viable mitigation. Are there any known mitigations we can try (if this doesn't work out)?
We are running FRR 7.5.1 on CentOS 7.
Kind regards
Natterbox Limited
Registered address: No 1 Croydon, 12-16 Addiscombe Road, Croydon CR0 0XT, UK
Company number: 06968249 VAT number: 293 7488 48
This email and any files transmitted with it are confidential, intended solely for the use of the addressee and are not to be used, disseminated, forwarded, printed or copied by any other person. Any views or opinions are solely those of the author and do not necessarily represent those of Natterbox Limited unless specifically stated. If you have received this communication in error, please accept our apologies and promptly inform the sender by email or by telephoning the above number. Please also immediately delete this message and any attachments from your systems. Thank you. The files attached and/or any website linked to this email may contain viruses which could damage your computer, Natterbox Limited cannot accept liability for such damage.
_______________________________________________ frog mailing list frog@lists.frrouting.org https://lists.frrouting.org/listinfo/frog
Well... the commit message says "haven't seen this in the wild", but I guess that wasn't an accurate assessment. I'll create a backport PR and we should ship a 7.5.2 (since this is core logging code and has now been shown to actually occur...) (also: my apologies for introducing that bug ;) -David P.S.: as a temporary workaround, disabling "debug" statements should reduce the frequency of the crashes (as noted in the commit, this only happens with debug logs involved.) On Fri, Aug 13, 2021 at 12:05:15PM -0400, Donald Sharp wrote:
On Fri, Aug 13, 2021 at 6:49 AM Cloud Operations <operations@redmatter.com> wrote:
[pid 13589] 10:04:47.905970 write(2</dev/null<char 1:3>>, "bgpd: lib/zlog_targets.c:127: zlog_fd: Assertion `iovpos == 0' failed.\n", 71) = -1 EBADF (Bad file descriptor)
Searching Google only turned up this:
https://git.edevau.net/Ede_Vau/frr/commit/db2baed166581081db692fab0214752dbb...
On Tue, Aug 17, 2021 at 05:30:59PM +0200, David Lamparter wrote:
Well... the commit message says "haven't seen this in the wild", but I guess that wasn't an accurate assessment. I'll create a backport PR and we should ship a 7.5.2 (since this is core logging code and has now been shown to actually occur...)
Actually... it's already backported (https://github.com/FRRouting/frr/pull/8579) We just need to spin a 7.5.2. -David
participants (3)
-
Cloud Operations -
David Lamparter -
Donald Sharp