Job, (Mikael)
Some more detail.
Dear Lou,
I have some follow up questions:
- when was this bug introduced?
The VNC code was submitted as a patch to Quagga in 2014, although authored a bit earlier - basically t the same time RFC5566 was being worked. It was included in the original FRR release.
The issue was related to a development attribute that was intended to be disabled in production use. Per rfc2042, the (VNC) code was using 255 as a development value for features that ended up never being standardized. The intent was to disable this usage for non-development use. Since 255 was a known attribute, the parsing code (bgp_attr.c) tried to parse the attribute that was generated as part of the experiment -- and failed as it was an unknown format. This failure in turn resulted in common attribute parsing error behavior being triggered. Which is is governed by https://tools.ietf.org/html/rfc4271#section-6.3 and RFC4271 Page 74, event 28.
- why is the session flapping at all? Doesn’t RFC 7606 suggest to handle such instances in a more graceful way, aka “treat-as-withdraw” rather than destroy the world and kill the session? Or perhaps rfc 5512 section 6 is of relevance too.
As Donald mentioned, FRR does not yet support 7606, so FRR
behaves per RFC4271 Page 74, event 28.
- what timeline do you propose? Right now these quagga deployments are obstructing legitimate research (the experiment isn’t about finding broken BGP implementations).
Thanks to the hard work of notably Donald and Martin the code is in and releases are being rolled.
The fix is to disable usage of the development attribute type [1] and long term to implement 7606 [2].
Lou
[2] https://github.com/FRRouting/frr/issues/3583
Kind regards,
Job
On Tue, Jan 8, 2019 at 11:31 Lou Berger <lberger@labn.net> wrote:
To add some more detail here. The root cause of the this issue was the
use of a BGP attribute reserved for development in the VNC code[1]. The
original intent was to disable use of this attribute by VNC[1] and FRR
in production, but this didn't happen. My apologies for this. A proper
fix has been submitted for all active releases and is undergoing
testing. For those who are interested, release specific PRs can be
found at [3].
Lou
[3] https://github.com/FRRouting/frr/pulls
On 1/7/2019 1:31 PM, Quentin Young wrote:
> Hello operators,
>
> This morning some users running FRR BGP noticed that their sessions were
> flapping. Investigation revealed that this was caused by an experiment being
> run by SwiNOG [0] which was triggering an undesired code path in FRR.
> Specifically, FRR uses attribute type 0xFF as the attribute code for VNC [1].
> This code was intended to be turned off by default, but our current published
> builds [2] have it turned on. Consequently, bgpd attempts to parse the received
> attribute as a VNC attribute and fails, triggering a session reset.
>
> We have a patch in testing now and expect to have new build artifacts published
> shortly. Additionally, we have contacted the experiment operators and requested
> a pause in the experiment while we handle this issue.
>
> Thank you to the operators that notified us this morning!
>
> - FRR maintainer team
>
> [0] http://lists.swinog.ch/public/swinog/2018-December/007110.html
> [1] http://docs.frrouting.org/en/latest/vnc.html
> [2] https://github.com/FRRouting/frr/releases
> _______________________________________________
> frog mailing list
> frog@lists.frrouting.org
> https://lists.frrouting.org/listinfo/frog
_______________________________________________
frog mailing list
frog@lists.frrouting.org
https://lists.frrouting.org/listinfo/frog