[FROG] Setting defaults for autodetected VRFs/VxLANx

Mon Jul 15 11:28:19 EDT 2019

Hi back!

This is a tricky subject.

> Le 15 juil. 2019 à 15:04, Eugene Crosser <crosser at average.org> a écrit :
> 
> Thanks for the response!
> 
> On 7/15/19 1:57 PM, Alexis Bauvin wrote:
> 
>>> My question is: should it be made possible (or maybe it is already
>>> possible?) to set default attributes for VRF/EVPNs that FRR autodetects?
>>> So that one could add something like this just once:
>> 
>> The issue with this is, with several VNIs provisioned in the VRF, how would
>> you know which one is to be the L3VNI? As far as the interface "topology" goes
>> in the Kernel, nothing differentiates a L2VNI from a L3VNI, except that a
>> L2VNI may eventually have other interfaces enslaved to its bridge to be useful.
>> But there can always be a moment where even a L2VNI only has a single
>> interface, the VXLAN one (e.g. during provisioning).
>> TL;DR: how do you reliably discriminate the proper VNI?
> 
> If there is exactly one VxLAN interface in the tree that grows from this
> VRF, then use its VNI. If there is none or more than one, then do not
> apply the default. Assuming that "simple" configurations will only have
> one VxLAN, and "complex" ones will require more sophisticated
> configuration anyway?

This kind of automagic configuration is quite dangerous. Say you have only one
interface, which then gets elected as the L3VNI. Would the configuration
disappear if another one is created?
If so, the config would flap, which would generate a potential route and/or
traffic flap, and that’s not desirable.
If not, then the behaviour would change depending on the configuration order. An
FRR restart would make it see both interfaces at startup, where it can’t decide
which one to elect.

In the simple case where it is guaranteed to work (single interface in a vrf),
well it could be a proper solution. However, this is more to the FRR maintainers
to decide. Personally I don’t find this robust enough.

> And if the VRF interface was found "eligible for default config" at this
> stage, then also apply to it the "router bgp ... vrf DEFAULT" snippet.
> 
> Shouldn't this work?

I don’t quite get the "vrf DEFAULT" part, as the default one is configured
without specifying "vrf".

>>> On a related note, I understand that currently FRR can fetch FDB only
>>> from 'bridge' interfaces, but not directly from VxLAN interfaces.
>> 
>> Partially correct: only bridges interfaces have a FDB (Forwarding DataBase),
>> because a FDB is what makes a bridge a switch, and VXLAN interfaces don’t.
> 
> Umm?... I think they do? They a "logically" switches, aren't they?
> 
> $ bridge fdb show dev vx1
> 9e:ac:3b:97:76:7c vlan 1 offload master br1
> 9e:ac:3b:97:76:7c offload master br1
> ae:65:c5:d5:80:67 vlan 1 offload master br1
> ae:65:c5:d5:80:67 offload master br1
> 72:57:cc:e1:7c:bd vlan 1 offload master br1
> 72:57:cc:e1:7c:bd offload master br1
> 1e:28:11:86:71:81 vlan 1 master br1 permanent
> 1e:28:11:86:71:81 master br1 permanent
> 72:57:cc:e1:7c:bd dst 10.42.16.6 self offload
> 9e:ac:3b:97:76:7c dst 10.42.16.8 self offload
> ae:65:c5:d5:80:67 dst 10.42.16.6 self offload

No, the VXLAN interface is a port of the switch. the "dev xxx" argument of
`bridge` is used to filter the FDB for entries applying to this port. It is the
same as the "dev xxx" for the RIB:
root at box:~# ip route show dev br5003
default via 10.42.64.6 proto bgp metric 20 onlink
10.40.8.54/31 via 10.42.64.6 proto bgp metric 20 onlink
10.40.8.56/31 via 10.42.64.6 proto bgp metric 20 onlink
10.42.42.0/24 via 10.42.64.2 proto bgp metric 20 onlink
10.42.42.11 via 10.42.64.2 proto bgp metric 20 onlink

br5003 is not a router, it is a port for the router in this VRF. 

In fact, in `bridge`, "dev xxx" has the same behaviour as "brport xxx" (output
trimmed for your reading pleasure):
root at box:~# bridge fdb show dev vxlan5003
70:7d:b9:26:85:7d vlan 1 extern_learn master br5003
70:70:8b:f4:a7:8f vlan 1 extern_learn master br5003
70:7d:b9:26:85:7d dst 10.42.64.2 self extern_learn
70:70:8b:f4:a7:8f dst 10.42.64.6 self extern_learn
root at box:~# bridge fdb show brport vxlan5003
70:7d:b9:26:85:7d vlan 1 extern_learn master br5003
70:70:8b:f4:a7:8f vlan 1 extern_learn master br5003
70:7d:b9:26:85:7d dst 10.42.64.2 self extern_learn
70:70:8b:f4:a7:8f dst 10.42.64.6 self extern_learn

To see the FDB of a specific switch, the correct way is with the "br xxx" arg
(another bridge is used here as it has more stuff in it, with cropped output):
root at box:~# bridge fdb show br br-foo
62:21:8f:f7:66:8e dev vxlan-foo vlan 1 master br-foo permanent
00:00:00:00:00:00 dev vxlan-foo dst 10.42.43.2 self permanent
e6:e0:96:e3:22:b4 dev veth-foo vlan 1 master br-foo permanent
52:54:00:3d:15:51 dev vnet1 master br-foo
fe:54:00:3d:15:51 dev vnet1 vlan 1 master br-foo permanent
root at box:~# bridge fdb show br vxlan-foo
root at box:~#

As you can see, the vxlan interface "has" an empty FDB. Has is between quotes
because of the way iproute2 handles such parameters. It queries the Kernel
through netlink with a "filter" netlink entry. The kernel will compare the
filled in fields (in this case, the family which is set to AF_BRIDGE, and the
master ifindex, which is the ifindex of vxlan-foo), and return only those
matching. Since it does not looks at the type of interface, it just returns an
empty list and not an actual error.

>> The FDB is needed because type 3 (VTEP) routes are installed in the FDB on
>> the port corresponding to the VXLAN interface. And in L3VNI mode, to be able
>> to route a packet, the data needed comes from three places:
>> - Next hop IP comes from the routing table
>> - Next hop MAC comes from the neighbor table
>> - VTEP IP comes from the FDB, where the router’s MAC (or the type 3 route) is
>>  installed
>> So sadly, the bridge is needed.
> 
> I am very probably missing something, but I do not see how the bridge's
> FDB can be useful when it is known to only bridge between two
> interfaces: VRF itself and VxLAN. MACs of all remote routes are (by
> definition) in the VxLAN's FDB. The only other FDB entry in the bridge
> will be the MAC of the VRF interface itself (or whatever enslaved
> interface is the "router"), and we know it anyway. I just cannot imagine
> how the switch (in this topology) can get _any_ MAC information that
> cannot be fetched from the VxLAN's FDB or from the other interfaces'
> configurations.

See above, the VXLAN interface has _no_ FDB. The bridge’s FDB is used to store
remote VTEP IPs, as well as the MACs of remote routers.
Everything you describe is correct, except it is stored in the bridge's FDB.

As of why, it's simple: reuse. All the functionality required for VXLAN as far
as L2 goes is already handled by a standard FDB (except that the Linux one is
a bit more full featured, and seeing a "MAC you can reach through this IP"
always made me smile), so the Linux bridge is a perfect candidate to avoid
reinventing the wheel.
MAC lookups, forwarding decisions, BUM replication, etc... were all already
implemented.

> I would really like to understand how it all fits together, if we are to
> run this thing in production…

To get a better understanding, I can’t recommend enough the excellent blog
posts from Vincent Bernat:
- VXLAN & Linux, that will help you understand the basics and how the various
  Linux tables interact
  (https://vincent.bernat.ch/en/blog/2017-vxlan-linux)
- VXLAN: BGP EVPN with Cumulus Quagga (or FRR), that will explain how FRR
  interacts with those various tables, how routes are installed and where FRR
  picks its information.
  (https://vincent.bernat.ch/en/blog/2017-vxlan-bgp-evpn)

As a third resource, the Cumulus doc on RIOT, L3VNI and Type-5 routes. It is
worth the read, albeit a bit hard because very cumulus-oriented:
https://docs.cumulusnetworks.com/display/DOCS/Ethernet+Virtual+Private+Network+-+EVPN

> Thanks again,
> 
> Eugene

Good luck!

Alexis