Hi back! This is a tricky subject.
Le 15 juil. 2019 à 15:04, Eugene Crosser <crosser@average.org> a écrit :
Thanks for the response!
On 7/15/19 1:57 PM, Alexis Bauvin wrote:
My question is: should it be made possible (or maybe it is already possible?) to set default attributes for VRF/EVPNs that FRR autodetects? So that one could add something like this just once:
The issue with this is, with several VNIs provisioned in the VRF, how would you know which one is to be the L3VNI? As far as the interface "topology" goes in the Kernel, nothing differentiates a L2VNI from a L3VNI, except that a L2VNI may eventually have other interfaces enslaved to its bridge to be useful. But there can always be a moment where even a L2VNI only has a single interface, the VXLAN one (e.g. during provisioning). TL;DR: how do you reliably discriminate the proper VNI?
If there is exactly one VxLAN interface in the tree that grows from this VRF, then use its VNI. If there is none or more than one, then do not apply the default. Assuming that "simple" configurations will only have one VxLAN, and "complex" ones will require more sophisticated configuration anyway?
This kind of automagic configuration is quite dangerous. Say you have only one interface, which then gets elected as the L3VNI. Would the configuration disappear if another one is created? If so, the config would flap, which would generate a potential route and/or traffic flap, and that’s not desirable. If not, then the behaviour would change depending on the configuration order. An FRR restart would make it see both interfaces at startup, where it can’t decide which one to elect. In the simple case where it is guaranteed to work (single interface in a vrf), well it could be a proper solution. However, this is more to the FRR maintainers to decide. Personally I don’t find this robust enough.
And if the VRF interface was found "eligible for default config" at this stage, then also apply to it the "router bgp ... vrf DEFAULT" snippet.
Shouldn't this work?
I don’t quite get the "vrf DEFAULT" part, as the default one is configured without specifying "vrf".
On a related note, I understand that currently FRR can fetch FDB only from 'bridge' interfaces, but not directly from VxLAN interfaces.
Partially correct: only bridges interfaces have a FDB (Forwarding DataBase), because a FDB is what makes a bridge a switch, and VXLAN interfaces don’t.
Umm?... I think they do? They a "logically" switches, aren't they?
$ bridge fdb show dev vx1 9e:ac:3b:97:76:7c vlan 1 offload master br1 9e:ac:3b:97:76:7c offload master br1 ae:65:c5:d5:80:67 vlan 1 offload master br1 ae:65:c5:d5:80:67 offload master br1 72:57:cc:e1:7c:bd vlan 1 offload master br1 72:57:cc:e1:7c:bd offload master br1 1e:28:11:86:71:81 vlan 1 master br1 permanent 1e:28:11:86:71:81 master br1 permanent 72:57:cc:e1:7c:bd dst 10.42.16.6 self offload 9e:ac:3b:97:76:7c dst 10.42.16.8 self offload ae:65:c5:d5:80:67 dst 10.42.16.6 self offload
No, the VXLAN interface is a port of the switch. the "dev xxx" argument of `bridge` is used to filter the FDB for entries applying to this port. It is the same as the "dev xxx" for the RIB: root@box:~# ip route show dev br5003 default via 10.42.64.6 proto bgp metric 20 onlink 10.40.8.54/31 via 10.42.64.6 proto bgp metric 20 onlink 10.40.8.56/31 via 10.42.64.6 proto bgp metric 20 onlink 10.42.42.0/24 via 10.42.64.2 proto bgp metric 20 onlink 10.42.42.11 via 10.42.64.2 proto bgp metric 20 onlink br5003 is not a router, it is a port for the router in this VRF. In fact, in `bridge`, "dev xxx" has the same behaviour as "brport xxx" (output trimmed for your reading pleasure): root@box:~# bridge fdb show dev vxlan5003 70:7d:b9:26:85:7d vlan 1 extern_learn master br5003 70:70:8b:f4:a7:8f vlan 1 extern_learn master br5003 70:7d:b9:26:85:7d dst 10.42.64.2 self extern_learn 70:70:8b:f4:a7:8f dst 10.42.64.6 self extern_learn root@box:~# bridge fdb show brport vxlan5003 70:7d:b9:26:85:7d vlan 1 extern_learn master br5003 70:70:8b:f4:a7:8f vlan 1 extern_learn master br5003 70:7d:b9:26:85:7d dst 10.42.64.2 self extern_learn 70:70:8b:f4:a7:8f dst 10.42.64.6 self extern_learn To see the FDB of a specific switch, the correct way is with the "br xxx" arg (another bridge is used here as it has more stuff in it, with cropped output): root@box:~# bridge fdb show br br-foo 62:21:8f:f7:66:8e dev vxlan-foo vlan 1 master br-foo permanent 00:00:00:00:00:00 dev vxlan-foo dst 10.42.43.2 self permanent e6:e0:96:e3:22:b4 dev veth-foo vlan 1 master br-foo permanent 52:54:00:3d:15:51 dev vnet1 master br-foo fe:54:00:3d:15:51 dev vnet1 vlan 1 master br-foo permanent root@box:~# bridge fdb show br vxlan-foo root@box:~# As you can see, the vxlan interface "has" an empty FDB. Has is between quotes because of the way iproute2 handles such parameters. It queries the Kernel through netlink with a "filter" netlink entry. The kernel will compare the filled in fields (in this case, the family which is set to AF_BRIDGE, and the master ifindex, which is the ifindex of vxlan-foo), and return only those matching. Since it does not looks at the type of interface, it just returns an empty list and not an actual error.
The FDB is needed because type 3 (VTEP) routes are installed in the FDB on the port corresponding to the VXLAN interface. And in L3VNI mode, to be able to route a packet, the data needed comes from three places: - Next hop IP comes from the routing table - Next hop MAC comes from the neighbor table - VTEP IP comes from the FDB, where the router’s MAC (or the type 3 route) is installed So sadly, the bridge is needed.
I am very probably missing something, but I do not see how the bridge's FDB can be useful when it is known to only bridge between two interfaces: VRF itself and VxLAN. MACs of all remote routes are (by definition) in the VxLAN's FDB. The only other FDB entry in the bridge will be the MAC of the VRF interface itself (or whatever enslaved interface is the "router"), and we know it anyway. I just cannot imagine how the switch (in this topology) can get _any_ MAC information that cannot be fetched from the VxLAN's FDB or from the other interfaces' configurations.
See above, the VXLAN interface has _no_ FDB. The bridge’s FDB is used to store remote VTEP IPs, as well as the MACs of remote routers. Everything you describe is correct, except it is stored in the bridge's FDB. As of why, it's simple: reuse. All the functionality required for VXLAN as far as L2 goes is already handled by a standard FDB (except that the Linux one is a bit more full featured, and seeing a "MAC you can reach through this IP" always made me smile), so the Linux bridge is a perfect candidate to avoid reinventing the wheel. MAC lookups, forwarding decisions, BUM replication, etc... were all already implemented.
I would really like to understand how it all fits together, if we are to run this thing in production…
To get a better understanding, I can’t recommend enough the excellent blog posts from Vincent Bernat: - VXLAN & Linux, that will help you understand the basics and how the various Linux tables interact (https://vincent.bernat.ch/en/blog/2017-vxlan-linux) - VXLAN: BGP EVPN with Cumulus Quagga (or FRR), that will explain how FRR interacts with those various tables, how routes are installed and where FRR picks its information. (https://vincent.bernat.ch/en/blog/2017-vxlan-bgp-evpn) As a third resource, the Cumulus doc on RIOT, L3VNI and Type-5 routes. It is worth the read, albeit a bit hard because very cumulus-oriented: https://docs.cumulusnetworks.com/display/DOCS/Ethernet+Virtual+Private+Netwo...
Thanks again,
Eugene
Good luck! Alexis