[cmaster-next] EVPN - notes on the Cumulus implementation

Mon Nov 28 22:50:12 EST 2016

Here is some information on the Cumulus implementation. We can discuss
further in tomorrow's meeting and/or subsequent ones.

Functional description:

The Cumulus implementation of EVPN is focused on EVPN as the control plane
for VxLAN, though it is certainly feasible to extend it to support other
encapsulations - e.g., MPLS. It is also clearly a work-in-progress. The
implementation is focused on Linux as the network OS; there will be work
needed to extend it for FreeBSD or provide the FPM-type interface.

The current state of the implementation (end November 2016) allows for MAC
learning and exchange but not for MAC+IP. The latter requires some
enhancements to the Linux kernel's "ARP proxy" (ARP suppression)
capabilities for VxLAN, and this is currently in the works.

A precursor to MAC exchange is the exchange of VxLAN topology. This will
build the mapping of VNIs to remote VTEPs and is required for handling BUM
traffic through ingress replication. The exchange of this topology happens
in BGP through the EVPN type-3 route. Currently, this is not configurable
(i.e., these routes will always get exchanged). Additional options need to
be added to disable this exchange (and hence, possibly, drop BUM traffic),
attach to a multicast group or have the ability to go to a service or
replicator node. Remote VTEPs learnt by BGP and informed to zebra will
result in the "flood list" being setup - by installing the appropriate
neighbor entry in the MAC FDB.

MACs are exchanged using EVPN type-2 route. Local MACs get associated to
the appropriate VNI. Remote MACs learnt by BGP and informed to zebra will
be installed in the kernel MAC FDB.

Some of the other aspects still in the works are the BGP encapsulation
attribute (RFC 5512 - basic code already there in branch from Lou), the
PMSI attribute (RFC 6514) and MAC mobility extended community (RFC 7432).

All BGP protocol exchanges are as per RFC 7432 and
draft-ietf-bess-evpn-overlay.

Handling of access device dual homing (i.e., vPC, MLAG etc. in vendor
speak) does not currently intend to use the EVPN type-4 route or EVPN
type-1 route and related procedures. Instead, it will rely on the
corresponding VTEPs using an "anycast IP" as their tunnel end point. This
is in accordance with current industry practice (i.e., Cisco and Arista's
approach).

Work on L3 multi-tenancy (i.e., Inter-VxLAN routing) is at an early stage.
This is where EVPN type-5 routes will be needed along with associating
tenants to corresponding L3 VRFs, support for the Router MAC extended
community etc.

Provisioning/Management:

As with other vendor implementations, there is no explicit configuration of
a "MAC VRF" - the VNI defines the L2 domain and is the MAC VRF.

VNIs (VxLAN interfaces), association of these to a bridge and mapping of
access (VLAN) interfaces to the bridge are all supported (done) through
existing Linux interfaces i.e., iproute2 or ifupdown2. There are no new
Quagga commands to create or update these.

The only essential pieces of Quagga configuration needed to activate EVPN
(for the current and near-term functionality) are:
a) enable EVPN - the current commands for this is "advertise-vni"
b) activate EVPN address family for the remote VTEPs (or appropriate BGP
peers, if peering is hop-by-hop)

router bgp <as>
 address-family evpn
  advertise-vni
  neighbor <nbr> activate

With the above configuration, VNIs known to the system will get associated
with automatic RDs and RTs. The RDs are formed as "RouterID:VNI" and the
RTs (import and export) are formed using "AS:VNI".

Configuration options are provided to specify a different RD and/or RT
value for a VNI. Whichever parameter that is not user-configured will be
auto-derived. The VNI will of course be "active" only when it is known to
the system (i.e., provisioned through existing Linux interface).

router bgp <as>
 address-family evpn
  advertise-vni
  neighbor <nbr> activate
  vni <vni>
    rd <value>
    route-target import <value> ...
    route-target export <value>

The main vtysh commands currently available to manage/monitor EVPN
operation are:

show evpn vni
show evpn vni <vni> mac
show bgp evpn summary
show bgp evpn route
show bgp evpn route rd <rd>

Design description:

zebra provides the kernel interface (through netlink) and maintains the VNI
information and the MAC table per VNI. It should be considered later if
these need to move into a "l2d"; this is best taken up for consideration,
if at all, in conjunction with a "vrfd" and an "ifd" so that zebra
effectively becomes "rtmd".

The VNI hash table is populated with local VNIs which are learnt through
existing netlink "link" notifications. The hash table also contains the
remote VTEPs which are updated by BGP. Linux currently only supports an
IPv4 address as the VTEP IP, so while the code has support for IPv6 address
as a VTEP, it is incomplete.

To be able to associate local MACs to the correct VNI (if they are
associated with a VNI), zebra needs to understand bridges and bridge
members and the VLAN to VNI mapping. These are built by listening mostly to
existing netlink "link" notifications, however, some notifications related
to the AF_BRIDGE family also need to be processed. The "L2" information for
an interface is maintained in a structure linked off 'struct zebra_if'.

The VNI hash table contains the MAC table corresponding to the VNI. This
includes local MACs learnt from netlink "neighbor" notifications
(AF_BRIDGE) as well as remote MACs updated by BGP. The remote MACs result
in MAC FDB entries being installed in the kernel. When this is extended for
MAC+IP, the IP information will result in creation of ARP "suppression"
entries in the kernel.

BGP also has a VNI hash table that it builds through interaction with
zebra. Currently, there is no per-VNI routing table. Instead, EVPN routes
(both local and remote) are only maintained in the main routing table where
they are organized using a hierarchical radix tree in the same fashion as
L3VPN routes and ENCAP routes. This is likely to need change in order to
handle various scenarios. The change envisioned is that local EVPN routes
will be only present in the per-VNI routing table from where they will be
advertised to EVPN peers; remote (i.e., learnt) EVPN routes will be present
in the global EVPN routing table from where they will be "imported" into
one or more VNI routing tables; the import can/will be done as a separate
thread.

Thanks,
Vivek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.frrouting.org/pipermail/dev/attachments/20161128/3eeff851/attachment.html>