[cmaster-next] Fwd: EVPN - notes on the Cumulus implementation

Tue Nov 29 11:46:18 EST 2016

Resend if missed

---------- Forwarded message ----------
From: Vivek Venkatraman <vivek at cumulusnetworks.com>
Date: Mon, Nov 28, 2016 at 10:50 PM
Subject: [cmaster-next] EVPN - notes on the Cumulus implementation
To: cmaster-next at lists.nox.tf

Here is some information on the Cumulus implementation. We can discuss
further in tomorrow's meeting and/or subsequent ones.

Functional description:

The Cumulus implementation of EVPN is focused on EVPN as the control
plane for VxLAN, though it is certainly feasible to extend it to
support other encapsulations - e.g., MPLS. It is also clearly a
work-in-progress. The implementation is focused on Linux as the
network OS; there will be work needed to extend it for FreeBSD or
provide the FPM-type interface.

The current state of the implementation (end November 2016) allows for
MAC learning and exchange but not for MAC+IP. The latter requires some
enhancements to the Linux kernel's "ARP proxy" (ARP suppression)
capabilities for VxLAN, and this is currently in the works.

A precursor to MAC exchange is the exchange of VxLAN topology. This
will build the mapping of VNIs to remote VTEPs and is required for
handling BUM traffic through ingress replication. The exchange of this
topology happens in BGP through the EVPN type-3 route. Currently, this
is not configurable (i.e., these routes will always get exchanged).
Additional options need to be added to disable this exchange (and
hence, possibly, drop BUM traffic), attach to a multicast group or
have the ability to go to a service or replicator node. Remote VTEPs
learnt by BGP and informed to zebra will result in the "flood list"
being setup - by installing the appropriate neighbor entry in the MAC
FDB.

MACs are exchanged using EVPN type-2 route. Local MACs get associated
to the appropriate VNI. Remote MACs learnt by BGP and informed to
zebra will be installed in the kernel MAC FDB.

Some of the other aspects still in the works are the BGP encapsulation
attribute (RFC 5512 - basic code already there in branch from Lou),
the PMSI attribute (RFC 6514) and MAC mobility extended community (RFC
7432).

All BGP protocol exchanges are as per RFC 7432 and draft-ietf-bess-evpn-overlay.

Handling of access device dual homing (i.e., vPC, MLAG etc. in vendor
speak) does not currently intend to use the EVPN type-4 route or EVPN
type-1 route and related procedures. Instead, it will rely on the
corresponding VTEPs using an "anycast IP" as their tunnel end point.
This is in accordance with current industry practice (i.e., Cisco and
Arista's approach).

Work on L3 multi-tenancy (i.e., Inter-VxLAN routing) is at an early
stage. This is where EVPN type-5 routes will be needed along with
associating tenants to corresponding L3 VRFs, support for the Router
MAC extended community etc.

Provisioning/Management:

As with other vendor implementations, there is no explicit
configuration of a "MAC VRF" - the VNI defines the L2 domain and is
the MAC VRF.

VNIs (VxLAN interfaces), association of these to a bridge and mapping
of access (VLAN) interfaces to the bridge are all supported (done)
through existing Linux interfaces i.e., iproute2 or ifupdown2. There
are no new Quagga commands to create or update these.

The only essential pieces of Quagga configuration needed to activate
EVPN (for the current and near-term functionality) are:
a) enable EVPN - the current commands for this is "advertise-vni"
b) activate EVPN address family for the remote VTEPs (or appropriate
BGP peers, if peering is hop-by-hop)

router bgp <as>
 address-family evpn
  advertise-vni
  neighbor <nbr> activate

With the above configuration, VNIs known to the system will get
associated with automatic RDs and RTs. The RDs are formed as
"RouterID:VNI" and the RTs (import and export) are formed using
"AS:VNI".

Configuration options are provided to specify a different RD and/or RT
value for a VNI. Whichever parameter that is not user-configured will
be auto-derived. The VNI will of course be "active" only when it is
known to the system (i.e., provisioned through existing Linux
interface).

router bgp <as>
 address-family evpn
  advertise-vni
  neighbor <nbr> activate
  vni <vni>
    rd <value>
    route-target import <value> ...
    route-target export <value>

The main vtysh commands currently available to manage/monitor EVPN
operation are:

show evpn vni
show evpn vni <vni> mac
show bgp evpn summary
show bgp evpn route
show bgp evpn route rd <rd>

Design description:

zebra provides the kernel interface (through netlink) and maintains
the VNI information and the MAC table per VNI. It should be considered
later if these need to move into a "l2d"; this is best taken up for
consideration, if at all, in conjunction with a "vrfd" and an "ifd" so
that zebra effectively becomes "rtmd".

The VNI hash table is populated with local VNIs which are learnt
through existing netlink "link" notifications. The hash table also
contains the remote VTEPs which are updated by BGP. Linux currently
only supports an IPv4 address as the VTEP IP, so while the code has
support for IPv6 address as a VTEP, it is incomplete.

To be able to associate local MACs to the correct VNI (if they are
associated with a VNI), zebra needs to understand bridges and bridge
members and the VLAN to VNI mapping. These are built by listening
mostly to existing netlink "link" notifications, however, some
notifications related to the AF_BRIDGE family also need to be
processed. The "L2" information for an interface is maintained in a
structure linked off 'struct zebra_if'.

The VNI hash table contains the MAC table corresponding to the VNI.
This includes local MACs learnt from netlink "neighbor" notifications
(AF_BRIDGE) as well as remote MACs updated by BGP. The remote MACs
result in MAC FDB entries being installed in the kernel. When this is
extended for MAC+IP, the IP information will result in creation of ARP
"suppression" entries in the kernel.

BGP also has a VNI hash table that it builds through interaction with
zebra. Currently, there is no per-VNI routing table. Instead, EVPN
routes (both local and remote) are only maintained in the main routing
table where they are organized using a hierarchical radix tree in the
same fashion as L3VPN routes and ENCAP routes. This is likely to need
change in order to handle various scenarios. The change envisioned is
that local EVPN routes will be only present in the per-VNI routing
table from where they will be advertised to EVPN peers; remote (i.e.,
learnt) EVPN routes will be present in the global EVPN routing table
from where they will be "imported" into one or more VNI routing
tables; the import can/will be done as a separate thread.

Thanks,
Vivek

_______________________________________________
cmaster-next mailing list
cmaster-next at lists.nox.tf
https://lists.nox.tf/listinfo/cmaster-next