[dev] EVPN (and L3VPN) configuration

Sun Jun 25 23:59:19 EDT 2017

Hi Lou, Philippe, All,

The PR that I submitted already addresses for the most part inter-subnet
routing (i.e., bridge+router scenario) if employing asymmetric routing (
https://tools.ietf.org/html/draft-ietf-bess-evpn-inter-subnet-forwarding
section 4). [I say "for the most part" because some additional changes are
needed for advertisement of gateway MACIP in the case of centralized
gateway and a few other things.] Changes are of course needed for symmetric
routing (section 5 of aforementioned draft). I'll describe both of these
below.

At the end, I'll propose some thoughts on extending this for other EVPN
encapsulation - specifically MPLS - to support traditional VPLS.

Asymmetric routing:

Here, we're dealing only with host routes and the ingress VTEP/NVE will
route to the virtual subnet where the destination is, so that the egress
VTEP/NVE only does bridging. In VxLAN terms, we're only dealing with L2
VNIs which need to be provisioned on all VTEPs. MACs are learnt against a
VLAN through kernel notifications, mapped to a VxLAN/VNI and advertised.
Likewise, neighbor entries (ARP/ND) are learnt on an SVI by listening to
kernel notifications; the mapping to the VxLAN/VNI is straightforward and
MACIP routes are originated using this L2 VNI. The logic on the receive
side is straightforward too. The received RTs map to the VNI which maps to
the VLAN. MAC routes would be installed into the FDB while MACIP routes
would result in neighbor entries being created on the SVI (corresponding to
the VLAN).

The above functionality is all present in the PR submitted. While the
target of the PR was just EVPN for L2 with ARP suppression, it can
accomplish routing too. Note that ARP suppression requires some additional
functionality in the Linux kernel which Cumulus Networks is working to get
into the upstream kernel.

The only additional provisioning we had planned to introduce was whether to
advertise our SVI MAC or not - needed only on gateway devices. This was to
be under "address-family l2vpn evpn":

router bgp <as>
  address-family l2vpn evpn
    advertise-default-gateway

However, I'll propose some changes to the provisioning at the end of this
note.

Symmetric routing:

Clearly, this is more scalable and brings in the "inter-connect subnet" (L3
VNI). It also introduces the ability to do prefix routing with EVPN type-5
routes.

The L3 VNI is a parameter per tenant - i.e., per L3 VRF. This is planned to
be the only required/mandatory configuration on top of what my PR
introduces. The tenant (L3 VRF) configuration already exists and the L3 VNI
was going to be added to it. The RD and RTs (for the tenant) could be
auto-derived from this L3 VNI, but could optionally be configured. The
planned configuration is/was:

router bgp <as> vrf <tenant VRF>
  <any existing configuration such as "redistribute connected" or "network">
  l2vpn evpn l3vni <vni>
  rd <RD>
  route-target <import | export | both> <RT>

If the community decision is to configure the RD and RT configs as
"vrf-policy" against the default VRF in BGP, the above will of course
change.

The way symmetric routing operates is as follows. There is no change to
advertisement or reception of MAC-only type-2 routes, these will only
contain the L2 VNI. For MACIP type-2 routes, when the neighbor (ARP/ND) is
learnt by listening to a kernel notification, the SVI that the entry is
learnt on will be part of the tenant's VRF and that will provide the L3 VNI
and L3 RTs. The RouterMAC extended community has to be added and the MAC
will be derived from the interface corresponding to the L3 VNI (the
"inter-connect subnet" interface). On the receive side, if the route has 2
VNIs, the MAC and Neighbor entry will be installed against the L2 VNI (if
present locally) as before while the IP host route will be processed and
imported into any L3 VRFs (BGP's RIB) that match its RTs. There is some
special handling required because the next hop is the remote VTEP/NVE whose
MAC should be set up as the received Router MAC.

What the above shows is that there isn't an explicit hierarchy of L2 VNIs
(subnets) of a tenant to the tenant's L3 VRF...but it is present implicitly
(the SVIs corresponding to those subnets will be assigned to the tenant's
VRF).

For external routing, the plan is that by default, any routes in the L3 VRF
(in BGP's RIB) will be advertised to EVPN peers as type-5 routes. The
current thought is that this can be controlled using existing route-map
constructs (TBD). Internal (i.e., EVPN) routes are already present in the
L3 VRF (BGP's RIB) as mentioned above. Existing route-maps can be used to
control how these are advertised externally - currently using VRF-lite BGP
peerings, in future using L3VPN.

For inter-DC connectivity, EVPN single-hop or multi-hop peerings can be
setup between the border EVPN routers in each DC. If some/all tenants do
not need their L2 domain stretched across the DCs but only need L3
connectivity (i.e., subnets contained to one DC), only EVPN type-5 routes
need to be exchanged on the inter-DC peering. The current plan is to
implement an addition to route-map matching for that - "match evpn
route-type <type>".

Extending/generalizing the provisioning for the non-VxLAN use case:

In the case of EVPN for VxLAN, a VLAN is mapped to a VxLAN (VNI) by the
operator and whether it is a single broadcast domain per EVI or multiple
broadcast domains per VNI, the VNI is sufficient to identify the bridge
table as per section 5.1.2 of
https://tools.ietf.org/html/draft-ietf-bess-evpn-overlay. This does lend
itself to a rather simplified configuration for VxLAN that would be a big
advantage to retain.

Whether EVPN should be used for VNIs or not (i.e., "advertise-all-vni"
under BGP L2VPN/EVPN address-family configuration in my PR) should move to
the entity (i.e., zebra) which creates/handles EVIs.

The term "vni" is specific to VxLAN and cannot be used for other EVPN. Our
preference is for "evi" but it is up to the community to decide whether
"evi", "vsi" or something else is the most appropriate.

For VxLAN, it is convenient to refer to the EVI (Ethernet Virtual Instance)
by its VNI for the common case; for other cases, there is no such
well-known identifier and the EVI is likely to be identified by name (just
like a L3 VRF).

The proposed commands are as follows. These are initial thoughts subject to
more refinement - partly because the Linux kernel does not currently have a
forwarding model for L2oMPLS.

l2vpn evpn advertise-vni <all | list of VNIs>
-- The handler of this command will be "zebra" and it is in lieu of the
"advertise-all-vni" command as stated above.
-- This only applies if using EVPN for VxLAN

l2vpn evpn evi <name>
  encapsulation <vxlan | mpls>
  bridge-table <table | bridge-name>
  <any MPLS/label allocation parameters - if encap is mpls>
  <any VxLAN parameters - if encap is vxlan>
-- The above syntax/commands will be used to create EVIs for MPLS, and if
needed, for VxLAN.
-- The handler of these commands will be "zebra"

router bgp <as>
  l2vpn evpn { vni <vni> | evi <name> }
    rd <rd>
    route-target <import | export | both> <rt>
-- The above syntax/commands will be used to define the RD/RT parameters
for a VNI/EVI if the auto-derivation is not desired.
-- The handler of the above will clearly be "bgpd"

The L3 VNI configuration - which is against a L3 VRF - is as proposed
earlier.

The "advertise-default-gateway" configuration for asymmetric routing can be
modified based on the final consensus on the above.

Vivek

On Tue, Jun 20, 2017 at 9:44 AM, Lou Berger <lberger at labn.net> wrote:

> see below.
>
>
> On 6/20/2017 11:42 AM, Philippe Guibert wrote:
> > Hi Lou, Vivek,
> >
> > inline my comments,
> >
> >
> > On Mon, Jun 19, 2017 at 4:51 PM, Lou Berger <lberger at labn.net> wrote:
> >>
> >> On 6/17/2017 1:25 AM, Vivek Venkatraman wrote:
> >>> Hi Lou and Philippe,
> >>>
> >>> Besides the provisioning angle, how these entities map to current data
> >>> structures and code flow also is a consideration, IMO. Everything
> >>> about the current VRF ('struct bgp' or 'struct zebra_vrf' or soon to
> >>> be introduced support for OSPF) pertains to a Layer-3 routing
> >>> instance. I'm not sure it is a good idea to either morph that
> >>> construct into a Layer-2 (bridging) instance too or try to envelope a
> >>> Layer-2 instance and a Layer-3 instance under something else (like
> >>> "network-instance").
> >> What are your thoughts on when we get to bridge+router capability?
> >> Allowing for this is part of what drove my comments.
> >>
> >> I have a fair bit of this in a prototype using rfapi and openflow, but
> >> not quete working or in sharable form.  (Just need to find a few days to
> >> code!)
> >>
> >>> My line of thinking after discussion with a few of my colleagues is as
> >>> follows:
> >>>
> >>> 1. We should keep Layer-3 instance and Layer-2 instance configuration
> >>> distinct, though there may be some common parameters that both have.
> >> Again, we can't forget about the real case of bridge routers and cause
> >> us problems down the road.
> > I am not sure about that.
> > Right now, I don't have the exact use case that shows that. I will
> > give more information later.
> > What I mean is that there may be some instances that can get both L3
> > and L2 information.
> > As far as I know, this is the case for RT2 entries with two labels
> inside.
> from the bgp distribution side, there is both a (L2VPN) mac and (L3VPN)
> CE-advertised/PE-learned route to distribute in this case.
>
> >
> >>> 2. While internal semantics indicate some parameters are "CE related"
> >>> and other parameters are "PE related", there are instances where it
> >>> can be argued either way. In any case, it would be easier for the
> >>> user/operator to have the configuration for one entity in one place,
> >>> as much as possible.
> >> I think your distinction is customer-facing (within VRF/VSI) and
> >> provider-facing (within core/provider transport), right?
> >>
> >>> 3. The reference to "VNI" directly and some aspects of the
> >>> configuration syntax in my PR make it too specific to EVPN for VxLAN.
> >>> This should be made more generic.
> >> Does VSI work for you?
> > Does VSI covers MPLs labels ?
> I think of VSI = VRF for L2.  So labels may or may be used based on config.
>
> Lou
>
> >
> >>> 4. There are certain aspects though about EVPN for VxLAN that allow
> >>> for auto-creation and auto-derivation for which allowance has to be
> >>> made. For e.g., I'm sure an operator wouldn't want to configure a
> >>> named-instance for each VNI on the system when the VNI itself can be
> >>> the key and can be learnt by the routing control plane from the kernel.
> >>>
> >>> With these in mind, I'll discuss a proposal with my colleagues before
> >>> bringing it here for further discussion next week.
> >>>
> >> Excellent!
> > ++1
> >
> > Philippe
> >
> >
> >
> >>>
> >>> On Thu, Jun 15, 2017 at 6:53 AM, Lou Berger <lberger at labn.net
> >>> <mailto:lberger at labn.net>> wrote:
> >>>
> >>>     Philippe,
> >>>
> >>>     I agree with your analysis.  One additional point bellow.
> >>>
> >>>
> >>>     On 6/15/2017 9:18 AM, Philippe Guibert wrote:
> >>>     > Hi Vivek,
> >>>     >
> >>>     > I just saw Lou's reply message.
> >>>     > Initial agreement was using vrf-policy under bgp node.
> >>>     > If I understand correctly, there is kind of redundancy with EVPN
> >>>     > address-family vni configuration command used on CE side.
> >>>     >
> >>>     > Your remark is very interesting. I think it is worth looking at
> >>>     how to
> >>>     > fuse both configuration ways.
> >>>     > Please find below some remarks on what could be done.
> >>>     >
> >>>     >> In addition, there are commands to configure the RD and RTs if
> >>>     >> auto-derivation is not desired - for e.g., peering with
> >>>     third-party BGP
> >>>     >> system. The syntax for this is shown through an example
> >>>     configuration below
> >>>     >> (which also shows steps #1 and #2).
> >>>     >>
> >>>     >> router bgp 65001
> >>>     >>  address-family l2vpn evpn
> >>>     >>   neighbor SPINE activate
> >>>     >>   vni 10100
> >>>     >>    rd 1:10100
> >>>     >>    route-target import 1:10100
> >>>     >>   exit-vni
> >>>     >>   advertise-all-vni
> >>>     >>  exit-address-family
> >>>     >>
> >>>     >> I see it as being very useful to have all the configuration
> >>>     relevant to a
> >>>     >> VRF (or VNI) in one place.
> >>>     > Thanks for point out that. I agree with you
> >>>     >
> >>>     >> One topic of discussion is regarding this optional VNI
> >>>     configuration.
> >>>     >> Instead of the keyword being "vni", should it be "vni-policy"
> >>>     to match with
> >>>     >> "vrf-policy"?
> >>>     > If we want to have a common vty node to enter the information, I
> >>>     would
> >>>     > like to draw your attention to the following:
> >>>     > - This vty node should be used for not only EVPN,but also VPNVx
> >>>     > address families.
> >>>     > The vni value used should be an attribute of that VPN object (
> since
> >>>     > vxlan does not apply to VPNVx afi/safi).
> >>>     > This vty node should be moved from evpn address-family to bgp
> node.
> >>>     >
> >>>     > - This vty node stands for a VRF that should be used.
> >>>     > It is true that in comparison to route-map concept, the
> "vrf-policy"
> >>>     > wording could be improved.
> >>>     > But we have to distinguish the vtynode from VRF node that is used
> >>>     > outside of BGP node.
> >>>     >
> >>>     > Based on vrf-policy wording, the subnode vty commands should be
> >>>     added
> >>>     > vni keyword.
> >>>     This would also work nicely for the future case where the VNI is a
> >>>     bridge/router. it would just fall out by specifying a valid vrf
> >>>     name in
> >>>     the vrf-policy.  Although this does lead to the slightly ugly case
> of
> >>>     needing to rename the 'policy' when associating a running VNI/VSI
> >>>     (bridge) with a VRF.  perhaps it makes sense to uncouple of the
> >>>     binding
> >>>     of the "policy" with the VRF name and VSI/VNI ID.  e.g.,
> >>>
> >>>
> >>>      network-instance <node-name>
> >>>        !when associated with a named BGP VRF
> >>>         vrf <vrf-name>
> >>>        !when associated with a vni
> >>>         vni <vni-id>
> >>>         rd <value>
> >>>         rt (import|export|both) <value> [<value-list>]
> >>>         label <value>
> >>>         route-map <mapname>
> >>>         tunnel advertisement-method <encap-attribute|evpn>
> >>>         tunnel type (none|l2tpv3overip|gre|ipinip|
> vxlan|mpls|mplsovergre)
> >>>
> >>>
> >>>     The name network-instance comes from
> >>>         https://tools.ietf.org/html/draft-ietf-rtgwg-ni-model
> >>>     <https://tools.ietf.org/html/draft-ietf-rtgwg-ni-model>
> >>>
> >>>     An alternative is to not add vrf/vni above and do something like
> >>>      router bgp XXX vrf <vrf-name>*
> >>>        network-instance <node-name>
> >>>
> >>>     and under a new bgp vsi (I like the more generic name VSI over
> >>>     VNI) node
> >>>       vsi <vni-id>
> >>>        network-instance <node-name>
> >>>
> >>>     but then it's up to the config reader to notice when something is
> >>>     a bridge and/or router instance.  (so I prefer the first
> >>>
> >>>
> >>>     > Vivek, those changes have also wider impact, I mean, at least
> >>>     > internally in the BGP daemon.
> >>>     > I list some of the changes that may be done, if we merge [both
> vty
> >>>     > nodes in a single one.
> >>>     >
> >>>     > - The VRF configuration calls VNC code, while VNI configuration
> >>>     calls
> >>>     > bgp_evpn_vty.c code.
> >>>     > This should be put in a separate file. A registration mechanism
> with
> >>>     > EVPN and VNC could apply.
> >>>     Yes this would need to change.
> >>>
> >>>     > - Some other attributes of EVPN ( RD and RT auto derivation)
> >>>     could be
> >>>     > configurable within that new VRF instance.
> >>>     >
> >>>     > - Also, regarding the advertise-all-vni command, does that mean
> that
> >>>     > such VNI ( VRFs objects ) should be instantiated too ?
> >>>     > I mean, I am sorry, I did not attend the specific EVPN meeting
> you
> >>>     > lead a few weeks ago. I know Lou was there.
> >>>
> >>>     > Perhaps you talked about the way to exchange VNI information
> between
> >>>     > EVPN and VNC ?
> >>>     We did, but all future (non blocking) stuff.  The sole blocking
> issue
> >>>     from my perspective is resolving this discussion.
> >>>
> >>>     Lou
> >>>
> >>>     > Regards,
> >>>     >
> >>>     > Philippe
> >>>     >
> >>>     >
> >>>     > [0]
> >>>     https://docs.google.com/document/d/1w_ie2tNXCgn0N3ZNFGYTK6lJkwMmk_
> XN5yz33MMNNqM/edit#
> >>>     <https://docs.google.com/document/d/1w_
> ie2tNXCgn0N3ZNFGYTK6lJkwMmk_XN5yz33MMNNqM/edit#>
> >>>     >
> >>>     >
> >>>     >
> >>>     >
> >>>     >> I don't think "vni-policy", or for that matter, "vrf-policy" is
> >>>     the best
> >>>     >> choice due to the following two main reasons:
> >>>     >>
> >>>     >> 1. A "policy" is a fairly familiar construct in routing
> >>>     parlance. It
> >>>     >> commonly refers to a set of rules or definitions that are
> >>>     generically
> >>>     >> specified and can then be applied to different "attach points".
> >>>     In FRR, a
> >>>     >> "route-map" would be a good example of such a policy. It may be
> >>>     misleading
> >>>     >> to call the specific configuration for a VNI or VRF as "policy",
> >>>     >> particularly when the VNI/VRF may later support import/export
> >>>     policies
> >>>     >> (route-maps).
> >>>     >>
> >>>     >> 2. The configuration syntax that emerged for VRFs after the
> >>>     last round of
> >>>     >> discussions separates out the "CE side" configuration (e.g., CE
> >>>     neighbors,
> >>>     >> redistribution etc.) from the "PE side" configuration (e.g., RD
> >>>     and RT
> >>>     >> configuration). From the user/operator's perspective, I do not
> >>>     see any value
> >>>     >> add in this separation, only a potential source of confusion
> >>>     since the same
> >>>     >> entity (VRF) needs to be configured in multiple places. It can
> >>>     also be
> >>>     >> debatable where a configuration lies. For e.g., should
> >>>     "vrf-import-policy"
> >>>     >> reside on the "PE side" as it deals with received L3VPN routes
> >>>     or on the "CE
> >>>     >> side" as it decides which routes to import into which VRF table?
> >>>     >>
> >>>     >> I see it as being very useful to have all the configuration
> >>>     relevant to a
> >>>     >> VRF (or VNI) in one place.
> >>>     >>
> >>>     >> The purpose of this email is to solicit wider feedback since
> >>>     only a few
> >>>     >> people participated in the earlier discussion and their
> >>>     positions are very
> >>>     >> likely unchanged. My suggestion would be to have the initial
> >>>     deliberations
> >>>     >> on the list and if that does not converge or indicates the need
> >>>     for a
> >>>     >> meeting, the maintainers will call one.
> >>>     >>
> >>>     >> Based on the consensus that emerges, I shall update my PR
> >>>     and/or introduce
> >>>     >> modifications in a subsequent PR - if needed.
> >>>     >>
> >>>     >> Vivek
> >>>     >>
> >>>     >>
> >>>     >> _______________________________________________
> >>>     >> dev mailing list
> >>>     >> dev at lists.frrouting.org <mailto:dev at lists.frrouting.org>
> >>>     >> https://lists.frrouting.org/listinfo/dev
> >>>     <https://lists.frrouting.org/listinfo/dev>
> >>>     >>
> >>>
> >>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.frrouting.org/pipermail/dev/attachments/20170625/19a500dd/attachment-0001.html>