[dev] EVPN (and L3VPN) configuration

Tue Jul 18 11:11:47 EDT 2017

Hi Vivek, Lou, all,

Thanks Vivek for having taken time to respond to the message.
I tried to summarise extract the discussion here, and transform in terms of
vty configuration.
We can loop at this document, when you are available, if possible by end of
july ( next week for instance).

I hope this will help.
https://docs.google.com/spreadsheets/d/1t608z3bIMZpHb4Juspp7FN2Ks-
4nU_qMvIv4JIzje_Y/edit#gid=0

Please check that you have the correct software rights to modify it.
Also, I plan to refine the document a bit more for the next days.
Feel free to do the same.

More comment on vty below ([Philippe2])
Thanks,

Philippe

>Hi Vivek,
>
>Thanks for taking the time to respond while on vacation.
>
>In yesterday's meeting there was a request to summarize the various
positions on this discussion.  (To ensure all understand the issue being
discussed.)  As such, when you can, can you summarize your proposed config
syntax for l3vpn, l2vpn and l2+l3vpn cases?
>
>Thanks
>Lou
>
>On July 12, 2017 6:05:04 AM Vivek Venkatraman <vivek at cumulusnetworks.com>
wrote:
>> Hi Philippe,
>>
>> I am currently on vacation in India, hence the delay in responding to
your mail. Thank you for an in-depth review, please see inline. (For
another couple of weeks, additional responses from me are likely to be
delayed).
>>
>>
>> On Mon, Jul 3, 2017 at 8:58 PM, Philippe Guibert <
philippe.guibert at 6wind.com> wrote:
>>
>>     Hi Vivek,
>>
>>     The note you made is very interesting, it gathers a lot of very
relevant information.
>>     You described  how the symetric and assymetric cases of draft rfc
works for frrouting.
>>     In addition to presenting this, you illustrated the new vty commands.
>>
>>     So, I made some comments on both points.
>>     - on the first point,
>>     I would like to know why you restrict to sending RT2 with only L2
Label ?
>>
>>
>> A single label (or VNI) is all that is needed for EVPN-for-L2 (where the
gateway is a different device) or even when supporting routing/gateway
functionality (L2+L3) when operating in asymmetric mode. The second label
(or VNI) is needed only for routing/gateway functionality when operating in
symmetric mode.
>>
>> There is no restriction envisioned in the implementation on a second
label (VNI). Rather, the plan is to support both modes of routing. For the
symmetric mode, in the case of VxLAN, the second label (VNI) is the "L3
VNI" and will be available through the proposed configuration (in this
mail).
>>

[Philippe2] OK

>>     Also, I would like FRR not to be restricted to VNI, since the draft
theorically supports network overlays other than VXLAN (NVGRE, MPLS).
>>
>>
>> I agree, which is why I proposed some configuration/syntax for MPLS
here. Note that the Linux kernel doesn't yet have a good model for L2oMPLS.
>>

[Philippe2] L3 VRF and  L2 VRF model, but also handling MPLS or VXLAN,
independently of the layer.

>>
>>
>>     - on the second point,
>>     I agree on advertise-gateway issue.
>>     I am not totally convinced with enhancing l2vpn evpn <l3vni>
>>     I think more of having a generic MAC-VRF or IP-VRF context where we
configure RD, RT, VNI, etc...In the vrf-policy case, MAC-VRF and IP-VRF
would have the same vty node. Only layer command would distinguish ( layer
3 versus layer 2). I need to bring more elaborate example for RT2/RT5 case.
>>
>>
>> The "EVI" syntax I provided below was for the MAC-VRF. Given that an IP
VRF and a MAC VRF will be rather different (the former deals with routes
and next hops and will/can have OSPF/BGP neighbors etc., the latter deals
with MACs and possibly, ARP suppression), I feel the two should be kept
separate.

[Philippe2]
I think you are specifically talking about CE configuration, whereas I was
discussion about configuring PE ( with vrf-policy).
Maybe CE configuration can be kept as is ( this will probably be discussed
through the spreadsheet).
On a previous mail, you were comparing vrf-policy to a kind of (route-map)
policy. If this is your feeling, then I think we agree on that topic. I
mean, that policy will apply to either MAC-VRF or IP-VRF.
Those VRFs will be separate. But the vrf-policy node ( currently
vrf-policy) will be used for both cases.

>>Also, since EVPN-for-VxLAN provides a simple way of "auto creating" the
MAC VRFs (the VNI is fundamentally the VRF delimiter), we should ensure the
operator is not forced into a lot of unnecessary configuration when 1 or 2
commands would do.
>>

[Philippe2]
I agree that auto-creation is enabled from zebra side.
But I think this option could be enabled too in BGP on two places : CE side
( the one you proposed), and vrf-policy.
For PE configuration, I would allow the ability to configure a vrf-policy
with kind of "auto create" mode.

>> The CLI/UI I proposed in my mail was based on the above two principles.
>>
>>
>>
>>     More comments below [Philippe]
>>
>>
>>
>>     On Mon, Jun 26, 2017 at 5:59 AM, Vivek Venkatraman <
vivek at cumulusnetworks.com> wrote:
>>
>>         Hi Lou, Philippe, All,
>>
>>         The PR that I submitted already addresses for the most part
inter-subnet routing (i.e., bridge+router scenario) if employing asymmetric
routing (https://tools.ietf.org/html/draft-ietf-bess-evpn-inter-
subnet-forwarding section 4). [I say "for the most part" because some
additional changes are needed for advertisement of gateway MACIP in the
case of centralized gateway and a few other things.] Changes are of course
needed for symmetric routing (section 5 of aforementioned draft). I'll
describe both of these below.
>>
>>         At the end, I'll propose some thoughts on extending this for
other EVPN encapsulation - specifically MPLS - to support traditional VPLS.
>>
>>         Asymmetric routing:
>>
>>         Here, we're dealing only with host routes and the ingress
VTEP/NVE will route to the virtual subnet where the destination is, so that
the egress VTEP/NVE only does bridging. In VxLAN terms, we're only dealing
with L2 VNIs which need to be provisioned on all VTEPs. MACs are learnt
against a VLAN through kernel notifications, mapped to a VxLAN/VNI and
advertised. Likewise, neighbor entries (ARP/ND) are learnt on an SVI by
listening to kernel notifications; the mapping to the VxLAN/VNI is
straightforward and MACIP routes are originated using this L2 VNI. The
logic on the receive side is straightforward too. The received RTs map to
the VNI which maps to the VLAN. MAC routes would be installed into the FDB
while MACIP routes would result in neighbor entries being created on the
SVI (corresponding to the VLAN).
>>
>>         The above functionality is all present in the PR submitted.
While the target of the PR was just EVPN for L2 with ARP suppression, it
can accomplish routing too. Note that ARP suppression requires some
additional functionality in the Linux kernel which Cumulus Networks is
working to get into the upstream kernel.
>>
>>         The only additional provisioning we had planned to introduce was
whether to advertise our SVI MAC or not - needed only on gateway devices.
This was to be under "address-family l2vpn evpn":
>>
>>         router bgp <as>
>>           address-family l2vpn evpn
>>             advertise-default-gateway
>>
>>     [Philippe]
>>     It picks up default gateway MAC address of the local VNI endpoint ?
>>     This seems ok for local VNI endpoints.
>>
>>
>> Correct. Plus, this is needed only in the centralized gateway scenario,
not in a distributed gateway scenario (where every VTEP/NVE does L2+L3).
>>
>>
>>
>>
>>         However, I'll propose some changes to the provisioning at the
end of this note.
>>
>>         Symmetric routing:
>>
>>         Clearly, this is more scalable and brings in the "inter-connect
subnet" (L3 VNI). It also introduces the ability to do prefix routing with
EVPN type-5 routes.
>>
>>
>>
>>
>>         The L3 VNI is a parameter per tenant - i.e., per L3 VRF. This is
planned to be the only required/mandatory configuration on top of what my
PR introduces. The tenant (L3 VRF) configuration already exists
>>
>>
>>
>>
>>         and the L3 VNI was going to be added to it. The RD and RTs (for
the tenant) could be auto-derived from this L3 VNI, but could optionally be
configured.
>>
>>         The planned configuration is/was:
>>
>>         router bgp <as> vrf <tenant VRF>
>>           <any existing configuration such as "redistribute connected"
or "network">
>>           l2vpn evpn l3vni <vni>
>>           rd <RD>
>>           route-target <import | export | both> <RT>
>>
>>
>>     [Philippe]
>>     I am not sure about the vty you propose.
>>     If I understand well, you propose to use l2vpn keyword directly
under router bgp node ?
>>
>>     (config)# router bgp <> vrf <>
>>     (bgpd)# l2vpn evpn l3vni <vni>                     <--- added command
>>     (bgpd)# rd   <>                                               <---
added command
>>     (bgpd)# route-target <>                                <--- added
command
>>     (bgpd)# address-family l2vpn evpn
>>     (config-router-evpn)# vni <l2vni>
>>     (config-router-evpn)# ...
>>     (config-router-evpn)# exit-address-family
>>     (config-router-evpn)# l2vpn evpn l3vni <l3vni>
>>
>>     If this is it
>>     -  The relationship between MAC-VRF ( l2vni) and IP-VRF ( the l2vpn
evpn l3vni configured by RD) is done by the configuration. Right ?
>>
>>
>>
>> Yes, because the L3 VNI is an operator configured entity. It is
theoretically possible to auto-generate it, though I don't think that is
well supported by the Linux kernel.
>>
>> Note that that line - "l2vpn evpn l3vni <vni>" - is the only "new"
command here. The RD and RT configuration is given to complete the layer-3
configuration but would apply for L3VPN also (subject to conclusion on
"vrf-policy" as noted).
>>
>>
>>
>>         If the community decision is to configure the RD and RT configs
as "vrf-policy" against the default VRF in BGP, the above will of course
change.
>>
>>
>>
>>
>>         The way symmetric routing operates is as follows. There is no
change to advertisement or reception of MAC-only type-2 routes, these will
only contain the L2 VNI.
>>
>>
>>     [Philippe] Why restrict to sending RT2 L2 VNI only ?
>>     I should elaborate an example on how vrf-policy configuration would
be done so as to permit sending RT2 with both labels.
>>
>>
>> I meant for MAC-only routes which don't have an IP address, only the L2
VNI is relevant. If you have a use case where MAC-only routes also need 2
labels (VNIs), can you explain that?

[Philippe2]
I would like to elaborate a configuration involving a second label.
So this is only for routing/gateway functionality when operating in
symmetric mode.

>>
>>
>>
>>
>>         For MACIP type-2 routes, when the neighbor (ARP/ND) is learnt by
listening to a kernel notification, the SVI that the entry is learnt on
will be part of the tenant's VRF and that will provide the L3 VNI and L3
RTs. The RouterMAC extended community has to be added and the MAC will be
derived from the interface corresponding to the L3 VNI (the "inter-connect
subnet" interface). On the receive side, if the route has 2 VNIs, the MAC
and Neighbor entry will be installed against the L2 VNI (if present
locally) as before while the IP host route will be processed and imported
into any L3 VRFs (BGP's RIB) that match its RTs.
>>
>>
>>     [Philippe]
>>     by taking an extract of draft-ietf-bess-evpn-inter-subnet-forwarding
>>     "
>>     While sending RT2 with L3VNI and L2VNI, you must ensure that RTs
refer to MAC-VRF and IP-VRF ( as per 5.1.1 control plane operation).
>>     "
>>     What if there is no L3 VRF Matching locally ? Do you drop the whole
incoming entry ?
>>     I think there should be a control on incoming RT2 messages, against
RTs.
>>
>>
>> No, the RT2 wouldn't be dropped completely. My understanding is that the
RTs in the incoming RT2 must be matched against BOTH the MAC VRFs (VNIs in
the case of VxLAN) and the IP VRFs, and imported into either or both as
appropriate, IF the RT2 has 2 labels (VNIs).
>>

[Philippe2] This can be discussed in a separate thread. We want to get an
agreement on vty.

>>
>>
>>
>>     For that, to differentiate L3VNI from L2VNI, I would add an
attribute per "vrf-policy" mentioning that this is an IP-VRF or a MAC-VRF.
>>
>>     (vrf-policy)# layer layer_3 | layer_2
>>
>>     How would you do that filtering based on a CE configuration ?
>>
>>
>> Did my response above answer this? If not, I need to understand the
question some more.
>>
>>

[Philippe2] The proposal is to add an additional configuration that
specifies if a VRF is MAC-VRF or IP-VRF.
As per your remark, on the proposed configuration, you don't need it. But I
think on vrf-policy, this command could clarify.

>>
>>
>>
>>
>>         There is some special handling required because the next hop is
the remote VTEP/NVE whose MAC should be set up as the received Router MAC.
>>
>>         What the above shows is that there isn't an explicit hierarchy
of L2 VNIs (subnets) of a tenant to the tenant's L3 VRF...but it is present
implicitly (the SVIs corresponding to those subnets will be assigned to the
tenant's VRF).
>>
>>         For external routing, the plan is that by default, any routes in
the L3 VRF (in BGP's RIB) will be advertised to EVPN peers as type-5
routes. The current thought is that this can be controlled using existing
route-map constructs (TBD). Internal (i.e., EVPN) routes are already
present in the L3 VRF (BGP's RIB) as mentioned above. Existing route-maps
can be used to control how these are advertised externally - currently
using VRF-lite BGP peerings, in future using L3VPN.
>>
>>         For inter-DC connectivity, EVPN single-hop or multi-hop peerings
can be setup between the border EVPN routers in each DC. If some/all
tenants do not need their L2 domain stretched across the DCs but only need
L3 connectivity (i.e., subnets contained to one DC), only EVPN type-5
routes need to be exchanged on the inter-DC peering. The current plan is to
implement an addition to route-map matching for that - "match evpn
route-type <type>".
>>
>>
>>     [Philippe]
>>     Indeed Route Type 5 can be used with or without Route Type 2.
>>     I understand you want to filter out Route Type 2 entries.
>>
>>     It is as if you want to filter only L3 VPN information.
>>     I woud propose a route-map that filters on L3 messages only ( no
RT1/RT2/RT3 indeed).
>>
>>
>> The above is an OPTIONAL configuration. If there is EVPN peering between
the DCs (and no other peering), by default, all routes would be exchanged.
In the scenario mentioned (and possibly others), there may be a need to
only exchange a particular type of EVPN route, in addition to other filters
(IP, AS-path etc. already exist, we are adding support for MAC ACLs).
>>

[Philippe2] Agree. so the configuration command would filter RT2 for
example.

>>
>>
>>     I have a subsidiary question.
>>     Suppose you have a MPLS based framework, and you want to use MPLSVPN
to populate the L3VPN of BGP'RIB.Do you have a method to carry that L3
information in BGP MPLSVPN instead of using BGP EVPN RT5 ?
>>
>>
>> Yes, the way I envision is that there would be L3VPN peering (instead of
EVPN peering) outside of the DC. EVPN routes within the DC would get
installed in the VRF routing table and L3VPN can pick these up and
advertise (with any needed policy control). L3VPN routes from the external
side would again get installed in the VRF routing table and EVPN can pick
these up and advertise as RT5 within the DC. I haven't worked out any
details yet though.
>>
[Philippe2]  this can be discussed in a separate thread , i don't think it
impacts vty.

>>
>>
>>
>>
>>         Extending/generalizing the provisioning for the non-VxLAN use
case:
>>
>>
>>     [Philippe]
>>     As per draft-ietf-bess-evpn-inter-subnet-forwarding-03
>>     "The first BGP Extended Community identifies the tunnel
>>        type per section 4.5 of [TUNNEL-ENCAP]"
>>
>>     You may need an extra extended community ( see rfc5512) to define
the encapsulation type wished: VXLAN or other encapsulation type.
>>
>>
>> The PR submitted already carries/exchanges the ENCAP extended community
though it is filled as VxLAN. The proposed config in this mail can be used
to extend this to carry the desired encap.
>>

[Philippe2]
A proposal is made on the spreadshet.
Done for PE, to be done for CE.

>>
>>
>>
>>
>>
>>         In the case of EVPN for VxLAN, a VLAN is mapped to a VxLAN (VNI)
by the operator and whether it is a single broadcast domain per EVI or
multiple broadcast domains per VNI, the VNI is sufficient to identify the
bridge table as per section 5.1.2 of https://tools.ietf.org/html/
draft-ietf-bess-evpn-overlay. This does lend itself to a rather simplified
configuration for VxLAN that would be a big advantage to retain.
>>
>>         Whether EVPN should be used for VNIs or not (i.e.,
"advertise-all-vni" under BGP L2VPN/EVPN address-family configuration in my
PR) should move to the entity (i.e., zebra) which creates/handles EVIs.
>>
>>
>>     [Philippe]
>>     I understand you want to have similar command to zebra.
>>     Nonetheless, I think bgp should keep it too ( for RT auto
derivation, but also to control zebra events)
>>
>>
>>
>>         The term "vni" is specific to VxLAN and cannot be used for other
EVPN. Our preference is for "evi" but it is up to the community to decide
whether "evi", "vsi" or something else is the most appropriate.
>>
>>
>>
>>
>>         For VxLAN, it is convenient to refer to the EVI (Ethernet
Virtual Instance) by its VNI for the common case; for other cases, there is
no such well-known identifier and the EVI is likely to be identified by
name (just like a L3 VRF).
>>
>>
>>     [Philippe]
>>     As per draft-ietf-bess-evpn-inter-subnet-forwarding-03, 5.1.1
>>        - Label-1 = MPLS Label or VNID corresponding to MAC-VRF
>>        - Label-2 = MPLS Label or VNID corresponding to IP-VRF
>>
>>     It seems VNI can apply to IP-VRF too.
>>     I would propose to pick up the definition of the draft :
>>
>>     "Label " = "MPLS Label or VNID"
>>
>>
>> Hmm...are you saying to use "label" instead of "vni" in the
configuration commands?
>>

[Philippe2] I am less afirmative than previously.
On vrf-policy mode, there is already label keyword.
However,  on global configuration mode, an additional label should be
configurable add-vrf

>>
>>
>>
>>
>>         The proposed commands are as follows. These are initial thoughts
subject to more refinement - partly because the Linux kernel does not
currently have a forwarding model for L2oMPLS.
>>
>>         l2vpn evpn advertise-vni <all | list of VNIs>
>>         -- The handler of this command will be "zebra" and it is in lieu
of the "advertise-all-vni" command as stated above.
>>         -- This only applies if using EVPN for VxLAN
>>
>>     l2vpn evpn evi <name>
>>
>>           encapsulation <vxlan | mpls>
>>           bridge-table <table | bridge-name>
>>           <any MPLS/label allocation parameters - if encap is mpls>
>>           <any VxLAN parameters - if encap is vxlan>
>>         -- The above syntax/commands will be used to create EVIs for
MPLS, and if needed, for VxLAN.
>>         -- The handler of these commands will be "zebra"
>>
>>
>>     [Philippe]
>>     BGPd is the only daemon interested in getting the VNI information ?
>>

[Philippe2]
Yes, zebra is the daemon that gathers that information.
I omitted it.

>>
>> No, zebra continues to be the entity interacting with the kernel, both
for learning all the L2 info (bridges, bridge ports, VLAN-VNI mappings,
MACs etc.) and neighbors as well as installing into the kernel.
>>
>> We have some nascent thoughts on splitting/reorganizing zebra further,
but nothing planned in the near term and will certainly be discussed in
detail before anything is attempted.
>>
>>
>>
>>     Also, the current level of FRR deliberately gives EVPN access to VNI
only.
>>     That implies that Ethernet NVO tunnel is neither MPLS nor NVGRE.
>>
>>     If yes, then no need to keep advertise-vni on bgpd.
>>     If no, then I would want to control the information on both sides.
>>
>>
>>         router bgp <as>
>>           l2vpn evpn { vni <vni> | evi <name> }
>>
>>     [Philippe]
>>     I have a configuration issue, if you want to do RT2 emission with
both L2 and L3 Label.
>>     Could you please elaborate ?
>>
>>
>> It is the presence of this configuration that will determine that RT2
should have a second label. In the case of VxLAN, the L3 VNI value would be
provided here, in the case of MPLS (or something else), the EVI would have
some appropriate configuration to generate this.
>>
>>
>>
>>
>>             rd <rd>
>>             route-target <import | export | both> <rt>
>>         -- The above syntax/commands will be used to define the RD/RT
parameters for a VNI/EVI if the auto-derivation is not desired.
>>         -- The handler of the above will clearly be "bgpd"
>>
>>         The L3 VNI configuration - which is against a L3 VRF - is as
proposed earlier.
>>
>>         The "advertise-default-gateway" configuration for asymmetric
routing can be modified based on the final consensus on the above.
>>
>>
>>     [Philippe] In the CE purpose, this command is ok for me.
>>
>>     Thanks,
>>
>>     Philippe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.frrouting.org/pipermail/dev/attachments/20170718/930328b3/attachment-0001.html>