<div dir="ltr">Hi Vivek, Lou, all,<br><br>Thanks Vivek for having taken time to respond to the message.<br>I tried to summarise extract the discussion here, and transform in terms of vty configuration.<br>We can loop at this document, when you are available, if possible by end of july ( next week for instance).<br><br>I hope this will help.<br><a target="_blank" href="https://docs.google.com/spreadsheets/d/1t608z3bIMZpHb4Juspp7FN2Ks-4nU_qMvIv4JIzje_Y/edit#gid=0">https://docs.google.com/<wbr>spreadsheets/d/<wbr>1t608z3bIMZpHb4Juspp7FN2Ks-<wbr>4nU_qMvIv4JIzje_Y/edit#gid=0</a><br><br>Please check that you have the correct software rights to modify it.<br>Also, I plan to refine the document a bit more for the next days.<br>Feel free to do the same.<br><br>More comment on vty below ([Philippe2])<br>Thanks,<br><br>Philippe<br><br>>Hi Vivek,<br>><br>>Thanks for taking the time to respond while on vacation.<br>><br>>In
yesterday's meeting there was a request to summarize the various
positions on this discussion. (To ensure all understand the issue being
discussed.) As such, when you can, can you summarize your proposed
config syntax for l3vpn, l2vpn and l2+l3vpn cases?<br>><br>>Thanks<br>>Lou<br>><br>>On July 12, 2017 6:05:04 AM Vivek Venkatraman <<a target="_blank" href="mailto:vivek@cumulusnetworks.com">vivek@cumulusnetworks.com</a>> wrote:<br>>> Hi Philippe,<br>>><br>>>
I am currently on vacation in India, hence the delay in responding to
your mail. Thank you for an in-depth review, please see inline. (For
another couple of weeks, additional responses from me are likely to be
delayed).<br>>><br>>><br>>> On Mon, Jul 3, 2017 at 8:58 PM, Philippe Guibert <<a target="_blank" href="mailto:philippe.guibert@6wind.com">philippe.guibert@6wind.com</a>> wrote:<br>>><br>>> Hi Vivek,<br>>><br>>> The note you made is very interesting, it gathers a lot of very relevant information.<br>>> You described how the symetric and assymetric cases of draft rfc works for frrouting. <br>>> In addition to presenting this, you illustrated the new vty commands.<br>>><br>>> So, I made some comments on both points.<br>>> - on the first point, <br>>> I would like to know why you restrict to sending RT2 with only L2 Label ?<br>>><br>>><br>>>
A single label (or VNI) is all that is needed for EVPN-for-L2 (where
the gateway is a different device) or even when supporting
routing/gateway functionality (L2+L3) when operating in asymmetric mode.
The second label (or VNI) is needed only for routing/gateway
functionality when operating in symmetric mode.<br>>><br>>>
There is no restriction envisioned in the implementation on a second
label (VNI). Rather, the plan is to support both modes of routing. For
the symmetric mode, in the case of VxLAN, the second label (VNI) is the
"L3 VNI" and will be available through the proposed configuration (in
this mail).<br>>><br><br>[Philippe2] OK<br><br><br>>>
Also, I would like FRR not to be restricted to VNI, since the draft
theorically supports network overlays other than VXLAN (NVGRE, MPLS). <br>>><br>>><br>>>
I agree, which is why I proposed some configuration/syntax for MPLS
here. Note that the Linux kernel doesn't yet have a good model for
L2oMPLS.<br>>><br><br>[Philippe2] L3 VRF and L2 VRF model, but also handling MPLS or VXLAN, independently of the layer.<br> <br>>><br>>><br>>> - on the second point, <br>>> I agree on advertise-gateway issue. <br>>> I am not totally convinced with enhancing l2vpn evpn <l3vni><br>>>
I think more of having a generic MAC-VRF or IP-VRF context where we
configure RD, RT, VNI, etc...In the vrf-policy case, MAC-VRF and IP-VRF
would have the same vty node. Only layer command would distinguish (
layer 3 versus layer 2). I need to bring more elaborate example for
RT2/RT5 case.<br>>><br>>><br>>> The "EVI" syntax I
provided below was for the MAC-VRF. Given that an IP VRF and a MAC VRF
will be rather different (the former deals with routes and next hops and
will/can have OSPF/BGP neighbors etc., the latter deals with MACs and
possibly, ARP suppression), I feel the two should be kept separate. <br><br>[Philippe2]<br>I think you are specifically talking about CE configuration, whereas I was discussion about configuring PE ( with vrf-policy).<br>Maybe CE configuration can be kept as is ( this will probably be discussed through the spreadsheet).<br>On
a previous mail, you were comparing vrf-policy to a kind of (route-map)
policy. If this is your feeling, then I think we agree on that topic. I
mean, that policy will apply to either MAC-VRF or IP-VRF.<br>Those VRFs will be separate. But the vrf-policy node ( currently vrf-policy) will be used for both cases.<br><br><br>>>Also,
since EVPN-for-VxLAN provides a simple way of "auto creating" the MAC
VRFs (the VNI is fundamentally the VRF delimiter), we should ensure the
operator is not forced into a lot of unnecessary configuration when 1 or
2 commands would do.<br>>><br><br>[Philippe2]<br>I agree that auto-creation is enabled from zebra side.<br>But I think this option could be enabled too in BGP on two places : CE side ( the one you proposed), and vrf-policy.<br>For PE configuration, I would allow the ability to configure a vrf-policy with kind of "auto create" mode.<br> <br><br><br><br>>> The CLI/UI I proposed in my mail was based on the above two principles.<br>>> <br>>><br>>><br>>> More comments below [Philippe]<br>>><br>>><br>>><br>>> On Mon, Jun 26, 2017 at 5:59 AM, Vivek Venkatraman <<a target="_blank" href="mailto:vivek@cumulusnetworks.com">vivek@cumulusnetworks.com</a>> wrote:<br>>><br>>> Hi Lou, Philippe, All,<br>>><br>>>
The PR that I submitted already addresses for the most part
inter-subnet routing (i.e., bridge+router scenario) if employing
asymmetric routing (<a target="_blank" href="https://tools.ietf.org/html/draft-ietf-bess-evpn-inter-subnet-forwarding">https://tools.ietf.org/html/<wbr>draft-ietf-bess-evpn-inter-<wbr>subnet-forwarding</a>
section 4). [I say "for the most part" because some additional changes
are needed for advertisement of gateway MACIP in the case of centralized
gateway and a few other things.] Changes are of course needed for
symmetric routing (section 5 of aforementioned draft). I'll describe
both of these below. <br>>><br>>> At the end, I'll
propose some thoughts on extending this for other EVPN encapsulation -
specifically MPLS - to support traditional VPLS.<br>>><br>>> Asymmetric routing:<br>>><br>>>
Here, we're dealing only with host routes and the ingress VTEP/NVE will
route to the virtual subnet where the destination is, so that the
egress VTEP/NVE only does bridging. In VxLAN terms, we're only dealing
with L2 VNIs which need to be provisioned on all VTEPs. MACs are learnt
against a VLAN through kernel notifications, mapped to a VxLAN/VNI and
advertised. Likewise, neighbor entries (ARP/ND) are learnt on an SVI by
listening to kernel notifications; the mapping to the VxLAN/VNI is
straightforward and MACIP routes are originated using this L2 VNI. The
logic on the receive side is straightforward too. The received RTs map
to the VNI which maps to the VLAN. MAC routes would be installed into
the FDB while MACIP routes would result in neighbor entries being
created on the SVI (corresponding to the VLAN).<br>>><br>>>
The above functionality is all present in the PR submitted. While the
target of the PR was just EVPN for L2 with ARP suppression, it can
accomplish routing too. Note that ARP suppression requires some
additional functionality in the Linux kernel which Cumulus Networks is
working to get into the upstream kernel.<br>>><br>>>
The only additional provisioning we had planned to introduce was
whether to advertise our SVI MAC or not - needed only on gateway
devices. This was to be under "address-family l2vpn evpn":<br>>><br>>> router bgp <as><br>>> address-family l2vpn evpn<br>>> advertise-default-gateway<br>>><br>>> [Philippe] <br>>> It picks up default gateway MAC address of the local VNI endpoint ?<br>>> This seems ok for local VNI endpoints.<br>>><br>>><br>>>
Correct. Plus, this is needed only in the centralized gateway scenario,
not in a distributed gateway scenario (where every VTEP/NVE does
L2+L3).<br>>> <br>>><br>>> <br>>><br>>> However, I'll propose some changes to the provisioning at the end of this note.<br>>><br>>> Symmetric routing:<br>>><br>>>
Clearly, this is more scalable and brings in the "inter-connect subnet"
(L3 VNI). It also introduces the ability to do prefix routing with EVPN
type-5 routes.<br>>><br>>><br>>> <br>>><br>>>
The L3 VNI is a parameter per tenant - i.e., per L3 VRF. This is
planned to be the only required/mandatory configuration on top of what
my PR introduces. The tenant (L3 VRF) configuration already exists<br>>><br>>><br>>> <br>>><br>>>
and the L3 VNI was going to be added to it. The RD and RTs (for the
tenant) could be auto-derived from this L3 VNI, but could optionally be
configured.<br>>><br>>> The planned configuration is/was:<br>>><br>>> router bgp <as> vrf <tenant VRF><br>>> <any existing configuration such as "redistribute connected" or "network"><br>>> l2vpn evpn l3vni <vni><br>>> rd <RD><br>>> route-target <import | export | both> <RT><br>>><br>>><br>>> [Philippe]<br>>> I am not sure about the vty you propose.<br>>> If I understand well, you propose to use l2vpn keyword directly under router bgp node ?<br>>><br>>> (config)# router bgp <> vrf <><br>>> (bgpd)# l2vpn evpn l3vni <vni> <--- added command<br>>> (bgpd)# rd <> <wbr> <--- added command<br>>> (bgpd)# route-target <> <wbr> <--- added command<br>>> (bgpd)# address-family l2vpn evpn<br>>> (config-router-evpn)# vni <l2vni><br>>> (config-router-evpn)# ...<br>>> (config-router-evpn)# exit-address-family<br>>> (config-router-evpn)# l2vpn evpn l3vni <l3vni><br>>><br>>> If this is it<br>>>
- The relationship between MAC-VRF ( l2vni) and IP-VRF ( the l2vpn
evpn l3vni configured by RD) is done by the configuration. Right ?<br>>> <br>>><br>>><br>>>
Yes, because the L3 VNI is an operator configured entity. It is
theoretically possible to auto-generate it, though I don't think that is
well supported by the Linux kernel.<br>>><br>>> Note that
that line - "l2vpn evpn l3vni <vni>" - is the only "new" command
here. The RD and RT configuration is given to complete the layer-3
configuration but would apply for L3VPN also (subject to conclusion on
"vrf-policy" as noted).<br>>> <br>>><br>>><br>>>
If the community decision is to configure the RD and RT configs as
"vrf-policy" against the default VRF in BGP, the above will of course
change.<br>>><br>>><br>>><br>>><br>>>
The way symmetric routing operates is as follows. There is no change to
advertisement or reception of MAC-only type-2 routes, these will only
contain the L2 VNI.<br>>><br>>><br>>> [Philippe] Why restrict to sending RT2 L2 VNI only ?<br>>>
I should elaborate an example on how vrf-policy configuration would be
done so as to permit sending RT2 with both labels.<br>>><br>>><br>>>
I meant for MAC-only routes which don't have an IP address, only the L2
VNI is relevant. If you have a use case where MAC-only routes also need
2 labels (VNIs), can you explain that?<br><br>[Philippe2]<br>I would like to elaborate a configuration involving a second label.<br>So this is only for routing/gateway functionality when operating in symmetric mode.<br><br>>> <br>>><br>>> <br>>><br>>>
For MACIP type-2 routes, when the neighbor (ARP/ND) is learnt by
listening to a kernel notification, the SVI that the entry is learnt on
will be part of the tenant's VRF and that will provide the L3 VNI and L3
RTs. The RouterMAC extended community has to be added and the MAC will
be derived from the interface corresponding to the L3 VNI (the
"inter-connect subnet" interface). On the receive side, if the route has
2 VNIs, the MAC and Neighbor entry will be installed against the L2 VNI
(if present locally) as before while the IP host route will be
processed and imported into any L3 VRFs (BGP's RIB) that match its RTs.<br>>><br>>> <br>>> [Philippe] <br>>> by taking an extract of draft-ietf-bess-evpn-inter-<wbr>subnet-forwarding<br>>> "<br>>>
While sending RT2 with L3VNI and L2VNI, you must ensure that RTs refer
to MAC-VRF and IP-VRF ( as per 5.1.1 control plane operation).<br>>> "<br>>> What if there is no L3 VRF Matching locally ? Do you drop the whole incoming entry ?<br>>> I think there should be a control on incoming RT2 messages, against RTs.<br>>><br>>><br>>>
No, the RT2 wouldn't be dropped completely. My understanding is that
the RTs in the incoming RT2 must be matched against BOTH the MAC VRFs
(VNIs in the case of VxLAN) and the IP VRFs, and imported into either or
both as appropriate, IF the RT2 has 2 labels (VNIs).<br>>><br><br>[Philippe2] This can be discussed in a separate thread. We want to get an agreement on vty.<br><br>>> <br>>><br>>><br>>>
For that, to differentiate L3VNI from L2VNI, I would add an attribute
per "vrf-policy" mentioning that this is an IP-VRF or a MAC-VRF.<br>>><br>>> (vrf-policy)# layer layer_3 | layer_2<br>>><br>>> How would you do that filtering based on a CE configuration ?<br>>><br>>><br>>> Did my response above answer this? If not, I need to understand the question some more.<br>>><br>>> <br><br>[Philippe2] The proposal is to add an additional configuration that specifies if a VRF is MAC-VRF or IP-VRF.<br>As per your remark, on the proposed configuration, you don't need it. But I think on vrf-policy, this command could clarify.<br><br>>><br>>><br>>> <br>>><br>>>
There is some special handling required because the next hop is the
remote VTEP/NVE whose MAC should be set up as the received Router MAC.<br>>><br>>>
What the above shows is that there isn't an explicit hierarchy of L2
VNIs (subnets) of a tenant to the tenant's L3 VRF...but it is present
implicitly (the SVIs corresponding to those subnets will be assigned to
the tenant's VRF).<br>>><br>>> For external routing,
the plan is that by default, any routes in the L3 VRF (in BGP's RIB)
will be advertised to EVPN peers as type-5 routes. The current thought
is that this can be controlled using existing route-map constructs
(TBD). Internal (i.e., EVPN) routes are already present in the L3 VRF
(BGP's RIB) as mentioned above. Existing route-maps can be used to
control how these are advertised externally - currently using VRF-lite
BGP peerings, in future using L3VPN.<br>>><br>>> For
inter-DC connectivity, EVPN single-hop or multi-hop peerings can be
setup between the border EVPN routers in each DC. If some/all tenants do
not need their L2 domain stretched across the DCs but only need L3
connectivity (i.e., subnets contained to one DC), only EVPN type-5
routes need to be exchanged on the inter-DC peering. The current plan is
to implement an addition to route-map matching for that - "match evpn
route-type <type>".<br>>><br>>><br>>> [Philippe] <br>>> Indeed Route Type 5 can be used with or without Route Type 2. <br>>> I understand you want to filter out Route Type 2 entries.<br>>><br>>> It is as if you want to filter only L3 VPN information.<br>>> I woud propose a route-map that filters on L3 messages only ( no RT1/RT2/RT3 indeed).<br>>><br>>><br>>>
The above is an OPTIONAL configuration. If there is EVPN peering
between the DCs (and no other peering), by default, all routes would be
exchanged. In the scenario mentioned (and possibly others), there may be
a need to only exchange a particular type of EVPN route, in addition to
other filters (IP, AS-path etc. already exist, we are adding support
for MAC ACLs).<br>>><br><br>[Philippe2] Agree. so the configuration command would filter RT2 for example.<br> <br>>><br>>><br>>> I have a subsidiary question. <br>>>
Suppose you have a MPLS based framework, and you want to use MPLSVPN to
populate the L3VPN of BGP'RIB.Do you have a method to carry that L3
information in BGP MPLSVPN instead of using BGP EVPN RT5 ?<br>>><br>>><br>>>
Yes, the way I envision is that there would be L3VPN peering (instead
of EVPN peering) outside of the DC. EVPN routes within the DC would get
installed in the VRF routing table and L3VPN can pick these up and
advertise (with any needed policy control). L3VPN routes from the
external side would again get installed in the VRF routing table and
EVPN can pick these up and advertise as RT5 within the DC. I haven't
worked out any details yet though.<br>>> <br>[Philippe2] this can be discussed in a separate thread , i don't think it impacts vty.<br><br>>><br>>><br>>> <br>>><br>>> Extending/generalizing the provisioning for the non-VxLAN use case:<br>>><br>>> <br>>> [Philippe] <br>>> As per draft-ietf-bess-evpn-inter-<wbr>subnet-forwarding-03<br>>> "The first BGP Extended Community identifies the tunnel<br>>> type per section 4.5 of [TUNNEL-ENCAP]"<br>>><br>>>
You may need an extra extended community ( see rfc5512) to define the
encapsulation type wished: VXLAN or other encapsulation type.<br>>><br>>><br>>>
The PR submitted already carries/exchanges the ENCAP extended community
though it is filled as VxLAN. The proposed config in this mail can be
used to extend this to carry the desired encap.<br>>> <br><br>[Philippe2]<br>A proposal is made on the spreadshet.<br>Done for PE, to be done for CE.<br><br>>><br>>><br>>> <br>>><br>>><br>>>
In the case of EVPN for VxLAN, a VLAN is mapped to a VxLAN (VNI) by the
operator and whether it is a single broadcast domain per EVI or
multiple broadcast domains per VNI, the VNI is sufficient to identify
the bridge table as per section 5.1.2 of <a target="_blank" href="https://tools.ietf.org/html/draft-ietf-bess-evpn-overlay">https://tools.ietf.org/html/<wbr>draft-ietf-bess-evpn-overlay</a>. This does lend itself to a rather simplified configuration for VxLAN that would be a big advantage to retain.<br>>><br>>>
Whether EVPN should be used for VNIs or not (i.e., "advertise-all-vni"
under BGP L2VPN/EVPN address-family configuration in my PR) should move
to the entity (i.e., zebra) which creates/handles EVIs.<br>>><br>>><br>>> [Philippe] <br>>> I understand you want to have similar command to zebra.<br>>> Nonetheless, I think bgp should keep it too ( for RT auto derivation, but also to control zebra events)<br>>> <br>>><br>>><br>>>
The term "vni" is specific to VxLAN and cannot be used for other EVPN.
Our preference is for "evi" but it is up to the community to decide
whether "evi", "vsi" or something else is the most appropriate.<br>>><br>>> <br>>><br>>><br>>>
For VxLAN, it is convenient to refer to the EVI (Ethernet Virtual
Instance) by its VNI for the common case; for other cases, there is no
such well-known identifier and the EVI is likely to be identified by
name (just like a L3 VRF).<br>>><br>>><br>>> [Philippe] <br>>> As per draft-ietf-bess-evpn-inter-<wbr>subnet-forwarding-03, 5.1.1<br>>> - Label-1 = MPLS Label or VNID corresponding to MAC-VRF<br>>> - Label-2 = MPLS Label or VNID corresponding to IP-VRF<br>>><br>>> It seems VNI can apply to IP-VRF too.<br>>> I would propose to pick up the definition of the draft : <br>>><br>>> "Label " = "MPLS Label or VNID"<br>>><br>>><br>>> Hmm...are you saying to use "label" instead of "vni" in the configuration commands?<br>>> <br><br>[Philippe2] I am less afirmative than previously.<br>On vrf-policy mode, there is already label keyword.<br>However, on global configuration mode, an additional label should be configurable add-vrf<br><br><br>>><br>>> <br>>><br>>><br>>>
The proposed commands are as follows. These are initial thoughts
subject to more refinement - partly because the Linux kernel does not
currently have a forwarding model for L2oMPLS.<br>>><br>>> l2vpn evpn advertise-vni <all | list of VNIs><br>>>
-- The handler of this command will be "zebra" and it is in lieu of the
"advertise-all-vni" command as stated above.<br>>> -- This only applies if using EVPN for VxLAN<br>>><br>>> l2vpn evpn evi <name><br>>><br>>> encapsulation <vxlan | mpls><br>>> bridge-table <table | bridge-name><br>>> <any MPLS/label allocation parameters - if encap is mpls><br>>> <any VxLAN parameters - if encap is vxlan><br>>> -- The above syntax/commands will be used to create EVIs for MPLS, and if needed, for VxLAN.<br>>> -- The handler of these commands will be "zebra"<br>>><br>>><br>>> [Philippe] <br>>> BGPd is the only daemon interested in getting the VNI information ?<br>>><br><br>[Philippe2]<br>Yes, zebra is the daemon that gathers that information.<br>I omitted it.<br><br>>><br>>>
No, zebra continues to be the entity interacting with the kernel, both
for learning all the L2 info (bridges, bridge ports, VLAN-VNI mappings,
MACs etc.) and neighbors as well as installing into the kernel.<br>>><br>>>
We have some nascent thoughts on splitting/reorganizing zebra further,
but nothing planned in the near term and will certainly be discussed in
detail before anything is attempted.<br>>><br>>> <br>>><br>>> Also, the current level of FRR deliberately gives EVPN access to VNI only.<br>>> That implies that Ethernet NVO tunnel is neither MPLS nor NVGRE.<br>>><br>>> If yes, then no need to keep advertise-vni on bgpd. <br>>> If no, then I would want to control the information on both sides.<br>>> <br>>><br>>> router bgp <as><br>>> l2vpn evpn { vni <vni> | evi <name> }<br>>><br>>> [Philippe] <br>>> I have a configuration issue, if you want to do RT2 emission with both L2 and L3 Label.<br>>> Could you please elaborate ?<br>>><br>>><br>>>
It is the presence of this configuration that will determine that RT2
should have a second label. In the case of VxLAN, the L3 VNI value would
be provided here, in the case of MPLS (or something else), the EVI
would have some appropriate configuration to generate this.<br>>> <br>>><br>>> <br>>><br>>> rd <rd><br>>> route-target <import | export | both> <rt><br>>>
-- The above syntax/commands will be used to define the RD/RT
parameters for a VNI/EVI if the auto-derivation is not desired.<br>>> -- The handler of the above will clearly be "bgpd"<br>>><br>>> The L3 VNI configuration - which is against a L3 VRF - is as proposed earlier.<br>>><br>>>
The "advertise-default-gateway" configuration for asymmetric routing
can be modified based on the final consensus on the above.<br>>><br>>><br>>> [Philippe] In the CE purpose, this command is ok for me.<br>>><br>>> Thanks,<br>>><br>>> Philippe</div>