Hi.
Please can someone help me, I am a bit stuck at the moment and not winning?
The image is the lab I am testing. PE1,P1,P2 and PE2 are Junos devices. The leaf/spines are Cumulus implementations on Mellanox switches.
+-------+ +-------+
| | | |
|--------------------> P1 |<-----------------------------------------------> P2 |
| | | | |
| | | | |
| +-------+ +-------+
PODA | ^
+--------------------------------------------+ |
| +-------+ | |
| ^---------------> |<--------------^ | |
| | y.y.y.y | PE1 | | | |
| | |z.z.z.z| | | |
| | | | | | |
| | +-------+ | | PODB |
| | | | +--------------------------------------+
| | | | | | |
| | +-------+ +-------+ | | | +-------+ | |
| | | | | | | | | | | | |
| | | Spine1| | Spine2| | | | | Spine1| | |
| | | | | | | | + | | | | |
| | | | | | | | | | | | |
| | +-------+-- +-------+ | | | +-------+ | |
| | | \-- -- | | | | | | |
| | | \- ---/ | | | | | | |
| | | ---/ | | | | | | |
| | | ---/ \-- | | | | | | |
| | v <-/ \> v | | | v v |
| | +------------+ +-------+ | | | +-------+ +-------+ |
| | | | | | | | | | | | | |
| | | | |Leaf2 | | | | |Leaf1 | | PE2 | |
| v--| Leaf1 | | |---v | | | |<--------> | | |
| | b.b.b.b | |c.c.c.c| | | |x.x.x.x| a.a.a.a |g.g.g.g| |
| +------------+ +-------+ | | +-------+ +-------+ |
| <-- <- | | -> |
| \---- \-- | | --/ |
+--------------------------------------------+ -|-/ |
\--- \> Port2 --/ +--------------------------------------+
\---- +------------------------+</
\---- | | Port3
\-> | |
| Testing Device |
Port1| |
+------------------------+
Internally in both pods, the architecture is L3 and using eBGP by means of unnumbered interfaces, advertising connected routes. In PODA PE1 is connected to both leafs, over separate links with labeled-unicast enabled, using implicit-null. The testing device is a Juniper SRX, with each interface setup in a virtual router, but part of the same subnet. On the switch side, the port facing the testing device is in a bridge, with a VNI setup and the local tunnel endpoint the loopback address. I am also using an SVI on the same subnet. Each port can reach the SVI locally. and testing between Port1 and Port2 is successful via the uplinks to PE1, label switching seems to be working correctly. There is reachability between the loopbacks of all the leafs as well. It seems there is a problem with the route-map since switching to labeled-unicast routes after testing ospf and ldp. I still need to confirm that, in order to test the reachability over the leaf/spine network, as it was working.
I am running into an issue with testing between Port1/Port2 and Port3. It seems like all routes are present, but with the eBGP architecture, the standard operation is to change the next-hop on external routes, but it changes the remote vtep as well. The mac/ip from Port3 is advertised by PODB leaf 1 with the VTEP as x.x.x.x, however displaying it on PODA Leaf1, it reports as y.y.y.y, where y.y.y.y is the link local address of PE1.
root@poda-leaf1:~# net show evpn mac vni 1001
Number of MACs (local and remote) known for this VNI: 3
MAC Type Intf/Remote VTEP VLAN
54:4b:8c:51:1c:a9 local swp13 1001
54:4b:8c:51:1c:ad remote y.y.y.y
root@poda-leaf1:~# net show bgp evpn route vni 1001 mac 54:4b:8c:51:1c:ad
BGP routing table entry for [2]:[0]:[0]:[48]:[54:4b:8c:51:1c:ad]
Paths: (1 available, best #1)
Not advertised to any peer
Route [2]:[0]:[0]:[48]:[54:4b:8c:51:1c:ad] VNI 993
Imported from x.x.x.x:2:[2]:[0]:[0]:[48]:[54:4b:8c:51:1c:ad]
11111 65202
y.y.y.y from y.y.y.y (z.z.z.z)
Origin IGP, metric 200, localpref 100, valid, external, bestpath-from-AS 11111, best
Extended Community: RT:65202:1001 ET:8
AddPath ID: RX 0, TX 56
Last update: Wed Aug 1 11:54:59 2018
I am also not understanding why the route above outputs VNI 993, it does however seem to import correctly into 1001.
root@poda-leaf1:~# net show bgp evpn route vni 1001 vtep y.y.y.y
BGP table version is 33, local router ID is b.b.b.b
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[ESI]:[EthTag]:[IPlen]:[IP]
Network Next Hop Metric LocPrf Weight Path
*> [2]:[0]:[0]:[48]:[54:4b:8c:51:1c:ad]
y.y.y.y 200 0 11111 65202 i
*> [2]:[0]:[0]:[48]:[54:4b:8c:51:1c:ad]:[32]:[10.2.0.200]
y.y.y.y 200 0 11111 65202 i
*> [3]:[0]:[32]:[x.x.x.x]
y.y.y.y 200 0 11111 65202 i
Checking in anything is present for the loopback of podb-leaf1
root@poda-leaf1:~# net show bgp evpn route vni 1001 vtep x.x.x.x
BGP table version is 33, local router ID is b.b.b.b
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[ESI]:[EthTag]:[IPlen]:[IP]
Network Next Hop Metric LocPrf Weight Path
Displayed 9 prefixes (0 paths)
Below is the same command on the originating leaf.
root@podB-leaf-01:~# net show bgp evpn route vni 1001 mac 54:4b:8c:51:1c:ad
BGP routing table entry for [2]:[0]:[0]:[48]:[54:4b:8c:51:1c:ad]
Paths: (1 available, best #1)
Not advertised to any peer
Route [2]:[0]:[0]:[48]:[54:4b:8c:51:1c:ad] VNI 1001
Local
x.x.x.x from 0.0.0.0 (x.x.x.x)
Origin IGP, localpref 100, weight 32768, valid, sourced, local, bestpath-from-AS Local, best
Extended Community: ET:8 RT:65202:1001
AddPath ID: RX 0, TX 63
Last update: Wed Aug 1 11:51:44 2018
Spanning the uplink port to another port on the same switch, allowed me to look at the dataplane, and it does confirm that it is sending the traffic to the wrong destination.
tcpdump on the uplink interface:
12:21:53.724226 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 134)
b.b.b.b.20496 > y.y.y.y.4789: [no cksum] VXLAN, flags [I] (0x08), vni 1001
IP (tos 0x0, ttl 64, id 45715, offset 0, flags [none], proto ICMP (1), length 84)
10.2.0.1 >
10.2.0.200: ICMP echo request, id 20365, seq 231, length 64
0x0000: 4500 0086 0000 4000 4011 a2ec 29c1 77ea E.....@.@...).w.
0x0010: d1cb 2404 5010 12b5 0072 0000 0800 0000 ..$.P....r......
0x0020: 0003 e900 544b 8c51 1cad 544b 8c51 1ca9 ....TK.Q..TK.Q..
0x0030: 0800 4500 0054 b293 0000 4001 b349 0a02 ..E..T....@..I..
0x0040: 0001 0a02 00c8 0800 ae01 4f8d 00e7 5b61 ..........O...[a
0x0050: f7a0 0009 bb7b 0809 0a0b 0c0d 0e0f 1011 .....{..........
0x0060: 1213 1415 1617 1819 1a1b 1c1d 1e1f 2021 ...............!
0x0070: 2223 2425 2627 2829 2a2b 2c2d 2e2f 3031 "#$%&'()*+,-./01
0x0080: 3233 3435 3637 234567
It is showing the VXLAN packet and not MPLS as there is no static entry for it as I am using labeled-unicast distribution.
I would appreciate any assistance as I have spend a lot of hours on this deployment and just keep on failing to get the 2 pods to talk to each other. If you can also confirm if this type of architecture is suppose to work? Below are the config files.
poda-leaf1 interface file:
auto lo
iface lo inet loopback
address b.b.b.b/32
auto swp3
iface swp3
address y.y.y.z/31
mpls-enable yes
mtu 9178
auto bridge
iface bridge
bridge-ports swp13 vni1001
bridge-pvid 1
bridge-vids 1001
bridge-vlan-aware yes
auto vlan1001
iface vlan1001
#hwaddress 44:39:39:FF:40:94
vlan-id 1001
vlan-raw-device bridge
auto vni1001
iface vni1001
bridge-access 1001
bridge-arp-nd-suppress on
bridge-learning off
mstpctl-bpduguard yes
mstpctl-portbpdufilter yes
vxlan-id 1001
vxlan-local-tunnelip b.b.b.b
poda-leaf1 frr.conf
router bgp 65200
bgp router-id b.b.b.b
coalesce-time 1000
bgp bestpath as-path multipath-relax
bgp bestpath compare-routerid
neighbor fabric peer-group
neighbor fabric remote-as external
neighbor fabric description Internal Fabric Network
neighbor fabric capability extended-nexthop
neighbor swp47 interface peer-group fabric
neighbor swp48 interface peer-group fabric
neighbor y.y.y.y remote-as 11111
neighbor y.y.y.y ebgp-multihop 3
!
address-family ipv4 unicast
network b.b.b.b/32
redistribute connected
no neighbor y.y.y.y activate
export vpn
exit-address-family
!
address-family ipv4 labeled-unicast
neighbor y.y.y.y activate
neighbor y.y.y.y route-map HigherMetric in
exit-address-family
!
address-family l2vpn evpn
neighbor fabric activate
neighbor y.y.y.y activate
neighbor y.y.y.y route-map HigherMetric in
advertise-all-vni
exit-address-family
!
route-map HigherMetric permit 10
set metric 200
!
ip route z.z.z.z/32 y.y.y.y
!
mpls label global-block 16 1000
mpls label bind b.b.b.b/32 implicit-null
mpls label bind c.c.c.c/32 102
mpls label bind x.x.x.x/32 103
podb-leaf01 interface file
auto lo
iface lo inet loopback
address x.x.x.x/32
auto swp3
iface swp3
address a.a.a.b/31
mpls-enable yes
mtu 9178
auto bridge
iface bridge
bridge-ports swp5 swp13 vni1001
bridge-pvid 1
bridge-vids 1001
bridge-vlan-aware yes
auto vlan1001
iface vlan1001
vlan-id 1001
vlan-raw-device bridge
auto vni1001
iface vni1001
bridge-access 1001
bridge-arp-nd-suppress on
bridge-learning off
mstpctl-bpduguard yes
mstpctl-portbpdufilter yes
vxlan-id 1001
vxlan-local-tunnelip x.x.x.x
podb-leaf1 frr.conf
router bgp 65202
bgp router-id x.x.x.x
coalesce-time 1000
bgp bestpath as-path multipath-relax
bgp bestpath compare-routerid
neighbor fabric peer-group
neighbor fabric remote-as external
neighbor fabric description Internal Fabric Network
neighbor fabric capability extended-nexthop
neighbor swp47 interface peer-group fabric
neighbor swp48 interface peer-group fabric
neighbor a.a.a.a remote-as 11111
neighbor a.a.a.a ebgp-multihop 3
!
address-family ipv4 unicast
network x.x.x.x/32
redistribute connected
no neighbor a.a.a.a activate
export vpn
exit-address-family
!
address-family ipv4 labeled-unicast
neighbor a.a.a.a activate
exit-address-family
!
address-family l2vpn evpn
neighbor fabric activate
neighbor a.a.a.a activate
advertise-all-vni
exit-address-family
!
ip route g.g.g.g/32 a.a.a.a
!
mpls label global-block 16 1000
mpls label bind b.b.b.b/32 301
mpls label bind c.c.c.c/32 302
mpls label bind x.x.x.x/32 implicit-null