Recursive lookup through BGP-LU route
Hi, We're experimenting with BGP-LU and frr in our lab. We have two host machines running frr. Each is connected to two TOR switches through their eth0 and eth1 links (four TORs total). Those are connected through another pair of switches. We have configured BGP-LU to distribute MPLS labels over ebpg throughout. Connectivity between loopback addresses on the two hosts works well at this point. Then, I created some namespaces in the hosts and gave them addresses. We use BGP to announce those addresses as /32 routes with the loopback address as the next hop. The route between namespaces on the two machines is recursively resolved so that they push the same MPLS label as the path to the other loopback. This all works well to start off with and we did some iperf runs that showed pretty good results. ECMP was working because the bandwidth was higher than any single link. After some link state changes, we seemed to lose the connection. However, pings between the loopback addresses still worked. After some time, we noticed that the MPLS labels in the routes to the namespace addresses (/32s) were different than the label in the route to the loopback. Since the former routes are resolved recursively using the latter, the labels should always be the same. Could this be a bug in FRR? Shouldn't the routes to the namespaces be invalidated or updated as soon as the route it was based on changed? The traffic between namespaces is getting dropped because the switch doesn't know about the label being pushed by the host. Any insight would be very helpful. Thanks! Carl Baldwin Here are /32 routes received. 10.112.128.1 is the loopback on the other host. The four routes are to four namespaces on the other host. lab1r2u05# show ip bgp neighbor 10.112.97.1 received-routes BGP table version is 0, local router ID is 10.112.128.2 Status codes: s suppressed, d damped, h history, * valid, > best, = multipath, i internal, r RIB-failure, S Stale, R Removed Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop Metric LocPrf Weight Path *> 10.224.12.10/32 10.112.128.1 0 4206900001 4206909998 i *> 10.224.12.15/32 10.112.128.1 0 4206900001 4206909998 i *> 10.224.12.70/32 10.112.128.1 0 4206900001 4206909998 i *> 10.224.12.75/32 10.112.128.1 0 4206900001 4206909998 i Total number of prefixes 4 Below is the routing table as it looked when we lost connectivity. Notice that the mpls label for the loopback route is 306592 via eth1 but the label for the four namespace addresses is 306576. root@lab1r2u05:~/ovs-droplets# ip route default via 10.112.2.132 dev eth2 10.112.2.128/25 dev eth2 proto kernel scope link src 10.112.2.145 10.112.128.1 encap mpls 306592 via 10.112.129.9 dev eth1 proto 186 metric 20 10.112.129.8/30 dev eth1 proto kernel scope link src 10.112.129.10 10.112.129.12/30 dev eth0 proto kernel scope link src 10.112.129.14 10.224.12.10 encap mpls 306576 via 10.112.129.9 dev eth1 proto 186 metric 20 10.224.12.15 encap mpls 306576 via 10.112.129.9 dev eth1 proto 186 metric 20 10.224.12.70 encap mpls 306576 via 10.112.129.9 dev eth1 proto 186 metric 20 10.224.12.75 encap mpls 306576 via 10.112.129.9 dev eth1 proto 186 metric 20 10.224.12.20 dev br0 scope link 10.224.12.25 dev br0 scope link 10.224.12.80 dev br0 scope link 10.224.12.85 dev br0 scope link
Can we get the output of `show ip route`, `show ip route 10.112.129.9`, `show mpls fec`, and `show mpls table`? donald On Mon, Jun 25, 2018 at 3:45 PM, Carl Baldwin <carl@ecbaldwin.net> wrote:
Hi,
We're experimenting with BGP-LU and frr in our lab. We have two host machines running frr. Each is connected to two TOR switches through their eth0 and eth1 links (four TORs total). Those are connected through another pair of switches. We have configured BGP-LU to distribute MPLS labels over ebpg throughout. Connectivity between loopback addresses on the two hosts works well at this point.
Then, I created some namespaces in the hosts and gave them addresses. We use BGP to announce those addresses as /32 routes with the loopback address as the next hop. The route between namespaces on the two machines is recursively resolved so that they push the same MPLS label as the path to the other loopback. This all works well to start off with and we did some iperf runs that showed pretty good results. ECMP was working because the bandwidth was higher than any single link.
After some link state changes, we seemed to lose the connection. However, pings between the loopback addresses still worked. After some time, we noticed that the MPLS labels in the routes to the namespace addresses (/32s) were different than the label in the route to the loopback. Since the former routes are resolved recursively using the latter, the labels should always be the same. Could this be a bug in FRR? Shouldn't the routes to the namespaces be invalidated or updated as soon as the route it was based on changed? The traffic between namespaces is getting dropped because the switch doesn't know about the label being pushed by the host.
Any insight would be very helpful.
Thanks! Carl Baldwin
Here are /32 routes received. 10.112.128.1 is the loopback on the other host. The four routes are to four namespaces on the other host.
lab1r2u05# show ip bgp neighbor 10.112.97.1 received-routes BGP table version is 0, local router ID is 10.112.128.2 Status codes: s suppressed, d damped, h history, * valid, > best, = multipath, i internal, r RIB-failure, S Stale, R Removed Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path *> 10.224.12.10/32 10.112.128.1 0 4206900001 4206909998 i *> 10.224.12.15/32 10.112.128.1 0 4206900001 4206909998 i *> 10.224.12.70/32 10.112.128.1 0 4206900001 4206909998 i *> 10.224.12.75/32 10.112.128.1 0 4206900001 4206909998 i
Total number of prefixes 4
Below is the routing table as it looked when we lost connectivity. Notice that the mpls label for the loopback route is 306592 via eth1 but the label for the four namespace addresses is 306576.
root@lab1r2u05:~/ovs-droplets# ip route default via 10.112.2.132 dev eth2 10.112.2.128/25 dev eth2 proto kernel scope link src 10.112.2.145 10.112.128.1 encap mpls 306592 via 10.112.129.9 dev eth1 proto 186 metric 20 10.112.129.8/30 dev eth1 proto kernel scope link src 10.112.129.10 10.112.129.12/30 dev eth0 proto kernel scope link src 10.112.129.14 10.224.12.10 encap mpls 306576 via 10.112.129.9 dev eth1 proto 186 metric 20 10.224.12.15 encap mpls 306576 via 10.112.129.9 dev eth1 proto 186 metric 20 10.224.12.70 encap mpls 306576 via 10.112.129.9 dev eth1 proto 186 metric 20 10.224.12.75 encap mpls 306576 via 10.112.129.9 dev eth1 proto 186 metric 20 10.224.12.20 dev br0 scope link 10.224.12.25 dev br0 scope link 10.224.12.80 dev br0 scope link 10.224.12.85 dev br0 scope link
_______________________________________________ dev mailing list dev@lists.frrouting.org https://lists.frrouting.org/listinfo/dev
Hi Donald, Thank you for your reply. I hope this isn't too jarring but we had to reproduce the issue on a slightly different host connected to different TORs. The issue is the same but the IPs are a bit different. Hopefully this is enough data. The network engineer with whom I'm working said to reproduce it all he had to do was "disable interface on switch towards eth0". Carl First, the host routes are a bit different here lab1r1u05# show ip bgp neighbor 10.112.97.1 received-routes BGP table version is 0, local router ID is 10.112.128.1 Status codes: s suppressed, d damped, h history, * valid, > best, = multipath, i internal, r RIB-failure, S Stale, R Removed Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop Metric LocPrf Weight Path *> 10.224.12.20/32 10.112.128.2 0 4206900001 4206909999 i *> 10.224.12.25/32 10.112.128.2 0 4206900001 4206909999 i *> 10.224.12.80/32 10.112.128.2 0 4206900001 4206909999 i *> 10.224.12.85/32 10.112.128.2 0 4206900001 4206909999 i Total number of prefixes 4 This is `show ip route` when the issue occurs. Note the difference in MPLS labels between the route to `10.112.128.2/32` <http://10.112.128.2/32> and the one to `10.224.12.25/32` <http://10.224.12.25/32>. lab1r1u05# show ip route Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, P - PIM, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP, > - selected route, * - FIB route K>* 0.0.0.0/0 [0/0] via 10.112.2.4, eth2, 04:27:00 C>* 10.112.2.0/25 is directly connected, eth2, 04:27:00 B>* 10.112.97.1/32 [20/0] via 10.112.129.1, eth1, label 696097, 04:26:57 * via 10.112.129.5, eth0, label 46, 04:26:57 C>* 10.112.128.1/32 is directly connected, lo, 04:27:00 B>* 10.112.128.2/32 [20/0] via 10.112.129.1, eth1, label 702241, 00:00:59 * via 10.112.129.5, eth0, label 564, 00:00:59 C>* 10.112.128.10/32 is directly connected, lo, 04:27:00 C>* 10.112.128.100/32 is directly connected, lo, 04:27:00 C>* 10.112.129.0/30 is directly connected, eth1, 04:27:00 C>* 10.112.129.4/30 is directly connected, eth0, 04:27:00 K>* 10.112.129.8/30 [0/0] via 10.112.129.1, eth1, 04:27:00 K>* 10.224.12.10/32 [0/0] is directly connected, br0, 04:27:00 K>* 10.224.12.15/32 [0/0] is directly connected, br0, 04:27:00 B> 10.224.12.20/32 [20/0] via 10.112.128.2 (recursive), 04:26:51 * via 10.112.129.1, eth1, label 702161, 04:26:51 * via 10.112.129.5, eth0, label 560, 04:26:51 B> 10.224.12.25/32 [20/0] via 10.112.128.2 (recursive), 04:26:51 * via 10.112.129.1, eth1, label 702161, 04:26:51 * via 10.112.129.5, eth0, label 560, 04:26:51 K>* 10.224.12.70/32 [0/0] is directly connected, br0, 04:27:00 K>* 10.224.12.75/32 [0/0] is directly connected, br0, 04:27:00 B> 10.224.12.80/32 [20/0] via 10.112.128.2 (recursive), 04:26:51 * via 10.112.129.1, eth1, label 702161, 04:26:51 * via 10.112.129.5, eth0, label 560, 04:26:51 B> 10.224.12.85/32 [20/0] via 10.112.128.2 (recursive), 04:26:51 * via 10.112.129.1, eth1, label 702161, 04:26:51 * via 10.112.129.5, eth0, label 560, 04:26:51 C>* 100.64.0.0/24 is directly connected, br0, 04:27:00 Here is the route to the two TORs to which FRR is connected directly. lab1r1u05# show ip route 10.112.129.1 Routing entry for 10.112.129.0/30 Known via "connected", distance 0, metric 0, best Last update 04:28:52 ago * directly connected, eth1 lab1r1u05# show ip route 10.112.129.5 Routing entry for 10.112.129.4/30 Known via "connected", distance 0, metric 0, best Last update 04:28:55 ago * directly connected, eth0 The MPLS fec lab1r1u05# show mpls fec 10.112.97.1/32 Label: 4294836223 Client list: bgp(fd 11) 10.112.128.2/32 Label: 4294836223 Client list: bgp(fd 11) 10.112.128.100/32 Label: 4294836223 Client list: bgp(fd 11) 10.224.12.10/32 Label: 4294836223 Client list: bgp(fd 11) 10.224.12.15/32 Label: 4294836223 Client list: bgp(fd 11) 10.224.12.20/32 Label: 4294836223 Client list: bgp(fd 11) 10.224.12.25/32 Label: 4294836223 Client list: bgp(fd 11) 10.224.12.70/32 Label: 4294836223 Client list: bgp(fd 11) 10.224.12.75/32 Label: 4294836223 Client list: bgp(fd 11) 10.224.12.80/32 Label: 4294836223 Client list: bgp(fd 11) 10.224.12.85/32 Label: 4294836223 Client list: bgp(fd 11) 2604:a880:801:201::1/128 Label: 4294836223 Client list: bgp(fd 11) 2604:a880:801:202::2/128 Label: 4294836223 Client list: bgp(fd 11) 2604:a880:801:203::10/128 Label: 4294836223 Client list: bgp(fd 11) 2604:a880:801:203::15/128 Label: 4294836223 Client list: bgp(fd 11) 2604:a880:801:203::20/128 Label: 4294836223 Client list: bgp(fd 11) 2604:a880:801:203::25/128 Label: 4294836223 Client list: bgp(fd 11) 2604:a880:801:204::70/128 Label: 4294836223 Client list: bgp(fd 11) 2604:a880:801:204::75/128 Label: 4294836223 Client list: bgp(fd 11) 2604:a880:801:204::80/128 Label: 4294836223 Client list: bgp(fd 11) 2604:a880:801:204::85/128 Label: 4294836223 Client list: bgp(fd 11) ... and the MPLS table. lab1r1u05# show mpls table (yes, it is empty) Inbound Outbound Label Type Nexthop Label -------- ------- --------------- -------- On Tue, Jun 26, 2018 at 6:11 AM Donald Sharp <sharpd@cumulusnetworks.com> wrote:
Can we get the output of `show ip route`, `show ip route 10.112.129.9`, `show mpls fec`, and `show mpls table`?
donald
On Mon, Jun 25, 2018 at 3:45 PM, Carl Baldwin <carl@ecbaldwin.net> wrote:
Hi,
We're experimenting with BGP-LU and frr in our lab. We have two host machines running frr. Each is connected to two TOR switches through their eth0 and eth1 links (four TORs total). Those are connected through another pair of switches. We have configured BGP-LU to distribute MPLS labels over ebpg throughout. Connectivity between loopback addresses on the two hosts works well at this point.
Then, I created some namespaces in the hosts and gave them addresses. We use BGP to announce those addresses as /32 routes with the loopback address as the next hop. The route between namespaces on the two machines is recursively resolved so that they push the same MPLS label as the path to the other loopback. This all works well to start off with and we did some iperf runs that showed pretty good results. ECMP was working because the bandwidth was higher than any single link.
After some link state changes, we seemed to lose the connection. However, pings between the loopback addresses still worked. After some time, we noticed that the MPLS labels in the routes to the namespace addresses (/32s) were different than the label in the route to the loopback. Since the former routes are resolved recursively using the latter, the labels should always be the same. Could this be a bug in FRR? Shouldn't the routes to the namespaces be invalidated or updated as soon as the route it was based on changed? The traffic between namespaces is getting dropped because the switch doesn't know about the label being pushed by the host.
Any insight would be very helpful.
Thanks! Carl Baldwin
Here are /32 routes received. 10.112.128.1 is the loopback on the other host. The four routes are to four namespaces on the other host.
lab1r2u05# show ip bgp neighbor 10.112.97.1 received-routes BGP table version is 0, local router ID is 10.112.128.2 Status codes: s suppressed, d damped, h history, * valid, > best, = multipath, i internal, r RIB-failure, S Stale, R Removed Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path *> 10.224.12.10/32 10.112.128.1 0 4206900001 4206909998 i *> 10.224.12.15/32 10.112.128.1 0 4206900001 4206909998 i *> 10.224.12.70/32 10.112.128.1 0 4206900001 4206909998 i *> 10.224.12.75/32 10.112.128.1 0 4206900001 4206909998 i
Total number of prefixes 4
Below is the routing table as it looked when we lost connectivity. Notice that the mpls label for the loopback route is 306592 via eth1 but the label for the four namespace addresses is 306576.
root@lab1r2u05:~/ovs-droplets# ip route default via 10.112.2.132 dev eth2 10.112.2.128/25 dev eth2 proto kernel scope link src 10.112.2.145 10.112.128.1 encap mpls 306592 via 10.112.129.9 dev eth1 proto 186 metric 20 10.112.129.8/30 dev eth1 proto kernel scope link src 10.112.129.10 10.112.129.12/30 dev eth0 proto kernel scope link src 10.112.129.14 10.224.12.10 encap mpls 306576 via 10.112.129.9 dev eth1 proto 186 metric 20 10.224.12.15 encap mpls 306576 via 10.112.129.9 dev eth1 proto 186 metric 20 10.224.12.70 encap mpls 306576 via 10.112.129.9 dev eth1 proto 186 metric 20 10.224.12.75 encap mpls 306576 via 10.112.129.9 dev eth1 proto 186 metric 20 10.224.12.20 dev br0 scope link 10.224.12.25 dev br0 scope link 10.224.12.80 dev br0 scope link 10.224.12.85 dev br0 scope link
_______________________________________________ dev mailing list dev@lists.frrouting.org https://lists.frrouting.org/listinfo/dev
And here is `ip route` after bringing the link from eth0 up again. It still lines up with the `show ip route` output I sent earlier. Carl root@lab1r1u05:~/ovs-droplets# ip route default via 10.112.2.4 dev eth2 10.112.2.0/25 dev eth2 proto kernel scope link src 10.112.2.15 10.112.97.1 proto 186 metric 20 nexthop encap mpls 696097 via 10.112.129.1 dev eth1 weight 1 nexthop encap mpls 46 via 10.112.129.5 dev eth0 weight 1 10.112.128.2 proto 186 metric 20 nexthop encap mpls 702241 via 10.112.129.1 dev eth1 weight 1 nexthop encap mpls 564 via 10.112.129.5 dev eth0 weight 1 10.112.129.0/30 dev eth1 proto kernel scope link src 10.112.129.2 10.112.129.4/30 dev eth0 proto kernel scope link src 10.112.129.6 10.112.129.8/30 via 10.112.129.1 dev eth1 10.224.12.10 dev br0 scope link 10.224.12.15 dev br0 scope link 10.224.12.20 proto 186 metric 20 nexthop encap mpls 702161 via 10.112.129.1 dev eth1 weight 1 nexthop encap mpls 560 via 10.112.129.5 dev eth0 weight 1 10.224.12.25 proto 186 metric 20 nexthop encap mpls 702161 via 10.112.129.1 dev eth1 weight 1 nexthop encap mpls 560 via 10.112.129.5 dev eth0 weight 1 10.224.12.70 dev br0 scope link 10.224.12.75 dev br0 scope link 10.224.12.80 proto 186 metric 20 nexthop encap mpls 702161 via 10.112.129.1 dev eth1 weight 1 nexthop encap mpls 560 via 10.112.129.5 dev eth0 weight 1 10.224.12.85 proto 186 metric 20 nexthop encap mpls 702161 via 10.112.129.1 dev eth1 weight 1 nexthop encap mpls 560 via 10.112.129.5 dev eth0 weight 1 100.64.0.0/24 dev br0 proto kernel scope link src 100.64.0.1 On Tue, Jun 26, 2018 at 6:11 AM Donald Sharp <sharpd@cumulusnetworks.com> wrote:
Can we get the output of `show ip route`, `show ip route 10.112.129.9`, `show mpls fec`, and `show mpls table`?
donald
On Mon, Jun 25, 2018 at 3:45 PM, Carl Baldwin <carl@ecbaldwin.net> wrote:
Hi,
We're experimenting with BGP-LU and frr in our lab. We have two host machines running frr. Each is connected to two TOR switches through their eth0 and eth1 links (four TORs total). Those are connected through another pair of switches. We have configured BGP-LU to distribute MPLS labels over ebpg throughout. Connectivity between loopback addresses on the two hosts works well at this point.
Then, I created some namespaces in the hosts and gave them addresses. We use BGP to announce those addresses as /32 routes with the loopback address as the next hop. The route between namespaces on the two machines is recursively resolved so that they push the same MPLS label as the path to the other loopback. This all works well to start off with and we did some iperf runs that showed pretty good results. ECMP was working because the bandwidth was higher than any single link.
After some link state changes, we seemed to lose the connection. However, pings between the loopback addresses still worked. After some time, we noticed that the MPLS labels in the routes to the namespace addresses (/32s) were different than the label in the route to the loopback. Since the former routes are resolved recursively using the latter, the labels should always be the same. Could this be a bug in FRR? Shouldn't the routes to the namespaces be invalidated or updated as soon as the route it was based on changed? The traffic between namespaces is getting dropped because the switch doesn't know about the label being pushed by the host.
Any insight would be very helpful.
Thanks! Carl Baldwin
Here are /32 routes received. 10.112.128.1 is the loopback on the other host. The four routes are to four namespaces on the other host.
lab1r2u05# show ip bgp neighbor 10.112.97.1 received-routes BGP table version is 0, local router ID is 10.112.128.2 Status codes: s suppressed, d damped, h history, * valid, > best, = multipath, i internal, r RIB-failure, S Stale, R Removed Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path *> 10.224.12.10/32 10.112.128.1 0 4206900001 4206909998 i *> 10.224.12.15/32 10.112.128.1 0 4206900001 4206909998 i *> 10.224.12.70/32 10.112.128.1 0 4206900001 4206909998 i *> 10.224.12.75/32 10.112.128.1 0 4206900001 4206909998 i
Total number of prefixes 4
Below is the routing table as it looked when we lost connectivity. Notice that the mpls label for the loopback route is 306592 via eth1 but the label for the four namespace addresses is 306576.
root@lab1r2u05:~/ovs-droplets# ip route default via 10.112.2.132 dev eth2 10.112.2.128/25 dev eth2 proto kernel scope link src 10.112.2.145 10.112.128.1 encap mpls 306592 via 10.112.129.9 dev eth1 proto 186 metric 20 10.112.129.8/30 dev eth1 proto kernel scope link src 10.112.129.10 10.112.129.12/30 dev eth0 proto kernel scope link src 10.112.129.14 10.224.12.10 encap mpls 306576 via 10.112.129.9 dev eth1 proto 186 metric 20 10.224.12.15 encap mpls 306576 via 10.112.129.9 dev eth1 proto 186 metric 20 10.224.12.70 encap mpls 306576 via 10.112.129.9 dev eth1 proto 186 metric 20 10.224.12.75 encap mpls 306576 via 10.112.129.9 dev eth1 proto 186 metric 20 10.224.12.20 dev br0 scope link 10.224.12.25 dev br0 scope link 10.224.12.80 dev br0 scope link 10.224.12.85 dev br0 scope link
_______________________________________________ dev mailing list dev@lists.frrouting.org https://lists.frrouting.org/listinfo/dev
participants (2)
-
Carl Baldwin -
Donald Sharp