[FROG] TCP retransmits

Tue Dec 31 05:02:06 EST 2019

On 30/12/2019 20:24, Travis Garrison wrote:
> We are getting sporadic speed issues and other weirdness after we installed our 2 new cores running FRR. The routers are HP DL380G8 servers with 64Gb of ram, dual Intel Xeon E5-2690 V2 3GHz 10 core CPUs. We disabled hyperthreading. The nics are Mellanox ConnectX3 Pros 40Gb using the native drivers from Debian 10. All updates and firmware have been applied. We show no errors in the frr.log or messages. We did notice that in a packet capture we are getting a lot of TCP retransmits and TCP out of orders on the internal side. Looks to be about 40 to 50% of our traffic is retransmits. Anyone here have any ideas?
>
> We have 2 10Gb internet feeds coming into 2 Mikrotik CRS326-24S+2Q+RM core switches. These switches are then connected with 40Gb DAC cables to the core routers on interface eno1. The switches are not connected together. The 2 core routers also have a 40Gb DAC cable connecting both together on interface eno1d1. Also we have 2 10Gb fiber feeds running to all the edge routers which are Mikrotik RB4011 or CCR1009s that are connected to the core switches. I can draw a diagram out but was unsure if I could attach it to this email or not. The core routers are running an MTU size of 2000 (with the exception of the ISP feed which is running 1500). We have verified that the edge routers and core switches are also running at 2000 MTU. The fiber network supports up to 8096 MTU. ISP1 is giving us full tables while ISP2 is currently giving us a default route only. They will be adding the capability of BGP and full routes in the next month. Through ISP1 we are also joining the local IXC and will be peering with Netflix. Do we need to redistribute the routes between the 2 cores?

Jump straight down to blackholes.

1) Are you sure that your wireshark traces are showing retransmits and 
not just showing retransmits because it is seeing the same packet 
twice.  Once on the way in and once on the way out?

if you look at the packets, do they show the vlan headers?   If you 
wireshark on the base interface, it will definitely happen.

(I think it's a debian thing, but I'm

2) providing ISP1 is giving you a reasonably full table, then you will 
be sending very little traffic outbound to ISP2.   The more precise 
route wins.

I'd get full table from both ASAP.    And then remove `gateway 
1x.x.58.113`   - you don't want a default gateway to an ISP when you 
have full tables

3) Blackholes.  I'm presuming that you have a block of IPv4 address.   
And OSPF through to the edge.

So  very counter-intuitively you need to blackhole your whole IP space 
on your core routers.   The more precise routes from OSPF will win.

If there are any parts of your address space that don't route 
internally, and somebody network scans you.   Then the packets will 
leave via the default route.   Your ISP will send them back again.

Actually, this is it.

My line is:

ip route 185.224.188.0/22 Null0 200

4) When all sorted, consider setting up BFD on your ISP links. BGP takes 
forever to time out.    (although I'm not 100% convinced by FRR BFD at 
the moment)