regarding - ospf-loop issue with the given topology

Santosh P K sapk at vmware.com
Wed Nov 13 01:11:36 EST 2019


Hello Palpandi,
     I have replied to your query on slack. Also Donald pointed out that in FRR 4.0 there are issue around MAX-ageing and many issues are addressed in master. Could you see if your concerns are addressed in master?

Thanks
Santosh P K

From: Palpandi Perumal <palp at pluribusnetworks.com>
Date: Tuesday, 12 November 2019 at 5:56 PM
To: <dev at lists.frrouting.org>
Subject: regarding - ospf-loop issue with the given topology

Hi All,
Please find the topology attached.

Whenever the node "uspine02" was rebooted, we ended in an ospf loop intermittently.
 We are seeing one anomaly sequence and we are not able to get the trigger point of it.
Once we ended up in the promatic state.
From the debug log,
The rebooted us-spine2 says, the received network-lsa is originated from me, so i should ignore that lsa to flood in broadcast domain and set the max-age that i have already installed.In this case, LS ack will not be sent from us-spine2, so gh-core1 again will add that LSA in re-transmit list and will process next 10s interval. In the mean-time, us-spine2 will broadcast the max-age LSA to ghcore for removing that route. Once that route was removed by us-spine2 max-age network-lsa, us-spine1 will send proper network-lsa immediately to ghcore1 to re-install that route so after that event, re-transmit list will be processed again on gh-core1, will send that network-lsa to us-spine2 and us-spine2 will be seeing that lsa as self-originated LSA and will do the above thing again. ----------> this anomaly sequence led our switches in that state.

It is a timing problem. That is the reason we are not hitting consistently.
Root cause:
Before uspine2 goes to reboot, it would have been in DR and would have generated one network-lsa to this broadcast domain area 204.
That LSA would have received on ghcore1. ghcore1 considered that LSA as proper LSA and installed it and ghcore1 flood that LSA again to area 204 at that time uspine2 would have established back with BDR and received the same LSA and considered that LSA as self-originated LSA.

Potential fix based on the RFC 2328 section 13:
Before flooding the LSA to broadcast domain.Check whether the received is self-originated.

diff -r dc50bb05b29e usr/src/cmd/FRRouting/frr-master/ospfd/ospf_flood.c
--- a/usr/src/cmd/FRRouting/frr-master/ospfd/ospf_flood.c Tue Nov 05 05:44:37 2019 -0800
+++ b/usr/src/cmd/FRRouting/frr-master/ospfd/ospf_flood.c Tue Nov 05 08:51:51 2019 -0800
@@ -925,6 +925,26 @@
  old->retransmit_counter--;
  ospf_lsdb_delete(&nbr->ls_rxmt, old);
  }
+              /*
+  * Please refer section 13.1 in RFC 2328
+  * Flooding procedure is not applicable for self
+  * originating lsa. Unfortunately we ended up the
+  * self-originated lsa to be added in retransmit list
+  * through flood caller.
+               * while adding this lsa to re-transmit list,
+               * need to confirm whether this is self-originated lsa.
+  * If its, it should get remove in lsdb and shouldnt add
+  * in retransmit list.
+               */
+ if (ospf_lsa_is_self_originated(nbr->oi->ospf, lsa)) {
+ if (IS_DEBUG_OSPF(lsa, LSA_FLOODING))
+ zlog_debug("self originated RXmtL(%lu)++,"
+ " NBR(%s), LSA[%s]",
+ ospf_ls_retransmit_count(nbr),
+ inet_ntoa(nbr->router_id),
+ dump_lsa_key(lsa));
+ return;
+ }


Thanks
Palpandi P


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.frrouting.org/pipermail/dev/attachments/20191113/b8605740/attachment.html>


More information about the dev mailing list