regarding - ospf-loop issue with the given topology

Palpandi Perumal palp at pluribusnetworks.com
Thu Nov 14 01:43:31 EST 2019


Hi Santosh,
Thanks for the feedback and timely response. We have done many
customisations on FRR 4.0 so integrating FRR 7.0 to our architecture is not
easy work.

For the above problem, we don't know the trigger. Once we ended up in the
problematic state( from the ospfd log)
    -One switch is saying its originated from me and on other switch
expects ls-ack for it.

Based on this point in RFC 2328:

13 <https://tools.ietf.org/html/rfc2328#section-13>.  The Flooding Procedure

    Link State Update packets provide the mechanism for flooding LSAs.
    A Link State Update packet may contain several distinct LSAs, and
    floods each LSA one hop further from its point of origination.


Added a potential fix - in somehow this network LSA added on re-transmit
list through flooding procedure so before adding the LSA to the retransmit
list , checking whether the LSA is self-originated and if its, it need not
to be flooded to the broadcast domain ( based on the above point).

After this fix, we haven't seen this problem.
Could you give us feedback on this fix?
It's a tedious work for us to migrate from FRR4.0 to FRR7.0.

Thanks
Palpandi P


On Wed, Nov 13, 2019 at 11:41 AM Santosh P K <sapk at vmware.com> wrote:

> Hello Palpandi,
>
>      I have replied to your query on slack. Also Donald pointed out that
> in FRR 4.0 there are issue around MAX-ageing and many issues are addressed
> in master. Could you see if your concerns are addressed in master?
>
>
>
> Thanks
>
> Santosh P K
>
>
>
> *From: *Palpandi Perumal <palp at pluribusnetworks.com>
> *Date: *Tuesday, 12 November 2019 at 5:56 PM
> *To: *<dev at lists.frrouting.org>
> *Subject: *regarding - ospf-loop issue with the given topology
>
>
>
> Hi All,
>
> Please find the topology attached.
>
>
>
> Whenever the node "uspine02" was rebooted, we ended in an ospf loop
> intermittently.
>
>  We are seeing one anomaly sequence and we are not able to get the trigger
> point of it.
>
> Once we ended up in the promatic state.
>
> From the debug log,
>
> The rebooted us-spine2 says, the received network-lsa is originated from
> me, so i should ignore that lsa to flood in broadcast domain and set the
> max-age that i have already installed.In this case, LS ack will not be sent
> from us-spine2, so gh-core1 again will add that LSA in re-transmit list and
> will process next 10s interval. In the mean-time, us-spine2 will broadcast
> the max-age LSA to ghcore for removing that route. Once that route was
> removed by us-spine2 max-age network-lsa, us-spine1 will send proper
> network-lsa immediately to ghcore1 to re-install that route so after that
> event, re-transmit list will be processed again on gh-core1, will send that
> network-lsa to us-spine2 and us-spine2 will be seeing that lsa as
> self-originated LSA and will do the above thing again. ----------> this
> anomaly sequence led our switches in that state.
>
>
>
> It is a timing problem. That is the reason we are not hitting consistently.
>
> *Root cause:*
>
> Before uspine2 goes to reboot, it would have been in DR and would have
> generated one network-lsa to this broadcast domain area 204.
>
> That LSA would have received on ghcore1. ghcore1 considered that LSA as
> proper LSA and installed it and ghcore1 flood that LSA again to area 204 at
> that time uspine2 would have established back with BDR and received the
> same LSA and considered that LSA as self-originated LSA.
>
>
>
> *Potential fix based on the RFC 2328 section 13:*
>
> Before flooding the LSA to broadcast domain.Check whether the received is
> self-originated.
>
>
>
> diff -r dc50bb05b29e usr/src/cmd/FRRouting/frr-master/ospfd/ospf_flood.c
>
> --- a/usr/src/cmd/FRRouting/frr-master/ospfd/ospf_flood.c Tue Nov 05
> 05:44:37 2019 -0800
>
> +++ b/usr/src/cmd/FRRouting/frr-master/ospfd/ospf_flood.c Tue Nov 05
> 08:51:51 2019 -0800
>
> @@ -925,6 +925,26 @@
>
>   old->retransmit_counter--;
>
>   ospf_lsdb_delete(&nbr->ls_rxmt, old);
>
>   }
>
> +              /*
>
> +  * Please refer section 13.1 in RFC 2328
>
> +  * Flooding procedure is not applicable for self
>
> +  * originating lsa. Unfortunately we ended up the
>
> +  * self-originated lsa to be added in retransmit list
>
> +  * through flood caller.
>
> +               * while adding this lsa to re-transmit list,
>
> +               * need to confirm whether this is self-originated lsa.
>
> +  * If its, it should get remove in lsdb and shouldnt add
>
> +  * in retransmit list.
>
> +               */
>
> + if (ospf_lsa_is_self_originated(nbr->oi->ospf, lsa)) {
>
> + if (IS_DEBUG_OSPF(lsa, LSA_FLOODING))
>
> + zlog_debug("self originated RXmtL(%lu)++,"
>
> + " NBR(%s), LSA[%s]",
>
> + ospf_ls_retransmit_count(nbr),
>
> + inet_ntoa(nbr->router_id),
>
> + dump_lsa_key(lsa));
>
> + return;
>
> + }
>
>
>
>
>
> Thanks
>
> Palpandi P
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.frrouting.org/pipermail/dev/attachments/20191114/28304f09/attachment.html>


More information about the dev mailing list