Re: [dev] Some doubts regarding asynchronous data plane in FRR

30 Jul 2019

      I'll try to answer in-line:

I had the following questions.
...
1. I see that there is no place for asynchronous response
in dplane_thread_loop(). As soon as we complete sending routes (contexts),
we start processing response in zdplane_info.dg_results_cb(). By
asynchronous response, we mean that once provider calls its work function,
it needs to wait for the reply. The reply will come later.
We don't want the dataplane pthread to "wait" - we don't want to block or
stop doing work in that pthread. The expectation is that a provider plugin
would use the existing event-delivery mechanisms in libfrr (see
lib/thread.h) to schedule callbacks if the plugin needs to do asynchronous
work. That's why the 'thread_master' is available to the plugins. It's also
possible for a plugin to spawn and manage its own pthread - that's also
supported. the plugin would need to specify that its in- and out-bound
queues need to use a lock; there's an api for that too.
...
2. dplane_provider_enqueue_to_zebra() not called anywhere. Can we call
this, on receiving asynchronous update from provider with proper op-code.
How to get the ctx  in this function. Will it be available in rib_dplane_q ?
Yes, this exists so that a plugin can receive some information from ...
somewhere, create a context, and enqueue it for processing in the main
zebra context. There are a range of apis to create and manipulate context
objects - route and lsp contexts in particular at this time. And of course
if there's some context-related api that should be added, feel free to ask
about it (or offer a PR).

The background for this is that my company, at least, runs frr on a
different platform (different hardware, different OS) than the actual
packet-forwarding hardware. We run frr in cloud-hosted containers, but use
whitebox switches to perform actual transit forwarding. We use a plugin to
convey route (and lsp, etc.) changes towards the forwarding switches, and
that plugin can also receive updates as the switches detect relevant events
that affect routes.
...
3. I did not understand ctx->zd_notif_provider.
The notification path is entirely optional, and the default "kernel"
plugin/provider does not use it (currently). The basic case is this:

1. some part of the dataplane is external, running in a different context
(different host, different OS) than zebra itself
2. the external dataplane detects some change to a route - a nexthop
transitions between useable or installed and unuseable. that event is sent
to a zebra plugin.
3. the plugin creates a notification context for the route, populates it
with information about the current status, and enqueues it to zebra
4. zebra changes its internal datastructures to reflect what has happened
at the remote dataplane
5 a. it may be that we want the local kernel to reflect the same state as
the remote dataplane. in that case, we trigger a kernel update based on the
notification context object. in order to retain a single code path for the
delivery and processing of the context objects, that context cycles through
the dataplane subsystem again.
5 b. the originating plugin can detect that the context does not need to be
processed by it (it _is_ the 'notif_provider')
5 c. the kernel plugin can perform an update based on the context
5 d. when the context returns back to the zebra main pthread and the code
in zebra_rib.c sees that context, it can detect that its internal
datastructures have already been updated, and it finishes with the context
immediately.

Regards,
Mark

Re: [dev] Some doubts regarding asynchronous data plane in FRR

Mark Stapp