flor design 0

I was approached on Twitter by Benjamin Dauvergne asking me about flor's design.

@jmettraux est-ce qu'on pourra avoir un jour un article de blog sur le design de ruote/flor ? :) Je n'ai jamais rien vu d'équivalent.

— Benjamin Dauvergne (@b_dauvergne) October 5, 2017

Could we have one day a blog post on the design of ruote/flor? :) I never saw anything equivalent.

Flor is a Ruby workflow engine. It is a new iteration of ruote which is now dead.

 

Ou des pointeurs sur l'inspiration, j'ai toujours du mal avec l'exécuteur d'expressions/message, on dirait un design tuplespace/whiteboard.

— Benjamin Dauvergne (@b_dauvergne) October 5, 2017

Or pointers on the inspiration, I always had troubles with the expressions/message executor, it looks like a tuplespace/whiteboard design.

(An entry point on tuple spaces: https://en.wikipedia.org/wiki/Tuple_space)

I pointed at flor's nascent documentation, but, well, that isn't a blog post, it's an index, not a gate.

 

Vu, mais j'ai toujours du mal avec les exécutions, les noeuds et les messages. Il faudrait un exemple concret du début à la fin.

— Benjamin Dauvergne (@b_dauvergne) October 16, 2017

Seen, but I still have trouble with the executions, the nodes and the messages. A concrete example, going from start to finish, would help.

 

Tel exécution lance tel message, tel fichier JSON est posé, etc.. :) Désolé si j'abuse un peu, mais ça parait assez magique.

— Benjamin Dauvergne (@b_dauvergne) October 16, 2017

This execution emits that message, this JSON file is produced, etc... :) Sorry if I push too far, but that looks quite magical.

 

ruote or flor

Flor builds on ruote. I only want to use/discuss ruote's design in order to explain the design and the design decisions around flor.

Flor may be thought of as a shrinking of ruote, a simplification. A workflow engine must be extremely robust, at least it must let executions be fixed with relative ease. That shrinking / simplification should go in that way.

executions, nodes, messages

Benjamin is mentioning among others, those three flor concepts, executions, nodes and messages. There is a flor glossary, it's currently incomplete. Those three concepts deserve some kind of explanation.

An execution is an instance of a flow. A flow is a [business] process definition. There can be many executions of a flow. You create an execution by handing a launch message to a flor scheduler. A launch message is an "execute" message. The two main messages in flor are "execute" messages and "receive" messages. The execute messages ensure the buildup of nodes, the receive messages accompany the termination of nodes.

Here is an example flow:

sequence
  task 'alice' 'stage a'
  task 'bob' 'stage b'

Executing it, in other words, creating an execution of it, starts with an "execute" message, with the given flow, translated into a[n abstract syntax] tree, being handed to the flor scheduler.

The scheduler will create a temporary component, an executor to run the execution. The executor will consider the initial execute message and create the corresponding node with a "nid" of "0" (root node). The node is a storage artefact, the executor then hands message and node to the "sequence" procedure, where the behaviour of a sequence is determined (execute children 0, wait for its receive message, execute children 1, wait..., when there are no more children, send receive message to parent node).

Execute messages build up the execution (seen by the addition of nodes), while receive message are symptoms of the execution building down (usually receive messages go hand in hand with the executor removing nodes from the execution).

I write that the executor is a temporary component because it only lives for the time of a session, when there are no more messages to proceed, the executor ends. When new messages show up, the scheduler will create a new executor who will pick up the execution where the previous executor left it (in the database).

The vanilla example of a "session" would be, following the flow above, the run from the sequence to its first child, the task emitted to the 'alice' tasker. When the "task" message has been handed to alice, there are no more messages to proceed, the session ends, the executor finishes.

a [hopefully] concrete example

Thanks to Asciinema and kinocompact, here is a run of the above flow. When the executions happen, there are summaries of the messages emitted to stdout, I hope it's not too confusing.

I extracted the messages for the three sessions fo the "rucho" execution. The "exe" lines are for "execute" messages. The "rec" lines are for "receive" messages. The "tas" lines are for "task" messages. Each of the below "ucho" lines are summaries of a single message. Note the node ids, 0, 00, 01, etc...

Here is the first session, where the execution is launched. It ends with 2 nodes (sequence 0 and alice tasker 0_0) and a task handed to alice. There are multiple "tas" messages since the executor discusses with the ganger to determine what tasker should receive this task labelled "alice".

I had better call those sessions "runs", as you can see in the logs, that's the word flor uses.


    /--- run starts Flor::UnitExecutor 70115976477080 shell-cli-20171021.0830.mechulorucho
    |   {:thread=>70115976196240}
    |   {:counters=>{}, :nodes=>0, :size=>-1}
   ucho 0 exe [sequence L2] [task,[[_att,[[_sqs,alpha,3]],3],[_att,[[_sqs,"s... m1s_" f.ret null vars:var0,_path,root,name
   ucho 0_0 exe [task L3] [_att,[[_sqs,alpha,3]],3],[_att,[[_sqs,"stage a"... m2s1r1>1 from 0 f.ret null
   ucho 0_0_0 exe [_att L3] [_sqs,alpha,3] m3s2r1>1 from 0_0 f.ret null
   ucho 0_0_0_0 exe [_sqs L3] alpha m4s3r1>1 from 0_0_0 f.ret alpha
   ucho 0_0_0 rec [_att L3] hp:_att m5s4r1>1 from 0_0_0_0 f.ret alpha
   ucho 0_0 rec [task L3] hp:task m6s5r1>1 from 0_0_0 f.ret alpha
   ucho 0_0_1 exe [_att L3] [_sqs,"stage a",3] m7s6r1>1 from 0_0 f.ret alpha
   ucho 0_0_1_0 exe [_sqs L3] "stage a" m8s7r1>1 from 0_0_1 f.ret "stage a"
   ucho 0_0_1 rec [_att L3] hp:_att m9s8r1>1 from 0_0_1_0 f.ret "stage a"
   ucho 0_0 rec [task L3] hp:task m10s9r1>1 from 0_0_1 f.ret "stage a"
   ucho 0_0 tas [task L3] hp:task m11s10r1>1 from 0_0 f.ret "stage a"
   ucho 0_0 tas [task L3] hp:task m12s10r1>1 from 0_0 f.ret "stage a"
    |   run ends Flor::Logger 70115974783520 shell-cli-20171021.0830.mechulorucho
    |   {:started=>"2017-10-21T08:30:48.174253Z", :took=>0.019922}
    |   {:thread=>70115976196240, :consumed=>12, :traps=>0}
    |   {:counters=>{"runs"=>1, "msgs"=>12, "omsgs"=>0}, :nodes=>2, :size=>465}
    \--- .

The executor ends, its run over, alice has its task. The execution (the sum of nodes) has been saved to the database.

As you've seen in the asciinema above, I then return this task to the execution (a "return" message placed in the flor scheduler database). The scheduler notices it, instantiates an executor, which removes the alice task node, sends a "rec" message to the sequence, which then sends an "exe" message to its 0_1 child, the bob task.


    /--- run starts Flor::UnitExecutor 70115975852240 shell-cli-20171021.0830.mechulorucho
    |   {:thread=>70115975803940}
    |   {:counters=>{"runs"=>1, "msgs"=>12, "omsgs"=>0}, :nodes=>2, :size=>465}
   ucho 0_0 ret [task L3] hp:task m13s_ f.ret "stage a"
   ucho 0_0 rec [task L3] hp:task m14s_r2>2 f.ret "stage a"
   ucho 0 rec [sequence L2] hp:sequence m15s14r2>2 from 0_0 f.ret "stage a" vars:var0,_path,root,name
   ucho 0_1 exe [task L4] [_att,[[_sqs,bravo,4]],4],[_att,[[_sqs,"stage b"... m16s15r2>2 from 0 f.ret "stage a"
   ucho 0_1_0 exe [_att L4] [_sqs,bravo,4] m17s16r2>2 from 0_1 f.ret "stage a"
   ucho 0_1_0_0 exe [_sqs L4] bravo m18s17r2>2 from 0_1_0 f.ret bravo
   ucho 0_1_0 rec [_att L4] hp:_att m19s18r2>2 from 0_1_0_0 f.ret bravo
   ucho 0_1 rec [task L4] hp:task m20s19r2>2 from 0_1_0 f.ret bravo
   ucho 0_1_1 exe [_att L4] [_sqs,"stage b",4] m21s20r2>2 from 0_1 f.ret bravo
   ucho 0_1_1_0 exe [_sqs L4] "stage b" m22s21r2>2 from 0_1_1 f.ret "stage b"
   ucho 0_1_1 rec [_att L4] hp:_att m23s22r2>2 from 0_1_1_0 f.ret "stage b"
   ucho 0_1 rec [task L4] hp:task m24s23r2>2 from 0_1_1 f.ret "stage b"
   ucho 0_1 tas [task L4] hp:task m25s24r2>2 from 0_1 f.ret "stage b"
   ucho 0_1 tas [task L4] hp:task m26s24r2>2 from 0_1 f.ret "stage b"
    |   run ends Flor::Logger 70115974783520 shell-cli-20171021.0830.mechulorucho
    |   {:started=>"2017-10-21T08:36:50.418712Z", :took=>0.012553}
    |   {:thread=>70115975803940, :consumed=>14, :traps=>0}
    |   {:counters=>{"runs"=>2, "msgs"=>26, "omsgs"=>0}, :nodes=>2, :size=>500}
    \--- .

The execution run ends as bob received a "tas" message. The execution has been saved to the database.

I then return the bob task to the execution with a "ret" message. The scheduler notices the message, instatiates an executor, which removes the bob task node and sends a "rec" message to the sequence 0. The sequence is supposed to send an execute message to its next child, but there are no more children, it's supposed to send a "rec" message to its parent node then. There are no parent node, the sequence being the root node. The execution ends.


    /--- run starts Flor::UnitExecutor 70115974805760 shell-cli-20171021.0830.mechulorucho
    |   {:thread=>70115974734660}
    |   {:counters=>{"runs"=>2, "msgs"=>26, "omsgs"=>0}, :nodes=>2, :size=>500}
   ucho 0_1 ret [task L4] hp:task m27s_ f.ret "stage b"
   ucho 0_1 rec [task L4] hp:task m28s_r3>3 f.ret "stage b"
   ucho 0 rec [sequence L2] hp:sequence m29s28r3>3 from 0_1 f.ret "stage b" vars:var0,_path,root,name
   ucho rec m30s29r3>3 from 0 f.ret "stage b"
   ucho ter m31s30r3>3 from 0 f.ret "stage b"
    |   run ends Flor::Logger 70115974783520 shell-cli-20171021.0830.mechulorucho
    |   {:started=>"2017-10-21T08:42:46.923942Z", :took=>0.414913}
    |   {:thread=>70115974734660, :consumed=>5, :traps=>0}
    |   {:counters=>{"runs"=>3, "msgs"=>31, "omsgs"=>0}, :nodes=>1, :size=>457}
    \--- .

Benjamin, further questions via Twitter, the flor mailing list or the flor chat are welcome. I'll try my best to write a response blog post if necessary.

Thanks for asking.