This tutorial is designed to introduce the concept of Kepler directors, outline the different directors that can be used, and present example workflows.
This tutorial assumes that you have basic Kepler knowledge and can use Kepler on the Gateway machine. Please see the 01. Introduction to Kepler - basics tutorial if you need more information.
A Kepler director is a specific type of Kepler actor which controls (or directs) the execution of a workflow. As discussed in the basic training Kepler actors are used to provide the functionality (what processing occurs in a workflows) whereas directors are used to orchestrate and execute actors to run the whole workflow (when something happens). The actors take their execution instructions from the director.
Every workflow must have a director that controls the execution of the workflow using a particular model, or domain, of computation. For example, workflows can be synchronous (i.e. with processing occurring one actor at a time in a pre-defined pattern) and for this the SDF Director would be ideal. Alternatively, if a workflow requires actors to be executed in parallel (i.e. one or more actors running simultaneously) then the PN Director would be used. The director used for a workflow is chosen at design time depending on the particular requirements of the workflow being created.
Kepler has a number of pre-defined directors and the underlying Ptolomy II system has more (which are generally not exposed to Kepler users but can be used if required). This tutorial focuses on three specific directors, detailed in the table below, which are required for standard scientific workflows.
|Generally used to oversee fairly simple, sequential workflows.|
|As previously mentioned this is designed for workflows that require the simultaneous execution of multiple actors.|
|This is generally used for workflows that use looping, branching, etc... (but not simultaneous actor execution where PN director should be used).|
Deciding which director to use depends on what functionality is required for a given workflow. The next sections outline the specifics of the three directors we are focusing on to give an idea of which one to chose for a given situtation. Generally the chose depends on what requirements there are for control flow/condition structures in the workflow, whether there is any requirement for running multiple actors simultaneously, and what the performance requirements on the workflow are.
The Synchronous Dataflow (SDF) Director executes a single actor at a time with one thread of execution, scheduling the whole workflow execution at the very beginning (with no scope for changing it during the execution). It is very efficient and as such it does not consume much of the system resources as it can precalculate the schedule for actor execution prior to execution (which also enables a fixed memory size of the workflow execution and ensures the workflow does not deadlock). However, this is dependent on the flow of execution through actors being fixed and the data consumption and production of actors being constant (i.e. actors always produce and/or consume the same amount and type of data).
Furthermore, the SDF domain cannot have feedback loops (where the output of an actor feeds back into an input of the same actor) as deadlock would occur. However, this can be addressed where such loops are required by adding delay actors which provide the first input to the feedback look. Additionally, . Therefore, it is sufficient for simple tasks which are not conditional. It also requires all actors in the workflow to be connected (although it is possible to have unconnected actors but it requires some modifications to the director parameters).
By default this director executes the workflow exactly once. This can be changed by altering the director parameters and setting the iterations parameter to the required value.
The Dynamic Dataflow (DDF) Director is often used for workflows that require looping or branching or other control structures, but that do not require parallel processing (in which case a PN Director should be used). It schedules execution iteratively by searching for ready actors which means that parts of the workflow can be executed conditionally, repeatedly, etc... By defaultis uses a set of rules that determine how to execute actors in a "basic iteration." Unlike the SDF director, which calculates the order in which actors execute and how many times each actor is executed before execution begins, this determines what actors to run at runtime, and the amount of data produced and output by each actor can vary in each basic iteration. This makes for a very flexible workflow and as such is one of the most commonly used directors.
In the DDF domain each actor has a set of execution pattern and can be fired executed if one of
them is satisfied (i.e. it is possible to create a set different execution triggers based on different inputs being available). It is generally used if conditional or flow control workflows are required, such as if-else or while/do-while, but where there is no requirement for actors to be executed concurrently.
Under a Process Network (PN) director, every actor gets its own execution thread and the director does not pre-define the execution schedule (or firing schedule) of the actors (as is done in the SDF director). The workflow is driven by when data is available: tokens are created on output ports whenever input tokens are available and output can be calculated. Execution is only finished when there are no new data token sources anywhere in the workflow.
Because PN workflows are very loosely coupled, they are natural candidates for managing workflows that require parallel processing on distributed computing systems, however it is potentially the most difficult and error-prone director to use as it does not guarantee any synchronisation or determinism. It is potentially one of the most powerful Kepler domains as there are few restrictions but can be inefficient as the director must keep checking for actors that have sufficient data to execute.
The same execution process that gives the PN Director its flexibility can also lead to some unexpected results: workflows may refuse to automatically terminate because data is always generated and available to receiving actors (i.e. a constant actor will always produce data by default). If one actor fires at a much higher rate than another, a receiving actor's memory buffer may overflow, causing the workflow execution to fail.
The PN director is generally not necessary for scientific workflows so we have not provided an example of using this director. However, if you are interested in using PN or think you have a requirement for it please contact EUFORIA support for more information.