Flow DSL vs Graph

Semantics, representation, and originality boundaries for the project's Flow model.

Thesis

For agent orchestration, Flow should be the semantic source of truth, YAML and JSON should be the portable representation, and graph should be a derived view for visualization and execution inspection.

This is the key distinction:

Graph-first systems are excellent at showing topology.
Flow-first systems are better at defining composition semantics.

Agent orchestration usually fails on semantic boundaries, not on drawing nodes and edges.

Questions this document answers

Why is Flow a better canonical representation than graph for agent orchestration?
Why can Flow be expressed precisely in YAML and JSON?
What exactly is original about this repository's Flow model, and what is not?
What is the smallest useful semantic core for a Flow YAML DSL?

Executive summary

1. `YAML` and `JSON` can represent Flow precisely

They can, provided Flow is defined as a closed ADT with explicit variants and explicit composition rules.

Once the model is defined in terms of variants such as:

Agent
FlatMap
Zip / Broadcast / Race / AtLeast
Branch
RecoverWith / FallbackTo
Loop
Notify / Tap / Guard

then YAML and JSON are no longer vague configuration formats. They are serialization formats for the same syntax tree.

2. Flow is a better semantic source of truth than graph

Not because it is more abstract, but because it makes the important parts explicit:

input and output types
composition rules
data propagation rules
failure propagation rules
recovery semantics
loop termination semantics
side-effect boundaries

Graph can carry some of this information, but usually only indirectly, through node metadata, edge annotations, and engine-specific conventions. That works for visualization, but it is a weak place to anchor semantics.

3. Graph is still essential

This is not an anti-graph argument.

Graph remains the best surface for:

editors
runtime inspection
Mermaid diagrams
trace playback
execution-plan visualization

The recommendation is not Flow instead of graph. It is Flow under graph.

Flow DSL / ADT      = semantic layer (source of truth)
YAML / JSON         = storage and transport layer
Graph / Mermaid DAG = derived projection layer
Interpreter         = execution semantics

Originality boundary

This section matters because originality claims are easy to overstate.

What this repository is not claiming

It is not claiming to have invented any of the following general ideas:

flow
workflow
dataflow
DAG orchestration
reactive pipelines

Those ideas all have substantial prior history.

What this repository is claiming

The originality claim is about a specific Flow model for agent orchestration.

The model combines these properties into one coherent semantic system:

Flow[I, O] as a typed ADT, not just a runtime graph
equivalent Python, YAML, and JSON representations over one semantic core
graph as a projection, not as the semantic authority
an actor-native interpreter that maps combinators onto the actor runtime
first-class primitives for recovery, quorum, loop, guard, and side-channel behavior

That is the right level of originality claim.

Recommended wording

Use language like this:

This repository introduces an original Flow model for agent orchestration: a typed Flow[I, O] semantic core with equivalent Python, YAML, and JSON representations, where graph is a derived view rather than the source of truth.

Avoid language like this:

“we invented flow”
“we invented workflow orchestration”
“we are the first to represent orchestration without graphs”

Why Flow can be represented precisely in YAML and JSON

“Precisely represented” means more than “serializable somehow”. It means the language has a stable, reversible, and checkable semantic mapping.

1. The structure is closed and enumerable

The combinator set is finite. A parser can map each YAML or JSON fragment onto a known ADT variant instead of guessing behavior from free-form strings.

For example:

steps means sequential composition
all means shared-input broadcast parallelism
each means split-input distributed parallelism
race means first-complete wins
at_least means quorum semantics
branch means typed or tagged dispatch
recover_with means recovery through another flow
fallback_to means retry with the original input on an alternate path
loop means explicit iterative control

2. Tree-shaped data matches composition naturally

Flow is fundamentally a tree of compositions.

YAML and JSON are already good at representing:

arrays for ordered steps
objects for named parameters
nested objects for subflows
dictionaries for branch mappings

Example:

steps:
  - agent: Search
  - all:
      - agent: Analyst
      - agent: Summarizer
  - fallback_to:
      source: {agent: Writer}
      fallback: {agent: BackupWriter}

This does not merely store topology. It stores ordered semantics: first sequence, then broadcast parallelism, then failure fallback.

3. The representation is statically checkable

Once Flow is an ADT, uploaded YAML can be validated before execution.

Examples:

flat_map(A, B): A.O == B.I
race(A, B): same input type, same output type
branch(source, mapping): branch keys must match the dispatch space of source.O
guard(A, check): check must return bool

Graph systems can add validation too, but they often do so after the graph model already exists. Flow makes the validation rules native to the model.

4. The mappings can be reversible

The ideal relationship is:

YAML <-> Flow ADT <-> JSON

That only works if each layer carries the same semantics, rather than splitting meaning across:

YAML structure
graph annotations
runtime-only inference

If meaning is scattered across those layers, the representation is no longer precise. It is patched together.

Why graph is a weak semantic source of truth

Graph is strong at expressing adjacency and dependencies.

It is weaker at expressing the semantics that matter most for agents:

Is a value broadcast to every branch, or split across branches?
Does failure trigger retry, fallback, recovery, or cancellation?
Is a side path fire-and-forget, synchronous tap, or post-condition guard?
Does a loop feed back the intermediate value, or return a final terminal value?
Is a branch keyed by type, tag, predicate, or string lookup?

Graph can represent these distinctions, but usually with extra node types, extra edge types, or engine-specific metadata. That makes graph an excellent execution view and a poor semantic center.

Why agent orchestration needs Flow semantics more than classic DAGs do

1. Agents exchange typed messages, not just task completion signals

Classic DAGs often care about task dependencies. Agent systems care much more about the shape and meaning of the values flowing between stages.

Questions like these matter constantly:

What exactly does the upstream agent return?
Can the next agent consume it safely?
Should this result be routed to a specialized agent?
Should the same input be sent to several agents for ensemble or quorum behavior?

That makes agent orchestration look more like composable program construction than like generic job scheduling.

2. Failure is part of the primary semantics

In agent systems, these are normal paths, not edge cases:

model timeout
tool failure
structured output violation
guard rejection
partial quorum failure

That is why primitives such as recover_with, fallback_to, guard, and at_least need to be first-class.

3. Substitutability matters

Agent systems constantly need local replacement and local recomposition:

swap one agent implementation for another
package a subsequence into a reusable subflow
define a stable YAML skeleton and extend it in Python where callables are needed

This is more natural when the source representation is a program-like ADT instead of a graph editing surface.

Recommended layering

The clean architecture is four-layered.

1. Flow ADT

Defines:

the combinator set
type signatures
data propagation
failure propagation
loop termination

This is the semantic specification.

2. YAML / JSON

Handles:

human authoring
versioned storage
cross-process transport
cross-language protocols

This layer should not invent semantics. It should carry Flow.

3. Graph

Handles:

Mermaid export
visual editors
runtime views
trace expansion

Graph should be derived from Flow, not the other way around.

4. Interpreter

Handles execution:

actor topology
ask / race / zip / supervision
local agent invocation
A2A / gRPC / MCP dispatch

This lets one Flow program target multiple runtimes without changing its top-level semantics.

Minimal Flow YAML semantic core

If the Flow DSL is meant to be real, its semantic core should stay small, but it must still cover the main orchestration needs.

Minimal primitive set

Category	Primitive	Purpose
leaf	`agent`	invoke one agent
sequential	`steps`	pass one stage's output into the next
parallel	`all`	broadcast the same input to multiple flows
parallel	`each`	distribute split inputs to multiple flows
branching	`branch`	route by type or tag
recovery	`fallback_to` / `recover_with`	define failure recovery paths
constraint	`guard`	enforce quality or contract checks
iteration	`loop`	continue until `Done`

Derived capabilities can sit on top of this set:

race
at_least
notify
tap

Minimal data rules

The primitives are not enough by themselves. Data propagation rules must also be explicit.

Primitive	Input rule	Output rule
`agent`	consumes one input `I`	produces one output `O`
`steps`	step `n` feeds step `n+1`	output of final step is the whole output
`all`	broadcasts the same input to every branch	aggregates into tuple or list
`each`	splits a compound input across branches	aggregates into tuple or list
`branch`	consumes upstream output and selects one branch	returns selected branch output
`fallback_to`	retries an alternate flow with the original input	returns the successful output
`recover_with`	passes an exception value to a recovery flow	returns recovery output
`guard`	checks upstream output while preserving the value by default	passes through on success, raises on failure
`loop`	feeds `Continue` back into the next iteration, exits with `Done`	returns the `Done` payload

Minimal YAML shape

flow: AnswerPipeline
steps:
  - agent: Retriever
  - branch:
      source: {agent: Classifier}
      mapping:
        SimpleQuestion:
          agent: FastResponder
        ComplexQuestion:
          steps:
            - all:
                - agent: Researcher
                - agent: Analyst
            - agent: Writer
  - guard:
      source: {agent: QualityChecker}
      check: {agent: Acceptable}
  - fallback_to:
      source: {agent: Publisher}
      fallback: {agent: SafePublisher}

Even this small shape already covers sequential composition, branching, broadcast parallelism, guard checks, and fallback behavior.

Minimal static validation set

The first validator does not need to be huge. It does need to be strict where the semantics matter.

Adjacent steps nodes must type-check end-to-end.
Every all branch must accept the same input type.
each inputs must split cleanly across all child branches.
branch.mapping must be non-empty and must match the dispatch space.
fallback_to source and fallback flows must share input and output types.
recover_with must accept an exception-shaped input.
guard.check must return bool.
loop.body must return Continue[T] | Done[U].

With these rules in place, the YAML is no longer “just config”. It is a statically bounded orchestration language.

Implications for this repository

The current repository direction is coherent.

The design already points toward:

Flow[I, O] as the ADT
Python, YAML, and JSON as equivalent representations
Mermaid graph as a visualization product
pre-execution type validation

That combination matters because it concentrates semantics in one place while still allowing portability, validation, and graph-based tooling.

Bottom line

If the question is:

Can Flow be represented precisely in JSON and YAML?

the answer is yes, and that is one reason it works well as a canonical orchestration IR.