Architecture Specification

Concord

A library of contracts where every action — deterministic or agentic — moves in agreement with policy, state, and audit. A durable runtime executes; Concord declares what the work means.

Version0.3 · functional core

RuntimeDBOS (default)

System of recordPostgres

ScopeSemantic primitive layer

WhyProblems Concord solves ↗

PositioningWhat Concord is & isn't ↗

BlogThree-part introduction ↗

Section 01 Executive summary

Concord is a library of contracts that turn any user request, system event, webhook, scheduled job, or agent action into a durable, inspectable workflow. The library declares what work means and what governance it requires; a durable runtime — DBOS by default — executes it.

Why Concord exists

Modern applications now coordinate agents, humans, tools, connectors, memory, approvals, artifacts, and durable workflows — but each piece usually has its own local model. Agent frameworks govern the agent loop. Workflow runtimes govern execution. Connector systems expose integrations. Policy engines decide rules. Observability tools record traces.

But no shared contract explains what an action means, who authorized it, what it touched, what it produced, and how it should be audited. Concord provides that missing semantic contract layer. For the full problem statement — ten concrete pains, one per chapter — see The problems Concord solves ↗.

Positioning

For a longer-form treatment of what Concord is and what it isn't — including a comparison across durable runtimes, agent frameworks, BPM platforms, DevOps systems, policy engines, data DAG orchestrators, observability tools, iPaaS, and memory systems — see What Concord is & isn't ↗.

Try a worked example

vendor-data-sync↗

scheduled · deterministic · retry taxonomy

hotel-booking↗

approval-gated · compensation · idempotency

revenue-investigation-swarm↗

agentic · swarm · governed tools

The core idea is simple:

Everything begins as ingress. Ingress becomes a command. A command is evaluated by policy. Policy produces a plan. A plan executes through primitives. Primitives mutate durable state. Every important transition is audited.

The core loop

flowchart LR A([Ingress]) --> B([Command]) B --> C{{Policy}} C --> D([Plan]) D --> E[Primitives] E --> F[(Durable state)] F --> G[/Audit/] style C fill:#F5E0D2,stroke:#D97757,stroke-width:1.5px style F fill:#FFFFFF,stroke:#141413,stroke-width:1.5px style G fill:#F1F2EC,stroke:#6B7B5A

The framework must support both:

Mode A

Deterministic workflows

Fixed steps, fixed state transitions, well-defined inputs and outputs.

Mode B

Agentic workflows

Dynamic tool selection, reasoning loops, external connectors, memory retrieval, human review, adaptive plans.

Framework assumes

Postgres is always the durable source of truth. Connectors are replaceable. Execution runtimes are replaceable. Agents are callers of the primitive layer, not owners of the system of record.

Section 02 Design philosophy

Concord is built around a few strong principles.

2.1 · Durable before executable

Do not execute meaningful work before recording intent.

Bad vs. good

flowchart LR subgraph Bad ["Bad"] direction LR a1[User request] --> a2[Execute function] --> a3[Maybe log result] end subgraph Good ["Good"] direction LR b1[User request] --> b2[Create command] --> b3[Create audit record] --> b4[Evaluate policy] --> b5[Execute] end style a1 fill:#FAF8F2,stroke:#B85556 style a2 fill:#FAF8F2,stroke:#B85556 style a3 fill:#FAF8F2,stroke:#B85556 style b1 fill:#FAF8F2,stroke:#6B7B5A style b2 fill:#FAF8F2,stroke:#6B7B5A style b3 fill:#FAF8F2,stroke:#6B7B5A style b4 fill:#FAF8F2,stroke:#6B7B5A style b5 fill:#FAF8F2,stroke:#6B7B5A

This makes the system replayable, observable, cancellable, and auditable.

2.2 · Commands are the center of the system

A command is the durable representation of requested intent. Examples:

generate_report

publish_report

run_sql_validation

capture_user_preference

process_webhook_event

sync_connector_data

ask_human_approval

send_notification

call_agent_tool

Distinction

The command is not the same as the execution step. The command says what is requested. Tasks say how work is executed.

2.3 · State transitions are explicit

Every meaningful workflow state change should be validated and persisted.

created → validated → waiting_for_approval → approved → queued → running → succeeded

Principle

Never infer workflow state only from logs. Logs are evidence. State is the control surface.

2.4 · Execution is replaceable

Concord does not care whether work is executed by:

local python function

background worker

databricks job

serverless function

container job

external api

agent tool call

workflow engine

human operator

Execution is an adapter. The core framework only cares about: input, context, command, task run, result, side effects, state transition, audit.

2.5 · Agents are participants, not authorities

Agentic workflows can call tools, propose plans, request approvals, and write memories. But the framework owns:

policy

approval gates

state transitions

artifact writes

memory writes

audit

idempotency

connector permissions

Rule

An agent can propose. The framework decides what is allowed.

2.6 · Postgres is the system of record

All durable framework state lives in Postgres. Postgres stores:

commands

workflow state

task runs

queue items

approvals

events

memories

artifacts

connectors

audit logs

policies

plans

agent traces

idempotency keys

This keeps the architecture portable and inspectable.

2.7 · The primitive set should be small and stable

New capabilities should usually be modeled as combinations of primitives, not new primitives.

Example composition

flowchart LR X["'Run a weekly connector sync'
is NOT a new primitive"] X --> A[schedule ingress] A --> B[command] B --> C[policy] C --> D[async task] D --> E[connector call] E --> F[artifact / event writes] F --> G[audit] style X fill:#F5E0D2,stroke:#D97757 style G fill:#F1F2EC,stroke:#6B7B5A

2.8 · Contracts and mechanics

Concord declares what work means; the durable runtime executes how it runs. Every operation that gets durable execution — commands, agent runs, swarms, connector calls, retries, schedules, cancellations, effects — carries a contract written in Concord and is executed by the runtime. The contract is the meaning: what the action is, what inputs it requires, what side effects it may produce, what error classes apply, what compensations exist, who may cancel it, what audit must be recorded. The mechanics are the execution: when it runs, how it retries, how it queues, how it sleeps, how it recovers.

This separation makes the runtime swappable. Concord's domain layer imports a DurableRuntime protocol (see §41), not the runtime implementation. The default adapter is DBOS; future Temporal or Restate adapters slot in without touching the domain. The contract is what survives across runtimes.

The rule

If a question is about meaning, it belongs in Concord. If it is about execution, it belongs in the runtime. When the line is unclear, the contract wins — write the meaning in Concord first, then describe how the runtime should honor it.

Section 03 The primitive model

Concord organizes work into ten primitive families.

Family 01

Ingress

How work enters the system.

Family 02

Intent

Capture what the system is being asked to do.

Family 03

Policy & planning

Decide allowed/denied and produce a plan.

Family 04

Execution

Actually perform the work.

Family 05

Coordination

Make async / distributed work safe.

Family 06

State lifecycle

Track progress with allowed transitions.

Family 07

Human judgment

Approvals, overrides, escalations.

Family 08

Knowledge & output

Memory, artifacts, lineage.

Family 09

Connectors

Adapt to outside systems.

Family 10

Observability & governance

Explain and secure the system.

Section 04 Primitive taxonomy

4.1 · Ingress primitives

Ingress is how work enters the system. Types:

user_request

api_request

external_webhook

scheduled_trigger

file_event

table_event

connector_event

agent_action

approval_callback

retry_event

system_event

Ingress should always produce either: a command, an event record, or a rejected request with audit.

Caution

Ingress should not perform expensive work directly.

4.2 · Intent primitives

Intent primitives capture what the system is being asked to do. Core object: Command. A command contains:

command_id

command_type

requested_by

source

payload

context

status

state

idempotency_key

created_at

updated_at

The command is the root object for most downstream records.

4.3 · Policy and planning primitives

Policy decides whether a command is allowed, denied, delayed, escalated, or routed. Policy outcomes:

allow

deny

require_approval

require_more_input

route_sync

route_async

redact

escalate

Planning turns a policy-approved command into an execution plan. A plan contains:

plan_id

command_id

execution_mode

steps

required_approvals

expected_artifacts

memory_write_intents

notification_intents

created_at

4.4 · Execution primitives

Execution primitives actually do work. Types:

sync_function

async_task

background_worker

external_job

connector_call

agent_tool_call

human_task

Execution must be tracked through task runs. A task run contains:

task_run_id

command_id

task_type

function_name

status

attempt

input

result

error

claimed_by

lease_until

started_at

completed_at

4.5 · Coordination primitives

Coordination makes async and distributed work safe. Types:

queue

idempotency

deduplication

lease

lock

retry

backoff

rate_limit

dependency

fanout

join

These primitives prevent common production failures:

duplicate webhook processing
double-published reports
two workers processing the same task
infinite retry loops
unbounded parallelism
overloaded connectors

4.6 · State lifecycle primitives

Lifecycle primitives track progress. Canonical states:

created

validated

waiting_for_input

waiting_for_approval

approved

queued

running

blocked

succeeded

failed

cancelled

expired

compensating

compensated

State transitions should be checked against an allowed transition table.

4.7 · Human judgment primitives

These represent human decisions. Types:

approval_request

approval_decision

human_input_request

review_packet

override

escalation

Rule

Human approval should never be only a message in Slack or email. It should be a durable record in Postgres.

4.8 · Knowledge and output primitives

These represent durable outputs or reusable knowledge. Types:

memory

artifact

version

lineage

retrieval

summary

embedding

Memory is for future behavior. Artifacts are outputs of work. Examples: generated report, exported CSV, SQL query result, dashboard draft, connector sync snapshot, agent answer, approval packet, preference memory.

4.9 · Connector primitives

Connectors adapt Concord to outside systems. A connector can represent:

databricks

slack

gmail

google drive

github

salesforce

snowflake

internal apis

llm provider

vector database

notification provider

A connector is not the workflow owner. It is a capability provider. Connector calls should be represented as task runs or tool calls with durable status.

4.10 · Observability and governance primitives

These explain and secure the system. Types:

audit_event

trace

metric

policy_decision

cost_record

permission_check

data_safety_check

secret_access_log

The audit log should answer: who requested, what was requested, what policy decision was made, what executed, what changed, what was produced, who approved, what connector was called, what failed.

Section 05 Canonical workflow shape

Every workflow follows this shape:

Canonical pipeline

flowchart TB I([Ingress]) --> C([Command]) C --> X[Context] X --> P{{Policy}} P --> PL([Plan]) PL --> E[Execution] E --> S[State transition] S --> SE[Side effects] SE --> O[Output] O --> M[(Memory)] M --> AU[/Audit/] style P fill:#F5E0D2,stroke:#D97757,stroke-width:1.5px style M fill:#F1F2EC,stroke:#6B7B5A style AU fill:#FFFFFF,stroke:#141413,stroke-width:1.5px

Optional branches:

Branching decisions

flowchart LR Q{"Needs immediate answer?"} -->|yes| SYNC[sync function] Q -->|no| LONG{"Long-running?"} LONG -->|yes| ASYNC["queue → async task → worker"] LONG -->|no| EXT{"External system?"} EXT -->|yes| CC[connector call] EXT -->|no| HJ{"Human judgment?"} HJ -->|yes| AP["approval request → decision → resume"] HJ -->|no| FR{"Future recall?"} FR -->|yes| MEM[memory write] FR -->|no| DR{"Durable result?"} DR -->|yes| ART[artifact write] DR -->|no| NOT{"Notify?"} NOT -->|yes| NOTIF[notification] NOT -->|no| FAIL{"Fails after side effects?"} FAIL -->|yes| COMP[compensation] FAIL -->|no| STOP{"Needs to stop?"} STOP -->|yes| CANCEL[cancellation] style Q fill:#F5E0D2,stroke:#D97757 style LONG fill:#F5E0D2,stroke:#D97757 style EXT fill:#F5E0D2,stroke:#D97757 style HJ fill:#F5E0D2,stroke:#D97757 style FR fill:#F5E0D2,stroke:#D97757 style DR fill:#F5E0D2,stroke:#D97757 style NOT fill:#F5E0D2,stroke:#D97757 style FAIL fill:#F5E0D2,stroke:#D97757 style STOP fill:#F5E0D2,stroke:#D97757

Section 06 High-level architecture

System overview

flowchart TB subgraph SOURCES ["Sources"] direction LR U[User] & A[Agent] & W[Webhook] & SC[Scheduler] & CL[Client] end SOURCES --> IA[Ingress Adapter] IA --> CS[Command Service] CS --> PE{{Policy Engine}} PE --> PL[Planner] PL --> SE[Sync Executor] PL --> AQ[Async Queue] SE --> FC[Function / Connector] AQ --> WK[Worker] FC --> SM[State Manager] WK --> SM SM --> PG[("Postgres
system of record")] PG --> AAM[Audit · Artifacts · Memory · Notifications] style PG fill:#FFFFFF,stroke:#141413,stroke-width:2px style PE fill:#F5E0D2,stroke:#D97757,stroke-width:1.5px style AAM fill:#F1F2EC,stroke:#6B7B5A

Section 07 Layered architecture

Layers (top-down)

flowchart TB API["API layer
receive · authenticate · normalize"] CMD["Command layer
create · validate · idempotency"] POL["Policy layer
permissions · risk · approval"] PLN["Planning layer
sync/async · gates · steps"] EXE["Execution layer
run · retry · persist"] STA["State layer
enforce transitions"] PER["Persistence layer
Postgres tables · tx · locks"] CON["Connector layer
normalize external APIs"] AGT["Agent layer
read context · propose · summarize"] API --> CMD --> POL --> PLN --> EXE --> STA --> PER CON -.-> EXE AGT -.-> CMD style POL fill:#F5E0D2,stroke:#D97757,stroke-width:1.5px style PER fill:#FFFFFF,stroke:#141413,stroke-width:1.5px style CON fill:#F1F2EC,stroke:#6B7B5A style AGT fill:#EFE6F0,stroke:#7A5560

7.1 · API layer

Receive requests, authenticate users, normalize ingress, create commands, return immediate response. Should not run expensive jobs, perform long syncs, write memory without policy, or skip command creation.

7.2 · Command layer

Create command, validate required inputs, apply idempotency, attach context, store payload, emit audit event.

7.3 · Policy layer

Check permissions, cost, risk, data safety, approval requirements, memory consent; decide route. Declarative where possible and executable where necessary.

7.4 · Planning layer

Choose sync vs async, insert approval gates, connector steps, memory writes, artifact writes, notification steps; create execution plan. For deterministic workflows, plans can be static. For agentic workflows, plans can be proposed dynamically and validated by policy.

7.5 · Execution layer

Execute function, run task, call connector, invoke agent, call external job, persist result, handle retries. Should be side-effect-aware.

7.6 · State layer

Enforce valid transitions, persist state, prevent illegal transitions, record state history.

7.7 · Persistence layer

Postgres tables, transactions, row-level locking, idempotency constraints, query APIs, retention, archival.

7.8 · Connector layer

Normalize external APIs, handle auth, handle rate limits, return structured results, avoid leaking provider-specific details upward.

7.9 · Agent layer

Read context, retrieve memory, propose action, call permitted tools, request approval if needed, summarize result.

Rule

Agent layer should never bypass policy.

Section 08 Postgres-first persistence design

Postgres is the durable system of record. Use Postgres for:

commands

events

task queue

task runs

approvals

memory

artifacts

audit

connector registrations

policy decisions

execution plans

Guidance

Use object storage or external systems only for large blobs. Store references in Postgres.

Section 09 Core Postgres schema

The schema is compact, append-only where it can be, and extensible through JSONB columns. Each command links to a durable runtime workflow via dbos_workflow_id; runtime execution status lives in the runtime adapter, business status lives here.

Entity relationships

erDiagram commands ||--o{ domain_events : emits commands ||--o{ domain_effects : plans commands ||--o{ approvals : gates commands ||--o{ artifacts : produces commands ||--o{ memory_records : writes commands ||--o{ command_dependencies : has_parents commands ||--o{ agent_runs : runs agent_runs ||--o{ swarm_runs : participates_in agent_runs ||--o{ domain_events : records_steps connectors ||--o{ tools : exposes commands { UUID command_id PK TEXT command_type TEXT status TEXT cancellation_mode JSONB payload JSONB context JSONB result TEXT idempotency_key TEXT dbos_workflow_id } command_dependencies { UUID child_command_id FK UUID parent_command_id FK TEXT required_status BOOLEAN propagate_cancellation } domain_effects { UUID domain_effect_id PK UUID command_id FK TEXT effect_type TEXT idempotency_key TEXT status TEXT executed_by_dbos_workflow_id } domain_events { UUID event_id PK UUID command_id FK UUID agent_run_id FK UUID swarm_run_id FK TEXT purpose TEXT event_type JSONB payload INT step_index NUMERIC cost_units } approvals { UUID approval_id PK UUID command_id FK TEXT status TIMESTAMPTZ expires_at } memory_records { UUID memory_id PK TEXT subject_type TEXT subject_id TEXT memory_type DOUBLE confidence } artifacts { UUID artifact_id PK UUID command_id FK TEXT artifact_type INT version } agent_runs { UUID agent_run_id PK UUID command_id FK TEXT agent_role TEXT status INT spawn_depth } swarm_runs { UUID swarm_run_id PK UUID command_id FK TEXT status TEXT join_strategy }

Schema

-- Commands. Carries domain status; runtime status lives in DBOS.
CREATE TABLE IF NOT EXISTS commands (
  command_id UUID PRIMARY KEY,
  command_type TEXT NOT NULL,
  requested_by TEXT NOT NULL,
  ingress TEXT NOT NULL,

  payload JSONB NOT NULL DEFAULT '{}',
  context JSONB NOT NULL DEFAULT '{}',

  status TEXT NOT NULL,
  cancellation_mode TEXT NOT NULL DEFAULT 'graceful',  -- 'graceful' | 'compensate_then_stop'
  idempotency_key TEXT NULL UNIQUE,
  dbos_workflow_id TEXT NULL UNIQUE,

  result JSONB NULL,
  error TEXT NULL,

  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  completed_at TIMESTAMPTZ NULL
);

-- Fan-in dependencies. A child waits for all parents to reach required_status.
CREATE TABLE IF NOT EXISTS command_dependencies (
  child_command_id UUID NOT NULL REFERENCES commands(command_id),
  parent_command_id UUID NOT NULL REFERENCES commands(command_id),
  required_status TEXT NOT NULL DEFAULT 'succeeded',
  propagate_cancellation BOOLEAN NOT NULL DEFAULT true,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  PRIMARY KEY (child_command_id, parent_command_id)
);

CREATE INDEX IF NOT EXISTS idx_command_deps_parent ON command_dependencies(parent_command_id);

-- Domain effects. Planned-upfront, transitioned through by DBOS steps.
CREATE TABLE IF NOT EXISTS domain_effects (
  domain_effect_id UUID PRIMARY KEY,
  command_id UUID NOT NULL REFERENCES commands(command_id),

  effect_type TEXT NOT NULL,
  effect_payload JSONB NOT NULL DEFAULT '{}',
  idempotency_key TEXT NULL,

  status TEXT NOT NULL DEFAULT 'planned',   -- 'planned' | 'executing' | 'succeeded' | 'failed'
  executed_by_dbos_workflow_id TEXT NULL,
  result JSONB NULL,
  error TEXT NULL,

  -- Compensation tracking
  compensates_effect_id UUID NULL REFERENCES domain_effects(domain_effect_id),
  declared_compensation TEXT NULL,

  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  completed_at TIMESTAMPTZ NULL
);

CREATE INDEX IF NOT EXISTS idx_domain_effects_command ON domain_effects(command_id);
CREATE INDEX IF NOT EXISTS idx_domain_effects_status  ON domain_effects(status);
CREATE INDEX IF NOT EXISTS idx_domain_effects_idem    ON domain_effects(idempotency_key) WHERE idempotency_key IS NOT NULL;

-- Unified domain events. Replaces command_events, domain_audit_log, agent_steps.
-- Disambiguated by purpose.
CREATE TABLE IF NOT EXISTS domain_events (
  event_id UUID PRIMARY KEY,
  command_id UUID REFERENCES commands(command_id),
  agent_run_id UUID NULL REFERENCES agent_runs(agent_run_id),
  swarm_run_id UUID NULL REFERENCES swarm_runs(swarm_run_id),

  purpose TEXT NOT NULL,        -- 'event' | 'audit' | 'agent_step'
  event_type TEXT NOT NULL,
  payload JSONB NOT NULL DEFAULT '{}',

  actor TEXT NOT NULL,
  trace_id TEXT NOT NULL,

  -- Agent-step extensions; null for non-agent rows
  step_index INT NULL,
  tool_name TEXT NULL,
  latency_ms INT NULL,
  token_usage JSONB NULL,
  cost_units NUMERIC NULL,

  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX IF NOT EXISTS idx_domain_events_command  ON domain_events(command_id);
CREATE INDEX IF NOT EXISTS idx_domain_events_purpose  ON domain_events(purpose);
CREATE INDEX IF NOT EXISTS idx_domain_events_agent    ON domain_events(agent_run_id) WHERE agent_run_id IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_domain_events_audit    ON domain_events(command_id, created_at) WHERE purpose = 'audit';

Section 10 Queue model

Queue execution is a runtime concern. Concord declares semantic queue names; the durable runtime adapter handles the mechanics — claim, lease, concurrency, retry, and rate limiting.

The standard queue names a Concord catalog uses:

connector_calls

agent_runs

swarm_children

notifications

scheduled_maintenance

high_risk_operations

primitiveflow_default

Queue choice is a planning outcome. Effect type maps to queue:

def choose_queue(effect: CoreEffect, context: dict) -> str:
    if effect.effect_type.startswith("connector."):
        return "connector_calls"
    if effect.effect_type.startswith("agent.run"):
        return "agent_runs"
    if effect.effect_type.startswith("agent.spawn"):
        return "swarm_children"
    if effect.effect_type.startswith("notification."):
        return "notifications"
    return "primitiveflow_default"

Queue registration, concurrency limits, partitioning, and per-queue rate limits are runtime configuration. The runtime adapter (default DBOSDurableRuntime) translates the queue name into its native primitive — a DBOS queue for the default adapter; a Temporal task queue or another broker for alternates.

No queue table here

Concord does not own a task_queue or worker_claims table. Runtime claim, lease, and worker assignment belong to the adapter and live in its tables, not Concord's. The domain projection of what work has been requested and what happened to it lives in domain_effects and domain_events.

Section 11 Idempotency model

Every externally triggered or user-submitted command should support idempotency. Examples:

webhook provider event id

user action id

file path + file version

scheduled job time window

approval decision id

external API request id

Rule

If idempotency_key exists, return the existing command instead of creating another one.

Postgres enforces this with:

CREATE UNIQUE INDEX pf_commands_idempotency_key_idx
ON pf_commands (idempotency_key)
WHERE idempotency_key IS NOT NULL;

Section 12 State machine

Concord enforces a domain state machine over the commands table. Runtime execution status — whether the underlying workflow is alive, recovering, or errored — lives in the runtime adapter and is joined to the command via commands.dbos_workflow_id. The two evolve independently: a runtime workflow can be "recovering" while the command is "running"; a runtime workflow can complete normally while the command sits in waiting_for_approval. The domain machine answers what does this booking mean right now; the runtime answers is execution alive.

The legal domain transitions are below. compensating and compensated are reachable via the compensate_then_stop cancellation mode (see §30).

Workflow state machine

stateDiagram-v2 [*] --> created created --> validated created --> failed created --> cancelled validated --> waiting_for_input validated --> waiting_for_approval validated --> queued validated --> running validated --> failed validated --> cancelled waiting_for_input --> validated waiting_for_input --> cancelled waiting_for_input --> expired waiting_for_approval --> approved waiting_for_approval --> cancelled waiting_for_approval --> expired waiting_for_approval --> failed approved --> queued approved --> running approved --> cancelled queued --> running queued --> cancelled queued --> failed running --> succeeded running --> failed running --> cancelled running --> blocked blocked --> queued blocked --> running blocked --> failed blocked --> cancelled failed --> queued failed --> compensating failed --> cancelled compensating --> compensated compensating --> failed succeeded --> [*] cancelled --> [*] expired --> [*] compensated --> [*]

Python standard:

from enum import Enum


class WorkflowState(str, Enum):
    CREATED = "created"
    VALIDATED = "validated"
    WAITING_FOR_INPUT = "waiting_for_input"
    WAITING_FOR_APPROVAL = "waiting_for_approval"
    APPROVED = "approved"
    QUEUED = "queued"
    RUNNING = "running"
    BLOCKED = "blocked"
    SUCCEEDED = "succeeded"
    FAILED = "failed"
    CANCELLED = "cancelled"
    EXPIRED = "expired"
    COMPENSATING = "compensating"
    COMPENSATED = "compensated"


ALLOWED_TRANSITIONS = {
    WorkflowState.CREATED: {
        WorkflowState.VALIDATED,
        WorkflowState.FAILED,
        WorkflowState.CANCELLED,
    },
    WorkflowState.VALIDATED: {
        WorkflowState.WAITING_FOR_INPUT,
        WorkflowState.WAITING_FOR_APPROVAL,
        WorkflowState.QUEUED,
        WorkflowState.RUNNING,
        WorkflowState.FAILED,
        WorkflowState.CANCELLED,
    },
    WorkflowState.WAITING_FOR_INPUT: {
        WorkflowState.VALIDATED,
        WorkflowState.CANCELLED,
        WorkflowState.EXPIRED,
    },
    WorkflowState.WAITING_FOR_APPROVAL: {
        WorkflowState.APPROVED,
        WorkflowState.CANCELLED,
        WorkflowState.EXPIRED,
        WorkflowState.FAILED,
    },
    WorkflowState.APPROVED: {
        WorkflowState.QUEUED,
        WorkflowState.RUNNING,
        WorkflowState.CANCELLED,
    },
    WorkflowState.QUEUED: {
        WorkflowState.RUNNING,
        WorkflowState.CANCELLED,
        WorkflowState.FAILED,
    },
    WorkflowState.RUNNING: {
        WorkflowState.SUCCEEDED,
        WorkflowState.FAILED,
        WorkflowState.CANCELLED,
        WorkflowState.BLOCKED,
    },
    WorkflowState.BLOCKED: {
        WorkflowState.QUEUED,
        WorkflowState.RUNNING,
        WorkflowState.FAILED,
        WorkflowState.CANCELLED,
    },
    WorkflowState.FAILED: {
        WorkflowState.QUEUED,
        WorkflowState.COMPENSATING,
        WorkflowState.CANCELLED,
    },
    WorkflowState.COMPENSATING: {
        WorkflowState.COMPENSATED,
        WorkflowState.FAILED,
    },
    WorkflowState.SUCCEEDED: set(),
    WorkflowState.CANCELLED: set(),
    WorkflowState.EXPIRED: set(),
    WorkflowState.COMPENSATED: set(),
}


def can_transition(from_state: WorkflowState, to_state: WorkflowState) -> bool:
    return to_state in ALLOWED_TRANSITIONS.get(from_state, set())

Section 13 Logical domain objects

13.1 · Command

A command is durable intent.

from dataclasses import dataclass, field
from datetime import datetime, timezone
from typing import Any
from uuid import uuid4


def now_iso() -> str:
    return datetime.now(timezone.utc).isoformat()


def new_id(prefix: str) -> str:
    return f"{prefix}_{uuid4().hex}"


@dataclass
class Context:
    user_id: str
    workspace_id: str | None = None
    app_id: str | None = None
    run_as: str = "user"
    trace_id: str = field(default_factory=lambda: new_id("trace"))
    request_id: str = field(default_factory=lambda: new_id("req"))
    metadata: dict[str, Any] = field(default_factory=dict)


@dataclass
class Command:
    command_id: str
    command_type: str
    task_name: str
    requested_by: str
    ingress_type: str
    payload: dict[str, Any]
    context: Context
    state: str = "created"
    status: str = "created"
    idempotency_key: str | None = None
    plan: dict[str, Any] | None = None
    result: dict[str, Any] | None = None
    error: str | None = None
    created_at: str = field(default_factory=now_iso)
    updated_at: str = field(default_factory=now_iso)

13.2 · TaskSpec

A task spec describes how a task maps to primitives.

from dataclasses import dataclass, field
from typing import Any


@dataclass
class TaskSpec:
    name: str
    description: str
    command_type: str
    function_name: str

    ingress_types: list[str] = field(default_factory=list)
    required_inputs: list[str] = field(default_factory=list)

    sync_allowed: bool = False
    async_required: bool = False
    approval_required: bool = False

    memory_write_possible: bool = False
    artifact_output_possible: bool = False
    notification_required: bool = False

    policy_checks: list[str] = field(default_factory=list)
    connector_requirements: list[str] = field(default_factory=list)

    risk_level: str = "low"
    max_attempts: int = 3
    metadata: dict[str, Any] = field(default_factory=dict)

13.3 · ExecutionPlan

A plan is a validated path from intent to execution.

from dataclasses import dataclass, field


@dataclass
class PlanStep:
    step_id: str
    primitive: str
    function_name: str
    depends_on: list[str] = field(default_factory=list)
    metadata: dict = field(default_factory=dict)


@dataclass
class ExecutionPlan:
    plan_id: str
    command_id: str
    execution_mode: str
    primitives: list[str]
    steps: list[PlanStep]

13.4 · CommandDependency

A command may declare zero or more parent commands it depends on. Resolution waits until every parent reaches its required_status (typically succeeded). The runtime adapter handles the waiting via a listener over command-completion events.

@dataclass(frozen=True)
class CommandDependency:
    child_command_id: str
    parent_command_id: str
    required_status: str = "succeeded"
    propagate_cancellation: bool = True

Cycles are rejected at insert time via a BFS over command_dependencies from child to ancestors. Cancellation cascades when propagate_cancellation = True and the required parent status becomes unreachable.

13.5 · CoreEffect

An effect is a side-effect plan — a description of what the workflow intends to do to the outside world (or to durable storage outside the workflow's own state). The functional core returns effects as part of CoreResult; they are persisted upfront as domain_effects rows in status planned and transitioned through executing → succeeded | failed as the runtime fires them.

from enum import StrEnum


class EffectStatus(StrEnum):
    PLANNED = "planned"
    EXECUTING = "executing"
    SUCCEEDED = "succeeded"
    FAILED = "failed"


@dataclass(frozen=True)
class CoreEffect:
    effect_type: str
    payload: dict[str, Any]
    idempotency_key: str | None = None
    declared_compensation: str | None = None    # name of the compensation operation, if any
    counter_effects: bool = False               # True if this effect only undoes a prior effect

The idempotency_key is the source of truth for effect-level idempotency, separate from command-level. External APIs that support idempotency keys consume this value.

Section 14 Primitive mapping standard

Every task should be mappable to primitives.

def map_task_to_primitives(task: TaskSpec) -> list[str]:
    primitives = [
        "ingress",
        "command",
        "context",
        "policy",
        "plan",
    ]

    if task.approval_required:
        primitives.append("human_approval")

    if task.async_required:
        primitives.extend(["queue", "async_task"])
    elif task.sync_allowed:
        primitives.append("sync_function")
    else:
        primitives.extend(["queue", "async_task"])

    if task.connector_requirements:
        primitives.append("connector_call")

    if task.artifact_output_possible:
        primitives.append("artifact_write")

    if task.memory_write_possible:
        primitives.append("memory_write")

    if task.notification_required:
        primitives.append("notification")

    primitives.extend(["state_transition", "audit"])

    return primitives

Example:

generate_report = TaskSpec(
    name="Generate Report",
    description="Generate a report asynchronously.",
    command_type="generate_report",
    function_name="generate_report",
    ingress_types=["user_request", "scheduled_trigger"],
    required_inputs=["report_type", "date_range"],
    async_required=True,
    artifact_output_possible=True,
    notification_required=True,
    policy_checks=["permission", "cost", "data_access"],
    risk_level="medium",
)

map_task_to_primitives(generate_report)

Expected output:

[
  "ingress",
  "command",
  "context",
  "policy",
  "plan",
  "queue",
  "async_task",
  "artifact_write",
  "notification",
  "state_transition",
  "audit"
]

Section 15 Deterministic workflows

A deterministic workflow has a known plan before execution.

Generate report · deterministic

flowchart LR A([Generate report]) --> B[validate input] B --> C[run report job] C --> D[create artifact] D --> E[notify user] style A fill:#F5E0D2,stroke:#D97757 style E fill:#F1F2EC,stroke:#6B7B5A

Task spec:

generate_report_task = TaskSpec(
    name="Generate Report",
    description="Generate a report from a known report template.",
    command_type="generate_report",
    function_name="generate_report",
    ingress_types=["user_request", "scheduled_trigger"],
    required_inputs=["report_type", "date_range"],
    async_required=True,
    approval_required=False,
    artifact_output_possible=True,
    notification_required=True,
    policy_checks=["permission", "cost"],
    risk_level="medium",
)

Lifecycle:

created → validated → queued → running → succeeded

Note

The plan is created by rules, not by an agent.

Section 16 Agentic workflows

An agentic workflow can use dynamic reasoning, but it must still operate inside the primitive framework.

Agent loop

flowchart TB UR([User request]) --> CMD([Command]) CMD --> RM[Retrieve memory / context] RM --> PROP[Agent proposes plan / tool call] PROP --> POL{{Policy validates}} POL --> EXEC[Execute approved tool] EXEC --> OBS[Observe result] OBS --> DEC{Agent decides next step} DEC -->|continue| PROP DEC -->|finish| OUT[Artifact · memory · audit] style POL fill:#F5E0D2,stroke:#D97757,stroke-width:1.5px style DEC fill:#F5E0D2,stroke:#D97757 style OUT fill:#F1F2EC,stroke:#6B7B5A

Important

The agent proposes actions. The framework authorizes and records actions.

Agent tool calls should become commands or child task runs. Example: agent wants to send an email.

Agent · external email

sequenceDiagram autonumber participant A as Agent participant G as Primitive Gateway participant P as Policy participant H as Approver participant C as Email Connector A->>G: propose send_email G->>G: create command G->>P: evaluate policy P-->>G: require_approval (external) G->>H: approval request H-->>G: approved G->>C: send email C-->>G: ok G-->>A: result + audit

Section 17 Agent action protocol

Agents should not call connectors directly. They should call the primitive gateway.

Recommended protocol (request):

{
  "action_type": "tool_call",
  "tool_name": "send_notification",
  "payload": {
    "channel": "email",
    "recipient": "finance@example.com",
    "message": "The report is ready."
  },
  "reason": "The user asked me to notify the finance team.",
  "risk_level": "medium"
}

The framework responds:

{
  "decision": "require_approval",
  "command_id": "cmd_123",
  "approval_id": "appr_456",
  "message": "Human approval required before sending external email."
}

Effect

This keeps agents safe and auditable.

Section 18 Connectors

A connector exposes capabilities to the framework. Examples:

postgres

databricks

github

slack

gmail

google_drive

salesforce

openai

anthropic

internal_http

A connector should declare: connector_id, connector_type, name, capabilities, auth mode, rate limits, risk level, input schemas, output schemas.

Separation

A connector function should not decide policy. It should execute after policy permits it.

Section 19 Connector interface standard

from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Any


@dataclass
class ConnectorCall:
    connector_name: str
    capability: str
    input: dict[str, Any]
    context: dict[str, Any]
    command_id: str | None = None


@dataclass
class ConnectorResult:
    ok: bool
    output: dict[str, Any]
    error: str | None = None
    metadata: dict[str, Any] | None = None


class Connector(ABC):
    name: str
    connector_type: str

    @abstractmethod
    def capabilities(self) -> list[str]:
        pass

    @abstractmethod
    def call(self, request: ConnectorCall) -> ConnectorResult:
        pass

Example connector:

class NotificationConnector(Connector):
    name = "notification"
    connector_type = "notification"

    def capabilities(self) -> list[str]:
        return ["send_app_notification", "send_email", "send_slack"]

    def call(self, request: ConnectorCall) -> ConnectorResult:
        if request.capability == "send_app_notification":
            return ConnectorResult(
                ok=True,
                output={
                    "sent": True,
                    "channel": "app",
                    "recipient": request.input["recipient"],
                },
            )

        return ConnectorResult(
            ok=False,
            output={},
            error=f"Unsupported capability: {request.capability}",
        )

Section 20 Tool registry

A tool is a capability exposed to deterministic flows or agents. Tool metadata: name, description, input_schema, output_schema, execution_mode, risk_level, requires_approval, connector, function_name.

Examples:

validate_sql

run_query

generate_report

publish_report

retrieve_memory

write_memory

send_notification

sync_github_issue

create_google_doc

start_databricks_job

Principle

Tools are not just Python functions. They are governed capabilities.

Section 21 Policy framework

Policies should be composable. Categories:

permission

cost

data_safety

external_sharing

destructive_action

memory_consent

connector_scope

agent_risk

rate_limit

approval_requirement

Policy function standard:

from dataclasses import dataclass, field
from typing import Any


@dataclass
class PolicyResult:
    decision: str
    reasons: list[str] = field(default_factory=list)
    required_approvals: list[str] = field(default_factory=list)
    metadata: dict[str, Any] = field(default_factory=dict)


def external_sharing_policy(command: Command, task: TaskSpec, context: Context) -> PolicyResult:
    destination = command.payload.get("destination", "")

    if "@" in destination or destination.startswith("external:"):
        return PolicyResult(
            decision="require_approval",
            reasons=["External sharing requires approval."],
            required_approvals=["data_owner"],
        )

    return PolicyResult(decision="allow")

Section 22 Approval architecture

Approvals are first-class. An approval request should include: approval_id, command_id, requested_by, approver, approval_type, request_payload, review_packet, status, expires_at, created_at, decided_at.

Approval statuses:

pending

approved

rejected

expired

cancelled

Approval flow

sequenceDiagram autonumber participant U as User / Agent participant F as Framework participant N as Notification participant A as Approver U->>F: command reaches approval gate F->>F: create approval request F->>N: send notification F->>F: state = waiting_for_approval A-->>F: approve / reject F->>F: record approval decision alt approved F-->>U: resume workflow else rejected / expired F-->>U: workflow fails / expires end

An approval request should include enough context for a human: what action is requested, who requested it, why it is needed, what data/artifact is affected, what policy triggered approval, what will happen after approval, risk level, expiration.

Section 23 Memory architecture

Memory is durable context that can shape future behavior. Memory should be scoped.

Scopes

user

team

organization

app

workflow

connector

dataset

project

Memory types

preference

instruction

constraint

fact

negative_preference

approval_preference

connector_preference

format_preference

Examples

User prefers concise executive summaries.
Finance reports should include YoY comparison.
Never send customer PII to external emails.
Use the analytics warehouse for ad hoc report queries.
Ask Alex before publishing monthly revenue reports.

Memory write rules

High-confidence, low-risk preference — write directly if consent exists.
Sensitive or broad memory — require approval.
Conflicting memory — mark old memory superseded or ask human.
Temporary memory — set expiration.

Memory retrieval should be explicit: by subject, by task type, by connector, by semantic query.

Backend

The semantics of memory — consent, scope, confidence, supersession, conflict resolution, retrieval contracts — live in Concord. The storage backend is a connector that implements a MemoryStore protocol. The default is Postgres (with optional pgvector for semantic search); alternates are connectors: PineconeMemoryStore, WeaviateMemoryStore, in-house stores.

from typing import Protocol


class MemoryStore(Protocol):
    def insert(self, memory: Memory) -> None: ...
    def get(self, memory_id: str) -> Memory | None: ...
    def search(self, scope: MemoryScope, query: str | None, limit: int) -> list[Memory]: ...
    def supersede(self, old_id: str, new_id: str) -> None: ...
    def delete(self, memory_id: str) -> None: ...

Consent, policy, and audit hooks fire in the semantic layer regardless of backend. The backend itself never sees consent — it just stores and retrieves bytes addressed by scope.

Section 24 Artifact architecture

Artifacts are durable outputs: the things the workflow produces and the user reads. They are distinct from effects (§13.5, §24 below) which are side-effect plans the workflow intends to perform on the outside world. An artifact is a report, a query result, a draft document; an effect is a publish call, an email send, a job enqueue. A workflow may produce both: an artifact (the report) and an effect (publish it). The two live in different tables — artifacts versus domain_effects — because they answer different questions.

Artifact types

report

file

table

query_result

dashboard

notebook

agent_answer

approval_packet

connector_snapshot

model_output

memory_summary

Artifacts should track: location, type, version, status, metadata, lineage, creator, command_id, created_at.

Artifact statuses

draft

created

validated

published

archived

deleted

failed

Note

Artifacts should be references, not necessarily blobs. Store large payloads elsewhere and keep pointers in Postgres.

Section 25 Audit architecture

Audit is append-only, and lives in a single domain_events table alongside business events and agent step traces. The purpose column ('event', 'audit', 'agent_step') disambiguates the role of each row. One table covers three needs: business event stream, compliance audit trail, and agent step history. Compliance queries filter by purpose = 'audit'; agent observability filters by purpose = 'agent_step' ordered by step_index; product event feeds filter by purpose = 'event'. Agent-step extension columns (step_index, tool_name, latency_ms, token_usage, cost_units) are nullable for non-agent rows.

Audit event examples

command_created

input_validated

policy_evaluated

plan_created

approval_requested

approval_resolved

task_enqueued

task_claimed

task_started

task_succeeded

task_failed

task_retried

connector_called

memory_written

artifact_created

notification_sent

state_transitioned

command_cancelled

compensation_started

compensation_completed

Audit events should contain: actor, command_id, trace_id, event_type, target_type, target_id, payload, timestamp.

Principle

Audit should never be the only place where state is stored. Audit explains state. State controls workflow.

Section 26 Deterministic task mapping

To map any deterministic task, answer:

What ingress created it?
What command does it represent?
What context applies?
What policy checks apply?
Is it sync or async?
Does it need approval?
Does it call a connector?
Does it create artifacts?
Does it write memory?
Does it notify someone?
What state lifecycle applies?
What audit events must exist?

Template:

task:
  name: Generate Report
  command_type: generate_report
  ingress_types:
    - user_request
    - scheduled_trigger
  required_inputs:
    - report_type
    - date_range
  execution:
    mode: async
    function_name: generate_report
  policies:
    - permission
    - cost
    - data_access
  approval:
    required: false
  outputs:
    artifacts:
      - report
    memory: []
    notifications:
      - app
  lifecycle:
    - created
    - validated
    - queued
    - running
    - succeeded

Section 27 Agentic task mapping

To map an agentic task, answer:

What user/system goal is the agent pursuing?
What tools may the agent call?
What memories may the agent read?
What memories may the agent write?
What tool calls require policy checks?
What actions require approval?
Can the agent create new commands?
Can the agent create artifacts?
What should be audited at each loop?
How does the loop terminate?

Template:

agent_workflow:
  name: Investigate Revenue Anomaly
  command_type: investigate_revenue_anomaly
  allowed_tools:
    - run_sql
    - retrieve_memory
    - create_report_draft
    - request_human_input
  forbidden_tools:
    - send_external_email
    - mutate_production_table
  memory:
    read_scopes:
      - user
      - team
      - project
    write_allowed: true
    write_requires_policy: true
  approval_gates:
    - publish_report
    - send_notification_external
  termination:
    conditions:
      - final_answer_created
      - human_cancelled
      - max_steps_reached
      - policy_denied

Section 28 Agent loop standard

from dataclasses import dataclass
from typing import Any


@dataclass
class AgentAction:
    action_type: str
    tool_name: str
    payload: dict[str, Any]
    reason: str
    risk_level: str = "low"


@dataclass
class AgentStepResult:
    decision: str
    command_id: str | None
    output: dict[str, Any]
    message: str


class PrimitiveGateway:
    def submit_agent_action(
        self,
        action: AgentAction,
        context: Context,
    ) -> AgentStepResult:
        """Convert an agent action into a governed command or task.

        The implementation should:
        1. Create command.
        2. Evaluate policy.
        3. Execute or request approval.
        4. Return structured result.
        """
        raise NotImplementedError

Effect

This prevents the agent from directly executing unsafe actions.

Section 29 Sync vs async standards

Sync

Fast, low-risk, bounded

operation is fast
operation is low risk
operation has bounded latency
operation can return inside the request budget
operation does not need retries beyond caller retry

Async

Slow, expensive, multi-step

may take longer than request budget
calls slow external APIs
creates durable artifacts
needs retries
has approval gates
is expensive
has multiple steps

Examples:

validate input — sync
preview SQL — sync if small
generate report — async
sync connector — async
send approved notification — async or sync depending on provider
write memory — sync if low risk, approval-gated if sensitive
agent tool call — governed sync or async depending on tool

Section 30 Cancellation standard

Every long-running workflow should be cancellable. Each command carries a cancellation_mode column with one of two values.

30.1 · `graceful` (default)

The workflow finishes its current step and then exits cleanly. The runtime's hard workflow.cancel() is not called; instead, each step's prelude checks command.cancellation_requested and a typed Cancelled exception routes to the workflow's exit branch. Side effects already in flight inside the current step are allowed to complete; side effects beyond it are skipped.

Transitions: running → cancelling → cancelled.

30.2 · `compensate_then_stop`

The workflow stops issuing new operation steps and walks the effect chain in reverse. For every domain_effects row in status succeeded with a declared compensation, the compensation is enqueued (reverse declaration order). The chain runs as a runtime sub-workflow with its own audit trail under purpose='audit'.

Transitions: running → cancelling → compensating → compensated → cancelled.

Cancellation flow

flowchart TB R["User / system
requests cancellation"] --> S{"State allows?"} S -->|no| KEEP[no-op] S -->|yes| M{"cancellation_mode"} M -->|graceful| G1[current step finishes] G1 --> G2[future steps skip via prelude check] G2 --> C1["command → cancelled"] M -->|compensate_then_stop| K1["command → compensating"] K1 --> K2[walk effects in reverse] K2 --> K3[enqueue each compensation] K3 --> K4["command → compensated"] K4 --> C1 C1 --> AU[/"domain_events audit"/] style S fill:#F5E0D2,stroke:#D97757 style M fill:#F5E0D2,stroke:#D97757 style AU fill:#F1F2EC,stroke:#6B7B5A

Cancellation propagates: from a swarm to its agent runs (§64), from a parent command to dependent children where propagate_cancellation is true (§13.4), and from a child workflow upward when marked terminal-on-child-failure.

Caution

Running steps should periodically check the command's cancellation state before performing irreversible side effects. graceful only protects against new steps, not against side effects mid-flight inside the currently running one.

immediate (hard cancel without grace) and checkpoint_then_stop (run to next safe checkpoint) are reserved as future modes; both are deferred until a concrete use case forces them.

Section 31 Compensation standard

Definition

Compensation is not rollback. It is a forward action that counteracts a side effect.

Examples:

created draft — delete draft
published artifact — unpublish artifact
sent wrong notification — send correction
created external ticket — close ticket
granted access — revoke access

Compensation works in three layers: a declarative manifest at registration, a graph validator that runs at startup, and a drift detector that runs at execution time.

Layer 1

Manifest at registration

Every operation declares the side effects it produces. Every compensation declares what it counters and what (if any) new effects it itself produces.

Layer 2

Graph validator at startup

On catalog load, Concord builds the effect/compensation graph and validates acyclicity, depth ≤ N, pure-counter invariants, and runtime capability matches.

Layer 3

Runtime drift detector

Each compensation step is wrapped to record every effect actually emitted. Drift from the declared manifest writes a compensation_drift audit row and alerts on-call.

31.1 · Manifest

Operations and their compensations carry typed declarations. Most compensations are pure-counter: their only produced effects are inverses of the parent's. Complex compensations that themselves require further compensation are allowed but must declare the cascade explicitly.

from concord.effects import operation, compensation, ExternalCall


@operation(
    produces=[ExternalCall("hotel_booking.book")],
    requires_compensation=True,
)
def book_hotel(...): ...


@compensation(
    of=book_hotel,
    produces=[ExternalCall("hotel_booking.cancel")],
    counter_effects=True,    # explicit: these are inverses, not new work
)
def cancel_hotel_reservation(...): ...

31.2 · Graph validator

At catalog load the validator rejects:

Operations marked requires_compensation=True with no registered compensation.
Compensation chains exceeding max_depth (default 2).
Cycles in the effect → compensation → counter-effect graph.
counter_effects=True declarations whose produced effects are not the inverses of the parent's.
Catalogs requiring SAGA_COMPENSATION_NATIVE when the active runtime adapter doesn't declare that capability (see §41).

These are registration-time errors. The app refuses to start until the catalog is internally consistent and runnable against the chosen runtime.

31.3 · Drift detector

At runtime, each compensation step is wrapped. Effects actually emitted are compared against the declared manifest; any drift writes an audit row.

@runtime.step(**compile_policy("compensation"))
def run_compensation_step(effect_id: str) -> dict:
    declared = load_manifest_for(effect_id)
    with concord.effects.intercept() as recorder:
        result = execute_compensation(effect_id)
    drift = recorder.emitted - declared.expected
    if drift:
        write_domain_event(
            event_type="compensation_drift",
            purpose="audit",
            payload={"effect_id": effect_id, "drift": list(drift)},
        )
    return result

The detector is what catches incomplete manifests, conditional side effects the declaration didn't anticipate, and genuine implementation bugs. Together with the manifest and the validator, it is good enough to call a compensation contract honest — every drift is recorded, named, and auditable.

When to need native saga support

The default DBOS adapter runs compensation chains as Concord-orchestrated sub-workflows. This is correct but weaker than native saga atomicity: if a compensation fails after its retries exhaust, the chain halts at that point and a drift row records the partial completion. Compensation-heavy domains (financial transactions, multi-leg bookings, regulated workflows) should choose a runtime that declares SAGA_COMPENSATION_NATIVE (see §41) — Temporal is the natural fit.

Section 32 Error taxonomy

Failures should be classified.

validation_error

policy_denied

approval_rejected

transient_connector_error

permanent_connector_error

rate_limited

timeout

cancelled

agent_failed

human_input_missing

unknown_error

This classification drives retry behavior.

Retryable

Transient by nature

transient_connector_error
rate_limited
timeout
temporary_database_error

Non-retryable

Logical failures

validation_error
policy_denied
approval_rejected
permission_denied
malformed_payload

Section 33 Retry and backoff

Retry mechanics belong to the runtime; the retry contract belongs to Concord. Concord declares, per operation, which error classes are retryable and how aggressively. The runtime adapter receives this as ordinary configuration.

The contract carries: attempt, max_attempts, run_after, last_error, error_class. Each operation has a registered RetryPolicy:

@dataclass(frozen=True)
class RetryPolicy:
    operation: str
    retryable: frozenset[ErrorClass]
    max_attempts: int = 3
    backoff_seconds: list[int] = field(default_factory=lambda: [30, 120, 600])
    requires_idempotency_key: bool = False

A single compile step translates the contract into runtime step kwargs. This is the only sanctioned way a connector step gets its retry configuration; no ad-hoc retry numbers in decorators.

from concord.retry import compile_policy

@runtime.step(**compile_policy("hotel_booking.book"))
def book_hotel_step(...): ...

Inside the step, the classifier from §32 runs before exceptions propagate. A validation_error raised inside book_hotel_step is converted to a non-retryable exception class regardless of max_attempts; a transient_connector_error raises a class the runtime knows to retry. Test the rule, not just the policy: a step that emits a non-retryable class must never retry, irrespective of decorator config.

A typical backoff schedule:

attempt 1 → retry after 30 seconds
attempt 2 → retry after 2 minutes
attempt 3 → retry after 10 minutes
attempt 4 → fail permanently

Caution

Retries should be idempotent. If a side effect may have happened, the retry should check before repeating it. Effect-level idempotency keys (on domain_effects) are the source of truth for external APIs that support them; the runtime's step-level idempotency only protects against re-execution within a workflow.

Section 34 Connector safety standards

Every connector call should record: connector name, capability, input hash or redacted input, output summary, status, latency, error, command_id, task_run_id, trace_id.

Rule

Never store raw secrets in connector configs.

Connector config should reference secrets, not contain them:

{
  "auth_mode": "oauth",
  "secret_ref": "vault://github/app/token",
  "scopes": ["repo:read", "issues:write"]
}

Section 35 Postgres transaction boundaries

Recommended transaction boundaries:

Tx 1

Command creation

Insert command · insert audit event · commit.

Tx 2

Policy and planning

Update command state · insert policy decisions · update plan · insert audit · commit.

Tx 3

Enqueue

Update command state to queued · insert task run · insert audit · commit.

Tx 4

Worker completion

Update task run result · update command result/state · insert artifacts/memories · insert audit · commit.

Caution

Avoid holding transactions while calling external APIs.

Section 36 API standards

Submit command

POST /commands

Request:

{
  "task_name": "Generate Report",
  "payload": {
    "report_type": "monthly_revenue",
    "date_range": "2026-05"
  },
  "idempotency_key": "generate_report:monthly_revenue:2026-05"
}

Response:

{
  "command_id": "cmd_123",
  "state": "queued",
  "status": "queued",
  "trace_id": "trace_abc"
}

Get command

GET /commands/{command_id}

Response:

{
  "command_id": "cmd_123",
  "command_type": "generate_report",
  "state": "succeeded",
  "result": {
    "artifact_id": "art_456"
  }
}

Resolve approval

POST /approvals/{approval_id}/resolve

Request:

{
  "decision": "approved",
  "reason": "Looks good."
}

Agent action

POST /agent-actions

Request:

{
  "tool_name": "publish_report",
  "payload": {
    "artifact_id": "art_456",
    "destination": "external:finance@example.com"
  },
  "reason": "The user asked to publish the final report."
}

Response:

{
  "decision": "require_approval",
  "command_id": "cmd_789",
  "approval_id": "appr_123"
}

Section 37 Example · deterministic flow

Worked design

A full Concord design doc for a deterministic vendor-data sync (scheduled trigger, retry taxonomy, idempotent upsert, no human approvals) is available as a worked example: vendor-data-sync · Concord design ↗

User asks: Generate the May revenue report.

End-to-end · deterministic

sequenceDiagram autonumber participant U as User participant API as API participant POL as Policy participant Q as Queue participant W as Worker participant DB as Postgres U->>API: "Generate May revenue report" API->>DB: insert command (created) API->>POL: evaluate (permission, cost, data_access) POL-->>API: allow → async API->>DB: validated → queued API->>Q: enqueue task Q-->>W: claim task W->>DB: running W->>W: generate_report() W->>DB: artifact created W->>DB: succeeded W-->>U: app notification

State: created → validated → queued → running → succeeded

Section 38 Example · approval-gated flow

Worked design

A full Concord design doc for a hotel reservation flow (approval gate at >$500, vendor connector with idempotency, compensate_then_stop cancellation with vendor cancel as compensation) is available as a worked example: hotel-booking · Concord design ↗

User asks: Publish the May revenue report to finance@example.com.

End-to-end · approval-gated

sequenceDiagram autonumber participant U as User participant F as Framework participant POL as Policy participant A as Approver participant C as Connector participant DB as Postgres U->>F: publish_report F->>DB: command created F->>POL: permission + external_sharing + pii_check POL-->>F: require_approval (external destination) F->>DB: state = waiting_for_approval F->>A: approval request A-->>F: approved F->>DB: approved → queued → running F->>C: publish report C-->>F: ok F->>DB: artifact created · succeeded F-->>U: notification

State: created → validated → waiting_for_approval → approved → queued → running → succeeded

Section 39 Example · agentic flow

Worked design

A full Concord design doc for an agentic revenue-anomaly investigation (coordinator + optional analyst sub-agent, governed SQL tool, memory with consent gate, approval before external sharing, read-only) is available as a worked example: revenue-investigation-swarm · Concord design ↗

User asks: Investigate why revenue dropped last week and draft a summary.

Investigate revenue anomaly

flowchart TB U([User request]) --> CMD[command: investigate_revenue_drop] CMD --> POL{{Policy · permission · data_access · agent_tool_scope}} POL --> RM[Agent retrieves memory] RM --> SQL[run_sql via governed tool] SQL --> DRAFT[create draft artifact] DRAFT --> SUM[Agent summarizes result] SUM --> OUT[investigation artifact · memory candidate] subgraph ALLOWED ["Allowed tools"] direction LR T1[retrieve_memory] & T2[run_sql] & T3[create_artifact] & T4[ask_human_input] end subgraph FORBID ["Forbidden tools"] direction LR F1[publish_report] & F2[external_email] & F3[production_mutation] end SUM -.->|if external| EXT[propose send_email] --> POL2{{Policy requires approval}} --> FLOW[approval flow] style POL fill:#F5E0D2,stroke:#D97757,stroke-width:1.5px style POL2 fill:#F5E0D2,stroke:#D97757,stroke-width:1.5px style OUT fill:#F1F2EC,stroke:#6B7B5A style ALLOWED fill:#F1F2EC,stroke:#6B7B5A style FORBID fill:#FAEEEE,stroke:#B85556

State: created → validated → running → succeeded (or escalates to approval if external action proposed).

Section 40 Framework module structure

Recommended code layout:

concord/
  __init__.py

  core/
    models.py
    states.py
    errors.py
    ids.py

  persistence/
    postgres.py
    migrations/
      001_core.sql

  registry/
    tasks.py
    tools.py
    connectors.py
    policies.py

  engine/
    command_service.py
    policy_engine.py
    planner.py
    executor.py
    worker.py
    approvals.py
    memory.py
    artifacts.py
    audit.py

  connectors/
    base.py
    postgres.py
    http.py
    notification.py
    databricks.py
    github.py

  agents/
    gateway.py
    protocols.py
    memory_context.py

  api/
    routes.py
    schemas.py

  examples/
    deterministic_report.py
    approval_flow.py
    agentic_investigation.py

Section 41 Minimal service interfaces

CommandService

class CommandService:
    def submit(
        self,
        task_name: str,
        payload: dict,
        context: Context,
        idempotency_key: str | None = None,
    ) -> dict:
        raise NotImplementedError

    def get(self, command_id: str) -> dict:
        raise NotImplementedError

    def cancel(self, command_id: str, actor: str, reason: str | None = None) -> dict:
        raise NotImplementedError

PolicyEngine

class PolicyEngine:
    def evaluate(
        self,
        command: Command,
        task: TaskSpec,
        context: Context,
    ) -> PolicyResult:
        raise NotImplementedError

Planner

class Planner:
    def create_plan(
        self,
        command: Command,
        task: TaskSpec,
        policy_result: PolicyResult,
    ) -> ExecutionPlan:
        raise NotImplementedError

Executor

class Executor:
    def execute_sync(self, command: Command, plan: ExecutionPlan) -> dict:
        raise NotImplementedError

    def enqueue_async(self, command: Command, plan: ExecutionPlan) -> dict:
        raise NotImplementedError

Worker

class Worker:
    def claim_next(self) -> dict | None:
        raise NotImplementedError

    def run_once(self) -> dict | None:
        raise NotImplementedError

DurableRuntime

The durable runtime is a protocol, not an implementation. Concord's domain layer imports the protocol; the runtime is supplied at app startup. The default implementation wraps DBOS; future adapters wrap Temporal, Restate, or in-house equivalents. Each adapter publishes a capabilities set so the catalog can be validated against the runtime at registration time.

from enum import StrEnum
from typing import ClassVar, Protocol


class RuntimeCapability(StrEnum):
    DURABLE_WORKFLOWS = "durable_workflows"
    DURABLE_STEPS = "durable_steps"
    QUEUES = "queues"
    SCHEDULES = "schedules"
    SIGNALS = "signals"
    SUBWORKFLOWS = "subworkflows"
    EFFECT_INTERCEPTION = "effect_interception"
    SAGA_COMPENSATION_NATIVE = "saga_compensation_native"
    WORKFLOW_VERSIONING = "workflow_versioning"


class DurableRuntime(Protocol):
    capabilities: ClassVar[frozenset[RuntimeCapability]]

    def submit_workflow(self, spec: WorkflowSpec) -> WorkflowHandle: ...
    def wait_for_result(self, handle: WorkflowHandle) -> WorkflowResult: ...
    def cancel(self, handle: WorkflowHandle, mode: CancellationMode) -> None: ...
    def enqueue_step(self, queue_name: str, step_spec: StepSpec) -> StepHandle: ...
    def schedule(self, schedule_spec: ScheduleSpec) -> ScheduleHandle: ...
    def signal(self, handle: WorkflowHandle, signal_name: str, payload: dict) -> None: ...

Adapter capability matrix:

Capability	DBOS	Temporal	Notes
DURABLE_WORKFLOWS	✓	✓	Table stakes.
DURABLE_STEPS	✓	✓ (activities)
QUEUES	✓	✓ (task queues)
SCHEDULES	✓	✓ (cron schedules)
SIGNALS	partial	✓ (signals + queries)	DBOS approval-wait pattern; Temporal more general.
SUBWORKFLOWS	✓	✓ (child workflows)
EFFECT_INTERCEPTION	✓	✓	Concord wraps every step; adapter just needs hook points.
SAGA_COMPENSATION_NATIVE	✗	✓	DBOS runs chains as Concord-orchestrated sub-workflows; see §31.
WORKFLOW_VERSIONING	✗	✓	Adopt Temporal when versioning becomes load-bearing.

At catalog load, Concord checks the union of capabilities required by registered operations against the active adapter's capabilities. Missing capabilities surface as startup errors — never as first-failure runtime errors.

MemoryStore

class MemoryStore(Protocol):
    def insert(self, memory: Memory) -> None: ...
    def get(self, memory_id: str) -> Memory | None: ...
    def search(self, scope: MemoryScope, query: str | None, limit: int) -> list[Memory]: ...
    def supersede(self, old_id: str, new_id: str) -> None: ...
    def delete(self, memory_id: str) -> None: ...

Section 42 Extensibility model

The framework should be extensible through registries.

task registry

tool registry

policy registry

connector registry

memory extractor registry

artifact handler registry

notification handler registry

agent adapter registry

To add a new capability:

Register connector if needed.
Register tool/function.
Register task spec.
Register policies.
Define output artifacts/memory/notifications.
Add tests.

Effect

No core engine changes should be required for most new capabilities.

Section 43 Future connector model

A future connector should be able to expose capabilities like:

connector:
  name: github
  type: github
  capabilities:
    - search_issues
    - create_issue
    - comment_on_pr
  auth:
    mode: oauth
    scopes:
      - issues:read
      - issues:write
  rate_limits:
    requests_per_minute: 60

Tool:

tool:
  name: create_github_issue
  connector: github
  capability: create_issue
  execution_mode: async
  risk_level: medium
  requires_approval: false
  input_schema:
    type: object
    required:
      - repo
      - title
    properties:
      repo:
        type: string
      title:
        type: string
      body:
        type: string

The planner can route tool execution through the connector registry.

Section 44 Future agent model

Agents should see a constrained tool catalog. For each task, define:

allowed_tools

forbidden_tools

memory_scopes

approval_gates

max_steps

max_cost

termination_conditions

Agent path through the system

flowchart LR AG[Agent] --> GW[Primitive gateway] GW --> CPP[command · policy · plan] CPP --> EX[execution] style AG fill:#EFE6F0,stroke:#7A5560 style GW fill:#F5E0D2,stroke:#D97757 style CPP fill:#F5E0D2,stroke:#D97757,stroke-width:1.5px style EX fill:#F1F2EC,stroke:#6B7B5A

Effect

This allows future replacement of the agent framework without rewriting the workflow system.

Section 45 Security model

Security layers:

authentication

authorization

policy evaluation

connector scope

approval gates

audit

secret isolation

data safety checks

Rule

A connector credential does not imply a user may use the connector.

The framework must check:

who is requesting
what they are trying to do
what connector/tool is involved
what data will be accessed
what side effects may occur
whether approval is required

Section 46 Data safety model

Data safety policies should classify:

PII

credentials

customer confidential data

internal-only data

external-shareable data

regulated data

Data safety outcomes:

allow

redact

require_approval

deny

Agentic workflows must apply data safety checks before:

external tool call
external notification
memory write
artifact publication
connector sync

Section 47 Testing standards

Test at four levels.

Level 1

Unit tests

State transitions · primitive mapping · policy decisions · planner outputs · idempotency behavior.

Level 2

Integration tests

Postgres persistence · queue claiming · worker retry · approval resume · connector call recording.

Level 3

Workflow tests

Generate report end-to-end · approval-gated publish · webhook deduplication · agent tool proposal with approval.

Level 4

Safety tests

Policy denial · external sharing approval · memory consent · forbidden tool calls · duplicate webhook event.

Level 5

Boundary discipline

Import-time boundary check: concord/core/ and concord/domain/ must not import the runtime; only concord/runtime/<adapter>.py may. An AST scanner (concord_boundary_check.py) runs in CI and fails the build on violation.

The boundary check is the only "test" that runs on every commit before the suite — it's a 200-line AST scanner with no install footprint. Rules:

Path	Disallowed	Why
`concord/core/**`	`dbos`, `temporalio`	The functional core has no runtime knowledge.
`concord/domain/**`	`dbos`, `temporalio`	Domain speaks the protocol; never an implementation.
`concord/runtime/*.py` (≠ `dbos.py`)	`dbos`	Adapter isolation. One file binds the implementation.
`concord/runtime/*.py` (≠ `temporal.py`)	`temporalio`	Same rule shape; future adapter.

Section 48 Operational dashboards

Minimum dashboards

commands by state

task runs by status

failed tasks

retry counts

approval backlog

old queued tasks

connector errors

policy denials

memory writes

artifact creation

audit volume

Operational alerts

queued task older than threshold

approval expired

connector error rate spike

task retry exhaustion

worker heartbeat missing

policy denial spike

Section 49 Development roadmap

Phases

flowchart LR P1["Phase 1
Logical core"] --> P2["Phase 2
Postgres backend"] P2 --> P3["Phase 3
Worker"] P3 --> P4["Phase 4
Approvals & notifications"] P4 --> P5["Phase 5
Connectors"] P5 --> P6["Phase 6
Agent gateway"] P6 --> P7["Phase 7
Hardening"] style P1 fill:#FAF8F2,stroke:#141413 style P2 fill:#FAF8F2,stroke:#141413 style P3 fill:#F5E0D2,stroke:#D97757 style P4 fill:#F5E0D2,stroke:#D97757 style P5 fill:#F1F2EC,stroke:#6B7B5A style P6 fill:#EFE6F0,stroke:#7A5560 style P7 fill:#F7F1E0,stroke:#B68A2E

Phase 1

Logical core

TaskSpec · Command · PolicyResult · ExecutionPlan · primitive mapper · state machine · audit model · in-memory store.

Phase 2

Postgres backend

Migrations · command / task queue / approval / audit / memory / artifact repositories.

Phase 3

Worker

Queue claiming · lease renewal · retry/backoff · task execution · result persistence · failure classification.

Phase 4

Approvals & notifications

Approval API · approval UI · notification connector · resume · expiration.

Phase 5

Connectors

Base interface · HTTP · notification · Databricks · GitHub · tool registry.

Phase 6

Agent gateway

Action protocol · tool allowlist · memory retrieval · policy-gated calls · trace events · approval-gated agent actions.

Phase 7

Hardening

Rate limits · cost accounting · data safety policies · compensation · cancellation · dashboards · retention.

Section 50 What this framework is not

Concord is not

a durable execution runtime (DBOS, Temporal, Cadence do that)
a distributed compute engine
a full BPMN engine
a replacement for Postgres
a replacement for connector APIs
a replacement for an LLM agent framework
a full data pipeline orchestrator
a framework that demands wholesale adoption

Concord is

a library of contracts (pip install concord)
a policy, approval, and state model
a connector and agent governance layer
an agent-safe tool gateway
a Postgres-backed system of record for domain meaning
a thin layer that delegates execution to a durable runtime

Section 51 Final vision

Concord should make every action in the system look like this:

Someone or something requested work. The request became a command. The command was evaluated by policy. The policy produced a plan. The plan executed through governed primitives. The workflow state changed explicitly. Outputs, memories, and artifacts were recorded. Every important thing was audited.

This gives you one coherent framework for:

ordinary deterministic workflows

long-running async jobs

external triggers

human approvals

memory capture

artifact creation

notifications

future connectors

agentic tool use

The key architectural decision:

Architectural truths

flowchart LR A["Postgres
owns truth"] --- B["Framework
owns primitives"] B --- C["Connectors
own capabilities"] C --- D["Workers
own execution"] D --- E["Agents
propose actions"] E --- F["Policy
decides what is allowed"] F --- G["Audit
explains what happened"] style A fill:#FFFFFF,stroke:#141413,stroke-width:1.5px style B fill:#FAF8F2,stroke:#141413 style C fill:#F1F2EC,stroke:#6B7B5A style D fill:#FAF8F2,stroke:#141413 style E fill:#EFE6F0,stroke:#7A5560 style F fill:#F5E0D2,stroke:#D97757,stroke-width:1.5px style G fill:#FAF8F2,stroke:#141413

Part II · Addendum

Multi-agent swarms, subagent spawning, agentic execution

A governed extension to the primitive layer. Agents propose actions; Concord governs, records, executes, and audits them.

Single-agent runs, parallel swarms, hierarchical delegation, and reviewer agents are all compositions over the same primitives — no new fundamentals required.

Sections52 – 73

Adds5 Postgres tables · 2 state machines · 5 execution modes

StanceRuntime-agnostic, Postgres-first

Section 52 Agentic design philosophy

Concord supports agentic execution as a governed extension of the primitive system. Agents propose; the framework decides.

Agents do not own execution. Agents propose actions. Concord governs, records, executes, and audits those actions.

52.1 · Agents are participants, not infrastructure

An agent is a participant that can interpret context, propose commands, call tools, delegate work, spawn subagents, produce artifacts, request memory reads/writes, ask for human approval, and evaluate or synthesize results. But every consequential action passes through the same primitive gateway used by deterministic workflows.

Agent → primitive gateway

flowchart LR AG[Agent proposes action] --> CMD([Concord command]) CMD --> POL{{Policy check}} POL --> PL([Execution plan]) PL --> EX[Sync / async execution] EX --> ST[State transition] ST --> OUT[Artifact · memory · notification · audit] style AG fill:#EFE6F0,stroke:#7A5560 style POL fill:#F5E0D2,stroke:#D97757,stroke-width:1.5px style OUT fill:#F1F2EC,stroke:#6B7B5A

This is what makes deterministic and agentic workflows interoperable.

52.2 · Swarms are workflow structures, not special cases

A swarm is a coordinated group of agent runs serving a parent command or workflow. A swarm can be sequential, parallel, competitive, hierarchical, review-driven, or human-supervised — but it is still composed from the same primitives:

Command → Policy → Plan → AgentRun → TaskRun → Artifact → Memory → Audit

52.3 · Subagent spawning is a governed operation

An agent must not directly create uncontrolled child agents. Subagent spawning is itself a command — spawn_subagent — subject to:

permission checks

role checks

cost limits

tool scope limits

memory scope limits

connector scope limits

spawn depth limits

swarm size limits

human approval if required

52.4 · Postgres remains the system of record

All durable agentic state lives in Postgres: swarm runs, agent runs, invocations, steps, tool calls, delegated goals, parent-child relationships, artifacts, memory reads/writes, approvals, cancellations, audit.

Substitutability

The LLM/agent runtime can be replaced. The Postgres-backed execution record must remain stable.

Section 53 Core agentic concepts

Five core objects model agentic execution.

Concept

SwarmRun

A coordinated multi-agent execution that belongs to one parent command. Defines objective, participants, coordinator, join strategy, and hard limits.

Concept

AgentRun

One execution of one agent role: coordinator, planner, researcher, reviewer, worker, memory manager, domain expert, or connector-specific agent.

Concept

AgentInvocation

Records a parent agent spawning or delegating to a child agent — delegated goal, constraints, allowed tools, memory scope, budget, max steps, spawn depth.

Concept

AgentStep

One decision, action, or observation inside an agent run. The agentic equivalent of a trace event — but durable and queryable.

Concept

JoinStrategy

Defines how subagent outputs are combined: all_success, first_success, quorum, coordinator_synthesis, evaluator_selection, human_review, best_of_n, map_reduce, consensus, ranked_review.

An AgentRun may create

commands

task runs

subagent invocations

artifacts

memory candidates

approval requests

audit events

AgentStep action types

plan

reason

tool_call_proposed

tool_call_executed

command_created

artifact_read

artifact_written

memory_read

memory_write_proposed

approval_requested

subagent_spawned

child_result_observed

evaluation

synthesis

final_answer

error

Join strategies

all_success

first_success

quorum

coordinator_synthesis

evaluator_selection

human_review

best_of_n

map_reduce

consensus

ranked_review

Section 54 Agentic hierarchy

54.1 · Standard hierarchy

Command → swarm → agents → join → output

flowchart TB C([Command]) --> WR[WorkflowRun] WR --> SR[SwarmRun] SR --> COORD[AgentRun · coordinator] COORD --> I1[Invocation · researcher] COORD --> I2[Invocation · analyst] COORD --> I3[Invocation · reviewer] I1 --> AR1[AgentRun · researcher] I2 --> AR2[AgentRun · analyst] I3 --> AR3[AgentRun · reviewer] AR1 --> J[JoinResults] AR2 --> J AR3 --> J J --> ART[ArtifactWrite] ART --> POL{{Policy check}} POL --> HA[HumanApproval if needed] HA --> FIN([FinalOutput]) style POL fill:#F5E0D2,stroke:#D97757,stroke-width:1.5px style FIN fill:#F1F2EC,stroke:#6B7B5A style COORD fill:#EFE6F0,stroke:#7A5560 style AR1 fill:#EFE6F0,stroke:#7A5560 style AR2 fill:#EFE6F0,stroke:#7A5560 style AR3 fill:#EFE6F0,stroke:#7A5560

54.2 · Parent-child relationships

Every child agent should have:

parent agent run id

swarm run id

delegated goal

bounded role

tool scope

memory scope

connector scope

max step count

max runtime / timeout

cancellation parent

This enables recursive execution while preserving control.

Section 55 Swarm execution modes

Five recurring patterns. Choose by the shape of the work.

Mode 01

Sequential

Parent delegates to one child at a time. Use when each step depends on the previous, when debuggability matters, when cost and control beat latency.

Mode 02

Parallel

Parent spawns subagents concurrently. Use when work decomposes cleanly, latency matters, multiple connectors can be explored at once.

Mode 03

Competitive

Multiple agents attempt the same task; an evaluator selects the best. Use when output quality matters or confidence comes from comparison.

Mode 04

Review-driven

Reviewer agents evaluate proposed outputs before publication. Use when outputs are externally visible, sensitive, or costly to get wrong.

Mode 05

Hierarchical

Agents can spawn subagents, within hard limits. Use when a coordinator cannot plan all subtasks upfront and decomposition must be flexible.

Sequential

flowchart LR COORD([Coordinator]) --> R[Researcher] --> A[Analyst] --> REV[Reviewer] --> FS([Final synthesis]) style COORD fill:#EFE6F0,stroke:#7A5560 style FS fill:#F1F2EC,stroke:#6B7B5A

Parallel

flowchart LR COORD([Coordinator]) --> RA[Researcher A] COORD --> RB[Researcher B] COORD --> A[Analyst] COORD --> REV[Reviewer] RA --> J([Join results]) RB --> J A --> J REV --> J style COORD fill:#EFE6F0,stroke:#7A5560 style J fill:#F1F2EC,stroke:#6B7B5A

Competitive

flowchart LR COORD([Coordinator]) --> S1[Solver A] COORD --> S2[Solver B] COORD --> S3[Solver C] S1 --> E{Evaluator} S2 --> E S3 --> E E --> BEST([Select best]) style COORD fill:#EFE6F0,stroke:#7A5560 style E fill:#F5E0D2,stroke:#D97757 style BEST fill:#F1F2EC,stroke:#6B7B5A

Review-driven

flowchart LR W[Workers produce artifacts] --> REV[Reviewer evaluates] REV --> Q{"Risk remains?"} Q -->|yes| HUM[Human approval] Q -->|no| PUB([Publication]) HUM --> PUB style Q fill:#F5E0D2,stroke:#D97757 style PUB fill:#F1F2EC,stroke:#6B7B5A

Hierarchical

flowchart TB COORD([Coordinator]) --> DL[Domain Lead] COORD --> REV[Reviewer] DL --> SA[Subagent A] DL --> SB[Subagent B] style COORD fill:#EFE6F0,stroke:#7A5560 style DL fill:#EFE6F0,stroke:#7A5560

Always enforce in hierarchical mode

max_depth

max_agents

max_steps_per_agent

max_total_steps

max_cost

allowed_roles

allowed_tools

allowed_connectors

Section 56 Mapping swarms to existing primitives

No new fundamental primitive is required. Swarms are compositions over existing primitives.

Agentic action	Concord mapping
Start swarm	Command → Policy → Plan → SwarmRun
Spawn subagent	Command → Policy → AgentInvocation → AgentRun
Agent tool call	Command → Policy → Sync/Async Function
Agent long-running tool	Command → Policy → Queue → Async Task
Agent asks human	Human Approval
Agent writes memory	Memory Write (candidate, policy-checked)
Agent creates output	Artifact Write
Agent notifies user	Notification
Agent stops work	Cancellation
Agent reverses side effect	Compensation
Agent result review	Evaluation / Quality Check
Agent trace	Audit / AgentStep

Section 57 Postgres schema additions

Five new tables reference the existing pf_commands, pf_task_runs, pf_artifacts, pf_memory, pf_approvals, and pf_audit_log tables.

Agentic entity relationships

erDiagram pf_commands ||--o{ swarm_runs : initiates swarm_runs ||--o{ agent_runs : contains agent_runs ||--o{ agent_invocations : parent_of agent_invocations ||--|| agent_runs : child_run agent_runs ||--o{ agent_steps : records agent_runs ||--o{ agent_messages : exchanges swarm_runs ||--o{ agent_messages : scoped_to swarm_runs { UUID swarm_run_id PK UUID command_id FK TEXT status TEXT execution_mode TEXT join_strategy INT max_agents INT max_depth INT max_total_steps } agent_runs { UUID agent_run_id PK UUID swarm_run_id FK UUID parent_agent_run_id FK TEXT agent_role TEXT status INT spawn_depth INT step_count INT max_steps } agent_invocations { UUID invocation_id PK UUID parent_agent_run_id FK UUID child_agent_run_id FK TEXT invocation_type TEXT delegated_goal } agent_steps { UUID agent_step_id PK UUID agent_run_id FK INT step_index TEXT action_type TEXT decision NUMERIC cost_units } agent_messages { UUID agent_message_id PK UUID agent_run_id FK TEXT message_role TEXT message_type TEXT content }

57.1 · `swarm_runs`

CREATE TABLE IF NOT EXISTS swarm_runs (
  swarm_run_id UUID PRIMARY KEY,
  command_id UUID NOT NULL,
  workflow_run_id UUID NULL,

  coordinator_agent_run_id UUID NULL,

  status TEXT NOT NULL,
  objective TEXT NOT NULL,

  execution_mode TEXT NOT NULL,
  join_strategy TEXT NOT NULL,

  max_agents INT NOT NULL DEFAULT 5,
  max_depth INT NOT NULL DEFAULT 2,
  max_total_steps INT NOT NULL DEFAULT 100,
  max_cost_units NUMERIC NULL,

  metadata JSONB NOT NULL DEFAULT '{}',

  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  started_at TIMESTAMPTZ NULL,
  completed_at TIMESTAMPTZ NULL,
  cancelled_at TIMESTAMPTZ NULL
);

Recommended statuses

created

running

joining

waiting_for_approval

succeeded

failed

cancelled

partially_succeeded

expired

57.2 · `agent_runs`

CREATE TABLE IF NOT EXISTS agent_runs (
  agent_run_id UUID PRIMARY KEY,

  command_id UUID NOT NULL,
  swarm_run_id UUID NULL,
  parent_agent_run_id UUID NULL,

  agent_name TEXT NOT NULL,
  agent_role TEXT NOT NULL,
  agent_version TEXT NULL,

  status TEXT NOT NULL,
  goal TEXT NOT NULL,

  allowed_tools JSONB NOT NULL DEFAULT '[]',
  allowed_connectors JSONB NOT NULL DEFAULT '[]',
  memory_scope JSONB NOT NULL DEFAULT '{}',
  context_scope JSONB NOT NULL DEFAULT '{}',

  max_steps INT NOT NULL DEFAULT 20,
  step_count INT NOT NULL DEFAULT 0,
  spawn_depth INT NOT NULL DEFAULT 0,

  model_config JSONB NOT NULL DEFAULT '{}',
  runtime_config JSONB NOT NULL DEFAULT '{}',
  metadata JSONB NOT NULL DEFAULT '{}',

  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  started_at TIMESTAMPTZ NULL,
  completed_at TIMESTAMPTZ NULL,
  cancelled_at TIMESTAMPTZ NULL
);

Recommended statuses

created

running

waiting_for_tool

waiting_for_child

waiting_for_approval

joining

succeeded

failed

cancelled

expired

57.3 · `agent_invocations`

CREATE TABLE IF NOT EXISTS agent_invocations (
  invocation_id UUID PRIMARY KEY,

  swarm_run_id UUID NOT NULL,
  parent_agent_run_id UUID NOT NULL,
  child_agent_run_id UUID NOT NULL,

  invocation_type TEXT NOT NULL,
  delegated_goal TEXT NOT NULL,

  constraints JSONB NOT NULL DEFAULT '{}',
  allowed_tools JSONB NOT NULL DEFAULT '[]',
  allowed_connectors JSONB NOT NULL DEFAULT '[]',
  memory_scope JSONB NOT NULL DEFAULT '{}',

  status TEXT NOT NULL DEFAULT 'created',

  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  completed_at TIMESTAMPTZ NULL
);

Invocation types

spawn

delegate

review

evaluate

synthesize

retry

fallback

57.4 · `agent_steps`

CREATE TABLE IF NOT EXISTS agent_steps (
  agent_step_id UUID PRIMARY KEY,

  agent_run_id UUID NOT NULL,
  command_id UUID NOT NULL,
  swarm_run_id UUID NULL,

  step_index INT NOT NULL,
  action_type TEXT NOT NULL,

  tool_name TEXT NULL,
  action_payload JSONB NOT NULL DEFAULT '{}',

  decision TEXT NOT NULL,
  observation JSONB NULL,
  output JSONB NULL,

  created_command_id UUID NULL,
  created_task_run_id UUID NULL,
  created_artifact_id UUID NULL,
  created_memory_id UUID NULL,
  created_approval_id UUID NULL,
  child_agent_run_id UUID NULL,

  latency_ms INT NULL,
  token_usage JSONB NULL,
  cost_units NUMERIC NULL,

  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

57.5 · `agent_messages`

Optional but useful for chat-style and collaborative agent systems.

CREATE TABLE IF NOT EXISTS agent_messages (
  agent_message_id UUID PRIMARY KEY,

  swarm_run_id UUID NULL,
  agent_run_id UUID NOT NULL,
  parent_agent_run_id UUID NULL,

  message_role TEXT NOT NULL,
  message_type TEXT NOT NULL,
  content TEXT NOT NULL,
  metadata JSONB NOT NULL DEFAULT '{}',

  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

Roles & types

Roles

message_role

system

user

agent

tool

reviewer

coordinator

human

Types

message_type

instruction

observation

request

response

critique

summary

handoff

57.6 · Recommended indexes

CREATE INDEX IF NOT EXISTS idx_swarm_runs_command_id ON swarm_runs(command_id);
CREATE INDEX IF NOT EXISTS idx_swarm_runs_status     ON swarm_runs(status);

CREATE INDEX IF NOT EXISTS idx_agent_runs_swarm_run_id        ON agent_runs(swarm_run_id);
CREATE INDEX IF NOT EXISTS idx_agent_runs_parent_agent_run_id ON agent_runs(parent_agent_run_id);
CREATE INDEX IF NOT EXISTS idx_agent_runs_status              ON agent_runs(status);

CREATE INDEX IF NOT EXISTS idx_agent_invocations_parent ON agent_invocations(parent_agent_run_id);
CREATE INDEX IF NOT EXISTS idx_agent_invocations_child  ON agent_invocations(child_agent_run_id);

CREATE INDEX IF NOT EXISTS idx_agent_steps_agent_run_id ON agent_steps(agent_run_id);
CREATE INDEX IF NOT EXISTS idx_agent_steps_command_id   ON agent_steps(command_id);
CREATE INDEX IF NOT EXISTS idx_agent_steps_action_type  ON agent_steps(action_type);

Section 58 Swarm & agent state transitions

58.1 · Swarm state transitions

SwarmRun lifecycle

stateDiagram-v2 [*] --> created created --> running running --> joining running --> waiting_for_approval running --> failed running --> cancelled joining --> succeeded joining --> partially_succeeded joining --> failed waiting_for_approval --> running waiting_for_approval --> cancelled waiting_for_approval --> failed succeeded --> [*] failed --> [*] cancelled --> [*] partially_succeeded --> [*] expired --> [*]

Terminal states: succeeded, failed, cancelled, partially_succeeded, expired.

58.2 · Agent state transitions

AgentRun lifecycle

stateDiagram-v2 [*] --> created created --> running running --> waiting_for_tool running --> waiting_for_child running --> waiting_for_approval running --> joining running --> succeeded running --> failed running --> cancelled waiting_for_tool --> running waiting_for_tool --> failed waiting_for_child --> running waiting_for_child --> failed waiting_for_approval --> running waiting_for_approval --> failed joining --> succeeded joining --> failed succeeded --> [*] failed --> [*] cancelled --> [*] expired --> [*]

Terminal states: succeeded, failed, cancelled, expired.

Section 59 Standards for subagent spawning

59.1 · Spawn request contract

from dataclasses import dataclass, field
from typing import Any


@dataclass
class SpawnSubagentRequest:
    parent_agent_run_id: str
    swarm_run_id: str
    agent_name: str
    agent_role: str
    delegated_goal: str
    allowed_tools: list[str]
    allowed_connectors: list[str]
    memory_scope: dict[str, Any]
    context_scope: dict[str, Any]
    constraints: dict[str, Any] = field(default_factory=dict)
    max_steps: int = 10

59.2 · Spawn result contract

from dataclasses import dataclass, field


@dataclass
class SpawnSubagentResult:
    decision: str
    child_agent_run_id: str | None
    command_id: str | None
    reasons: list[str] = field(default_factory=list)

Allowed decisions:

allowed

denied

requires_approval

requires_more_input

59.3 · Spawn policy checks

max_swarm_size

max_spawn_depth

allowed_agent_role

allowed_tools_for_role

allowed_connectors_for_role

memory_scope_isolation

context_scope_isolation

cost_budget

step_budget

human_approval_requirement

external_side_effect_requirement

Spawn flow

sequenceDiagram autonumber participant P as Parent agent participant G as Primitive Gateway participant POL as Policy participant DB as Postgres P->>G: SpawnSubagentRequest G->>G: create command spawn_subagent G->>POL: evaluate spawn policy alt allowed POL-->>G: allowed G->>DB: insert agent_run · agent_invocation G-->>P: child_agent_run_id else denied / requires_approval POL-->>G: decision + reasons G-->>P: decision + reasons end

59.4 · Example spawn policy

def evaluate_spawn_policy(request, swarm, parent_agent):
    reasons = []

    if parent_agent["spawn_depth"] + 1 > swarm["max_depth"]:
        reasons.append("Spawn depth exceeded.")

    if swarm["current_agent_count"] >= swarm["max_agents"]:
        reasons.append("Swarm agent limit exceeded.")

    allowed_roles = parent_agent.get("allowed_child_roles", [])
    if allowed_roles and request.agent_role not in allowed_roles:
        reasons.append(f"Role not allowed: {request.agent_role}")

    parent_tools = set(parent_agent.get("allowed_tools", []))
    if not set(request.allowed_tools).issubset(parent_tools):
        reasons.append("Child requested tools outside parent scope.")

    parent_connectors = set(parent_agent.get("allowed_connectors", []))
    if not set(request.allowed_connectors).issubset(parent_connectors):
        reasons.append("Child requested connectors outside parent scope.")

    if reasons:
        return {"decision": "denied", "reasons": reasons}

    return {"decision": "allowed", "reasons": []}

Section 60 Tool calls from agents

60.1 · Tool calls must become commands

Rule

An agent tool call should not directly execute the tool. It should create a command or child task run.

Agent wants to call run_sql
→ create command: run_sql
→ policy checks permission / data scope
→ execute sync or async
→ return observation to agent
→ write audit

60.2 · Tool call contract

from dataclasses import dataclass, field
from typing import Any


@dataclass
class AgentToolCallRequest:
    agent_run_id: str
    swarm_run_id: str | None
    tool_name: str
    payload: dict[str, Any]
    reason: str
    expected_output: str
    idempotency_key: str | None = None
    metadata: dict[str, Any] = field(default_factory=dict)

60.3 · Tool call routing

Routing

sequenceDiagram autonumber participant A as Agent participant G as Primitive Gateway participant POL as Policy participant E as Executor A->>G: AgentToolCallRequest G->>G: check allowed_tools G->>G: convert to command G->>POL: evaluate command policy POL-->>G: allow / deny / approve G->>E: execute sync or async E-->>G: result G->>G: persist · write AgentStep G-->>A: observation

Section 61 Memory rules for swarms

Memory semantics — scope, consent, supersession, candidate writes, the MemoryStore protocol — are defined in §23 Memory architecture. This section adds only the swarm-specific deltas.

61.1 · Inheritance is subset, not union

Subset rule

Child agents do not automatically inherit parent memory access. The required invariant: child_memory_scope ⊆ parent_memory_scope. Spawn requests that violate this are rejected at policy time (see §59 Subagent spawning).

61.2 · Per-agent memory scope shape

Each AgentRun carries an explicit memory scope. The scope is bound at spawn time and immutable for the run.

{
  "subject_type": "user",
  "subject_id": "user_123",
  "allowed_memory_types": ["preference", "workflow_preference"],
  "allow_semantic_retrieval": true,
  "max_results": 10
}

Section 62 Connector rules for swarms

Connectors should be scoped explicitly per agent.

{
  "allowed_connectors": [
    "postgres",
    "databricks",
    "github",
    "slack"
  ]
}

Subset rule

A child agent should never gain connector access that the parent does not have. Recommended: child_connector_scope ⊆ parent_connector_scope.

For future connectors, define

connector_name

allowed_operations

credential_scope

read_scope

write_scope

rate_limit

approval_required_for_write

audit_level

Section 63 Artifact rules for swarms

Artifact semantics — types, statuses, lifecycle, and the distinction from effects — are defined in §24 Artifact architecture. This section adds only the swarm-specific delta.

63.1 · Join through artifacts, not chat

A coordinator should consume child outputs as artifacts, not as ephemeral chat messages. Every subagent result that matters is committed as a row in artifacts with a typed artifact_type; the coordinator's join strategy reads from there, not from agent message streams.

Coordinator joins artifacts

flowchart LR CA[Child agent output] --> A[Artifact] A --> COORD[Coordinator reads artifacts] COORD --> J[Join strategy] J --> FA([Final artifact]) style COORD fill:#EFE6F0,stroke:#7A5560 style FA fill:#F1F2EC,stroke:#6B7B5A

Typical swarm artifact types

research_summary

sql_result

analysis_result

review_report

risk_assessment

final_synthesis

Section 64 Cancellation semantics

Cancellation modes (graceful, compensate_then_stop) and per-command state transitions are defined in §30 Cancellation standard. This section adds only the swarm-specific delta: how cancellation cascades across a multi-agent execution.

64.1 · Cascade flow

Swarm cascade

flowchart TB CC[Cancel parent command] --> WR[cancel workflow run] WR --> SR[cancel swarm run] SR --> AR[cancel active agent runs] AR --> TR[cancel queued task runs] TR --> CI[cancel pending child invocations] CI --> AP[cancel pending approvals if appropriate] AP --> AU[/"audit all cancellations"/] style AU fill:#F1F2EC,stroke:#6B7B5A

The parent's cancellation_mode propagates: a graceful parent cancel triggers a graceful exit on each child agent run; a compensate_then_stop parent cancel triggers compensation in each child whose effects fired (reverse declaration order per child, then up the chain).

Recommended audit fields on cancel:

{
  "cancellation_mode": "graceful",
  "cancel_reason": "user_requested",
  "cancelled_by": "user_123"
}

Section 65 Compensation semantics

The compensation contract — the three-layer design (manifest, registration validator, runtime drift detector) — is defined in §31 Compensation standard. This section adds the swarm-specific delta: who runs the compensation chain and who proposes it.

Principle

Agents may propose compensation, but Concord executes compensation through governed commands. A compensation proposed by an agent goes through the same manifest validation and drift detection as one declared in the catalog at registration time.

65.1 · Typical swarm side-effect → compensation mapping

Side effect	Compensation
Created draft artifact	Mark artifact cancelled
Started external job	Cancel job
Sent notification	Send correction
Wrote staging table	Drop or mark stale
Created approval request	Expire approval
Wrote memory	Supersede or delete memory
Opened GitHub PR	Close PR or mark draft

Section 66 Quality & evaluation

Swarms should support evaluation as a first-class step. Evaluation can be deterministic or agentic.

Type 01

Deterministic

SQL validates
row counts reconcile
JSON schema is valid
artifact exists
confidence score exceeds threshold
PII check passes
cost is below budget

Type 02

Agentic

reviewer agent critiques answer
evaluator agent ranks candidates
safety agent reviews external output
domain agent validates reasoning

Agentic evaluation must still write structured results.

{
  "evaluation_type": "reviewer_agent",
  "decision": "pass",
  "confidence": 0.91,
  "issues": [],
  "recommendation": "publish"
}

Section 67 Standard YAML specification

A complete swarm declared in YAML:

name: revenue_report_swarm
objective: Generate and review a monthly revenue report.
execution_mode: parallel
join_strategy: coordinator_synthesis

limits:
  max_agents: 4
  max_depth: 1
  max_total_steps: 80
  max_cost_units: 50

coordinator:
  agent_name: revenue_report_coordinator
  agent_role: coordinator
  max_steps: 20
  allowed_tools:
    - spawn_subagent
    - read_artifact
    - create_artifact
    - request_approval
  allowed_connectors:
    - postgres
    - databricks

agents:
  - agent_name: revenue_researcher
    agent_role: researcher
    delegated_goal: Gather source data and assumptions.
    max_steps: 15
    allowed_tools:
      - run_sql
      - retrieve_memory
      - create_artifact
    allowed_connectors:
      - postgres
      - databricks

  - agent_name: revenue_analyst
    agent_role: analyst
    delegated_goal: Compute metrics and produce analysis.
    max_steps: 15
    allowed_tools:
      - run_sql
      - create_artifact
    allowed_connectors:
      - postgres
      - databricks

  - agent_name: revenue_reviewer
    agent_role: reviewer
    delegated_goal: Validate the final report and identify risks.
    max_steps: 10
    allowed_tools:
      - read_artifact
      - evaluate_output
      - request_human_input
    allowed_connectors:
      - postgres

Section 68 Standard Python interfaces

The core service interfaces — CommandService, PolicyEngine, Planner, Executor, Worker, DurableRuntime, MemoryStore — live in §41 Minimal service interfaces. This section adds the three agent-and-swarm interfaces that aren't elsewhere.

68.1 · Agent runtime interface

from typing import Protocol, Any


class AgentRuntime(Protocol):
    def start_agent_run(
        self,
        agent_run_id: str,
        goal: str,
        context: dict[str, Any],
    ) -> dict[str, Any]: ...

    def resume_agent_run(
        self,
        agent_run_id: str,
        observation: dict[str, Any],
    ) -> dict[str, Any]: ...

    def cancel_agent_run(
        self,
        agent_run_id: str,
        reason: str,
    ) -> dict[str, Any]: ...

Adapter, not library binding

Concord does not depend on a specific LLM/agent library. LangGraph, custom loops, OpenAI Agents SDK, CrewAI, or other systems plug in as adapters behind this protocol — same pattern as DurableRuntime in §41.

68.2 · Swarm planner interface

class SwarmPlanner(Protocol):
    def create_swarm_plan(
        self,
        command: dict[str, Any],
        context: dict[str, Any],
    ) -> dict[str, Any]: ...

68.3 · Join strategy interface

class JoinStrategy(Protocol):
    def join(
        self,
        swarm_run: dict[str, Any],
        child_results: list[dict[str, Any]],
    ) -> dict[str, Any]: ...

Example join output:

{
  "decision": "succeeded",
  "summary": "All required agents completed.",
  "selected_artifact_id": "artifact_123",
  "confidence": 0.88,
  "requires_human_review": false
}

Section 69 How to map any agentic task

Use this 18-point checklist.

What is the parent command?
Is an agent needed, or is a deterministic function enough?
Is this single-agent or swarm?
What is the swarm objective?
Who is the coordinator?
What subagent roles are allowed?
Can subagents spawn further children?
What is the max depth?
What tools can each agent use?
What connectors can each agent use?
What memory can each agent read?
Can any agent write memory?
What artifacts should each agent produce?
How are outputs joined?
Does any step require human approval?
What are the cancellation rules?
What are the compensation rules?
What audit records are mandatory?

Section 70 Example · research and report swarm

70.1 · User request

Create a monthly revenue report, check it, and prepare it for external sharing.

70.2 · Primitive mapping

End-to-end

flowchart TB IN([User request]) --> CMD["Command
generate_and_prepare_revenue_report"] CMD --> POL{{Policy}} POL --> PL([Plan]) PL --> SR[SwarmRun] SR --> COORD[AgentRun · coordinator] COORD --> R[AgentRun · researcher] COORD --> A[AgentRun · analyst] COORD --> REV[AgentRun · reviewer] R --> J["JoinStrategy
coordinator_synthesis"] A --> J REV --> J J --> ART["ArtifactWrite
draft_report"] ART --> POL2{{Policy · external_sharing}} POL2 --> HA[HumanApproval] HA --> AT["AsyncTask
publish_report"] AT --> NOT[Notification] NOT --> AUD[/Audit/] style POL fill:#F5E0D2,stroke:#D97757,stroke-width:1.5px style POL2 fill:#F5E0D2,stroke:#D97757,stroke-width:1.5px style COORD fill:#EFE6F0,stroke:#7A5560 style R fill:#EFE6F0,stroke:#7A5560 style A fill:#EFE6F0,stroke:#7A5560 style REV fill:#EFE6F0,stroke:#7A5560 style AUD fill:#F1F2EC,stroke:#6B7B5A

70.3 · State flow

command.created
→ command.validated
→ swarm.created
→ swarm.running
→ agent_runs.running
→ swarm.joining
→ artifact.created
→ command.waiting_for_approval
→ command.approved
→ command.queued
→ command.running
→ command.succeeded

Section 71 Operational guardrails

71.1 · Hard limits

Every swarm should have hard limits.

max_agents

max_depth

max_steps_per_agent

max_total_steps

max_runtime_seconds

max_cost_units

max_tool_calls

max_memory_reads

max_memory_writes

max_connector_calls

71.2 · Role-based tool permissions

{
  "researcher": ["run_sql", "retrieve_memory", "create_artifact"],
  "analyst":    ["run_sql", "create_artifact", "evaluate_output"],
  "reviewer":   ["read_artifact", "evaluate_output", "request_human_input"],
  "publisher":  ["publish_artifact", "request_approval"]
}

71.3 · No uncontrolled recursion

Every spawn must check

parent_depth + 1 ≤ max_depth · current_agent_count < max_agents · child_scope ⊆ parent_scope · child_tools ⊆ parent_tools · child_connectors ⊆ parent_connectors

71.4 · No ungoverned side effects

Agents may not directly perform external writes. External writes should be commands:

send_email

publish_report

write_table

create_github_pr

post_to_slack

update_memory

grant_permission

Each goes through: Command → Policy → Approval if needed → Execution → Audit.

Section 72 Recommended update to core architecture

Add this section to the main Concord architecture document:

Agentic execution is modeled as a governed extension of the primitive system. A single-agent workflow is one AgentRun attached to a command. A multi-agent workflow is a SwarmRun containing many AgentRuns. A subagent spawn is a governed command that creates an AgentInvocation and child AgentRun. Agent tool calls are commands or task runs. Agent outputs are artifacts. Agent memory writes are candidate memory records. Agent observations and decisions are AgentSteps.

Section 73 Final principle

The system should support future agent runtimes and connector ecosystems without changing the core architecture.

Do not

encode agent framework assumptions into the database
let agents bypass the primitive gateway
let child agents expand their own authority
treat chat messages as the only source of truth

Do persist

agent runs & steps
invocations & spawn decisions
artifacts & memory candidates
approvals & audit events

Concord should remain:

Postgres-first

Runtime-agnostic

Connector-extensible

Agent-compatible

Policy-governed

Audit-complete

Deterministic when possible

Agentic when useful

Part III · Addendum · Decision accepted

Functional core architecture on DBOS

Concord is built on DBOS + Postgres. DBOS is the durable execution runtime; Postgres is the system of record; Concord is the semantic and functional-core layer.

Supersedes earlier implementation ideas that proposed custom queue, retry, lease, schedule, and outbox machinery inside Concord.

Section 74 Decision accepted · Concord on DBOS

The goal is not to recreate DBOS inside Concord. The goal is to define a clean primitive vocabulary and functional decision layer that DBOS can execute durably.

Three-layer stack

flowchart TB PF["Concord
semantic primitive layer
functional decision core"] DBOSR["DBOS
durable execution runtime
workflows · steps · queues · schedules"] PG[("Postgres
durable system of record
Concord domain + DBOS runtime tables")] PF --> DBOSR --> PG style PF fill:#F5E0D2,stroke:#D97757,stroke-width:1.5px style DBOSR fill:#FAF8F2,stroke:#141413 style PG fill:#FFFFFF,stroke:#141413,stroke-width:1.5px

74.1 · Core decision

Use DBOS for runtime mechanics. Use Concord for meaning, governance, and architecture.

74.2 · Ownership split

DBOS owns

Runtime mechanics

durable workflows & steps
workflow recovery
retries · queues · schedules
durable sleep
workflow IDs / idempotency
Postgres transaction tracking
concurrency & rate limits

Concord owns

Meaning & governance

command taxonomy
policy model
task classification & planning semantics
approval, memory, artifact models
connector contracts
agent & swarm ontology
domain audit events

Section 75 Why DBOS is the runtime

DBOS aligns with Concord because Concord is:

Postgres-first

functional-core oriented

app-local

connector-heavy

agent-compatible

domain-audit driven

DBOS workflows provide durable execution and recover from completed steps after interruption. DBOS workflow IDs can act as idempotency keys. Workflows are expected to be deterministic; non-deterministic work (database access, third-party APIs, randomness, local time) belongs inside DBOS steps.

Principle

Primitives decide and describe. DBOS steps execute side effects. Postgres records the domain truth.

Section 76 Design philosophy after DBOS

76.1 · Concord becomes smaller

Concord should not be a workflow engine. It becomes a semantic workflow kernel:

Given command + context + current domain state,
derive policy decisions, plans, domain events, and DBOS execution requests.

The implementation should avoid building infrastructure DBOS already supplies.

76.2 · Functional core, DBOS shell

Layer · A

Functional core

pure command classification
pure policy evaluation
pure plan creation
pure state transition derivation
pure connector permission checks
pure agent / swarm planning

Layer · B

DBOS shell

durable workflows & steps
transaction steps
connector calls
LLM calls
queues · retries · schedules
sleep / wait / resume

The functional core is unit-testable without DBOS. The DBOS shell is integration-tested with Postgres and real or mocked connectors.

76.3 · Domain events are not DBOS internals

DBOS already has runtime execution state. Concord should maintain domain events, not duplicate DBOS runtime history.

Yes · domain

Business meaning

command_created

policy_evaluated

approval_requested

approval_granted

memory_candidate_created

artifact_created

connector_invocation_succeeded

booking_confirmed

No · DBOS

Runtime mechanics

worker lease acquired

generic retry attempt

queue poll started

workflow checkpoint

task heartbeat

Rule

DBOS owns runtime mechanics. Concord owns business meaning.

Section 77 What we remove from Concord

Earlier versions included concepts now delegated to DBOS.

77.1 · Remove custom durable queue

Do not build a custom queue table, queue claiming, worker lease system, polling loop, global concurrency controller, or rate limiter. Use DBOS queues.

Concord may still define semantic queue names:

connector_calls

agent_runs

swarm_children

notifications

scheduled_jobs

77.2 · Remove custom retry runner

Use DBOS step/workflow retry behavior. Concord can still declare domain-level retry metadata:

{
  "operation": "hotel_booking.search",
  "retry_class": "transient_connector_error",
  "max_attempts": 3,
  "requires_idempotency_key": true
}

77.3 · Remove custom lease and claim primitives

Drop claimed_by, lease_until, worker tick, manual claim_next_task. Where domain-level ownership matters (an approval assigned to an approver, an agent role assigned to a runtime adapter), model it semantically. Runtime ownership belongs to DBOS.

77.4 · Remove custom schedule runner

Use DBOS schedules. Concord can still declare schedule specs (cron + queue + command_type) and let DBOS execute them.

77.5 · Remove generic effect outbox

The earlier functional design proposed a generic effect_outbox. With DBOS, this is replaced by DBOS workflows, steps, and queues.

Distinction

DBOS queue/workflow/step = executable runtime mechanism. Concord domain_effect = semantic record that an action was requested or performed. Don't use the latter as an execution engine.

Section 78 What Concord keeps

Concord remains responsible for the conceptual structure of work.

Core primitives

Ingress

Command

Context

Policy

Plan

State

Approval

Memory

Artifact

Notification

Cancellation

Compensation

Audit

AgentRun

SwarmRun

ConnectorInvocation

DBOS executes them; DBOS does not define their meaning.

78.1 · Command model

Every consequential action still becomes a command. A command is the durable representation of user, system, connector, or agent intent.

search_hotels

rank_hotel_options

create_booking_draft

book_hotel

cancel_reservation

write_user_preference

spawn_subagent

run_connector_operation

create_artifact

request_approval

78.2 · Policy, artifact, memory models stay

DBOS does not know whether a hotel booking requires approval, whether a connector write is safe, whether memory needs consent, or whether a subagent may access a connector. Concord owns those decisions.

78.3 · Agent & swarm ontology stays

DBOS can run agent workflows durably, but Concord owns: AgentRun, SwarmRun, AgentInvocation, AgentStep, JoinStrategy, ToolScope, ConnectorScope, MemoryScope, SpawnPolicy.

Roles

The agent runtime is an adapter. DBOS is the durable executor. Concord is the governance and domain model.

Section 79 Mapping Concord to DBOS

Concord concept	DBOS implementation
Command submission	DBOS workflow start
Idempotency key	DBOS workflow ID
Sync primitive	Normal function (or DBOS step if side-effectful)
Async primitive	DBOS background or queued workflow
Queue	DBOS queue
Retry	DBOS step/workflow retry settings
Schedule	DBOS schedule
Durable timer	DBOS durable sleep
External side effect	DBOS step
Postgres write	DBOS datasource transaction
Agent run	DBOS workflow
Subagent spawn	DBOS child/queued workflow + Concord AgentInvocation
Swarm	DBOS parent workflow coordinating child workflows
Approval wait	DBOS workflow waits on durable approval state/event
Connector call	DBOS step calling connector adapter
Artifact write	DBOS transaction step
Memory write	DBOS transaction step after policy
Domain audit	DBOS transaction step writing audit table

Section 80 New layered architecture

80.1 · Package layout

concord/
  core/
    types.py
    commands.py
    context.py
    classify.py
    policy.py
    planning.py
    transitions.py
    reducers.py
    validation.py

  domain/
    approvals.py
    memory.py
    artifacts.py
    connectors.py
    agents.py
    swarms.py
    audit.py

  dbos_runtime/
    workflows.py
    steps.py
    queues.py
    schedules.py
    datasource.py
    approval_waits.py
    swarm_workflows.py

  adapters/
    connectors/
      hotel.py
      github.py
      slack.py
      databricks.py
    agents/
      base.py
      langgraph_adapter.py
      custom_agent_adapter.py
      openai_agents_adapter.py

  postgres/
    schema.sql
    repositories.py
    projections.py

80.2 · Layer ownership

Dependencies

flowchart TB CORE["core/
pure Python · no DBOS · no DB · no LLMs"] DOMAIN["domain/
data contracts · repository interfaces"] DBOSR["dbos_runtime/
only layer importing DBOS"] ADAPTERS["adapters/
connectors · agent runtimes"] PG["postgres/
schema · repos · projections"] CORE --> DBOSR DOMAIN --> DBOSR DBOSR --> ADAPTERS DBOSR --> PG style CORE fill:#F5E0D2,stroke:#D97757,stroke-width:1.5px style DBOSR fill:#FAF8F2,stroke:#141413 style PG fill:#FFFFFF,stroke:#141413 style ADAPTERS fill:#EFE6F0,stroke:#7A5560 style DOMAIN fill:#F1F2EC,stroke:#6B7B5A

Why

Keeps Concord portable. DBOS-specific code does not leak into the semantic layer.

Section 81 Functional core standard

81.1 · Core types

All pure Concord functions return values of this shape:

from dataclasses import dataclass, field
from typing import Any


@dataclass(frozen=True)
class CoreEvent:
    event_type: str
    payload: dict[str, Any]


@dataclass(frozen=True)
class CoreEffect:
    effect_type: str
    payload: dict[str, Any]
    idempotency_key: str | None = None


@dataclass(frozen=True)
class CoreResult:
    status: str
    events: list[CoreEvent] = field(default_factory=list)
    effects: list[CoreEffect] = field(default_factory=list)
    errors: list[str] = field(default_factory=list)
    metadata: dict[str, Any] = field(default_factory=dict)

Important

CoreEffect is a semantic description, not an execution queue. DBOS interprets it through workflows and steps.

Core → shell

flowchart LR IN[command + context + state] --> CORE[Pure Concord function] CORE --> OUT["CoreResult
status · events · effects"] OUT --> DBOS[DBOS step interprets] DBOS --> PG[(Postgres writes)] DBOS --> CON[Connector calls] DBOS --> Q[Queue enqueues] style CORE fill:#F5E0D2,stroke:#D97757,stroke-width:1.5px style DBOS fill:#FAF8F2,stroke:#141413 style PG fill:#FFFFFF,stroke:#141413

81.2 · Example · functional policy

def evaluate_booking_policy(command: dict, context: dict, state: dict) -> CoreResult:
    payload = command["payload"]

    required = [
        "booking_draft_id",
        "hotel_name",
        "check_in_date",
        "check_out_date",
        "total_price",
        "currency",
        "cancellation_policy_summary",
    ]
    missing = [f for f in required if not payload.get(f)]

    if missing:
        return CoreResult(
            status="waiting_for_input",
            events=[CoreEvent("policy_evaluated", {"decision": "require_more_input", "missing": missing})],
            effects=[CoreEffect(
                "user.request_input",
                {"missing_fields": missing},
                idempotency_key=f"input:{command['command_id']}",
            )],
        )

    if not payload.get("user_explicitly_approved"):
        return CoreResult(
            status="waiting_for_approval",
            events=[CoreEvent("policy_evaluated", {"decision": "require_approval"})],
            effects=[CoreEffect(
                "approval.request",
                {
                    "approval_type": "hotel_booking",
                    "command_id": command["command_id"],
                    "approver": context["user_id"],
                    "approval_packet": payload,
                },
                idempotency_key=f"approval:{command['command_id']}",
            )],
        )

    return CoreResult(
        status="allowed",
        events=[CoreEvent("policy_evaluated", {"decision": "allow"})],
    )

This function does not write Postgres, call DBOS, call hotel APIs, send notifications, mutate objects, read time, or generate random IDs. It only describes intent.

Section 82 DBOS runtime standard

82.1 · Workflow as durable shell

from dbos import DBOS, SetWorkflowID


@DBOS.workflow()
def run_command(command_id: str) -> dict:
    command = load_command_tx(command_id)

    policy_result = evaluate_policy_step(command_id)
    persist_core_events_step(command_id, policy_result["events"])

    if policy_result["status"] == "waiting_for_input":
        request_input_step(command_id, policy_result["effects"])
        return {"status": "waiting_for_input"}

    if policy_result["status"] == "waiting_for_approval":
        request_approval_step(command_id, policy_result["effects"])
        wait_for_approval_workflow(command_id)

    plan = create_plan_step(command_id)
    result = execute_plan_workflow(command_id, plan)

    finalize_command_step(command_id, result)
    return result

Intentionally thin. Workflow body branches on input and calls steps; the business logic lives in the pure core.

82.2 · Step as side-effect boundary

Every non-deterministic operation goes inside a DBOS step or DBOS datasource transaction:

read/write Postgres

call connector

call LLM

send notification

create approval

write memory

write artifact

read current time

generate external IDs

82.3 · Datasource transaction

Use DBOS datasource transactions for Postgres writes that must not re-execute after workflow replay: insert command, insert approval, insert artifact, insert memory, insert audit event, insert agent step, insert connector invocation, update domain projection.

Section 83 DBOS queues standard

Concord defines semantic queue names; DBOS manages the mechanics (concurrency, partitioning, rate limiting).

Recommended queues

concord_default

connector_calls

agent_runs

swarm_children

notifications

scheduled_maintenance

high_risk_operations

Queue choice is a planning outcome

def choose_queue(effect: CoreEffect, context: dict) -> str:
    if effect.effect_type.startswith("connector."):
        return "connector_calls"
    if effect.effect_type.startswith("agent.run"):
        return "agent_runs"
    if effect.effect_type.startswith("agent.spawn"):
        return "swarm_children"
    if effect.effect_type.startswith("notification."):
        return "notifications"
    return "concord_default"

Section 84 DBOS schedules standard

Concord does not build a scheduler. Schedule specs are domain configuration; DBOS owns execution and backfill.

schedules:
  - name: expire_pending_approvals
    command_type: expire_approvals
    cron: "*/5 * * * *"
    queue: scheduled_maintenance

  - name: sync_connector_metadata
    command_type: sync_connector_metadata
    cron: "0 * * * *"
    queue: scheduled_maintenance

  - name: memory_decay_review
    command_type: review_stale_memory
    cron: "0 3 * * *"
    queue: scheduled_maintenance

Section 85 DBOS and agentic workflows

85.1 · AgentRun as DBOS workflow

Concord AgentRun row
→ DBOS workflow: run_agent(agent_run_id)
→ DBOS steps call agent runtime
→ agent proposes commands / tool calls
→ each proposed action re-enters Concord

Agents do not directly execute external side effects.

85.2 · Agent steps as domain records

Agent steps are written to Concord tables for product-level observability and replay:

agent_started

tool_call_proposed

tool_call_allowed

tool_call_denied

subagent_spawn_requested

artifact_created

memory_candidate_created

agent_completed

These are domain trace records, not DBOS runtime logs.

85.3 · Subagent spawn as DBOS child/queued workflow

Command: spawn_subagent
→ Policy: check role / tool / connector / memory / depth limits
→ DBOS workflow creates AgentInvocation
→ DBOS enqueues child AgentRun workflow

Do not

Let the agent runtime spawn unmanaged processes or coroutines.

85.4 · Swarm as parent DBOS workflow

@DBOS.workflow()
def run_swarm(swarm_run_id: str) -> dict:
    swarm = load_swarm_tx(swarm_run_id)

    child_specs = plan_swarm_children_step(swarm_run_id)

    handles = [enqueue_agent_run_step(child) for child in child_specs]
    results = [h.get_result() for h in handles]

    joined = join_swarm_results_step(swarm_run_id, results)
    persist_swarm_result_step(swarm_run_id, joined)

    return joined

Section 86 Approval workflows on DBOS

Human approvals are domain state plus workflow waiting.

Approval under DBOS

sequenceDiagram autonumber participant W as DBOS workflow participant POL as Concord policy participant DB as Postgres participant N as Notification step participant U as User W->>POL: evaluate POL-->>W: approval.request effect W->>DB: create approval row (tx step) W->>N: notify approver W->>W: durable wait / poll approval state U->>DB: approve via UI/API → command: resolve_approval DB-->>W: approval_granted event W->>W: resume

Durability

The workflow must not rely only on in-memory callbacks. Approval state must be durable in Postgres.

Section 87 Connector execution on DBOS

87.1 · Connector calls are DBOS steps

Concord records the semantic invocation; DBOS executes the step durably.

hotel_inventory.search

hotel_booking.book

github.create_pr

slack.send_message

databricks.start_job

87.2 · Connector idempotency

connector: hotel_booking
operation: book_hotel
side_effect: true
idempotency_required: true
idempotency_key_template: "book_hotel:{booking_draft_id}"
approval_required: true

Don't assume

DBOS workflow IDs and step boundaries help prevent re-execution, but external APIs still need domain idempotency keys when they support them. DBOS alone does not make third-party side effects idempotent.

87.3 · Connector invocations table

CREATE TABLE IF NOT EXISTS connector_invocations (
  connector_invocation_id UUID PRIMARY KEY,
  command_id UUID NOT NULL,
  connector_name TEXT NOT NULL,
  operation TEXT NOT NULL,
  side_effect BOOLEAN NOT NULL DEFAULT false,
  idempotency_key TEXT NULL,
  status TEXT NOT NULL,
  request_payload JSONB NOT NULL DEFAULT '{}',
  response_payload JSONB NULL,
  error TEXT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  completed_at TIMESTAMPTZ NULL
);

This is a domain / audit record — not a queue.

Section 88 Postgres schema changes

88.1 · Keep these Concord domain tables

commands

command_events

approvals

memory_records

memory_candidates

artifacts

notifications

connector_invocations

agent_runs

swarm_runs

agent_invocations

agent_steps

domain_audit_log

88.2 · Remove these runtime tables

task_queue

task_leases

worker_claims

generic_effect_outbox

retry_queue

scheduler_jobs

manual_lock_table

Unless a table has product / domain meaning, DBOS should own its runtime equivalent.

88.3 · Updated `commands` table

CREATE TABLE IF NOT EXISTS commands (
  command_id UUID PRIMARY KEY,
  command_type TEXT NOT NULL,
  requested_by TEXT NOT NULL,
  ingress TEXT NOT NULL,

  payload JSONB NOT NULL DEFAULT '{}',
  context JSONB NOT NULL DEFAULT '{}',

  status TEXT NOT NULL,
  idempotency_key TEXT NULL UNIQUE,

  dbos_workflow_id TEXT NULL UNIQUE,

  result JSONB NULL,
  error TEXT NULL,

  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  completed_at TIMESTAMPTZ NULL
);

dbos_workflow_id links the Concord command to the DBOS execution.

88.4 · `command_events`

CREATE TABLE IF NOT EXISTS command_events (
  command_event_id UUID PRIMARY KEY,
  command_id UUID NOT NULL REFERENCES commands(command_id),

  event_type TEXT NOT NULL,
  event_payload JSONB NOT NULL DEFAULT '{}',

  actor TEXT NOT NULL,
  trace_id TEXT NOT NULL,

  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

Domain events; does not replace DBOS runtime history.

88.5 · `domain_effects` (optional)

Only add if product visibility into planned effects is useful.

CREATE TABLE IF NOT EXISTS domain_effects (
  domain_effect_id UUID PRIMARY KEY,
  command_id UUID NOT NULL REFERENCES commands(command_id),

  effect_type TEXT NOT NULL,
  effect_payload JSONB NOT NULL DEFAULT '{}',
  idempotency_key TEXT NULL,

  status TEXT NOT NULL DEFAULT 'planned',

  executed_by_dbos_workflow_id TEXT NULL,
  result JSONB NULL,
  error TEXT NULL,

  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  completed_at TIMESTAMPTZ NULL
);

Do not

Use domain_effects as a runtime queue. DBOS owns runtime.

Section 89 Concord state after DBOS

Concord state is a domain projection, not a runtime scheduler.

DBOS

Execution status

Is the workflow executing, completed, errored, cancelled?

Concord

Business state

Is the booking waiting for approval, confirmed, cancelled, expired?

Examples of Concord state: command.status, approval.status, artifact.status, agent_run.status, swarm_run.status, reservation.status, memory.status.

Section 90 Deterministic workflow design rules

Because DBOS workflows must be deterministic, Concord enforces these rules.

Allowed

Inside workflow body

branch on workflow input
call DBOS steps
call DBOS transaction steps
call DBOS sleep
enqueue DBOS workflows
wait for DBOS workflow handles
call pure Concord functions with deterministic inputs

Forbidden

Directly in workflow body

database reads/writes outside transaction steps
HTTP / API calls
LLM calls
random number generation
current local time
non-deterministic iteration over unordered data
uncontrolled async races
agent runtime loops without step boundaries
connector calls

Put forbidden operations inside DBOS steps.

Section 91 Functional core design rules

91.1 · Pure functions do not import DBOS

# Good
def classify_command(command: dict, context: dict) -> CoreResult:
    ...

# Avoid
from dbos import DBOS

def classify_command(...):
    DBOS.logger.info(...)

91.2 · Core functions return values only

# Good
return CoreResult(
    status="waiting_for_approval",
    events=[...],
    effects=[...],
)

# Avoid
insert_approval(...)
send_email(...)
enqueue_worker(...)

91.3 · DBOS steps interpret core results

@DBOS.step()
def interpret_core_effect(command_id: str, effect: dict) -> dict:
    ...

Section 92 Minimal DBOS workflow patterns

92.1 · Submit command

from dbos import DBOS, SetWorkflowID


def submit_command(command_type: str, payload: dict, context: dict) -> str:
    command_id = create_command_tx(command_type, payload, context)

    with SetWorkflowID(f"command:{command_id}"):
        DBOS.start_workflow(run_command, command_id)

    return command_id

92.2 · Run command

@DBOS.workflow()
def run_command(command_id: str) -> dict:
    command = load_command_tx(command_id)

    classification = classify_command_step(command_id)
    persist_core_events_step(command_id, classification["events"])

    policy = evaluate_policy_step(command_id)
    persist_core_events_step(command_id, policy["events"])

    if policy["status"] == "waiting_for_approval":
        create_approval_step(command_id, policy["effects"])
        notify_approver_step(command_id)
        wait_for_approval_workflow(command_id)

    plan = create_plan_step(command_id)
    persist_core_events_step(command_id, plan["events"])

    result = execute_plan_workflow(command_id, plan)

    finalize_command_step(command_id, result)
    return result

92.3 · Execute connector

@DBOS.step()
def execute_connector_step(command_id, connector_name, operation, payload):
    record_connector_invocation_started_tx(command_id, connector_name, operation, payload)

    connector = connector_registry.get(connector_name)
    result = connector.invoke(operation, payload)

    record_connector_invocation_completed_tx(command_id, connector_name, operation, result)
    return result

92.4 · Write artifact

@DBOS.step()
def write_artifact_step(command_id, artifact_type, payload):
    artifact = create_artifact_tx(
        command_id=command_id,
        artifact_type=artifact_type,
        payload=payload,
    )
    append_command_event_tx(
        command_id=command_id,
        event_type="artifact_created",
        payload={"artifact_id": artifact["artifact_id"]},
    )
    return artifact

Section 93 Example · hotel reservation under DBOS

End-to-end · search + book

sequenceDiagram autonumber participant U as User participant C as Concord command participant W as DBOS workflow participant POL as Policy participant S as DBOS step participant CON as Hotel connector participant DB as Postgres U->>C: hotel_reservation_assist C->>W: start run_command W->>POL: classify + evaluate POL-->>W: route_agent (or sync) W->>S: search step S->>CON: search hotels CON-->>S: results S->>DB: write search_results artifact W->>U: present ranked options U->>C: book_hotel(draft_id) W->>POL: evaluate booking policy POL-->>W: require_approval (price + external API) W->>DB: create approval row W->>U: notification — approval needed U-->>DB: approve W->>S: book step S->>CON: book(draft_id, idempotency_key) CON-->>S: confirmation S->>DB: write reservation artifact + audit W-->>U: confirmation notification

93.1 · Why DBOS matters here

Hotel booking has external API calls, payment-sensitive side effects, approval waits, idempotency needs, retryable connector failures, durable state requirements, notification side effects, and memory candidates. DBOS handles durable execution. Concord handles what the booking means.

Section 94 Agent swarms under DBOS

94.1 · Recommended representation

Concord domain object	DBOS implementation
SwarmRun	`run_swarm` DBOS workflow
AgentRun	`run_agent` DBOS workflow
AgentStep	domain trace row in Postgres
ToolCall	Concord command executed by DBOS

94.2 · Swarm flow

Swarm under DBOS

flowchart TB CMD[command: run_hotel_reservation_swarm] --> W[run_swarm DBOS workflow] W --> COORD[create coordinator AgentRun] COORD --> PLAN[plan child agents] PLAN --> E1[enqueue child run_agent · researcher] PLAN --> E2[enqueue child run_agent · analyst] PLAN --> E3[enqueue child run_agent · reviewer] E1 --> J[wait for results] E2 --> J E3 --> J J --> JOIN[join results] JOIN --> ART[recommendation artifact] ART --> APP{"external booking?"} APP -->|yes| HA[approval] APP -->|no| DONE[done] HA --> DONE style COORD fill:#EFE6F0,stroke:#7A5560 style E1 fill:#EFE6F0,stroke:#7A5560 style E2 fill:#EFE6F0,stroke:#7A5560 style E3 fill:#EFE6F0,stroke:#7A5560 style APP fill:#F5E0D2,stroke:#D97757 style DONE fill:#F1F2EC,stroke:#6B7B5A

94.3 · Subagent spawning rule

Agent proposes spawn_subagent
→ Concord command
→ policy checks max depth, tools, connectors, memory scope
→ DBOS workflow creates AgentInvocation
→ DBOS enqueues child AgentRun workflow

Rule

Do not spawn unmanaged agent processes.

Section 95 Approval waiting pattern

A human approval is durable state plus workflow waiting. Recommended first implementation:

Workflow creates approval row
Workflow durably sleeps / polls approval status
Approval UI updates approval row
Workflow resumes and continues

This is simple, Postgres-native, and consistent with the architecture. It can evolve into a more event-driven DBOS communication pattern later if needed.

Section 96 What to avoid

96.1 · Avoid building a second DBOS

custom workflow runner

custom worker pool

custom durable timers

custom queue leases

custom retry scheduler

custom backfill scheduler

custom workflow recovery

96.2 · Avoid hiding DBOS behind too much abstraction

Engineers should still be able to see: this is a DBOS workflow, this is a DBOS step, this is a DBOS transaction, this is a DBOS queue, this is a Concord command, this is a Concord artifact.

96.3 · Avoid making agents privileged

Bad

agent directly calls booking connector
agent directly writes memory
agent directly sends email
agent directly spawns process

Good

agent proposes command
Concord policy evaluates
DBOS executes through durable steps
Postgres records domain events

Section 97 Responsibilities matrix

Concern	Owner
Workflow durability	DBOS
Workflow recovery	DBOS
Step replay behavior	DBOS
Queue mechanics	DBOS
Queue concurrency / rate limit	DBOS
Scheduling	DBOS
Durable sleep	DBOS
Postgres transaction tracking	DBOS datasource
Command taxonomy	Concord
Domain state	Concord
Policy decisions	Concord
Approval semantics	Concord
Memory semantics	Concord
Artifact semantics	Concord
Agent / swarm ontology	Concord
Connector contracts	Concord
Domain audit	Concord
Third-party API implementation	Connector adapters
Agent reasoning loop	Agent runtime adapter

Section 98 Implementation phases

Phases

flowchart LR P1["Phase 1
Functional core"] --> P2["Phase 2
Postgres domain schema"] P2 --> P3["Phase 3
DBOS runtime shell"] P3 --> P4["Phase 4
Hotel reservation agent"] P4 --> P5["Phase 5
Swarms & subagents"] style P1 fill:#F5E0D2,stroke:#D97757 style P3 fill:#FAF8F2,stroke:#141413 style P4 fill:#F1F2EC,stroke:#6B7B5A style P5 fill:#EFE6F0,stroke:#7A5560

Phase 1

Functional core

Pure modules: commands, classification, policy, planning, state transitions, effect descriptors, domain event descriptors. No DBOS imports.

Phase 2

Postgres domain schema

commands, command_events, approvals, artifacts, memory_records / memory_candidates, connector_invocations, agent_runs, swarm_runs, agent_invocations, agent_steps, domain_audit_log.

Phase 3

DBOS runtime shell

run_command, run_agent, run_swarm workflows; execute_connector_step, write_artifact_step, write_memory_step, request_approval_step, notify_step.

Phase 4

Hotel reservation agent

Hotel inventory / booking connectors; search, ranking, booking draft / approval / finalization commands; reservation artifact; travel preference memory.

Phase 5

Swarms & subagents

Swarm planning, spawn_subagent command, agent run workflow, agent step recording, join strategies, scope enforcement.

Section 99 Open design decisions

Keep these explicit:

How approval wait/resume is implemented in DBOS.
Whether to use a domain_effects table for product visibility.
How much DBOS metadata to mirror into Concord domain tables.
How agent runtime adapters report intermediate steps.
How connector idempotency keys are generated and enforced.
How memory consent is represented in the UI.
How cancellation maps from Concord domain state to DBOS workflow cancellation.
How to version command payload schemas and plan schemas.
How to expose workflow status to users without leaking DBOS internals.
How to manage DBOS queues across environments.

Section 100 Updated one-sentence architecture

Concord is a functional semantic layer that turns commands into governed domain decisions, plans, and effects; DBOS is the durable Postgres-backed runtime that executes those effects through workflows, steps, queues, schedules, and transactions.

Section 101 Updated rule of thumb

Concord

When asking …

What does this task mean?
Is it allowed?
Does it need approval?
Sync, agentic, or swarm?
What memory / artifacts / audit should exist?
What connector scopes are allowed?

DBOS

When asking …

How does this run durably?
How is it retried?
How is it queued?
How is it scheduled?
How does it recover?
How do transactions avoid re-execution?
How do workers execute it?

Section 102 Source notes

This addendum relies on DBOS's documented runtime model:

Section 103 Executive summary · Domain Registry

Concord needs a first-class Domain Registry — the capability control plane that is the source of truth for every governed capability contract in the system.

The registry answers what capabilities exist, which versions are active, who owns them, what they allow, what policies apply, what connectors they touch, what memory they read/write, what artifacts they produce, which agents can use them, which workflows can invoke them, which approvals are required, which DBOS workflows execute them.

skills

tools

connectors

command_types

policies

artifact_schemas

memory_schemas

agent_roles

swarm_templates

approval_types

evaluation_suites

workflow_types

Architectural shift

A "Skill Registry" is not a standalone module. It is one domain registry inside the larger Concord Domain Registry.

The core principle holds: capabilities can describe, propose, and guide. Only Concord commands can authorize. Only DBOS workflows and steps can execute. Only Postgres records the truth.

Section 104 Problem statement

Concord is evolving from a workflow primitive layer into a governed action contract system. The need first appears as a skill registry problem — which agent skills exist, which versions are published, which agents can use them, what tools/connectors/memory/artifacts they touch.

But skills are only one part of the capability graph. A skill often depends on many other registered things:

hotel_booking_skill
  uses tool: book_hotel
    creates command_type: book_hotel
      requires policy_bundle: hotel_booking_policy
      requires approval_type: hotel_booking_approval
      produces artifact_schema: reservation_confirmation
      invokes connector_operation: hotel_booking.book_hotel
      may write memory_schema: user.travel_preferences

Without a unified Domain Registry these relationships scatter across YAML, code, prompts, policy files, connector adapters, and agent runtime configuration — making it impossible to answer impact questions like "if we retire this connector operation, which tools break?" or "if we change this artifact schema, which commands produce it?".

Without a registry

Concord cannot fully deliver its core promise — a governed contract layer for deterministic and agentic work.

Section 105 Product decision

105.1 · New core module

Concord adds a first-class module concord.registry (or concord.domain_registry) alongside the existing core, runtime, agents, connectors, and Postgres modules.

concord/
  core/
    commands/  policies/  plans/  effects/  events/  state/
  runtime/
    dbos/  workflows/  steps/  queues/  schedules/
  registry/                          ← new module
    kernel/
    skills/  tools/  connectors/  command_types/  policies/
    artifacts/  memory/  agents/  swarms/  approvals/
    evaluations/  workflows/
  agents/
    runtime_adapters/  tool_gateway/  swarms/  subagents/
  connectors/
    base/  adapters/
  postgres/
    repositories/  migrations/  projections/
  api/
    admin/  runtime/  webhooks/

105.2 · Keep registries in Concord initially

The semantics of skills, tools, connectors, memory, artifacts, policy, approvals, agents, swarms, and command types are core to Concord's contract layer. A separate framework too early would risk becoming a generic catalog that doesn't understand Concord primitives.

105.3 · Future extraction path

The generic mechanics may later become a reusable library concord-registry covering versioned records, lifecycle state machine, artifact references, immutable version checks, generic binding resolution, compatibility constraints, audit event helpers, deprecation, retirement, rollback. Concord retains the domain semantics — what a skill, tool, connector operation, command type, approval type, memory schema, or agent role means.

Section 106 Design philosophy

106.1 · The Domain Registry is the capability graph

The registry models Concord's capability graph and answers impact questions across it.

Capability graph shape

flowchart TB AR[AgentRole] --> SK[Skill] SK --> TO[Tool] TO --> CT[CommandType] CT --> PB[PolicyBundle] CT --> AT[ApprovalType] CT --> AS[ArtifactSchema] CT --> MS[MemorySchema] CT --> DW[DBOS Workflow] TO --> CO[ConnectorOperation] CO --> CN[Connector] style CT fill:#F5E0D2,stroke:#D97757,stroke-width:1.5px style AR fill:#EFE6F0,stroke:#7A5560 style DW fill:#FFFFFF,stroke:#141413

106.2 · Registries are semantic, not runtime execution

The registry defines what capabilities mean. DBOS defines how workflows execute durably. book_hotel as a registry entry declares high-risk, requires approval, produces reservation_confirmation, invokes hotel_booking.book_hotel, with idempotency key book_hotel:{booking_draft_id}. DBOS as the runtime executes the durable workflow with retries, recovery, and transaction tracking.

106.3 · Published versions are immutable

Rule

Every registry object that can influence behavior must be versioned. Published versions are immutable. Allowed: create / publish / deprecate / retire a version, change a binding to point to a new version, rollback a binding. Not allowed: mutate a published manifest in place, expand connector permissions in place, remove an approval requirement in place, change a schema in place.

106.4 · Specific bindings beat global bindings

Resolution applies scope-aware precedence — more specific bindings override less specific ones.

user → team → tenant → workflow_type → command_type
     → agent_role → agent_name → app → environment → global

106.5 · Runtime manifests are minimized

Agent runtimes, workflows, and tool gateways receive only the capabilities they are allowed to use. Draft versions, retired versions, other tenants' bindings, admin metadata, raw secrets, unbound skills, unallowed connectors, unallowed memory scopes — none of these leak into runtime manifests.

Section 107 Registry types

Twelve registry types, each a typed view over the registry kernel.

Type 01

Skill

Reusable governed capability package: instructions, allowed tools / connectors / memory scopes / artifact scopes, evals, approval requirements, agent/runtime compatibility.

Type 02

Tool

Executable internal capability: input/output schema, sync/async mode, side-effect classification, DBOS execution mode, idempotency, policy + artifact + command + connector mappings.

Type 03

Connector

External system + operations: credential scopes, read/write scopes, rate limits, idempotency support, compensation behavior, approval requirements, adapter compatibility.

Type 04

Command type

Canonical contract for a governed action: payload schema, default policy bundle, execution mode, approval behavior, artifacts, memory behavior, connector operations, DBOS workflow mapping, idempotency, risk level. Likely the most important registry.

Type 05

Policy

Named policy checks and bundles: policy input contract, decision outputs, risk classifications, applicability rules, bundles, external policy-engine references, versioning. Registry decides where and how policy applies; doesn't have to implement every engine.

Type 06

Artifact schema

Contract for durable outputs: schema, visibility rules, lineage requirements, retention policy, external-sharing policy, version compatibility, rendering hints.

Type 07

Memory schema

Contract for durable preferences and reusable facts: subject type, visibility, consent requirements, confidence requirements, expiration, supersession rules, retrieval rules, write policy.

Type 08

Agent role

Capability boundary for an agent: allowed skills / tools / connectors / subagent roles, max steps, max depth, memory + artifact scope, approval behavior, runtime compatibility.

Type 09

Swarm template

Governed multi-agent execution pattern: coordinator role, child roles, join strategy, spawn limits, parallelism, required artifacts and evaluations, approval gates, memory + connector scope inheritance.

Type 10

Approval type

Contract for human authorization: required fields, approver resolution, expiration, risk level, UI schema, audit requirements, resume behavior, allowed decision values.

Type 11

Evaluation suite

Contract for quality and safety checks: eval input/output schema, blocking vs advisory, applicability, version, runtime adapter.

Type 12

Workflow type

Contract for a repeatable business / application workflow: default command sequence, allowed command types, default policies, default agent roles, default skills, approval gates, artifact expectations, DBOS workflow mapping.

107.1 · Skill manifest example

name: hotel_booking
version: 1.4.2
display_name: Hotel Booking Skill
runtime:
  min_concord_version: "0.3.0"
  compatible_agent_runtimes: [concord_agent_runtime, langgraph_adapter]
lifecycle:
  owner: travel-platform
  risk_level: high
capabilities:
  - type: connector_operation
    connector: hotel_booking
    operation: book_hotel
    side_effect: true
    requires_approval: true
  - type: artifact_write
    artifact_type: reservation_confirmation
  - type: memory_write
    memory_type: user.travel_preferences
    requires_consent: true
policy:
  required_checks: [permission, connector_scope, payment_token_required,
                    cancellation_policy_disclosed, approval_required_for_booking]
tools:
  - name: book_hotel
    command_type: book_hotel
    mode: approval_gated_async
evals:
  required: [booking_terms_present, cancellation_policy_present, approval_packet_complete]

107.2 · Command type manifest example

name: book_hotel
version: 1.0.0
risk_level: high
execution:
  mode: approval_gated_async
  dbos_workflow: run_command
payload_schema:
  type: object
  required: [booking_draft_id, payment_token_ref]
policies:
  required: [permission, connector_scope, payment_token_required, approval_required_for_booking]
approval:
  required: true
  approval_type: hotel_booking_approval
artifacts:
  produces: [reservation_confirmation]
connector_operations: [hotel_booking.book_hotel]
idempotency:
  required: true
  key_template: "book_hotel:{booking_draft_id}"
compensation:
  supported: true
  command_type: cancel_reservation

107.3 · Approval type manifest example

name: hotel_booking_approval
version: 1.0.0
risk_level: high
required_fields:
  - hotel_name
  - check_in_date
  - check_out_date
  - guests
  - total_price
  - currency
  - cancellation_policy_summary
  - payment_method_summary
decision_values: [approved, rejected, request_changes]
expiration:
  default_minutes: 30
ui:
  template: hotel_booking_approval_card
resume:
  on_approved: continue_workflow
  on_rejected: fail_or_replan

Section 108 Registry kernel

The Domain Registry is built on a shared kernel that provides generic mechanics. Each typed registry is a thin view over the kernel.

108.1 · Core kernel objects

Object	Carries
`RegistryEntity`	entity_id, entity_type, name, display_name, description, owner, domain, status
`RegistryVersion`	version_id, entity_id, version (major.minor.patch), status, manifest, checksum, created_by, approved_by, lifecycle timestamps
`RegistryArtifact`	artifact_id, version_id, artifact_type, artifact_uri, checksum, metadata
`RegistryBinding`	binding_id, version_id, binding_scope, binding_target, environment, status, rollout_strategy, rollout_config
`RegistryLifecycleEvent`	lifecycle_event_id, entity_id, version_id, event_type, actor, payload, trace_id
`RegistryRelationship`	source_version_id, relationship_type, target_entity_type, target_entity_name, target_version_constraint, metadata

108.2 · Relationship types

uses_tool

creates_command_type

requires_policy

requires_approval_type

produces_artifact_schema

reads_memory_schema

writes_memory_schema

invokes_connector_operation

allows_agent_role

includes_agent_role

requires_evaluation_suite

uses_workflow_type

Section 109 Data model

Hybrid schema — generic kernel tables, with optional typed domain tables (skill_capabilities, tool_contracts, connector_operations, etc.) when query patterns warrant. This avoids per-type lifecycle duplication while still allowing domain-specific validation.

109.1 · `registry_entities`

CREATE TABLE IF NOT EXISTS registry_entities (
  entity_id UUID PRIMARY KEY,
  entity_type TEXT NOT NULL,       -- 'skill', 'tool', 'connector', 'command_type',
                                   -- 'policy', 'artifact_schema', 'memory_schema',
                                   -- 'agent_role', 'swarm_template', 'approval_type',
                                   -- 'evaluation_suite', 'workflow_type'
  name TEXT NOT NULL,
  display_name TEXT NOT NULL,
  description TEXT NULL,
  owner TEXT NOT NULL,
  domain TEXT NULL,
  status TEXT NOT NULL DEFAULT 'active',
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  UNIQUE(entity_type, name)
);

109.2 · `registry_versions`

CREATE TABLE IF NOT EXISTS registry_versions (
  version_id UUID PRIMARY KEY,
  entity_id UUID NOT NULL REFERENCES registry_entities(entity_id),
  version TEXT NOT NULL,
  version_major INT NOT NULL,
  version_minor INT NOT NULL,
  version_patch INT NOT NULL,
  status TEXT NOT NULL DEFAULT 'draft',  -- draft | validated | pending_approval |
                                         -- approved | published | deprecated |
                                         -- retired | rejected
  risk_level TEXT NOT NULL DEFAULT 'low',
  manifest JSONB NOT NULL DEFAULT '{}',
  checksum TEXT NOT NULL,
  created_by TEXT NOT NULL,
  approved_by TEXT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  validated_at TIMESTAMPTZ NULL,
  approved_at TIMESTAMPTZ NULL,
  published_at TIMESTAMPTZ NULL,
  deprecated_at TIMESTAMPTZ NULL,
  retired_at TIMESTAMPTZ NULL,
  UNIQUE(entity_id, version)
);

109.3 · `registry_artifacts` · `registry_bindings` · `registry_lifecycle_events`

CREATE TABLE IF NOT EXISTS registry_artifacts (
  registry_artifact_id UUID PRIMARY KEY,
  version_id UUID NOT NULL REFERENCES registry_versions(version_id),
  artifact_type TEXT NOT NULL,     -- manifest_yaml | manifest_json | skill_md |
                                   -- tool_schema | connector_schema | policy_bundle |
                                   -- prompt_template | eval_suite | example_bundle |
                                   -- package_archive
  artifact_uri TEXT NOT NULL,
  checksum TEXT NOT NULL,
  metadata JSONB NOT NULL DEFAULT '{}',
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE TABLE IF NOT EXISTS registry_bindings (
  registry_binding_id UUID PRIMARY KEY,
  version_id UUID NOT NULL REFERENCES registry_versions(version_id),
  binding_scope TEXT NOT NULL,     -- global | environment | app | tenant | team |
                                   -- user | agent_role | agent_name | workflow_type |
                                   -- command_type | connector
  binding_target TEXT NOT NULL,
  environment TEXT NOT NULL DEFAULT 'prod',
  status TEXT NOT NULL DEFAULT 'active',
  rollout_strategy TEXT NOT NULL DEFAULT 'pinned',
  rollout_config JSONB NOT NULL DEFAULT '{}',
  created_by TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE TABLE IF NOT EXISTS registry_lifecycle_events (
  registry_lifecycle_event_id UUID PRIMARY KEY,
  entity_id UUID NOT NULL REFERENCES registry_entities(entity_id),
  version_id UUID NULL REFERENCES registry_versions(version_id),
  event_type TEXT NOT NULL,
  actor TEXT NOT NULL,
  payload JSONB NOT NULL DEFAULT '{}',
  trace_id TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

109.4 · `registry_relationships` · `registry_evaluations` · `registry_usage`

CREATE TABLE IF NOT EXISTS registry_relationships (
  registry_relationship_id UUID PRIMARY KEY,
  source_version_id UUID NOT NULL REFERENCES registry_versions(version_id),
  relationship_type TEXT NOT NULL,
  target_entity_type TEXT NOT NULL,
  target_entity_name TEXT NOT NULL,
  target_version_constraint TEXT NULL,
  required BOOLEAN NOT NULL DEFAULT true,
  metadata JSONB NOT NULL DEFAULT '{}',
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE TABLE IF NOT EXISTS registry_evaluations (
  registry_evaluation_id UUID PRIMARY KEY,
  version_id UUID NOT NULL REFERENCES registry_versions(version_id),
  evaluation_type TEXT NOT NULL,
  status TEXT NOT NULL,
  result JSONB NOT NULL DEFAULT '{}',
  error TEXT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  completed_at TIMESTAMPTZ NULL
);

CREATE TABLE IF NOT EXISTS registry_usage (
  registry_usage_id UUID PRIMARY KEY,
  version_id UUID NOT NULL REFERENCES registry_versions(version_id),
  agent_run_id UUID NULL,
  swarm_run_id UUID NULL,
  command_id UUID NULL,
  workflow_run_id UUID NULL,
  usage_type TEXT NOT NULL,        -- resolved_for_agent_run | tool_enabled |
                                   -- command_proposed | connector_operation_enabled |
                                   -- memory_read_enabled | memory_write_proposed |
                                   -- artifact_created | subagent_spawn_enabled |
                                   -- policy_applied | approval_type_used
  payload JSONB NOT NULL DEFAULT '{}',
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

Section 110 Versioning

All behavior-affecting registry objects use semantic versioning (MAJOR.MINOR.PATCH).

Patch

Doc / cosmetic

Documentation improvements, examples added, prompt wording changes that preserve behavior, bug fixes that preserve contract.

Minor

Additive

New optional capability, new optional field, new optional eval, backward-compatible schema expansion.

Major

Breaking

Changed input or output contract, removed capability, expanded connector scope, changed memory scope, changed approval behavior, changed artifact schema incompatibly, changed side-effect semantics.

Section 111 Lifecycle

111.1 · Standard lifecycle

create registry version
→ validate manifest
→ validate relationships
→ validate compatibility
→ run evals
→ classify risk
→ request approval if needed
→ publish
→ bind to scopes
→ monitor usage
→ deprecate
→ retire

111.2 · State machine

draft

validated

pending_approval

approved

published

deprecated

retired

rejected

111.3 · High-risk lifecycle

High-risk objects require explicit approval before publication. High-risk changes include external write added · memory write added · payment operation added · database mutation added · subagent spawning added · connector scope expanded · approval requirement removed · artifact schema changed incompatibly · policy weakened.

111.4 · Emergency retirement

Triggered by

Security vulnerability · unsafe connector behavior · approval bypass · prompt-injection vulnerability · data leakage · memory corruption · payment-related defect.

Emergency retirement blocks new resolution, disables or migrates active bindings, notifies owners, writes lifecycle events, and creates an incident audit.

Section 112 Bindings & resolution

112.1 · Binding strategies

pinned

latest_patch

latest_minor

canary

percentage

tenant_allowlist

environment_specific

manual

Production default

pinned. Avoid production bindings to unbounded latest — silent updates of behavior-changing manifests are a class of incident no team wants.

112.2 · Resolution inputs

Resolution accepts entity_type, scope, target, environment, agent_role, agent_name, user_id, team_id, tenant_id, app_id, workflow_type, command_type, and context. It applies the precedence order from §106.4.

112.3 · Resolution output

Examples: runtime skill manifest · allowed tool list · connector operation contract · command type contract · policy bundle · approval packet schema · artifact schema · memory access contract · agent role manifest · swarm template manifest.

112.4 · Runtime manifest for an AgentRun

{
  "agent_run_id": "agent_run_123",
  "agent_role": "hotel_booking_agent",
  "skills": [
    {
      "name": "hotel_booking",
      "version": "1.4.2",
      "skill_version_id": "uuid",
      "instructions_ref": "artifact://...",
      "tools": [
        { "name": "create_booking_draft", "command_type": "create_booking_draft", "mode": "async" },
        { "name": "book_hotel", "command_type": "book_hotel", "mode": "approval_gated_async" }
      ],
      "memory_scope": {
        "read":  ["user.travel_preferences"],
        "write": ["user.travel_preferences"]
      },
      "artifact_scope": {
        "write": ["booking_draft", "reservation_confirmation"]
      }
    }
  ]
}

Section 113 DBOS lifecycle workflows

Registry lifecycle runs as DBOS workflows — durable, recoverable, idempotent.

Workflow

`publish_registry_version`

load version → validate manifest → validate relationships → validate compatibility → run eval suite → classify risk → request approval if needed → publish → write lifecycle events → notify owner.

Workflow

`bind_registry_version`

load version → check status is published → validate binding scope → check compatibility → write binding → write lifecycle event → invalidate resolution cache.

Workflow

`rollback_registry_binding`

load current binding → load target prior version → validate target is usable → update binding → write lifecycle event → notify affected owners.

Workflow

`retire_registry_version`

load version → find active bindings → disable or migrate bindings → mark retired → write lifecycle event → notify owners.

Workflow

`resolve_agent_runtime_manifest`

load active bindings → apply precedence → filter by policy → validate compatibility → resolve relationships → build minimized manifest → record usage.

Section 114 Policy & governance

The Domain Registry evaluates:

Can this agent use this skill?
Can this skill expose this tool?
Can this tool create this command type?
Can this command type invoke this connector operation?
Can this command type produce this artifact?
Can this skill read / write this memory?
Can this agent spawn this subagent?
Can this workflow use this swarm template?
Does this capability require approval?
Is this version deprecated or retired?
Is this version allowed in this environment?

Registry changes are commands

Registry mutations themselves go through the Concord command contract: create_registry_entity, create_registry_version, validate_registry_version, publish_registry_version, bind_registry_version, deprecate_registry_version, retire_registry_version, rollback_registry_binding. Policy applies, audit records, approval gates trigger.

Section 115 API surface

def create_registry_entity(
    entity_type: str, name: str, display_name: str,
    owner: str, description: str | None = None, domain: str | None = None,
) -> str: ...

def create_registry_version(
    entity_type: str, name: str, version: str,
    manifest: dict, artifact_refs: list[dict], created_by: str,
) -> str: ...

def validate_registry_version(version_id: str, actor: str) -> dict: ...

def publish_registry_version(version_id: str, actor: str) -> dict: ...

def bind_registry_version(
    version_id: str, binding_scope: str, binding_target: str,
    environment: str, actor: str,
    rollout_strategy: str = "pinned", rollout_config: dict | None = None,
) -> str: ...

def resolve_agent_runtime_manifest(
    agent_run_id: str, agent_name: str, agent_role: str,
    context: dict, environment: str,
) -> dict: ...

def resolve_command_type_contract(
    command_type: str, context: dict, environment: str,
) -> dict: ...

def can_use_capability(
    source_version_id: str, relationship_type: str,
    target_entity_type: str, target_entity_name: str, context: dict,
) -> bool: ...

def rollback_registry_binding(
    binding_id: str, target_version_id: str, actor: str, reason: str,
) -> dict: ...

Section 116 Admin UX

The admin UI should support listing entities by type, viewing and comparing versions, viewing manifests / relationships / declared capabilities / risk level / eval results, publishing, approving high-risk versions, binding to scopes, viewing active bindings, rolling back, deprecating, retiring, viewing lifecycle timelines, viewing usage, performing impact analysis, and previewing runtime manifests.

116.1 · Critical views

Domain registry dashboard

Entity detail

Version detail

Relationship graph

Binding detail

Risk review

Eval results

Usage graph

Runtime manifest preview

Impact analysis

Section 117 Observability

117.1 · Tracked behaviors

Versions resolved per entity type, versions used per command type, skills resolved per agent role, tools enabled per skill, connector operations invoked per tool, policies applied per command type, artifacts produced per command type, memory writes by skill, deprecated version usage, retired version resolution attempts, rollback frequency.

117.2 · Metrics

registry_version_publish_count

registry_resolution_count

registry_resolution_latency_ms

registry_eval_failure_count

registry_binding_rollback_count

deprecated_version_usage_count

retired_version_block_count

high_risk_approval_count

impact_analysis_count

Section 118 Implementation plan

Phases

flowchart LR P1["Phase 1
Registry kernel"] --> P2["Phase 2
Core domain registries"] P2 --> P3["Phase 3
Relationship graph"] P3 --> P4["Phase 4
Evals & rollout"] P4 --> P5["Phase 5
Swarm-aware registry"] P5 --> P6["Phase 6
Optional extraction"] style P1 fill:#FAF8F2,stroke:#141413 style P6 fill:#F1F2EC,stroke:#6B7B5A

Phase 1

Registry kernel

Kernel tables, basic lifecycle, publish, bind, resolve, rollback. Entities supported: skill · tool · connector · command_type.

Phase 2

Core domain registries

Add: policy · artifact schema · memory schema · agent role · approval type. Add relationship validation, runtime manifest generation, capability filtering, risk classification, high-risk approval workflow.

Phase 3

Relationship graph

Add: registry_relationships, impact analysis, dependency resolution, version compatibility checks, graph visualization.

Phase 4

Evals & rollout

Add: registry_evaluations, eval suites, canary rollout, compatibility checks, deprecation warnings, usage tracking.

Phase 5

Swarm-aware registry

Add: swarm template + evaluation suite + workflow type registries, subagent skill constraints, parent-child scope inheritance, join-strategy compatibility.

Phase 6

Optional extraction

If the kernel stabilizes, extract concord-registry as a reusable library. Keep Concord-specific domain semantics in Concord.

Section 119 Acceptance criteria

119.1 · Functional · v1

Create registry entity.
Create registry version.
Validate manifest.
Publish version.
Bind version to scope.
Resolve version by scope.
Resolve skills / tools / connectors / command types for an AgentRun.
Prevent retired version from resolving.
Write lifecycle events.
Rollback binding.

119.2 · Technical · v1

All registry writes are audited.
Published versions are immutable.
Runtime manifests exclude disallowed capabilities.
DBOS lifecycle workflows run idempotently.
High-risk registry changes are blocked without approval.
Resolution is deterministic.
Postgres is the source of truth.

Section 120 Open questions

How generic should the registry kernel be in v1?
Should typed domain tables exist immediately, or are manifests + relationships sufficient at first?
How strict should manifest validation be at first?
Should SKILL.md and other large artifacts live in Git, object storage, or Postgres?
Should high-risk approval be per version or per binding?
How should runtime manifest caching work?
How should registry impact analysis be visualized?
Should deprecated versions resolve in production?
How should active DBOS workflows behave if a registry version is retired mid-run?
Should command type contracts be required before any command can run?

Section 121 Final recommendation

Build a new core Concord module concord.registry — the Concord Domain Registry, not only a Skills Registry.

121.1 · The twelve registries

Skill

Tool

Connector

Command type

Policy

Artifact schema

Memory schema

Agent role

Swarm template

Approval type

Evaluation suite

Workflow type

121.2 · On a shared kernel

versioned records

lifecycle state machine

artifact references

immutable published versions

generic bindings

scope resolution

relationship graph

compatibility checks

audit event helpers

rollback

deprecation

retirement

121.3 · The architecture

Concord Core defines the primitive model. Concord Registry defines capability contracts. DBOS Runtime executes them durably. Postgres records the truth.

Key framing

The Domain Registry is Concord's capability graph. Skills are one node type in that graph.

Concord

Deterministic workflows

Agentic workflows

2.1 · Durable before executable

2.2 · Commands are the center of the system

2.3 · State transitions are explicit

2.4 · Execution is replaceable

2.5 · Agents are participants, not authorities

2.6 · Postgres is the system of record

2.7 · The primitive set should be small and stable

2.8 · Contracts and mechanics

Ingress

Intent

Policy & planning

Execution

Coordination

State lifecycle

Human judgment

Knowledge & output

Connectors

Observability & governance

4.1 · Ingress primitives

4.2 · Intent primitives

4.3 · Policy and planning primitives

4.4 · Execution primitives

4.5 · Coordination primitives

4.6 · State lifecycle primitives

4.7 · Human judgment primitives

4.8 · Knowledge and output primitives

4.9 · Connector primitives

4.10 · Observability and governance primitives

7.1 · API layer

7.2 · Command layer

7.3 · Policy layer

7.4 · Planning layer

7.5 · Execution layer

7.6 · State layer

7.7 · Persistence layer

7.8 · Connector layer

7.9 · Agent layer

Entity relationships

Schema

13.1 · Command

13.2 · TaskSpec

13.3 · ExecutionPlan

13.4 · CommandDependency

13.5 · CoreEffect

Scopes

Memory types

Examples

Memory write rules

Backend

Artifact types

Artifact statuses

Audit event examples

Fast, low-risk, bounded

Slow, expensive, multi-step

30.1 · graceful (default)

30.2 · compensate_then_stop

Manifest at registration

Graph validator at startup

Runtime drift detector

31.1 · Manifest

31.2 · Graph validator

31.3 · Drift detector

Transient by nature

Logical failures

Command creation

Policy and planning

Enqueue

Worker completion

Submit command

Get command

Resolve approval

Agent action

CommandService

PolicyEngine

Planner

Executor

Worker

30.1 · `graceful` (default)

30.2 · `compensate_then_stop`

57.1 · `swarm_runs`

57.2 · `agent_runs`

57.3 · `agent_invocations`

57.4 · `agent_steps`

57.5 · `agent_messages`