Concord
A library of contracts where every action — deterministic or agentic — moves in agreement with policy, state, and audit. A durable runtime executes; Concord declares what the work means.
Section 01 Executive summary
Concord is a library of contracts that turn any user request, system event, webhook, scheduled job, or agent action into a durable, inspectable workflow. The library declares what work means and what governance it requires; a durable runtime — DBOS by default — executes it.
Modern applications now coordinate agents, humans, tools, connectors, memory, approvals, artifacts, and durable workflows — but each piece usually has its own local model. Agent frameworks govern the agent loop. Workflow runtimes govern execution. Connector systems expose integrations. Policy engines decide rules. Observability tools record traces.
But no shared contract explains what an action means, who authorized it, what it touched, what it produced, and how it should be audited. Concord provides that missing semantic contract layer. For the full problem statement — ten concrete pains, one per chapter — see The problems Concord solves ↗.
For a longer-form treatment of what Concord is and what it isn't — including a comparison across durable runtimes, agent frameworks, BPM platforms, DevOps systems, policy engines, data DAG orchestrators, observability tools, iPaaS, and memory systems — see What Concord is & isn't ↗.
Try a worked example
The core idea is simple:
The framework must support both:
Deterministic workflows
Fixed steps, fixed state transitions, well-defined inputs and outputs.
Agentic workflows
Dynamic tool selection, reasoning loops, external connectors, memory retrieval, human review, adaptive plans.
Postgres is always the durable source of truth. Connectors are replaceable. Execution runtimes are replaceable. Agents are callers of the primitive layer, not owners of the system of record.
Section 02 Design philosophy
Concord is built around a few strong principles.
2.1 · Durable before executable
Do not execute meaningful work before recording intent.
This makes the system replayable, observable, cancellable, and auditable.
2.2 · Commands are the center of the system
A command is the durable representation of requested intent. Examples:
The command is not the same as the execution step. The command says what is requested. Tasks say how work is executed.
2.3 · State transitions are explicit
Every meaningful workflow state change should be validated and persisted.
created → validated → waiting_for_approval → approved → queued → running → succeeded
Never infer workflow state only from logs. Logs are evidence. State is the control surface.
2.4 · Execution is replaceable
Concord does not care whether work is executed by:
Execution is an adapter. The core framework only cares about: input, context, command, task run, result, side effects, state transition, audit.
2.5 · Agents are participants, not authorities
Agentic workflows can call tools, propose plans, request approvals, and write memories. But the framework owns:
An agent can propose. The framework decides what is allowed.
2.6 · Postgres is the system of record
All durable framework state lives in Postgres. Postgres stores:
This keeps the architecture portable and inspectable.
2.7 · The primitive set should be small and stable
New capabilities should usually be modeled as combinations of primitives, not new primitives.
is NOT a new primitive"] X --> A[schedule ingress] A --> B[command] B --> C[policy] C --> D[async task] D --> E[connector call] E --> F[artifact / event writes] F --> G[audit] style X fill:#F5E0D2,stroke:#D97757 style G fill:#F1F2EC,stroke:#6B7B5A
2.8 · Contracts and mechanics
Concord declares what work means; the durable runtime executes how it runs. Every operation that gets durable execution — commands, agent runs, swarms, connector calls, retries, schedules, cancellations, effects — carries a contract written in Concord and is executed by the runtime. The contract is the meaning: what the action is, what inputs it requires, what side effects it may produce, what error classes apply, what compensations exist, who may cancel it, what audit must be recorded. The mechanics are the execution: when it runs, how it retries, how it queues, how it sleeps, how it recovers.
This separation makes the runtime swappable. Concord's domain layer imports a DurableRuntime protocol (see §41), not the runtime implementation. The default adapter is DBOS; future Temporal or Restate adapters slot in without touching the domain. The contract is what survives across runtimes.
If a question is about meaning, it belongs in Concord. If it is about execution, it belongs in the runtime. When the line is unclear, the contract wins — write the meaning in Concord first, then describe how the runtime should honor it.
Section 03 The primitive model
Concord organizes work into ten primitive families.
Ingress
How work enters the system.
Intent
Capture what the system is being asked to do.
Policy & planning
Decide allowed/denied and produce a plan.
Execution
Actually perform the work.
Coordination
Make async / distributed work safe.
State lifecycle
Track progress with allowed transitions.
Human judgment
Approvals, overrides, escalations.
Knowledge & output
Memory, artifacts, lineage.
Connectors
Adapt to outside systems.
Observability & governance
Explain and secure the system.
Section 04 Primitive taxonomy
4.1 · Ingress primitives
Ingress is how work enters the system. Types:
Ingress should always produce either: a command, an event record, or a rejected request with audit.
Ingress should not perform expensive work directly.
4.2 · Intent primitives
Intent primitives capture what the system is being asked to do. Core object: Command. A command contains:
The command is the root object for most downstream records.
4.3 · Policy and planning primitives
Policy decides whether a command is allowed, denied, delayed, escalated, or routed. Policy outcomes:
Planning turns a policy-approved command into an execution plan. A plan contains:
4.4 · Execution primitives
Execution primitives actually do work. Types:
Execution must be tracked through task runs. A task run contains:
4.5 · Coordination primitives
Coordination makes async and distributed work safe. Types:
These primitives prevent common production failures:
- duplicate webhook processing
- double-published reports
- two workers processing the same task
- infinite retry loops
- unbounded parallelism
- overloaded connectors
4.6 · State lifecycle primitives
Lifecycle primitives track progress. Canonical states:
State transitions should be checked against an allowed transition table.
4.7 · Human judgment primitives
These represent human decisions. Types:
Human approval should never be only a message in Slack or email. It should be a durable record in Postgres.
4.8 · Knowledge and output primitives
These represent durable outputs or reusable knowledge. Types:
Memory is for future behavior. Artifacts are outputs of work. Examples: generated report, exported CSV, SQL query result, dashboard draft, connector sync snapshot, agent answer, approval packet, preference memory.
4.9 · Connector primitives
Connectors adapt Concord to outside systems. A connector can represent:
A connector is not the workflow owner. It is a capability provider. Connector calls should be represented as task runs or tool calls with durable status.
4.10 · Observability and governance primitives
These explain and secure the system. Types:
The audit log should answer: who requested, what was requested, what policy decision was made, what executed, what changed, what was produced, who approved, what connector was called, what failed.
Section 05 Canonical workflow shape
Every workflow follows this shape:
Optional branches:
Section 06 High-level architecture
system of record")] PG --> AAM[Audit · Artifacts · Memory · Notifications] style PG fill:#FFFFFF,stroke:#141413,stroke-width:2px style PE fill:#F5E0D2,stroke:#D97757,stroke-width:1.5px style AAM fill:#F1F2EC,stroke:#6B7B5A
Section 07 Layered architecture
receive · authenticate · normalize"] CMD["Command layer
create · validate · idempotency"] POL["Policy layer
permissions · risk · approval"] PLN["Planning layer
sync/async · gates · steps"] EXE["Execution layer
run · retry · persist"] STA["State layer
enforce transitions"] PER["Persistence layer
Postgres tables · tx · locks"] CON["Connector layer
normalize external APIs"] AGT["Agent layer
read context · propose · summarize"] API --> CMD --> POL --> PLN --> EXE --> STA --> PER CON -.-> EXE AGT -.-> CMD style POL fill:#F5E0D2,stroke:#D97757,stroke-width:1.5px style PER fill:#FFFFFF,stroke:#141413,stroke-width:1.5px style CON fill:#F1F2EC,stroke:#6B7B5A style AGT fill:#EFE6F0,stroke:#7A5560
7.1 · API layer
Receive requests, authenticate users, normalize ingress, create commands, return immediate response. Should not run expensive jobs, perform long syncs, write memory without policy, or skip command creation.
7.2 · Command layer
Create command, validate required inputs, apply idempotency, attach context, store payload, emit audit event.
7.3 · Policy layer
Check permissions, cost, risk, data safety, approval requirements, memory consent; decide route. Declarative where possible and executable where necessary.
7.4 · Planning layer
Choose sync vs async, insert approval gates, connector steps, memory writes, artifact writes, notification steps; create execution plan. For deterministic workflows, plans can be static. For agentic workflows, plans can be proposed dynamically and validated by policy.
7.5 · Execution layer
Execute function, run task, call connector, invoke agent, call external job, persist result, handle retries. Should be side-effect-aware.
7.6 · State layer
Enforce valid transitions, persist state, prevent illegal transitions, record state history.
7.7 · Persistence layer
Postgres tables, transactions, row-level locking, idempotency constraints, query APIs, retention, archival.
7.8 · Connector layer
Normalize external APIs, handle auth, handle rate limits, return structured results, avoid leaking provider-specific details upward.
7.9 · Agent layer
Read context, retrieve memory, propose action, call permitted tools, request approval if needed, summarize result.
Agent layer should never bypass policy.
Section 08 Postgres-first persistence design
Postgres is the durable system of record. Use Postgres for:
Use object storage or external systems only for large blobs. Store references in Postgres.
Section 09 Core Postgres schema
The schema is compact, append-only where it can be, and extensible through JSONB columns. Each command links to a durable runtime workflow via dbos_workflow_id; runtime execution status lives in the runtime adapter, business status lives here.
Entity relationships
Schema
-- Commands. Carries domain status; runtime status lives in DBOS.
CREATE TABLE IF NOT EXISTS commands (
command_id UUID PRIMARY KEY,
command_type TEXT NOT NULL,
requested_by TEXT NOT NULL,
ingress TEXT NOT NULL,
payload JSONB NOT NULL DEFAULT '{}',
context JSONB NOT NULL DEFAULT '{}',
status TEXT NOT NULL,
cancellation_mode TEXT NOT NULL DEFAULT 'graceful', -- 'graceful' | 'compensate_then_stop'
idempotency_key TEXT NULL UNIQUE,
dbos_workflow_id TEXT NULL UNIQUE,
result JSONB NULL,
error TEXT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
completed_at TIMESTAMPTZ NULL
);
-- Fan-in dependencies. A child waits for all parents to reach required_status.
CREATE TABLE IF NOT EXISTS command_dependencies (
child_command_id UUID NOT NULL REFERENCES commands(command_id),
parent_command_id UUID NOT NULL REFERENCES commands(command_id),
required_status TEXT NOT NULL DEFAULT 'succeeded',
propagate_cancellation BOOLEAN NOT NULL DEFAULT true,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
PRIMARY KEY (child_command_id, parent_command_id)
);
CREATE INDEX IF NOT EXISTS idx_command_deps_parent ON command_dependencies(parent_command_id);
-- Domain effects. Planned-upfront, transitioned through by DBOS steps.
CREATE TABLE IF NOT EXISTS domain_effects (
domain_effect_id UUID PRIMARY KEY,
command_id UUID NOT NULL REFERENCES commands(command_id),
effect_type TEXT NOT NULL,
effect_payload JSONB NOT NULL DEFAULT '{}',
idempotency_key TEXT NULL,
status TEXT NOT NULL DEFAULT 'planned', -- 'planned' | 'executing' | 'succeeded' | 'failed'
executed_by_dbos_workflow_id TEXT NULL,
result JSONB NULL,
error TEXT NULL,
-- Compensation tracking
compensates_effect_id UUID NULL REFERENCES domain_effects(domain_effect_id),
declared_compensation TEXT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
completed_at TIMESTAMPTZ NULL
);
CREATE INDEX IF NOT EXISTS idx_domain_effects_command ON domain_effects(command_id);
CREATE INDEX IF NOT EXISTS idx_domain_effects_status ON domain_effects(status);
CREATE INDEX IF NOT EXISTS idx_domain_effects_idem ON domain_effects(idempotency_key) WHERE idempotency_key IS NOT NULL;
-- Unified domain events. Replaces command_events, domain_audit_log, agent_steps.
-- Disambiguated by purpose.
CREATE TABLE IF NOT EXISTS domain_events (
event_id UUID PRIMARY KEY,
command_id UUID REFERENCES commands(command_id),
agent_run_id UUID NULL REFERENCES agent_runs(agent_run_id),
swarm_run_id UUID NULL REFERENCES swarm_runs(swarm_run_id),
purpose TEXT NOT NULL, -- 'event' | 'audit' | 'agent_step'
event_type TEXT NOT NULL,
payload JSONB NOT NULL DEFAULT '{}',
actor TEXT NOT NULL,
trace_id TEXT NOT NULL,
-- Agent-step extensions; null for non-agent rows
step_index INT NULL,
tool_name TEXT NULL,
latency_ms INT NULL,
token_usage JSONB NULL,
cost_units NUMERIC NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX IF NOT EXISTS idx_domain_events_command ON domain_events(command_id);
CREATE INDEX IF NOT EXISTS idx_domain_events_purpose ON domain_events(purpose);
CREATE INDEX IF NOT EXISTS idx_domain_events_agent ON domain_events(agent_run_id) WHERE agent_run_id IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_domain_events_audit ON domain_events(command_id, created_at) WHERE purpose = 'audit';
Section 10 Queue model
Queue execution is a runtime concern. Concord declares semantic queue names; the durable runtime adapter handles the mechanics — claim, lease, concurrency, retry, and rate limiting.
The standard queue names a Concord catalog uses:
Queue choice is a planning outcome. Effect type maps to queue:
def choose_queue(effect: CoreEffect, context: dict) -> str:
if effect.effect_type.startswith("connector."):
return "connector_calls"
if effect.effect_type.startswith("agent.run"):
return "agent_runs"
if effect.effect_type.startswith("agent.spawn"):
return "swarm_children"
if effect.effect_type.startswith("notification."):
return "notifications"
return "primitiveflow_default"
Queue registration, concurrency limits, partitioning, and per-queue rate limits are runtime configuration. The runtime adapter (default DBOSDurableRuntime) translates the queue name into its native primitive — a DBOS queue for the default adapter; a Temporal task queue or another broker for alternates.
Concord does not own a task_queue or worker_claims table. Runtime claim, lease, and worker assignment belong to the adapter and live in its tables, not Concord's. The domain projection of what work has been requested and what happened to it lives in domain_effects and domain_events.
Section 11 Idempotency model
Every externally triggered or user-submitted command should support idempotency. Examples:
If idempotency_key exists, return the existing command instead of creating another one.
Postgres enforces this with:
CREATE UNIQUE INDEX pf_commands_idempotency_key_idx
ON pf_commands (idempotency_key)
WHERE idempotency_key IS NOT NULL;
Section 12 State machine
Concord enforces a domain state machine over the commands table. Runtime execution status — whether the underlying workflow is alive, recovering, or errored — lives in the runtime adapter and is joined to the command via commands.dbos_workflow_id. The two evolve independently: a runtime workflow can be "recovering" while the command is "running"; a runtime workflow can complete normally while the command sits in waiting_for_approval. The domain machine answers what does this booking mean right now; the runtime answers is execution alive.
The legal domain transitions are below. compensating and compensated are reachable via the compensate_then_stop cancellation mode (see §30).
Python standard:
from enum import Enum
class WorkflowState(str, Enum):
CREATED = "created"
VALIDATED = "validated"
WAITING_FOR_INPUT = "waiting_for_input"
WAITING_FOR_APPROVAL = "waiting_for_approval"
APPROVED = "approved"
QUEUED = "queued"
RUNNING = "running"
BLOCKED = "blocked"
SUCCEEDED = "succeeded"
FAILED = "failed"
CANCELLED = "cancelled"
EXPIRED = "expired"
COMPENSATING = "compensating"
COMPENSATED = "compensated"
ALLOWED_TRANSITIONS = {
WorkflowState.CREATED: {
WorkflowState.VALIDATED,
WorkflowState.FAILED,
WorkflowState.CANCELLED,
},
WorkflowState.VALIDATED: {
WorkflowState.WAITING_FOR_INPUT,
WorkflowState.WAITING_FOR_APPROVAL,
WorkflowState.QUEUED,
WorkflowState.RUNNING,
WorkflowState.FAILED,
WorkflowState.CANCELLED,
},
WorkflowState.WAITING_FOR_INPUT: {
WorkflowState.VALIDATED,
WorkflowState.CANCELLED,
WorkflowState.EXPIRED,
},
WorkflowState.WAITING_FOR_APPROVAL: {
WorkflowState.APPROVED,
WorkflowState.CANCELLED,
WorkflowState.EXPIRED,
WorkflowState.FAILED,
},
WorkflowState.APPROVED: {
WorkflowState.QUEUED,
WorkflowState.RUNNING,
WorkflowState.CANCELLED,
},
WorkflowState.QUEUED: {
WorkflowState.RUNNING,
WorkflowState.CANCELLED,
WorkflowState.FAILED,
},
WorkflowState.RUNNING: {
WorkflowState.SUCCEEDED,
WorkflowState.FAILED,
WorkflowState.CANCELLED,
WorkflowState.BLOCKED,
},
WorkflowState.BLOCKED: {
WorkflowState.QUEUED,
WorkflowState.RUNNING,
WorkflowState.FAILED,
WorkflowState.CANCELLED,
},
WorkflowState.FAILED: {
WorkflowState.QUEUED,
WorkflowState.COMPENSATING,
WorkflowState.CANCELLED,
},
WorkflowState.COMPENSATING: {
WorkflowState.COMPENSATED,
WorkflowState.FAILED,
},
WorkflowState.SUCCEEDED: set(),
WorkflowState.CANCELLED: set(),
WorkflowState.EXPIRED: set(),
WorkflowState.COMPENSATED: set(),
}
def can_transition(from_state: WorkflowState, to_state: WorkflowState) -> bool:
return to_state in ALLOWED_TRANSITIONS.get(from_state, set())
Section 13 Logical domain objects
13.1 · Command
A command is durable intent.
from dataclasses import dataclass, field
from datetime import datetime, timezone
from typing import Any
from uuid import uuid4
def now_iso() -> str:
return datetime.now(timezone.utc).isoformat()
def new_id(prefix: str) -> str:
return f"{prefix}_{uuid4().hex}"
@dataclass
class Context:
user_id: str
workspace_id: str | None = None
app_id: str | None = None
run_as: str = "user"
trace_id: str = field(default_factory=lambda: new_id("trace"))
request_id: str = field(default_factory=lambda: new_id("req"))
metadata: dict[str, Any] = field(default_factory=dict)
@dataclass
class Command:
command_id: str
command_type: str
task_name: str
requested_by: str
ingress_type: str
payload: dict[str, Any]
context: Context
state: str = "created"
status: str = "created"
idempotency_key: str | None = None
plan: dict[str, Any] | None = None
result: dict[str, Any] | None = None
error: str | None = None
created_at: str = field(default_factory=now_iso)
updated_at: str = field(default_factory=now_iso)
13.2 · TaskSpec
A task spec describes how a task maps to primitives.
from dataclasses import dataclass, field
from typing import Any
@dataclass
class TaskSpec:
name: str
description: str
command_type: str
function_name: str
ingress_types: list[str] = field(default_factory=list)
required_inputs: list[str] = field(default_factory=list)
sync_allowed: bool = False
async_required: bool = False
approval_required: bool = False
memory_write_possible: bool = False
artifact_output_possible: bool = False
notification_required: bool = False
policy_checks: list[str] = field(default_factory=list)
connector_requirements: list[str] = field(default_factory=list)
risk_level: str = "low"
max_attempts: int = 3
metadata: dict[str, Any] = field(default_factory=dict)
13.3 · ExecutionPlan
A plan is a validated path from intent to execution.
from dataclasses import dataclass, field
@dataclass
class PlanStep:
step_id: str
primitive: str
function_name: str
depends_on: list[str] = field(default_factory=list)
metadata: dict = field(default_factory=dict)
@dataclass
class ExecutionPlan:
plan_id: str
command_id: str
execution_mode: str
primitives: list[str]
steps: list[PlanStep]
13.4 · CommandDependency
A command may declare zero or more parent commands it depends on. Resolution waits until every parent reaches its required_status (typically succeeded). The runtime adapter handles the waiting via a listener over command-completion events.
@dataclass(frozen=True)
class CommandDependency:
child_command_id: str
parent_command_id: str
required_status: str = "succeeded"
propagate_cancellation: bool = True
Cycles are rejected at insert time via a BFS over command_dependencies from child to ancestors. Cancellation cascades when propagate_cancellation = True and the required parent status becomes unreachable.
13.5 · CoreEffect
An effect is a side-effect plan — a description of what the workflow intends to do to the outside world (or to durable storage outside the workflow's own state). The functional core returns effects as part of CoreResult; they are persisted upfront as domain_effects rows in status planned and transitioned through executing → succeeded | failed as the runtime fires them.
from enum import StrEnum
class EffectStatus(StrEnum):
PLANNED = "planned"
EXECUTING = "executing"
SUCCEEDED = "succeeded"
FAILED = "failed"
@dataclass(frozen=True)
class CoreEffect:
effect_type: str
payload: dict[str, Any]
idempotency_key: str | None = None
declared_compensation: str | None = None # name of the compensation operation, if any
counter_effects: bool = False # True if this effect only undoes a prior effect
The idempotency_key is the source of truth for effect-level idempotency, separate from command-level. External APIs that support idempotency keys consume this value.
Section 14 Primitive mapping standard
Every task should be mappable to primitives.
def map_task_to_primitives(task: TaskSpec) -> list[str]:
primitives = [
"ingress",
"command",
"context",
"policy",
"plan",
]
if task.approval_required:
primitives.append("human_approval")
if task.async_required:
primitives.extend(["queue", "async_task"])
elif task.sync_allowed:
primitives.append("sync_function")
else:
primitives.extend(["queue", "async_task"])
if task.connector_requirements:
primitives.append("connector_call")
if task.artifact_output_possible:
primitives.append("artifact_write")
if task.memory_write_possible:
primitives.append("memory_write")
if task.notification_required:
primitives.append("notification")
primitives.extend(["state_transition", "audit"])
return primitives
Example:
generate_report = TaskSpec(
name="Generate Report",
description="Generate a report asynchronously.",
command_type="generate_report",
function_name="generate_report",
ingress_types=["user_request", "scheduled_trigger"],
required_inputs=["report_type", "date_range"],
async_required=True,
artifact_output_possible=True,
notification_required=True,
policy_checks=["permission", "cost", "data_access"],
risk_level="medium",
)
map_task_to_primitives(generate_report)
Expected output:
[
"ingress",
"command",
"context",
"policy",
"plan",
"queue",
"async_task",
"artifact_write",
"notification",
"state_transition",
"audit"
]
Section 15 Deterministic workflows
A deterministic workflow has a known plan before execution.
Task spec:
generate_report_task = TaskSpec(
name="Generate Report",
description="Generate a report from a known report template.",
command_type="generate_report",
function_name="generate_report",
ingress_types=["user_request", "scheduled_trigger"],
required_inputs=["report_type", "date_range"],
async_required=True,
approval_required=False,
artifact_output_possible=True,
notification_required=True,
policy_checks=["permission", "cost"],
risk_level="medium",
)
Lifecycle:
created → validated → queued → running → succeeded
The plan is created by rules, not by an agent.
Section 16 Agentic workflows
An agentic workflow can use dynamic reasoning, but it must still operate inside the primitive framework.
The agent proposes actions. The framework authorizes and records actions.
Agent tool calls should become commands or child task runs. Example: agent wants to send an email.
Section 17 Agent action protocol
Agents should not call connectors directly. They should call the primitive gateway.
Recommended protocol (request):
{
"action_type": "tool_call",
"tool_name": "send_notification",
"payload": {
"channel": "email",
"recipient": "finance@example.com",
"message": "The report is ready."
},
"reason": "The user asked me to notify the finance team.",
"risk_level": "medium"
}
The framework responds:
{
"decision": "require_approval",
"command_id": "cmd_123",
"approval_id": "appr_456",
"message": "Human approval required before sending external email."
}
This keeps agents safe and auditable.
Section 18 Connectors
A connector exposes capabilities to the framework. Examples:
A connector should declare: connector_id, connector_type, name, capabilities, auth mode, rate limits, risk level, input schemas, output schemas.
A connector function should not decide policy. It should execute after policy permits it.
Section 19 Connector interface standard
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Any
@dataclass
class ConnectorCall:
connector_name: str
capability: str
input: dict[str, Any]
context: dict[str, Any]
command_id: str | None = None
@dataclass
class ConnectorResult:
ok: bool
output: dict[str, Any]
error: str | None = None
metadata: dict[str, Any] | None = None
class Connector(ABC):
name: str
connector_type: str
@abstractmethod
def capabilities(self) -> list[str]:
pass
@abstractmethod
def call(self, request: ConnectorCall) -> ConnectorResult:
pass
Example connector:
class NotificationConnector(Connector):
name = "notification"
connector_type = "notification"
def capabilities(self) -> list[str]:
return ["send_app_notification", "send_email", "send_slack"]
def call(self, request: ConnectorCall) -> ConnectorResult:
if request.capability == "send_app_notification":
return ConnectorResult(
ok=True,
output={
"sent": True,
"channel": "app",
"recipient": request.input["recipient"],
},
)
return ConnectorResult(
ok=False,
output={},
error=f"Unsupported capability: {request.capability}",
)
Section 20 Tool registry
A tool is a capability exposed to deterministic flows or agents. Tool metadata: name, description, input_schema, output_schema, execution_mode, risk_level, requires_approval, connector, function_name.
Examples:
Tools are not just Python functions. They are governed capabilities.
Section 21 Policy framework
Policies should be composable. Categories:
Policy function standard:
from dataclasses import dataclass, field
from typing import Any
@dataclass
class PolicyResult:
decision: str
reasons: list[str] = field(default_factory=list)
required_approvals: list[str] = field(default_factory=list)
metadata: dict[str, Any] = field(default_factory=dict)
def external_sharing_policy(command: Command, task: TaskSpec, context: Context) -> PolicyResult:
destination = command.payload.get("destination", "")
if "@" in destination or destination.startswith("external:"):
return PolicyResult(
decision="require_approval",
reasons=["External sharing requires approval."],
required_approvals=["data_owner"],
)
return PolicyResult(decision="allow")
Section 22 Approval architecture
Approvals are first-class. An approval request should include: approval_id, command_id, requested_by, approver, approval_type, request_payload, review_packet, status, expires_at, created_at, decided_at.
Approval statuses:
An approval request should include enough context for a human: what action is requested, who requested it, why it is needed, what data/artifact is affected, what policy triggered approval, what will happen after approval, risk level, expiration.
Section 23 Memory architecture
Memory is durable context that can shape future behavior. Memory should be scoped.
Scopes
Memory types
Examples
- User prefers concise executive summaries.
- Finance reports should include YoY comparison.
- Never send customer PII to external emails.
- Use the analytics warehouse for ad hoc report queries.
- Ask Alex before publishing monthly revenue reports.
Memory write rules
- High-confidence, low-risk preference — write directly if consent exists.
- Sensitive or broad memory — require approval.
- Conflicting memory — mark old memory superseded or ask human.
- Temporary memory — set expiration.
Memory retrieval should be explicit: by subject, by task type, by connector, by semantic query.
Backend
The semantics of memory — consent, scope, confidence, supersession, conflict resolution, retrieval contracts — live in Concord. The storage backend is a connector that implements a MemoryStore protocol. The default is Postgres (with optional pgvector for semantic search); alternates are connectors: PineconeMemoryStore, WeaviateMemoryStore, in-house stores.
from typing import Protocol
class MemoryStore(Protocol):
def insert(self, memory: Memory) -> None: ...
def get(self, memory_id: str) -> Memory | None: ...
def search(self, scope: MemoryScope, query: str | None, limit: int) -> list[Memory]: ...
def supersede(self, old_id: str, new_id: str) -> None: ...
def delete(self, memory_id: str) -> None: ...
Consent, policy, and audit hooks fire in the semantic layer regardless of backend. The backend itself never sees consent — it just stores and retrieves bytes addressed by scope.
Section 24 Artifact architecture
Artifacts are durable outputs: the things the workflow produces and the user reads. They are distinct from effects (§13.5, §24 below) which are side-effect plans the workflow intends to perform on the outside world. An artifact is a report, a query result, a draft document; an effect is a publish call, an email send, a job enqueue. A workflow may produce both: an artifact (the report) and an effect (publish it). The two live in different tables — artifacts versus domain_effects — because they answer different questions.
Artifact types
Artifacts should track: location, type, version, status, metadata, lineage, creator, command_id, created_at.
Artifact statuses
Artifacts should be references, not necessarily blobs. Store large payloads elsewhere and keep pointers in Postgres.
Section 25 Audit architecture
Audit is append-only, and lives in a single domain_events table alongside business events and agent step traces. The purpose column ('event', 'audit', 'agent_step') disambiguates the role of each row. One table covers three needs: business event stream, compliance audit trail, and agent step history. Compliance queries filter by purpose = 'audit'; agent observability filters by purpose = 'agent_step' ordered by step_index; product event feeds filter by purpose = 'event'. Agent-step extension columns (step_index, tool_name, latency_ms, token_usage, cost_units) are nullable for non-agent rows.
Audit event examples
Audit events should contain: actor, command_id, trace_id, event_type, target_type, target_id, payload, timestamp.
Audit should never be the only place where state is stored. Audit explains state. State controls workflow.
Section 26 Deterministic task mapping
To map any deterministic task, answer:
- What ingress created it?
- What command does it represent?
- What context applies?
- What policy checks apply?
- Is it sync or async?
- Does it need approval?
- Does it call a connector?
- Does it create artifacts?
- Does it write memory?
- Does it notify someone?
- What state lifecycle applies?
- What audit events must exist?
Template:
task:
name: Generate Report
command_type: generate_report
ingress_types:
- user_request
- scheduled_trigger
required_inputs:
- report_type
- date_range
execution:
mode: async
function_name: generate_report
policies:
- permission
- cost
- data_access
approval:
required: false
outputs:
artifacts:
- report
memory: []
notifications:
- app
lifecycle:
- created
- validated
- queued
- running
- succeeded
Section 27 Agentic task mapping
To map an agentic task, answer:
- What user/system goal is the agent pursuing?
- What tools may the agent call?
- What memories may the agent read?
- What memories may the agent write?
- What tool calls require policy checks?
- What actions require approval?
- Can the agent create new commands?
- Can the agent create artifacts?
- What should be audited at each loop?
- How does the loop terminate?
Template:
agent_workflow:
name: Investigate Revenue Anomaly
command_type: investigate_revenue_anomaly
allowed_tools:
- run_sql
- retrieve_memory
- create_report_draft
- request_human_input
forbidden_tools:
- send_external_email
- mutate_production_table
memory:
read_scopes:
- user
- team
- project
write_allowed: true
write_requires_policy: true
approval_gates:
- publish_report
- send_notification_external
termination:
conditions:
- final_answer_created
- human_cancelled
- max_steps_reached
- policy_denied
Section 28 Agent loop standard
from dataclasses import dataclass
from typing import Any
@dataclass
class AgentAction:
action_type: str
tool_name: str
payload: dict[str, Any]
reason: str
risk_level: str = "low"
@dataclass
class AgentStepResult:
decision: str
command_id: str | None
output: dict[str, Any]
message: str
class PrimitiveGateway:
def submit_agent_action(
self,
action: AgentAction,
context: Context,
) -> AgentStepResult:
"""Convert an agent action into a governed command or task.
The implementation should:
1. Create command.
2. Evaluate policy.
3. Execute or request approval.
4. Return structured result.
"""
raise NotImplementedError
This prevents the agent from directly executing unsafe actions.
Section 29 Sync vs async standards
Fast, low-risk, bounded
- operation is fast
- operation is low risk
- operation has bounded latency
- operation can return inside the request budget
- operation does not need retries beyond caller retry
Slow, expensive, multi-step
- may take longer than request budget
- calls slow external APIs
- creates durable artifacts
- needs retries
- has approval gates
- is expensive
- has multiple steps
Examples:
- validate input — sync
- preview SQL — sync if small
- generate report — async
- sync connector — async
- send approved notification — async or sync depending on provider
- write memory — sync if low risk, approval-gated if sensitive
- agent tool call — governed sync or async depending on tool
Section 30 Cancellation standard
Every long-running workflow should be cancellable. Each command carries a cancellation_mode column with one of two values.
30.1 · graceful (default)
The workflow finishes its current step and then exits cleanly. The runtime's hard workflow.cancel() is not called; instead, each step's prelude checks command.cancellation_requested and a typed Cancelled exception routes to the workflow's exit branch. Side effects already in flight inside the current step are allowed to complete; side effects beyond it are skipped.
Transitions: running → cancelling → cancelled.
30.2 · compensate_then_stop
The workflow stops issuing new operation steps and walks the effect chain in reverse. For every domain_effects row in status succeeded with a declared compensation, the compensation is enqueued (reverse declaration order). The chain runs as a runtime sub-workflow with its own audit trail under purpose='audit'.
Transitions: running → cancelling → compensating → compensated → cancelled.
requests cancellation"] --> S{"State allows?"} S -->|no| KEEP[no-op] S -->|yes| M{"cancellation_mode"} M -->|graceful| G1[current step finishes] G1 --> G2[future steps skip via prelude check] G2 --> C1["command → cancelled"] M -->|compensate_then_stop| K1["command → compensating"] K1 --> K2[walk effects in reverse] K2 --> K3[enqueue each compensation] K3 --> K4["command → compensated"] K4 --> C1 C1 --> AU[/"domain_events audit"/] style S fill:#F5E0D2,stroke:#D97757 style M fill:#F5E0D2,stroke:#D97757 style AU fill:#F1F2EC,stroke:#6B7B5A
Cancellation propagates: from a swarm to its agent runs (§64), from a parent command to dependent children where propagate_cancellation is true (§13.4), and from a child workflow upward when marked terminal-on-child-failure.
Running steps should periodically check the command's cancellation state before performing irreversible side effects. graceful only protects against new steps, not against side effects mid-flight inside the currently running one.
immediate (hard cancel without grace) and checkpoint_then_stop (run to next safe checkpoint) are reserved as future modes; both are deferred until a concrete use case forces them.
Section 31 Compensation standard
Compensation is not rollback. It is a forward action that counteracts a side effect.
Examples:
- created draft — delete draft
- published artifact — unpublish artifact
- sent wrong notification — send correction
- created external ticket — close ticket
- granted access — revoke access
Compensation works in three layers: a declarative manifest at registration, a graph validator that runs at startup, and a drift detector that runs at execution time.
Manifest at registration
Every operation declares the side effects it produces. Every compensation declares what it counters and what (if any) new effects it itself produces.
Graph validator at startup
On catalog load, Concord builds the effect/compensation graph and validates acyclicity, depth ≤ N, pure-counter invariants, and runtime capability matches.
Runtime drift detector
Each compensation step is wrapped to record every effect actually emitted. Drift from the declared manifest writes a compensation_drift audit row and alerts on-call.
31.1 · Manifest
Operations and their compensations carry typed declarations. Most compensations are pure-counter: their only produced effects are inverses of the parent's. Complex compensations that themselves require further compensation are allowed but must declare the cascade explicitly.
from concord.effects import operation, compensation, ExternalCall
@operation(
produces=[ExternalCall("hotel_booking.book")],
requires_compensation=True,
)
def book_hotel(...): ...
@compensation(
of=book_hotel,
produces=[ExternalCall("hotel_booking.cancel")],
counter_effects=True, # explicit: these are inverses, not new work
)
def cancel_hotel_reservation(...): ...
31.2 · Graph validator
At catalog load the validator rejects:
- Operations marked
requires_compensation=Truewith no registered compensation. - Compensation chains exceeding
max_depth(default 2). - Cycles in the effect → compensation → counter-effect graph.
counter_effects=Truedeclarations whose produced effects are not the inverses of the parent's.- Catalogs requiring
SAGA_COMPENSATION_NATIVEwhen the active runtime adapter doesn't declare that capability (see §41).
These are registration-time errors. The app refuses to start until the catalog is internally consistent and runnable against the chosen runtime.
31.3 · Drift detector
At runtime, each compensation step is wrapped. Effects actually emitted are compared against the declared manifest; any drift writes an audit row.
@runtime.step(**compile_policy("compensation"))
def run_compensation_step(effect_id: str) -> dict:
declared = load_manifest_for(effect_id)
with concord.effects.intercept() as recorder:
result = execute_compensation(effect_id)
drift = recorder.emitted - declared.expected
if drift:
write_domain_event(
event_type="compensation_drift",
purpose="audit",
payload={"effect_id": effect_id, "drift": list(drift)},
)
return result
The detector is what catches incomplete manifests, conditional side effects the declaration didn't anticipate, and genuine implementation bugs. Together with the manifest and the validator, it is good enough to call a compensation contract honest — every drift is recorded, named, and auditable.
The default DBOS adapter runs compensation chains as Concord-orchestrated sub-workflows. This is correct but weaker than native saga atomicity: if a compensation fails after its retries exhaust, the chain halts at that point and a drift row records the partial completion. Compensation-heavy domains (financial transactions, multi-leg bookings, regulated workflows) should choose a runtime that declares SAGA_COMPENSATION_NATIVE (see §41) — Temporal is the natural fit.
Section 32 Error taxonomy
Failures should be classified.
This classification drives retry behavior.
Transient by nature
- transient_connector_error
- rate_limited
- timeout
- temporary_database_error
Logical failures
- validation_error
- policy_denied
- approval_rejected
- permission_denied
- malformed_payload
Section 33 Retry and backoff
Retry mechanics belong to the runtime; the retry contract belongs to Concord. Concord declares, per operation, which error classes are retryable and how aggressively. The runtime adapter receives this as ordinary configuration.
The contract carries: attempt, max_attempts, run_after, last_error, error_class. Each operation has a registered RetryPolicy:
@dataclass(frozen=True)
class RetryPolicy:
operation: str
retryable: frozenset[ErrorClass]
max_attempts: int = 3
backoff_seconds: list[int] = field(default_factory=lambda: [30, 120, 600])
requires_idempotency_key: bool = False
A single compile step translates the contract into runtime step kwargs. This is the only sanctioned way a connector step gets its retry configuration; no ad-hoc retry numbers in decorators.
from concord.retry import compile_policy
@runtime.step(**compile_policy("hotel_booking.book"))
def book_hotel_step(...): ...
Inside the step, the classifier from §32 runs before exceptions propagate. A validation_error raised inside book_hotel_step is converted to a non-retryable exception class regardless of max_attempts; a transient_connector_error raises a class the runtime knows to retry. Test the rule, not just the policy: a step that emits a non-retryable class must never retry, irrespective of decorator config.
A typical backoff schedule:
attempt 1 → retry after 30 seconds
attempt 2 → retry after 2 minutes
attempt 3 → retry after 10 minutes
attempt 4 → fail permanently
Retries should be idempotent. If a side effect may have happened, the retry should check before repeating it. Effect-level idempotency keys (on domain_effects) are the source of truth for external APIs that support them; the runtime's step-level idempotency only protects against re-execution within a workflow.
Section 34 Connector safety standards
Every connector call should record: connector name, capability, input hash or redacted input, output summary, status, latency, error, command_id, task_run_id, trace_id.
Never store raw secrets in connector configs.
Connector config should reference secrets, not contain them:
{
"auth_mode": "oauth",
"secret_ref": "vault://github/app/token",
"scopes": ["repo:read", "issues:write"]
}
Section 35 Postgres transaction boundaries
Recommended transaction boundaries:
Command creation
Insert command · insert audit event · commit.
Policy and planning
Update command state · insert policy decisions · update plan · insert audit · commit.
Enqueue
Update command state to queued · insert task run · insert audit · commit.
Worker completion
Update task run result · update command result/state · insert artifacts/memories · insert audit · commit.
Avoid holding transactions while calling external APIs.
Section 36 API standards
Submit command
POST /commands
Request:
{
"task_name": "Generate Report",
"payload": {
"report_type": "monthly_revenue",
"date_range": "2026-05"
},
"idempotency_key": "generate_report:monthly_revenue:2026-05"
}
Response:
{
"command_id": "cmd_123",
"state": "queued",
"status": "queued",
"trace_id": "trace_abc"
}
Get command
GET /commands/{command_id}
Response:
{
"command_id": "cmd_123",
"command_type": "generate_report",
"state": "succeeded",
"result": {
"artifact_id": "art_456"
}
}
Resolve approval
POST /approvals/{approval_id}/resolve
Request:
{
"decision": "approved",
"reason": "Looks good."
}
Agent action
POST /agent-actions
Request:
{
"tool_name": "publish_report",
"payload": {
"artifact_id": "art_456",
"destination": "external:finance@example.com"
},
"reason": "The user asked to publish the final report."
}
Response:
{
"decision": "require_approval",
"command_id": "cmd_789",
"approval_id": "appr_123"
}
Section 37 Example · deterministic flow
A full Concord design doc for a deterministic vendor-data sync (scheduled trigger, retry taxonomy, idempotent upsert, no human approvals) is available as a worked example: vendor-data-sync · Concord design ↗
User asks: Generate the May revenue report.
State: created → validated → queued → running → succeeded
Section 38 Example · approval-gated flow
A full Concord design doc for a hotel reservation flow (approval gate at >$500, vendor connector with idempotency, compensate_then_stop cancellation with vendor cancel as compensation) is available as a worked example: hotel-booking · Concord design ↗
User asks: Publish the May revenue report to finance@example.com.
State: created → validated → waiting_for_approval → approved → queued → running → succeeded
Section 39 Example · agentic flow
A full Concord design doc for an agentic revenue-anomaly investigation (coordinator + optional analyst sub-agent, governed SQL tool, memory with consent gate, approval before external sharing, read-only) is available as a worked example: revenue-investigation-swarm · Concord design ↗
User asks: Investigate why revenue dropped last week and draft a summary.
State: created → validated → running → succeeded (or escalates to approval if external action proposed).
Section 40 Framework module structure
Recommended code layout:
concord/
__init__.py
core/
models.py
states.py
errors.py
ids.py
persistence/
postgres.py
migrations/
001_core.sql
registry/
tasks.py
tools.py
connectors.py
policies.py
engine/
command_service.py
policy_engine.py
planner.py
executor.py
worker.py
approvals.py
memory.py
artifacts.py
audit.py
connectors/
base.py
postgres.py
http.py
notification.py
databricks.py
github.py
agents/
gateway.py
protocols.py
memory_context.py
api/
routes.py
schemas.py
examples/
deterministic_report.py
approval_flow.py
agentic_investigation.py
Section 41 Minimal service interfaces
CommandService
class CommandService:
def submit(
self,
task_name: str,
payload: dict,
context: Context,
idempotency_key: str | None = None,
) -> dict:
raise NotImplementedError
def get(self, command_id: str) -> dict:
raise NotImplementedError
def cancel(self, command_id: str, actor: str, reason: str | None = None) -> dict:
raise NotImplementedError
PolicyEngine
class PolicyEngine:
def evaluate(
self,
command: Command,
task: TaskSpec,
context: Context,
) -> PolicyResult:
raise NotImplementedError
Planner
class Planner:
def create_plan(
self,
command: Command,
task: TaskSpec,
policy_result: PolicyResult,
) -> ExecutionPlan:
raise NotImplementedError
Executor
class Executor:
def execute_sync(self, command: Command, plan: ExecutionPlan) -> dict:
raise NotImplementedError
def enqueue_async(self, command: Command, plan: ExecutionPlan) -> dict:
raise NotImplementedError
Worker
class Worker:
def claim_next(self) -> dict | None:
raise NotImplementedError
def run_once(self) -> dict | None:
raise NotImplementedError
DurableRuntime
The durable runtime is a protocol, not an implementation. Concord's domain layer imports the protocol; the runtime is supplied at app startup. The default implementation wraps DBOS; future adapters wrap Temporal, Restate, or in-house equivalents. Each adapter publishes a capabilities set so the catalog can be validated against the runtime at registration time.
from enum import StrEnum
from typing import ClassVar, Protocol
class RuntimeCapability(StrEnum):
DURABLE_WORKFLOWS = "durable_workflows"
DURABLE_STEPS = "durable_steps"
QUEUES = "queues"
SCHEDULES = "schedules"
SIGNALS = "signals"
SUBWORKFLOWS = "subworkflows"
EFFECT_INTERCEPTION = "effect_interception"
SAGA_COMPENSATION_NATIVE = "saga_compensation_native"
WORKFLOW_VERSIONING = "workflow_versioning"
class DurableRuntime(Protocol):
capabilities: ClassVar[frozenset[RuntimeCapability]]
def submit_workflow(self, spec: WorkflowSpec) -> WorkflowHandle: ...
def wait_for_result(self, handle: WorkflowHandle) -> WorkflowResult: ...
def cancel(self, handle: WorkflowHandle, mode: CancellationMode) -> None: ...
def enqueue_step(self, queue_name: str, step_spec: StepSpec) -> StepHandle: ...
def schedule(self, schedule_spec: ScheduleSpec) -> ScheduleHandle: ...
def signal(self, handle: WorkflowHandle, signal_name: str, payload: dict) -> None: ...
Adapter capability matrix:
| Capability | DBOS | Temporal | Notes |
|---|---|---|---|
| DURABLE_WORKFLOWS | ✓ | ✓ | Table stakes. |
| DURABLE_STEPS | ✓ | ✓ (activities) | |
| QUEUES | ✓ | ✓ (task queues) | |
| SCHEDULES | ✓ | ✓ (cron schedules) | |
| SIGNALS | partial | ✓ (signals + queries) | DBOS approval-wait pattern; Temporal more general. |
| SUBWORKFLOWS | ✓ | ✓ (child workflows) | |
| EFFECT_INTERCEPTION | ✓ | ✓ | Concord wraps every step; adapter just needs hook points. |
| SAGA_COMPENSATION_NATIVE | ✗ | ✓ | DBOS runs chains as Concord-orchestrated sub-workflows; see §31. |
| WORKFLOW_VERSIONING | ✗ | ✓ | Adopt Temporal when versioning becomes load-bearing. |
At catalog load, Concord checks the union of capabilities required by registered operations against the active adapter's capabilities. Missing capabilities surface as startup errors — never as first-failure runtime errors.
MemoryStore
class MemoryStore(Protocol):
def insert(self, memory: Memory) -> None: ...
def get(self, memory_id: str) -> Memory | None: ...
def search(self, scope: MemoryScope, query: str | None, limit: int) -> list[Memory]: ...
def supersede(self, old_id: str, new_id: str) -> None: ...
def delete(self, memory_id: str) -> None: ...
Section 42 Extensibility model
The framework should be extensible through registries.
To add a new capability:
- Register connector if needed.
- Register tool/function.
- Register task spec.
- Register policies.
- Define output artifacts/memory/notifications.
- Add tests.
No core engine changes should be required for most new capabilities.
Section 43 Future connector model
A future connector should be able to expose capabilities like:
connector:
name: github
type: github
capabilities:
- search_issues
- create_issue
- comment_on_pr
auth:
mode: oauth
scopes:
- issues:read
- issues:write
rate_limits:
requests_per_minute: 60
Tool:
tool:
name: create_github_issue
connector: github
capability: create_issue
execution_mode: async
risk_level: medium
requires_approval: false
input_schema:
type: object
required:
- repo
- title
properties:
repo:
type: string
title:
type: string
body:
type: string
The planner can route tool execution through the connector registry.
Section 44 Future agent model
Agents should see a constrained tool catalog. For each task, define:
This allows future replacement of the agent framework without rewriting the workflow system.
Section 45 Security model
Security layers:
A connector credential does not imply a user may use the connector.
The framework must check:
- who is requesting
- what they are trying to do
- what connector/tool is involved
- what data will be accessed
- what side effects may occur
- whether approval is required
Section 46 Data safety model
Data safety policies should classify:
Data safety outcomes:
Agentic workflows must apply data safety checks before:
- external tool call
- external notification
- memory write
- artifact publication
- connector sync
Section 47 Testing standards
Test at four levels.
Unit tests
State transitions · primitive mapping · policy decisions · planner outputs · idempotency behavior.
Integration tests
Postgres persistence · queue claiming · worker retry · approval resume · connector call recording.
Workflow tests
Generate report end-to-end · approval-gated publish · webhook deduplication · agent tool proposal with approval.
Safety tests
Policy denial · external sharing approval · memory consent · forbidden tool calls · duplicate webhook event.
Boundary discipline
Import-time boundary check: concord/core/ and concord/domain/ must not import the runtime; only concord/runtime/<adapter>.py may. An AST scanner (concord_boundary_check.py) runs in CI and fails the build on violation.
The boundary check is the only "test" that runs on every commit before the suite — it's a 200-line AST scanner with no install footprint. Rules:
| Path | Disallowed | Why |
|---|---|---|
concord/core/** | dbos, temporalio | The functional core has no runtime knowledge. |
concord/domain/** | dbos, temporalio | Domain speaks the protocol; never an implementation. |
concord/runtime/*.py (≠ dbos.py) | dbos | Adapter isolation. One file binds the implementation. |
concord/runtime/*.py (≠ temporal.py) | temporalio | Same rule shape; future adapter. |
Section 48 Operational dashboards
Minimum dashboards
Operational alerts
Section 49 Development roadmap
Logical core"] --> P2["Phase 2
Postgres backend"] P2 --> P3["Phase 3
Worker"] P3 --> P4["Phase 4
Approvals & notifications"] P4 --> P5["Phase 5
Connectors"] P5 --> P6["Phase 6
Agent gateway"] P6 --> P7["Phase 7
Hardening"] style P1 fill:#FAF8F2,stroke:#141413 style P2 fill:#FAF8F2,stroke:#141413 style P3 fill:#F5E0D2,stroke:#D97757 style P4 fill:#F5E0D2,stroke:#D97757 style P5 fill:#F1F2EC,stroke:#6B7B5A style P6 fill:#EFE6F0,stroke:#7A5560 style P7 fill:#F7F1E0,stroke:#B68A2E
Logical core
TaskSpec · Command · PolicyResult · ExecutionPlan · primitive mapper · state machine · audit model · in-memory store.
Postgres backend
Migrations · command / task queue / approval / audit / memory / artifact repositories.
Worker
Queue claiming · lease renewal · retry/backoff · task execution · result persistence · failure classification.
Approvals & notifications
Approval API · approval UI · notification connector · resume · expiration.
Connectors
Base interface · HTTP · notification · Databricks · GitHub · tool registry.
Agent gateway
Action protocol · tool allowlist · memory retrieval · policy-gated calls · trace events · approval-gated agent actions.
Hardening
Rate limits · cost accounting · data safety policies · compensation · cancellation · dashboards · retention.
Section 50 What this framework is not
A longer-form treatment of Concord's positioning — including a comparison across nine adjacent categories (durable runtimes, agent frameworks, BPM platforms, DevOps systems, policy engines, data DAG orchestrators, observability tools, iPaaS / connector platforms, memory systems) — lives at What Concord is & isn't ↗.
Concord is not
- a durable execution runtime (DBOS, Temporal, Cadence do that)
- a distributed compute engine
- a full BPMN engine
- a replacement for Postgres
- a replacement for connector APIs
- a replacement for an LLM agent framework
- a full data pipeline orchestrator
- a framework that demands wholesale adoption
Concord is
- a library of contracts (
pip install concord) - a policy, approval, and state model
- a connector and agent governance layer
- an agent-safe tool gateway
- a Postgres-backed system of record for domain meaning
- a thin layer that delegates execution to a durable runtime
Section 51 Final vision
Concord should make every action in the system look like this:
This gives you one coherent framework for:
The key architectural decision:
owns truth"] --- B["Framework
owns primitives"] B --- C["Connectors
own capabilities"] C --- D["Workers
own execution"] D --- E["Agents
propose actions"] E --- F["Policy
decides what is allowed"] F --- G["Audit
explains what happened"] style A fill:#FFFFFF,stroke:#141413,stroke-width:1.5px style B fill:#FAF8F2,stroke:#141413 style C fill:#F1F2EC,stroke:#6B7B5A style D fill:#FAF8F2,stroke:#141413 style E fill:#EFE6F0,stroke:#7A5560 style F fill:#F5E0D2,stroke:#D97757,stroke-width:1.5px style G fill:#FAF8F2,stroke:#141413
Multi-agent swarms, subagent spawning, agentic execution
A governed extension to the primitive layer. Agents propose actions; Concord governs, records, executes, and audits them.
Single-agent runs, parallel swarms, hierarchical delegation, and reviewer agents are all compositions over the same primitives — no new fundamentals required.
Section 52 Agentic design philosophy
Concord supports agentic execution as a governed extension of the primitive system. Agents propose; the framework decides.
52.1 · Agents are participants, not infrastructure
An agent is a participant that can interpret context, propose commands, call tools, delegate work, spawn subagents, produce artifacts, request memory reads/writes, ask for human approval, and evaluate or synthesize results. But every consequential action passes through the same primitive gateway used by deterministic workflows.
This is what makes deterministic and agentic workflows interoperable.
52.2 · Swarms are workflow structures, not special cases
A swarm is a coordinated group of agent runs serving a parent command or workflow. A swarm can be sequential, parallel, competitive, hierarchical, review-driven, or human-supervised — but it is still composed from the same primitives:
Command → Policy → Plan → AgentRun → TaskRun → Artifact → Memory → Audit
52.3 · Subagent spawning is a governed operation
An agent must not directly create uncontrolled child agents. Subagent spawning is itself a command — spawn_subagent — subject to:
52.4 · Postgres remains the system of record
All durable agentic state lives in Postgres: swarm runs, agent runs, invocations, steps, tool calls, delegated goals, parent-child relationships, artifacts, memory reads/writes, approvals, cancellations, audit.
The LLM/agent runtime can be replaced. The Postgres-backed execution record must remain stable.
Section 53 Core agentic concepts
Five core objects model agentic execution.
SwarmRun
A coordinated multi-agent execution that belongs to one parent command. Defines objective, participants, coordinator, join strategy, and hard limits.
AgentRun
One execution of one agent role: coordinator, planner, researcher, reviewer, worker, memory manager, domain expert, or connector-specific agent.
AgentInvocation
Records a parent agent spawning or delegating to a child agent — delegated goal, constraints, allowed tools, memory scope, budget, max steps, spawn depth.
AgentStep
One decision, action, or observation inside an agent run. The agentic equivalent of a trace event — but durable and queryable.
JoinStrategy
Defines how subagent outputs are combined: all_success, first_success, quorum, coordinator_synthesis, evaluator_selection, human_review, best_of_n, map_reduce, consensus, ranked_review.
An AgentRun may create
AgentStep action types
Join strategies
Section 54 Agentic hierarchy
54.1 · Standard hierarchy
54.2 · Parent-child relationships
Every child agent should have:
This enables recursive execution while preserving control.
Section 55 Swarm execution modes
Five recurring patterns. Choose by the shape of the work.
Sequential
Parent delegates to one child at a time. Use when each step depends on the previous, when debuggability matters, when cost and control beat latency.
Parallel
Parent spawns subagents concurrently. Use when work decomposes cleanly, latency matters, multiple connectors can be explored at once.
Competitive
Multiple agents attempt the same task; an evaluator selects the best. Use when output quality matters or confidence comes from comparison.
Review-driven
Reviewer agents evaluate proposed outputs before publication. Use when outputs are externally visible, sensitive, or costly to get wrong.
Hierarchical
Agents can spawn subagents, within hard limits. Use when a coordinator cannot plan all subtasks upfront and decomposition must be flexible.
Always enforce in hierarchical mode
Section 56 Mapping swarms to existing primitives
No new fundamental primitive is required. Swarms are compositions over existing primitives.
| Agentic action | Concord mapping |
|---|---|
| Start swarm | Command → Policy → Plan → SwarmRun |
| Spawn subagent | Command → Policy → AgentInvocation → AgentRun |
| Agent tool call | Command → Policy → Sync/Async Function |
| Agent long-running tool | Command → Policy → Queue → Async Task |
| Agent asks human | Human Approval |
| Agent writes memory | Memory Write (candidate, policy-checked) |
| Agent creates output | Artifact Write |
| Agent notifies user | Notification |
| Agent stops work | Cancellation |
| Agent reverses side effect | Compensation |
| Agent result review | Evaluation / Quality Check |
| Agent trace | Audit / AgentStep |
Section 57 Postgres schema additions
Five new tables reference the existing pf_commands, pf_task_runs, pf_artifacts, pf_memory, pf_approvals, and pf_audit_log tables.
57.1 · swarm_runs
CREATE TABLE IF NOT EXISTS swarm_runs (
swarm_run_id UUID PRIMARY KEY,
command_id UUID NOT NULL,
workflow_run_id UUID NULL,
coordinator_agent_run_id UUID NULL,
status TEXT NOT NULL,
objective TEXT NOT NULL,
execution_mode TEXT NOT NULL,
join_strategy TEXT NOT NULL,
max_agents INT NOT NULL DEFAULT 5,
max_depth INT NOT NULL DEFAULT 2,
max_total_steps INT NOT NULL DEFAULT 100,
max_cost_units NUMERIC NULL,
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
started_at TIMESTAMPTZ NULL,
completed_at TIMESTAMPTZ NULL,
cancelled_at TIMESTAMPTZ NULL
);
Recommended statuses
57.2 · agent_runs
CREATE TABLE IF NOT EXISTS agent_runs (
agent_run_id UUID PRIMARY KEY,
command_id UUID NOT NULL,
swarm_run_id UUID NULL,
parent_agent_run_id UUID NULL,
agent_name TEXT NOT NULL,
agent_role TEXT NOT NULL,
agent_version TEXT NULL,
status TEXT NOT NULL,
goal TEXT NOT NULL,
allowed_tools JSONB NOT NULL DEFAULT '[]',
allowed_connectors JSONB NOT NULL DEFAULT '[]',
memory_scope JSONB NOT NULL DEFAULT '{}',
context_scope JSONB NOT NULL DEFAULT '{}',
max_steps INT NOT NULL DEFAULT 20,
step_count INT NOT NULL DEFAULT 0,
spawn_depth INT NOT NULL DEFAULT 0,
model_config JSONB NOT NULL DEFAULT '{}',
runtime_config JSONB NOT NULL DEFAULT '{}',
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
started_at TIMESTAMPTZ NULL,
completed_at TIMESTAMPTZ NULL,
cancelled_at TIMESTAMPTZ NULL
);
Recommended statuses
57.3 · agent_invocations
CREATE TABLE IF NOT EXISTS agent_invocations (
invocation_id UUID PRIMARY KEY,
swarm_run_id UUID NOT NULL,
parent_agent_run_id UUID NOT NULL,
child_agent_run_id UUID NOT NULL,
invocation_type TEXT NOT NULL,
delegated_goal TEXT NOT NULL,
constraints JSONB NOT NULL DEFAULT '{}',
allowed_tools JSONB NOT NULL DEFAULT '[]',
allowed_connectors JSONB NOT NULL DEFAULT '[]',
memory_scope JSONB NOT NULL DEFAULT '{}',
status TEXT NOT NULL DEFAULT 'created',
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
completed_at TIMESTAMPTZ NULL
);
Invocation types
57.4 · agent_steps
CREATE TABLE IF NOT EXISTS agent_steps (
agent_step_id UUID PRIMARY KEY,
agent_run_id UUID NOT NULL,
command_id UUID NOT NULL,
swarm_run_id UUID NULL,
step_index INT NOT NULL,
action_type TEXT NOT NULL,
tool_name TEXT NULL,
action_payload JSONB NOT NULL DEFAULT '{}',
decision TEXT NOT NULL,
observation JSONB NULL,
output JSONB NULL,
created_command_id UUID NULL,
created_task_run_id UUID NULL,
created_artifact_id UUID NULL,
created_memory_id UUID NULL,
created_approval_id UUID NULL,
child_agent_run_id UUID NULL,
latency_ms INT NULL,
token_usage JSONB NULL,
cost_units NUMERIC NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
57.5 · agent_messages
Optional but useful for chat-style and collaborative agent systems.
CREATE TABLE IF NOT EXISTS agent_messages (
agent_message_id UUID PRIMARY KEY,
swarm_run_id UUID NULL,
agent_run_id UUID NOT NULL,
parent_agent_run_id UUID NULL,
message_role TEXT NOT NULL,
message_type TEXT NOT NULL,
content TEXT NOT NULL,
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
Roles & types
message_role
message_type
57.6 · Recommended indexes
CREATE INDEX IF NOT EXISTS idx_swarm_runs_command_id ON swarm_runs(command_id);
CREATE INDEX IF NOT EXISTS idx_swarm_runs_status ON swarm_runs(status);
CREATE INDEX IF NOT EXISTS idx_agent_runs_swarm_run_id ON agent_runs(swarm_run_id);
CREATE INDEX IF NOT EXISTS idx_agent_runs_parent_agent_run_id ON agent_runs(parent_agent_run_id);
CREATE INDEX IF NOT EXISTS idx_agent_runs_status ON agent_runs(status);
CREATE INDEX IF NOT EXISTS idx_agent_invocations_parent ON agent_invocations(parent_agent_run_id);
CREATE INDEX IF NOT EXISTS idx_agent_invocations_child ON agent_invocations(child_agent_run_id);
CREATE INDEX IF NOT EXISTS idx_agent_steps_agent_run_id ON agent_steps(agent_run_id);
CREATE INDEX IF NOT EXISTS idx_agent_steps_command_id ON agent_steps(command_id);
CREATE INDEX IF NOT EXISTS idx_agent_steps_action_type ON agent_steps(action_type);
Section 58 Swarm & agent state transitions
58.1 · Swarm state transitions
Terminal states: succeeded, failed, cancelled, partially_succeeded, expired.
58.2 · Agent state transitions
Terminal states: succeeded, failed, cancelled, expired.
Section 59 Standards for subagent spawning
59.1 · Spawn request contract
from dataclasses import dataclass, field
from typing import Any
@dataclass
class SpawnSubagentRequest:
parent_agent_run_id: str
swarm_run_id: str
agent_name: str
agent_role: str
delegated_goal: str
allowed_tools: list[str]
allowed_connectors: list[str]
memory_scope: dict[str, Any]
context_scope: dict[str, Any]
constraints: dict[str, Any] = field(default_factory=dict)
max_steps: int = 10
59.2 · Spawn result contract
from dataclasses import dataclass, field
@dataclass
class SpawnSubagentResult:
decision: str
child_agent_run_id: str | None
command_id: str | None
reasons: list[str] = field(default_factory=list)
Allowed decisions:
59.3 · Spawn policy checks
59.4 · Example spawn policy
def evaluate_spawn_policy(request, swarm, parent_agent):
reasons = []
if parent_agent["spawn_depth"] + 1 > swarm["max_depth"]:
reasons.append("Spawn depth exceeded.")
if swarm["current_agent_count"] >= swarm["max_agents"]:
reasons.append("Swarm agent limit exceeded.")
allowed_roles = parent_agent.get("allowed_child_roles", [])
if allowed_roles and request.agent_role not in allowed_roles:
reasons.append(f"Role not allowed: {request.agent_role}")
parent_tools = set(parent_agent.get("allowed_tools", []))
if not set(request.allowed_tools).issubset(parent_tools):
reasons.append("Child requested tools outside parent scope.")
parent_connectors = set(parent_agent.get("allowed_connectors", []))
if not set(request.allowed_connectors).issubset(parent_connectors):
reasons.append("Child requested connectors outside parent scope.")
if reasons:
return {"decision": "denied", "reasons": reasons}
return {"decision": "allowed", "reasons": []}
Section 60 Tool calls from agents
60.1 · Tool calls must become commands
An agent tool call should not directly execute the tool. It should create a command or child task run.
Agent wants to call run_sql
→ create command: run_sql
→ policy checks permission / data scope
→ execute sync or async
→ return observation to agent
→ write audit
60.2 · Tool call contract
from dataclasses import dataclass, field
from typing import Any
@dataclass
class AgentToolCallRequest:
agent_run_id: str
swarm_run_id: str | None
tool_name: str
payload: dict[str, Any]
reason: str
expected_output: str
idempotency_key: str | None = None
metadata: dict[str, Any] = field(default_factory=dict)
60.3 · Tool call routing
Section 61 Memory rules for swarms
Memory semantics — scope, consent, supersession, candidate writes, the MemoryStore protocol — are defined in §23 Memory architecture. This section adds only the swarm-specific deltas.
61.1 · Inheritance is subset, not union
Child agents do not automatically inherit parent memory access. The required invariant: child_memory_scope ⊆ parent_memory_scope. Spawn requests that violate this are rejected at policy time (see §59 Subagent spawning).
61.2 · Per-agent memory scope shape
Each AgentRun carries an explicit memory scope. The scope is bound at spawn time and immutable for the run.
{
"subject_type": "user",
"subject_id": "user_123",
"allowed_memory_types": ["preference", "workflow_preference"],
"allow_semantic_retrieval": true,
"max_results": 10
}
Section 62 Connector rules for swarms
Connectors should be scoped explicitly per agent.
{
"allowed_connectors": [
"postgres",
"databricks",
"github",
"slack"
]
}
A child agent should never gain connector access that the parent does not have. Recommended: child_connector_scope ⊆ parent_connector_scope.
For future connectors, define
Section 63 Artifact rules for swarms
Artifact semantics — types, statuses, lifecycle, and the distinction from effects — are defined in §24 Artifact architecture. This section adds only the swarm-specific delta.
63.1 · Join through artifacts, not chat
A coordinator should consume child outputs as artifacts, not as ephemeral chat messages. Every subagent result that matters is committed as a row in artifacts with a typed artifact_type; the coordinator's join strategy reads from there, not from agent message streams.
Typical swarm artifact types
Section 64 Cancellation semantics
Cancellation modes (graceful, compensate_then_stop) and per-command state transitions are defined in §30 Cancellation standard. This section adds only the swarm-specific delta: how cancellation cascades across a multi-agent execution.
64.1 · Cascade flow
The parent's cancellation_mode propagates: a graceful parent cancel triggers a graceful exit on each child agent run; a compensate_then_stop parent cancel triggers compensation in each child whose effects fired (reverse declaration order per child, then up the chain).
Recommended audit fields on cancel:
{
"cancellation_mode": "graceful",
"cancel_reason": "user_requested",
"cancelled_by": "user_123"
}
Section 65 Compensation semantics
The compensation contract — the three-layer design (manifest, registration validator, runtime drift detector) — is defined in §31 Compensation standard. This section adds the swarm-specific delta: who runs the compensation chain and who proposes it.
Agents may propose compensation, but Concord executes compensation through governed commands. A compensation proposed by an agent goes through the same manifest validation and drift detection as one declared in the catalog at registration time.
65.1 · Typical swarm side-effect → compensation mapping
| Side effect | Compensation |
|---|---|
| Created draft artifact | Mark artifact cancelled |
| Started external job | Cancel job |
| Sent notification | Send correction |
| Wrote staging table | Drop or mark stale |
| Created approval request | Expire approval |
| Wrote memory | Supersede or delete memory |
| Opened GitHub PR | Close PR or mark draft |
Section 66 Quality & evaluation
Swarms should support evaluation as a first-class step. Evaluation can be deterministic or agentic.
Deterministic
- SQL validates
- row counts reconcile
- JSON schema is valid
- artifact exists
- confidence score exceeds threshold
- PII check passes
- cost is below budget
Agentic
- reviewer agent critiques answer
- evaluator agent ranks candidates
- safety agent reviews external output
- domain agent validates reasoning
Agentic evaluation must still write structured results.
{
"evaluation_type": "reviewer_agent",
"decision": "pass",
"confidence": 0.91,
"issues": [],
"recommendation": "publish"
}
Section 67 Standard YAML specification
A complete swarm declared in YAML:
name: revenue_report_swarm
objective: Generate and review a monthly revenue report.
execution_mode: parallel
join_strategy: coordinator_synthesis
limits:
max_agents: 4
max_depth: 1
max_total_steps: 80
max_cost_units: 50
coordinator:
agent_name: revenue_report_coordinator
agent_role: coordinator
max_steps: 20
allowed_tools:
- spawn_subagent
- read_artifact
- create_artifact
- request_approval
allowed_connectors:
- postgres
- databricks
agents:
- agent_name: revenue_researcher
agent_role: researcher
delegated_goal: Gather source data and assumptions.
max_steps: 15
allowed_tools:
- run_sql
- retrieve_memory
- create_artifact
allowed_connectors:
- postgres
- databricks
- agent_name: revenue_analyst
agent_role: analyst
delegated_goal: Compute metrics and produce analysis.
max_steps: 15
allowed_tools:
- run_sql
- create_artifact
allowed_connectors:
- postgres
- databricks
- agent_name: revenue_reviewer
agent_role: reviewer
delegated_goal: Validate the final report and identify risks.
max_steps: 10
allowed_tools:
- read_artifact
- evaluate_output
- request_human_input
allowed_connectors:
- postgres
Section 68 Standard Python interfaces
The core service interfaces — CommandService, PolicyEngine, Planner, Executor, Worker, DurableRuntime, MemoryStore — live in §41 Minimal service interfaces. This section adds the three agent-and-swarm interfaces that aren't elsewhere.
68.1 · Agent runtime interface
from typing import Protocol, Any
class AgentRuntime(Protocol):
def start_agent_run(
self,
agent_run_id: str,
goal: str,
context: dict[str, Any],
) -> dict[str, Any]: ...
def resume_agent_run(
self,
agent_run_id: str,
observation: dict[str, Any],
) -> dict[str, Any]: ...
def cancel_agent_run(
self,
agent_run_id: str,
reason: str,
) -> dict[str, Any]: ...
Concord does not depend on a specific LLM/agent library. LangGraph, custom loops, OpenAI Agents SDK, CrewAI, or other systems plug in as adapters behind this protocol — same pattern as DurableRuntime in §41.
68.2 · Swarm planner interface
class SwarmPlanner(Protocol):
def create_swarm_plan(
self,
command: dict[str, Any],
context: dict[str, Any],
) -> dict[str, Any]: ...
68.3 · Join strategy interface
class JoinStrategy(Protocol):
def join(
self,
swarm_run: dict[str, Any],
child_results: list[dict[str, Any]],
) -> dict[str, Any]: ...
Example join output:
{
"decision": "succeeded",
"summary": "All required agents completed.",
"selected_artifact_id": "artifact_123",
"confidence": 0.88,
"requires_human_review": false
}
Section 69 How to map any agentic task
Use this 18-point checklist.
- What is the parent command?
- Is an agent needed, or is a deterministic function enough?
- Is this single-agent or swarm?
- What is the swarm objective?
- Who is the coordinator?
- What subagent roles are allowed?
- Can subagents spawn further children?
- What is the max depth?
- What tools can each agent use?
- What connectors can each agent use?
- What memory can each agent read?
- Can any agent write memory?
- What artifacts should each agent produce?
- How are outputs joined?
- Does any step require human approval?
- What are the cancellation rules?
- What are the compensation rules?
- What audit records are mandatory?
Section 70 Example · research and report swarm
70.1 · User request
Create a monthly revenue report, check it, and prepare it for external sharing.
70.2 · Primitive mapping
generate_and_prepare_revenue_report"] CMD --> POL{{Policy}} POL --> PL([Plan]) PL --> SR[SwarmRun] SR --> COORD[AgentRun · coordinator] COORD --> R[AgentRun · researcher] COORD --> A[AgentRun · analyst] COORD --> REV[AgentRun · reviewer] R --> J["JoinStrategy
coordinator_synthesis"] A --> J REV --> J J --> ART["ArtifactWrite
draft_report"] ART --> POL2{{Policy · external_sharing}} POL2 --> HA[HumanApproval] HA --> AT["AsyncTask
publish_report"] AT --> NOT[Notification] NOT --> AUD[/Audit/] style POL fill:#F5E0D2,stroke:#D97757,stroke-width:1.5px style POL2 fill:#F5E0D2,stroke:#D97757,stroke-width:1.5px style COORD fill:#EFE6F0,stroke:#7A5560 style R fill:#EFE6F0,stroke:#7A5560 style A fill:#EFE6F0,stroke:#7A5560 style REV fill:#EFE6F0,stroke:#7A5560 style AUD fill:#F1F2EC,stroke:#6B7B5A
70.3 · State flow
command.created
→ command.validated
→ swarm.created
→ swarm.running
→ agent_runs.running
→ swarm.joining
→ artifact.created
→ command.waiting_for_approval
→ command.approved
→ command.queued
→ command.running
→ command.succeeded
Section 71 Operational guardrails
71.1 · Hard limits
Every swarm should have hard limits.
71.2 · Role-based tool permissions
{
"researcher": ["run_sql", "retrieve_memory", "create_artifact"],
"analyst": ["run_sql", "create_artifact", "evaluate_output"],
"reviewer": ["read_artifact", "evaluate_output", "request_human_input"],
"publisher": ["publish_artifact", "request_approval"]
}
71.3 · No uncontrolled recursion
parent_depth + 1 ≤ max_depth ·
current_agent_count < max_agents ·
child_scope ⊆ parent_scope ·
child_tools ⊆ parent_tools ·
child_connectors ⊆ parent_connectors
71.4 · No ungoverned side effects
Agents may not directly perform external writes. External writes should be commands:
Each goes through: Command → Policy → Approval if needed → Execution → Audit.
Section 72 Recommended update to core architecture
Add this section to the main Concord architecture document:
Section 73 Final principle
The system should support future agent runtimes and connector ecosystems without changing the core architecture.
Do not
- encode agent framework assumptions into the database
- let agents bypass the primitive gateway
- let child agents expand their own authority
- treat chat messages as the only source of truth
Do persist
- agent runs & steps
- invocations & spawn decisions
- artifacts & memory candidates
- approvals & audit events
Concord should remain:
Functional core architecture on DBOS
Concord is built on DBOS + Postgres. DBOS is the durable execution runtime; Postgres is the system of record; Concord is the semantic and functional-core layer.
Supersedes earlier implementation ideas that proposed custom queue, retry, lease, schedule, and outbox machinery inside Concord.
Section 74 Decision accepted · Concord on DBOS
The goal is not to recreate DBOS inside Concord. The goal is to define a clean primitive vocabulary and functional decision layer that DBOS can execute durably.
semantic primitive layer
functional decision core"] DBOSR["DBOS
durable execution runtime
workflows · steps · queues · schedules"] PG[("Postgres
durable system of record
Concord domain + DBOS runtime tables")] PF --> DBOSR --> PG style PF fill:#F5E0D2,stroke:#D97757,stroke-width:1.5px style DBOSR fill:#FAF8F2,stroke:#141413 style PG fill:#FFFFFF,stroke:#141413,stroke-width:1.5px
74.1 · Core decision
74.2 · Ownership split
Runtime mechanics
- durable workflows & steps
- workflow recovery
- retries · queues · schedules
- durable sleep
- workflow IDs / idempotency
- Postgres transaction tracking
- concurrency & rate limits
Meaning & governance
- command taxonomy
- policy model
- task classification & planning semantics
- approval, memory, artifact models
- connector contracts
- agent & swarm ontology
- domain audit events
Section 75 Why DBOS is the runtime
DBOS aligns with Concord because Concord is:
DBOS workflows provide durable execution and recover from completed steps after interruption. DBOS workflow IDs can act as idempotency keys. Workflows are expected to be deterministic; non-deterministic work (database access, third-party APIs, randomness, local time) belongs inside DBOS steps.
Primitives decide and describe. DBOS steps execute side effects. Postgres records the domain truth.
Section 76 Design philosophy after DBOS
76.1 · Concord becomes smaller
Concord should not be a workflow engine. It becomes a semantic workflow kernel:
Given command + context + current domain state,
derive policy decisions, plans, domain events, and DBOS execution requests.
The implementation should avoid building infrastructure DBOS already supplies.
76.2 · Functional core, DBOS shell
Functional core
- pure command classification
- pure policy evaluation
- pure plan creation
- pure state transition derivation
- pure connector permission checks
- pure agent / swarm planning
DBOS shell
- durable workflows & steps
- transaction steps
- connector calls
- LLM calls
- queues · retries · schedules
- sleep / wait / resume
The functional core is unit-testable without DBOS. The DBOS shell is integration-tested with Postgres and real or mocked connectors.
76.3 · Domain events are not DBOS internals
DBOS already has runtime execution state. Concord should maintain domain events, not duplicate DBOS runtime history.
Business meaning
Runtime mechanics
DBOS owns runtime mechanics. Concord owns business meaning.
Section 77 What we remove from Concord
Earlier versions included concepts now delegated to DBOS.
77.1 · Remove custom durable queue
Do not build a custom queue table, queue claiming, worker lease system, polling loop, global concurrency controller, or rate limiter. Use DBOS queues.
Concord may still define semantic queue names:
77.2 · Remove custom retry runner
Use DBOS step/workflow retry behavior. Concord can still declare domain-level retry metadata:
{
"operation": "hotel_booking.search",
"retry_class": "transient_connector_error",
"max_attempts": 3,
"requires_idempotency_key": true
}
77.3 · Remove custom lease and claim primitives
Drop claimed_by, lease_until, worker tick, manual claim_next_task. Where domain-level ownership matters (an approval assigned to an approver, an agent role assigned to a runtime adapter), model it semantically. Runtime ownership belongs to DBOS.
77.4 · Remove custom schedule runner
Use DBOS schedules. Concord can still declare schedule specs (cron + queue + command_type) and let DBOS execute them.
77.5 · Remove generic effect outbox
The earlier functional design proposed a generic effect_outbox. With DBOS, this is replaced by DBOS workflows, steps, and queues.
DBOS queue/workflow/step = executable runtime mechanism. Concord domain_effect = semantic record that an action was requested or performed. Don't use the latter as an execution engine.
Section 78 What Concord keeps
Concord remains responsible for the conceptual structure of work.
Core primitives
DBOS executes them; DBOS does not define their meaning.
78.1 · Command model
Every consequential action still becomes a command. A command is the durable representation of user, system, connector, or agent intent.
78.2 · Policy, artifact, memory models stay
DBOS does not know whether a hotel booking requires approval, whether a connector write is safe, whether memory needs consent, or whether a subagent may access a connector. Concord owns those decisions.
78.3 · Agent & swarm ontology stays
DBOS can run agent workflows durably, but Concord owns: AgentRun, SwarmRun, AgentInvocation, AgentStep, JoinStrategy, ToolScope, ConnectorScope, MemoryScope, SpawnPolicy.
The agent runtime is an adapter. DBOS is the durable executor. Concord is the governance and domain model.
Section 79 Mapping Concord to DBOS
| Concord concept | DBOS implementation |
|---|---|
| Command submission | DBOS workflow start |
| Idempotency key | DBOS workflow ID |
| Sync primitive | Normal function (or DBOS step if side-effectful) |
| Async primitive | DBOS background or queued workflow |
| Queue | DBOS queue |
| Retry | DBOS step/workflow retry settings |
| Schedule | DBOS schedule |
| Durable timer | DBOS durable sleep |
| External side effect | DBOS step |
| Postgres write | DBOS datasource transaction |
| Agent run | DBOS workflow |
| Subagent spawn | DBOS child/queued workflow + Concord AgentInvocation |
| Swarm | DBOS parent workflow coordinating child workflows |
| Approval wait | DBOS workflow waits on durable approval state/event |
| Connector call | DBOS step calling connector adapter |
| Artifact write | DBOS transaction step |
| Memory write | DBOS transaction step after policy |
| Domain audit | DBOS transaction step writing audit table |
Section 80 New layered architecture
80.1 · Package layout
concord/
core/
types.py
commands.py
context.py
classify.py
policy.py
planning.py
transitions.py
reducers.py
validation.py
domain/
approvals.py
memory.py
artifacts.py
connectors.py
agents.py
swarms.py
audit.py
dbos_runtime/
workflows.py
steps.py
queues.py
schedules.py
datasource.py
approval_waits.py
swarm_workflows.py
adapters/
connectors/
hotel.py
github.py
slack.py
databricks.py
agents/
base.py
langgraph_adapter.py
custom_agent_adapter.py
openai_agents_adapter.py
postgres/
schema.sql
repositories.py
projections.py
80.2 · Layer ownership
pure Python · no DBOS · no DB · no LLMs"] DOMAIN["domain/
data contracts · repository interfaces"] DBOSR["dbos_runtime/
only layer importing DBOS"] ADAPTERS["adapters/
connectors · agent runtimes"] PG["postgres/
schema · repos · projections"] CORE --> DBOSR DOMAIN --> DBOSR DBOSR --> ADAPTERS DBOSR --> PG style CORE fill:#F5E0D2,stroke:#D97757,stroke-width:1.5px style DBOSR fill:#FAF8F2,stroke:#141413 style PG fill:#FFFFFF,stroke:#141413 style ADAPTERS fill:#EFE6F0,stroke:#7A5560 style DOMAIN fill:#F1F2EC,stroke:#6B7B5A
Keeps Concord portable. DBOS-specific code does not leak into the semantic layer.
Section 81 Functional core standard
81.1 · Core types
All pure Concord functions return values of this shape:
from dataclasses import dataclass, field
from typing import Any
@dataclass(frozen=True)
class CoreEvent:
event_type: str
payload: dict[str, Any]
@dataclass(frozen=True)
class CoreEffect:
effect_type: str
payload: dict[str, Any]
idempotency_key: str | None = None
@dataclass(frozen=True)
class CoreResult:
status: str
events: list[CoreEvent] = field(default_factory=list)
effects: list[CoreEffect] = field(default_factory=list)
errors: list[str] = field(default_factory=list)
metadata: dict[str, Any] = field(default_factory=dict)
CoreEffect is a semantic description, not an execution queue. DBOS interprets it through workflows and steps.
status · events · effects"] OUT --> DBOS[DBOS step interprets] DBOS --> PG[(Postgres writes)] DBOS --> CON[Connector calls] DBOS --> Q[Queue enqueues] style CORE fill:#F5E0D2,stroke:#D97757,stroke-width:1.5px style DBOS fill:#FAF8F2,stroke:#141413 style PG fill:#FFFFFF,stroke:#141413
81.2 · Example · functional policy
def evaluate_booking_policy(command: dict, context: dict, state: dict) -> CoreResult:
payload = command["payload"]
required = [
"booking_draft_id",
"hotel_name",
"check_in_date",
"check_out_date",
"total_price",
"currency",
"cancellation_policy_summary",
]
missing = [f for f in required if not payload.get(f)]
if missing:
return CoreResult(
status="waiting_for_input",
events=[CoreEvent("policy_evaluated", {"decision": "require_more_input", "missing": missing})],
effects=[CoreEffect(
"user.request_input",
{"missing_fields": missing},
idempotency_key=f"input:{command['command_id']}",
)],
)
if not payload.get("user_explicitly_approved"):
return CoreResult(
status="waiting_for_approval",
events=[CoreEvent("policy_evaluated", {"decision": "require_approval"})],
effects=[CoreEffect(
"approval.request",
{
"approval_type": "hotel_booking",
"command_id": command["command_id"],
"approver": context["user_id"],
"approval_packet": payload,
},
idempotency_key=f"approval:{command['command_id']}",
)],
)
return CoreResult(
status="allowed",
events=[CoreEvent("policy_evaluated", {"decision": "allow"})],
)
This function does not write Postgres, call DBOS, call hotel APIs, send notifications, mutate objects, read time, or generate random IDs. It only describes intent.
Section 82 DBOS runtime standard
82.1 · Workflow as durable shell
from dbos import DBOS, SetWorkflowID
@DBOS.workflow()
def run_command(command_id: str) -> dict:
command = load_command_tx(command_id)
policy_result = evaluate_policy_step(command_id)
persist_core_events_step(command_id, policy_result["events"])
if policy_result["status"] == "waiting_for_input":
request_input_step(command_id, policy_result["effects"])
return {"status": "waiting_for_input"}
if policy_result["status"] == "waiting_for_approval":
request_approval_step(command_id, policy_result["effects"])
wait_for_approval_workflow(command_id)
plan = create_plan_step(command_id)
result = execute_plan_workflow(command_id, plan)
finalize_command_step(command_id, result)
return result
Intentionally thin. Workflow body branches on input and calls steps; the business logic lives in the pure core.
82.2 · Step as side-effect boundary
Every non-deterministic operation goes inside a DBOS step or DBOS datasource transaction:
82.3 · Datasource transaction
Use DBOS datasource transactions for Postgres writes that must not re-execute after workflow replay: insert command, insert approval, insert artifact, insert memory, insert audit event, insert agent step, insert connector invocation, update domain projection.
Section 83 DBOS queues standard
Concord defines semantic queue names; DBOS manages the mechanics (concurrency, partitioning, rate limiting).
Recommended queues
Queue choice is a planning outcome
def choose_queue(effect: CoreEffect, context: dict) -> str:
if effect.effect_type.startswith("connector."):
return "connector_calls"
if effect.effect_type.startswith("agent.run"):
return "agent_runs"
if effect.effect_type.startswith("agent.spawn"):
return "swarm_children"
if effect.effect_type.startswith("notification."):
return "notifications"
return "concord_default"
Section 84 DBOS schedules standard
Concord does not build a scheduler. Schedule specs are domain configuration; DBOS owns execution and backfill.
schedules:
- name: expire_pending_approvals
command_type: expire_approvals
cron: "*/5 * * * *"
queue: scheduled_maintenance
- name: sync_connector_metadata
command_type: sync_connector_metadata
cron: "0 * * * *"
queue: scheduled_maintenance
- name: memory_decay_review
command_type: review_stale_memory
cron: "0 3 * * *"
queue: scheduled_maintenance
Section 85 DBOS and agentic workflows
85.1 · AgentRun as DBOS workflow
Concord AgentRun row
→ DBOS workflow: run_agent(agent_run_id)
→ DBOS steps call agent runtime
→ agent proposes commands / tool calls
→ each proposed action re-enters Concord
Agents do not directly execute external side effects.
85.2 · Agent steps as domain records
Agent steps are written to Concord tables for product-level observability and replay:
These are domain trace records, not DBOS runtime logs.
85.3 · Subagent spawn as DBOS child/queued workflow
Command: spawn_subagent
→ Policy: check role / tool / connector / memory / depth limits
→ DBOS workflow creates AgentInvocation
→ DBOS enqueues child AgentRun workflow
Let the agent runtime spawn unmanaged processes or coroutines.
85.4 · Swarm as parent DBOS workflow
@DBOS.workflow()
def run_swarm(swarm_run_id: str) -> dict:
swarm = load_swarm_tx(swarm_run_id)
child_specs = plan_swarm_children_step(swarm_run_id)
handles = [enqueue_agent_run_step(child) for child in child_specs]
results = [h.get_result() for h in handles]
joined = join_swarm_results_step(swarm_run_id, results)
persist_swarm_result_step(swarm_run_id, joined)
return joined
Section 86 Approval workflows on DBOS
Human approvals are domain state plus workflow waiting.
The workflow must not rely only on in-memory callbacks. Approval state must be durable in Postgres.
Section 87 Connector execution on DBOS
87.1 · Connector calls are DBOS steps
Concord records the semantic invocation; DBOS executes the step durably.
87.2 · Connector idempotency
connector: hotel_booking
operation: book_hotel
side_effect: true
idempotency_required: true
idempotency_key_template: "book_hotel:{booking_draft_id}"
approval_required: true
DBOS workflow IDs and step boundaries help prevent re-execution, but external APIs still need domain idempotency keys when they support them. DBOS alone does not make third-party side effects idempotent.
87.3 · Connector invocations table
CREATE TABLE IF NOT EXISTS connector_invocations (
connector_invocation_id UUID PRIMARY KEY,
command_id UUID NOT NULL,
connector_name TEXT NOT NULL,
operation TEXT NOT NULL,
side_effect BOOLEAN NOT NULL DEFAULT false,
idempotency_key TEXT NULL,
status TEXT NOT NULL,
request_payload JSONB NOT NULL DEFAULT '{}',
response_payload JSONB NULL,
error TEXT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
completed_at TIMESTAMPTZ NULL
);
This is a domain / audit record — not a queue.
Section 88 Postgres schema changes
88.1 · Keep these Concord domain tables
88.2 · Remove these runtime tables
Unless a table has product / domain meaning, DBOS should own its runtime equivalent.
88.3 · Updated commands table
CREATE TABLE IF NOT EXISTS commands (
command_id UUID PRIMARY KEY,
command_type TEXT NOT NULL,
requested_by TEXT NOT NULL,
ingress TEXT NOT NULL,
payload JSONB NOT NULL DEFAULT '{}',
context JSONB NOT NULL DEFAULT '{}',
status TEXT NOT NULL,
idempotency_key TEXT NULL UNIQUE,
dbos_workflow_id TEXT NULL UNIQUE,
result JSONB NULL,
error TEXT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
completed_at TIMESTAMPTZ NULL
);
dbos_workflow_id links the Concord command to the DBOS execution.
88.4 · command_events
CREATE TABLE IF NOT EXISTS command_events (
command_event_id UUID PRIMARY KEY,
command_id UUID NOT NULL REFERENCES commands(command_id),
event_type TEXT NOT NULL,
event_payload JSONB NOT NULL DEFAULT '{}',
actor TEXT NOT NULL,
trace_id TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
Domain events; does not replace DBOS runtime history.
88.5 · domain_effects (optional)
Only add if product visibility into planned effects is useful.
CREATE TABLE IF NOT EXISTS domain_effects (
domain_effect_id UUID PRIMARY KEY,
command_id UUID NOT NULL REFERENCES commands(command_id),
effect_type TEXT NOT NULL,
effect_payload JSONB NOT NULL DEFAULT '{}',
idempotency_key TEXT NULL,
status TEXT NOT NULL DEFAULT 'planned',
executed_by_dbos_workflow_id TEXT NULL,
result JSONB NULL,
error TEXT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
completed_at TIMESTAMPTZ NULL
);
Use domain_effects as a runtime queue. DBOS owns runtime.
Section 89 Concord state after DBOS
Concord state is a domain projection, not a runtime scheduler.
Execution status
Is the workflow executing, completed, errored, cancelled?
Business state
Is the booking waiting for approval, confirmed, cancelled, expired?
Examples of Concord state: command.status, approval.status, artifact.status, agent_run.status, swarm_run.status, reservation.status, memory.status.
Section 90 Deterministic workflow design rules
Because DBOS workflows must be deterministic, Concord enforces these rules.
Inside workflow body
- branch on workflow input
- call DBOS steps
- call DBOS transaction steps
- call DBOS sleep
- enqueue DBOS workflows
- wait for DBOS workflow handles
- call pure Concord functions with deterministic inputs
Directly in workflow body
- database reads/writes outside transaction steps
- HTTP / API calls
- LLM calls
- random number generation
- current local time
- non-deterministic iteration over unordered data
- uncontrolled async races
- agent runtime loops without step boundaries
- connector calls
Put forbidden operations inside DBOS steps.
Section 91 Functional core design rules
91.1 · Pure functions do not import DBOS
# Good
def classify_command(command: dict, context: dict) -> CoreResult:
...
# Avoid
from dbos import DBOS
def classify_command(...):
DBOS.logger.info(...)
91.2 · Core functions return values only
# Good
return CoreResult(
status="waiting_for_approval",
events=[...],
effects=[...],
)
# Avoid
insert_approval(...)
send_email(...)
enqueue_worker(...)
91.3 · DBOS steps interpret core results
@DBOS.step()
def interpret_core_effect(command_id: str, effect: dict) -> dict:
...
Section 92 Minimal DBOS workflow patterns
92.1 · Submit command
from dbos import DBOS, SetWorkflowID
def submit_command(command_type: str, payload: dict, context: dict) -> str:
command_id = create_command_tx(command_type, payload, context)
with SetWorkflowID(f"command:{command_id}"):
DBOS.start_workflow(run_command, command_id)
return command_id
92.2 · Run command
@DBOS.workflow()
def run_command(command_id: str) -> dict:
command = load_command_tx(command_id)
classification = classify_command_step(command_id)
persist_core_events_step(command_id, classification["events"])
policy = evaluate_policy_step(command_id)
persist_core_events_step(command_id, policy["events"])
if policy["status"] == "waiting_for_approval":
create_approval_step(command_id, policy["effects"])
notify_approver_step(command_id)
wait_for_approval_workflow(command_id)
plan = create_plan_step(command_id)
persist_core_events_step(command_id, plan["events"])
result = execute_plan_workflow(command_id, plan)
finalize_command_step(command_id, result)
return result
92.3 · Execute connector
@DBOS.step()
def execute_connector_step(command_id, connector_name, operation, payload):
record_connector_invocation_started_tx(command_id, connector_name, operation, payload)
connector = connector_registry.get(connector_name)
result = connector.invoke(operation, payload)
record_connector_invocation_completed_tx(command_id, connector_name, operation, result)
return result
92.4 · Write artifact
@DBOS.step()
def write_artifact_step(command_id, artifact_type, payload):
artifact = create_artifact_tx(
command_id=command_id,
artifact_type=artifact_type,
payload=payload,
)
append_command_event_tx(
command_id=command_id,
event_type="artifact_created",
payload={"artifact_id": artifact["artifact_id"]},
)
return artifact
Section 93 Example · hotel reservation under DBOS
93.1 · Why DBOS matters here
Hotel booking has external API calls, payment-sensitive side effects, approval waits, idempotency needs, retryable connector failures, durable state requirements, notification side effects, and memory candidates. DBOS handles durable execution. Concord handles what the booking means.
Section 94 Agent swarms under DBOS
94.1 · Recommended representation
| Concord domain object | DBOS implementation |
|---|---|
| SwarmRun | run_swarm DBOS workflow |
| AgentRun | run_agent DBOS workflow |
| AgentStep | domain trace row in Postgres |
| ToolCall | Concord command executed by DBOS |
94.2 · Swarm flow
94.3 · Subagent spawning rule
Agent proposes spawn_subagent
→ Concord command
→ policy checks max depth, tools, connectors, memory scope
→ DBOS workflow creates AgentInvocation
→ DBOS enqueues child AgentRun workflow
Do not spawn unmanaged agent processes.
Section 95 Approval waiting pattern
A human approval is durable state plus workflow waiting. Recommended first implementation:
Workflow creates approval row
Workflow durably sleeps / polls approval status
Approval UI updates approval row
Workflow resumes and continues
This is simple, Postgres-native, and consistent with the architecture. It can evolve into a more event-driven DBOS communication pattern later if needed.
Section 96 What to avoid
96.1 · Avoid building a second DBOS
96.2 · Avoid hiding DBOS behind too much abstraction
Engineers should still be able to see: this is a DBOS workflow, this is a DBOS step, this is a DBOS transaction, this is a DBOS queue, this is a Concord command, this is a Concord artifact.
96.3 · Avoid making agents privileged
Bad
- agent directly calls booking connector
- agent directly writes memory
- agent directly sends email
- agent directly spawns process
Good
- agent proposes command
- Concord policy evaluates
- DBOS executes through durable steps
- Postgres records domain events
Section 97 Responsibilities matrix
| Concern | Owner |
|---|---|
| Workflow durability | DBOS |
| Workflow recovery | DBOS |
| Step replay behavior | DBOS |
| Queue mechanics | DBOS |
| Queue concurrency / rate limit | DBOS |
| Scheduling | DBOS |
| Durable sleep | DBOS |
| Postgres transaction tracking | DBOS datasource |
| Command taxonomy | Concord |
| Domain state | Concord |
| Policy decisions | Concord |
| Approval semantics | Concord |
| Memory semantics | Concord |
| Artifact semantics | Concord |
| Agent / swarm ontology | Concord |
| Connector contracts | Concord |
| Domain audit | Concord |
| Third-party API implementation | Connector adapters |
| Agent reasoning loop | Agent runtime adapter |
Section 98 Implementation phases
Functional core"] --> P2["Phase 2
Postgres domain schema"] P2 --> P3["Phase 3
DBOS runtime shell"] P3 --> P4["Phase 4
Hotel reservation agent"] P4 --> P5["Phase 5
Swarms & subagents"] style P1 fill:#F5E0D2,stroke:#D97757 style P3 fill:#FAF8F2,stroke:#141413 style P4 fill:#F1F2EC,stroke:#6B7B5A style P5 fill:#EFE6F0,stroke:#7A5560
Functional core
Pure modules: commands, classification, policy, planning, state transitions, effect descriptors, domain event descriptors. No DBOS imports.
Postgres domain schema
commands, command_events, approvals, artifacts, memory_records / memory_candidates, connector_invocations, agent_runs, swarm_runs, agent_invocations, agent_steps, domain_audit_log.
DBOS runtime shell
run_command, run_agent, run_swarm workflows; execute_connector_step, write_artifact_step, write_memory_step, request_approval_step, notify_step.
Hotel reservation agent
Hotel inventory / booking connectors; search, ranking, booking draft / approval / finalization commands; reservation artifact; travel preference memory.
Swarms & subagents
Swarm planning, spawn_subagent command, agent run workflow, agent step recording, join strategies, scope enforcement.
Section 99 Open design decisions
Keep these explicit:
- How approval wait/resume is implemented in DBOS.
- Whether to use a
domain_effectstable for product visibility. - How much DBOS metadata to mirror into Concord domain tables.
- How agent runtime adapters report intermediate steps.
- How connector idempotency keys are generated and enforced.
- How memory consent is represented in the UI.
- How cancellation maps from Concord domain state to DBOS workflow cancellation.
- How to version command payload schemas and plan schemas.
- How to expose workflow status to users without leaking DBOS internals.
- How to manage DBOS queues across environments.
Section 100 Updated one-sentence architecture
Section 101 Updated rule of thumb
When asking …
- What does this task mean?
- Is it allowed?
- Does it need approval?
- Sync, agentic, or swarm?
- What memory / artifacts / audit should exist?
- What connector scopes are allowed?
When asking …
- How does this run durably?
- How is it retried?
- How is it queued?
- How is it scheduled?
- How does it recover?
- How do transactions avoid re-execution?
- How do workers execute it?
Section 102 Source notes
This addendum relies on DBOS's documented runtime model:
Section 103 Executive summary · Domain Registry
Concord needs a first-class Domain Registry — the capability control plane that is the source of truth for every governed capability contract in the system.
The registry answers what capabilities exist, which versions are active, who owns them, what they allow, what policies apply, what connectors they touch, what memory they read/write, what artifacts they produce, which agents can use them, which workflows can invoke them, which approvals are required, which DBOS workflows execute them.
A "Skill Registry" is not a standalone module. It is one domain registry inside the larger Concord Domain Registry.
The core principle holds: capabilities can describe, propose, and guide. Only Concord commands can authorize. Only DBOS workflows and steps can execute. Only Postgres records the truth.
Section 104 Problem statement
Concord is evolving from a workflow primitive layer into a governed action contract system. The need first appears as a skill registry problem — which agent skills exist, which versions are published, which agents can use them, what tools/connectors/memory/artifacts they touch.
But skills are only one part of the capability graph. A skill often depends on many other registered things:
hotel_booking_skill
uses tool: book_hotel
creates command_type: book_hotel
requires policy_bundle: hotel_booking_policy
requires approval_type: hotel_booking_approval
produces artifact_schema: reservation_confirmation
invokes connector_operation: hotel_booking.book_hotel
may write memory_schema: user.travel_preferences
Without a unified Domain Registry these relationships scatter across YAML, code, prompts, policy files, connector adapters, and agent runtime configuration — making it impossible to answer impact questions like "if we retire this connector operation, which tools break?" or "if we change this artifact schema, which commands produce it?".
Concord cannot fully deliver its core promise — a governed contract layer for deterministic and agentic work.
Section 105 Product decision
105.1 · New core module
Concord adds a first-class module concord.registry (or concord.domain_registry) alongside the existing core, runtime, agents, connectors, and Postgres modules.
concord/
core/
commands/ policies/ plans/ effects/ events/ state/
runtime/
dbos/ workflows/ steps/ queues/ schedules/
registry/ ← new module
kernel/
skills/ tools/ connectors/ command_types/ policies/
artifacts/ memory/ agents/ swarms/ approvals/
evaluations/ workflows/
agents/
runtime_adapters/ tool_gateway/ swarms/ subagents/
connectors/
base/ adapters/
postgres/
repositories/ migrations/ projections/
api/
admin/ runtime/ webhooks/
105.2 · Keep registries in Concord initially
The semantics of skills, tools, connectors, memory, artifacts, policy, approvals, agents, swarms, and command types are core to Concord's contract layer. A separate framework too early would risk becoming a generic catalog that doesn't understand Concord primitives.
105.3 · Future extraction path
The generic mechanics may later become a reusable library concord-registry covering versioned records, lifecycle state machine, artifact references, immutable version checks, generic binding resolution, compatibility constraints, audit event helpers, deprecation, retirement, rollback. Concord retains the domain semantics — what a skill, tool, connector operation, command type, approval type, memory schema, or agent role means.
Section 106 Design philosophy
106.1 · The Domain Registry is the capability graph
The registry models Concord's capability graph and answers impact questions across it.
106.2 · Registries are semantic, not runtime execution
The registry defines what capabilities mean. DBOS defines how workflows execute durably. book_hotel as a registry entry declares high-risk, requires approval, produces reservation_confirmation, invokes hotel_booking.book_hotel, with idempotency key book_hotel:{booking_draft_id}. DBOS as the runtime executes the durable workflow with retries, recovery, and transaction tracking.
106.3 · Published versions are immutable
Every registry object that can influence behavior must be versioned. Published versions are immutable. Allowed: create / publish / deprecate / retire a version, change a binding to point to a new version, rollback a binding. Not allowed: mutate a published manifest in place, expand connector permissions in place, remove an approval requirement in place, change a schema in place.
106.4 · Specific bindings beat global bindings
Resolution applies scope-aware precedence — more specific bindings override less specific ones.
user → team → tenant → workflow_type → command_type
→ agent_role → agent_name → app → environment → global
106.5 · Runtime manifests are minimized
Agent runtimes, workflows, and tool gateways receive only the capabilities they are allowed to use. Draft versions, retired versions, other tenants' bindings, admin metadata, raw secrets, unbound skills, unallowed connectors, unallowed memory scopes — none of these leak into runtime manifests.
Section 107 Registry types
Twelve registry types, each a typed view over the registry kernel.
Skill
Reusable governed capability package: instructions, allowed tools / connectors / memory scopes / artifact scopes, evals, approval requirements, agent/runtime compatibility.
Tool
Executable internal capability: input/output schema, sync/async mode, side-effect classification, DBOS execution mode, idempotency, policy + artifact + command + connector mappings.
Connector
External system + operations: credential scopes, read/write scopes, rate limits, idempotency support, compensation behavior, approval requirements, adapter compatibility.
Command type
Canonical contract for a governed action: payload schema, default policy bundle, execution mode, approval behavior, artifacts, memory behavior, connector operations, DBOS workflow mapping, idempotency, risk level. Likely the most important registry.
Policy
Named policy checks and bundles: policy input contract, decision outputs, risk classifications, applicability rules, bundles, external policy-engine references, versioning. Registry decides where and how policy applies; doesn't have to implement every engine.
Artifact schema
Contract for durable outputs: schema, visibility rules, lineage requirements, retention policy, external-sharing policy, version compatibility, rendering hints.
Memory schema
Contract for durable preferences and reusable facts: subject type, visibility, consent requirements, confidence requirements, expiration, supersession rules, retrieval rules, write policy.
Agent role
Capability boundary for an agent: allowed skills / tools / connectors / subagent roles, max steps, max depth, memory + artifact scope, approval behavior, runtime compatibility.
Swarm template
Governed multi-agent execution pattern: coordinator role, child roles, join strategy, spawn limits, parallelism, required artifacts and evaluations, approval gates, memory + connector scope inheritance.
Approval type
Contract for human authorization: required fields, approver resolution, expiration, risk level, UI schema, audit requirements, resume behavior, allowed decision values.
Evaluation suite
Contract for quality and safety checks: eval input/output schema, blocking vs advisory, applicability, version, runtime adapter.
Workflow type
Contract for a repeatable business / application workflow: default command sequence, allowed command types, default policies, default agent roles, default skills, approval gates, artifact expectations, DBOS workflow mapping.
107.1 · Skill manifest example
name: hotel_booking
version: 1.4.2
display_name: Hotel Booking Skill
runtime:
min_concord_version: "0.3.0"
compatible_agent_runtimes: [concord_agent_runtime, langgraph_adapter]
lifecycle:
owner: travel-platform
risk_level: high
capabilities:
- type: connector_operation
connector: hotel_booking
operation: book_hotel
side_effect: true
requires_approval: true
- type: artifact_write
artifact_type: reservation_confirmation
- type: memory_write
memory_type: user.travel_preferences
requires_consent: true
policy:
required_checks: [permission, connector_scope, payment_token_required,
cancellation_policy_disclosed, approval_required_for_booking]
tools:
- name: book_hotel
command_type: book_hotel
mode: approval_gated_async
evals:
required: [booking_terms_present, cancellation_policy_present, approval_packet_complete]
107.2 · Command type manifest example
name: book_hotel
version: 1.0.0
risk_level: high
execution:
mode: approval_gated_async
dbos_workflow: run_command
payload_schema:
type: object
required: [booking_draft_id, payment_token_ref]
policies:
required: [permission, connector_scope, payment_token_required, approval_required_for_booking]
approval:
required: true
approval_type: hotel_booking_approval
artifacts:
produces: [reservation_confirmation]
connector_operations: [hotel_booking.book_hotel]
idempotency:
required: true
key_template: "book_hotel:{booking_draft_id}"
compensation:
supported: true
command_type: cancel_reservation
107.3 · Approval type manifest example
name: hotel_booking_approval
version: 1.0.0
risk_level: high
required_fields:
- hotel_name
- check_in_date
- check_out_date
- guests
- total_price
- currency
- cancellation_policy_summary
- payment_method_summary
decision_values: [approved, rejected, request_changes]
expiration:
default_minutes: 30
ui:
template: hotel_booking_approval_card
resume:
on_approved: continue_workflow
on_rejected: fail_or_replan
Section 108 Registry kernel
The Domain Registry is built on a shared kernel that provides generic mechanics. Each typed registry is a thin view over the kernel.
108.1 · Core kernel objects
| Object | Carries |
|---|---|
RegistryEntity | entity_id, entity_type, name, display_name, description, owner, domain, status |
RegistryVersion | version_id, entity_id, version (major.minor.patch), status, manifest, checksum, created_by, approved_by, lifecycle timestamps |
RegistryArtifact | artifact_id, version_id, artifact_type, artifact_uri, checksum, metadata |
RegistryBinding | binding_id, version_id, binding_scope, binding_target, environment, status, rollout_strategy, rollout_config |
RegistryLifecycleEvent | lifecycle_event_id, entity_id, version_id, event_type, actor, payload, trace_id |
RegistryRelationship | source_version_id, relationship_type, target_entity_type, target_entity_name, target_version_constraint, metadata |
108.2 · Relationship types
Section 109 Data model
Hybrid schema — generic kernel tables, with optional typed domain tables (skill_capabilities, tool_contracts, connector_operations, etc.) when query patterns warrant. This avoids per-type lifecycle duplication while still allowing domain-specific validation.
109.1 · registry_entities
CREATE TABLE IF NOT EXISTS registry_entities (
entity_id UUID PRIMARY KEY,
entity_type TEXT NOT NULL, -- 'skill', 'tool', 'connector', 'command_type',
-- 'policy', 'artifact_schema', 'memory_schema',
-- 'agent_role', 'swarm_template', 'approval_type',
-- 'evaluation_suite', 'workflow_type'
name TEXT NOT NULL,
display_name TEXT NOT NULL,
description TEXT NULL,
owner TEXT NOT NULL,
domain TEXT NULL,
status TEXT NOT NULL DEFAULT 'active',
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE(entity_type, name)
);
109.2 · registry_versions
CREATE TABLE IF NOT EXISTS registry_versions (
version_id UUID PRIMARY KEY,
entity_id UUID NOT NULL REFERENCES registry_entities(entity_id),
version TEXT NOT NULL,
version_major INT NOT NULL,
version_minor INT NOT NULL,
version_patch INT NOT NULL,
status TEXT NOT NULL DEFAULT 'draft', -- draft | validated | pending_approval |
-- approved | published | deprecated |
-- retired | rejected
risk_level TEXT NOT NULL DEFAULT 'low',
manifest JSONB NOT NULL DEFAULT '{}',
checksum TEXT NOT NULL,
created_by TEXT NOT NULL,
approved_by TEXT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
validated_at TIMESTAMPTZ NULL,
approved_at TIMESTAMPTZ NULL,
published_at TIMESTAMPTZ NULL,
deprecated_at TIMESTAMPTZ NULL,
retired_at TIMESTAMPTZ NULL,
UNIQUE(entity_id, version)
);
109.3 · registry_artifacts · registry_bindings · registry_lifecycle_events
CREATE TABLE IF NOT EXISTS registry_artifacts (
registry_artifact_id UUID PRIMARY KEY,
version_id UUID NOT NULL REFERENCES registry_versions(version_id),
artifact_type TEXT NOT NULL, -- manifest_yaml | manifest_json | skill_md |
-- tool_schema | connector_schema | policy_bundle |
-- prompt_template | eval_suite | example_bundle |
-- package_archive
artifact_uri TEXT NOT NULL,
checksum TEXT NOT NULL,
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE IF NOT EXISTS registry_bindings (
registry_binding_id UUID PRIMARY KEY,
version_id UUID NOT NULL REFERENCES registry_versions(version_id),
binding_scope TEXT NOT NULL, -- global | environment | app | tenant | team |
-- user | agent_role | agent_name | workflow_type |
-- command_type | connector
binding_target TEXT NOT NULL,
environment TEXT NOT NULL DEFAULT 'prod',
status TEXT NOT NULL DEFAULT 'active',
rollout_strategy TEXT NOT NULL DEFAULT 'pinned',
rollout_config JSONB NOT NULL DEFAULT '{}',
created_by TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE IF NOT EXISTS registry_lifecycle_events (
registry_lifecycle_event_id UUID PRIMARY KEY,
entity_id UUID NOT NULL REFERENCES registry_entities(entity_id),
version_id UUID NULL REFERENCES registry_versions(version_id),
event_type TEXT NOT NULL,
actor TEXT NOT NULL,
payload JSONB NOT NULL DEFAULT '{}',
trace_id TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
109.4 · registry_relationships · registry_evaluations · registry_usage
CREATE TABLE IF NOT EXISTS registry_relationships (
registry_relationship_id UUID PRIMARY KEY,
source_version_id UUID NOT NULL REFERENCES registry_versions(version_id),
relationship_type TEXT NOT NULL,
target_entity_type TEXT NOT NULL,
target_entity_name TEXT NOT NULL,
target_version_constraint TEXT NULL,
required BOOLEAN NOT NULL DEFAULT true,
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE IF NOT EXISTS registry_evaluations (
registry_evaluation_id UUID PRIMARY KEY,
version_id UUID NOT NULL REFERENCES registry_versions(version_id),
evaluation_type TEXT NOT NULL,
status TEXT NOT NULL,
result JSONB NOT NULL DEFAULT '{}',
error TEXT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
completed_at TIMESTAMPTZ NULL
);
CREATE TABLE IF NOT EXISTS registry_usage (
registry_usage_id UUID PRIMARY KEY,
version_id UUID NOT NULL REFERENCES registry_versions(version_id),
agent_run_id UUID NULL,
swarm_run_id UUID NULL,
command_id UUID NULL,
workflow_run_id UUID NULL,
usage_type TEXT NOT NULL, -- resolved_for_agent_run | tool_enabled |
-- command_proposed | connector_operation_enabled |
-- memory_read_enabled | memory_write_proposed |
-- artifact_created | subagent_spawn_enabled |
-- policy_applied | approval_type_used
payload JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
Section 110 Versioning
All behavior-affecting registry objects use semantic versioning (MAJOR.MINOR.PATCH).
Doc / cosmetic
Documentation improvements, examples added, prompt wording changes that preserve behavior, bug fixes that preserve contract.
Additive
New optional capability, new optional field, new optional eval, backward-compatible schema expansion.
Breaking
Changed input or output contract, removed capability, expanded connector scope, changed memory scope, changed approval behavior, changed artifact schema incompatibly, changed side-effect semantics.
Section 111 Lifecycle
111.1 · Standard lifecycle
create registry version
→ validate manifest
→ validate relationships
→ validate compatibility
→ run evals
→ classify risk
→ request approval if needed
→ publish
→ bind to scopes
→ monitor usage
→ deprecate
→ retire
111.2 · State machine
111.3 · High-risk lifecycle
High-risk objects require explicit approval before publication. High-risk changes include external write added · memory write added · payment operation added · database mutation added · subagent spawning added · connector scope expanded · approval requirement removed · artifact schema changed incompatibly · policy weakened.
111.4 · Emergency retirement
Security vulnerability · unsafe connector behavior · approval bypass · prompt-injection vulnerability · data leakage · memory corruption · payment-related defect.
Emergency retirement blocks new resolution, disables or migrates active bindings, notifies owners, writes lifecycle events, and creates an incident audit.
Section 112 Bindings & resolution
112.1 · Binding strategies
pinned. Avoid production bindings to unbounded latest — silent updates of behavior-changing manifests are a class of incident no team wants.
112.2 · Resolution inputs
Resolution accepts entity_type, scope, target, environment, agent_role, agent_name, user_id, team_id, tenant_id, app_id, workflow_type, command_type, and context. It applies the precedence order from §106.4.
112.3 · Resolution output
Examples: runtime skill manifest · allowed tool list · connector operation contract · command type contract · policy bundle · approval packet schema · artifact schema · memory access contract · agent role manifest · swarm template manifest.
112.4 · Runtime manifest for an AgentRun
{
"agent_run_id": "agent_run_123",
"agent_role": "hotel_booking_agent",
"skills": [
{
"name": "hotel_booking",
"version": "1.4.2",
"skill_version_id": "uuid",
"instructions_ref": "artifact://...",
"tools": [
{ "name": "create_booking_draft", "command_type": "create_booking_draft", "mode": "async" },
{ "name": "book_hotel", "command_type": "book_hotel", "mode": "approval_gated_async" }
],
"memory_scope": {
"read": ["user.travel_preferences"],
"write": ["user.travel_preferences"]
},
"artifact_scope": {
"write": ["booking_draft", "reservation_confirmation"]
}
}
]
}
Section 113 DBOS lifecycle workflows
Registry lifecycle runs as DBOS workflows — durable, recoverable, idempotent.
publish_registry_version
load version → validate manifest → validate relationships → validate compatibility → run eval suite → classify risk → request approval if needed → publish → write lifecycle events → notify owner.
bind_registry_version
load version → check status is published → validate binding scope → check compatibility → write binding → write lifecycle event → invalidate resolution cache.
rollback_registry_binding
load current binding → load target prior version → validate target is usable → update binding → write lifecycle event → notify affected owners.
retire_registry_version
load version → find active bindings → disable or migrate bindings → mark retired → write lifecycle event → notify owners.
resolve_agent_runtime_manifest
load active bindings → apply precedence → filter by policy → validate compatibility → resolve relationships → build minimized manifest → record usage.
Section 114 Policy & governance
The Domain Registry evaluates:
- Can this agent use this skill?
- Can this skill expose this tool?
- Can this tool create this command type?
- Can this command type invoke this connector operation?
- Can this command type produce this artifact?
- Can this skill read / write this memory?
- Can this agent spawn this subagent?
- Can this workflow use this swarm template?
- Does this capability require approval?
- Is this version deprecated or retired?
- Is this version allowed in this environment?
Registry mutations themselves go through the Concord command contract: create_registry_entity, create_registry_version, validate_registry_version, publish_registry_version, bind_registry_version, deprecate_registry_version, retire_registry_version, rollback_registry_binding. Policy applies, audit records, approval gates trigger.
Section 115 API surface
def create_registry_entity(
entity_type: str, name: str, display_name: str,
owner: str, description: str | None = None, domain: str | None = None,
) -> str: ...
def create_registry_version(
entity_type: str, name: str, version: str,
manifest: dict, artifact_refs: list[dict], created_by: str,
) -> str: ...
def validate_registry_version(version_id: str, actor: str) -> dict: ...
def publish_registry_version(version_id: str, actor: str) -> dict: ...
def bind_registry_version(
version_id: str, binding_scope: str, binding_target: str,
environment: str, actor: str,
rollout_strategy: str = "pinned", rollout_config: dict | None = None,
) -> str: ...
def resolve_agent_runtime_manifest(
agent_run_id: str, agent_name: str, agent_role: str,
context: dict, environment: str,
) -> dict: ...
def resolve_command_type_contract(
command_type: str, context: dict, environment: str,
) -> dict: ...
def can_use_capability(
source_version_id: str, relationship_type: str,
target_entity_type: str, target_entity_name: str, context: dict,
) -> bool: ...
def rollback_registry_binding(
binding_id: str, target_version_id: str, actor: str, reason: str,
) -> dict: ...
Section 116 Admin UX
The admin UI should support listing entities by type, viewing and comparing versions, viewing manifests / relationships / declared capabilities / risk level / eval results, publishing, approving high-risk versions, binding to scopes, viewing active bindings, rolling back, deprecating, retiring, viewing lifecycle timelines, viewing usage, performing impact analysis, and previewing runtime manifests.
116.1 · Critical views
Section 117 Observability
117.1 · Tracked behaviors
Versions resolved per entity type, versions used per command type, skills resolved per agent role, tools enabled per skill, connector operations invoked per tool, policies applied per command type, artifacts produced per command type, memory writes by skill, deprecated version usage, retired version resolution attempts, rollback frequency.
117.2 · Metrics
Section 118 Implementation plan
Registry kernel"] --> P2["Phase 2
Core domain registries"] P2 --> P3["Phase 3
Relationship graph"] P3 --> P4["Phase 4
Evals & rollout"] P4 --> P5["Phase 5
Swarm-aware registry"] P5 --> P6["Phase 6
Optional extraction"] style P1 fill:#FAF8F2,stroke:#141413 style P6 fill:#F1F2EC,stroke:#6B7B5A
Registry kernel
Kernel tables, basic lifecycle, publish, bind, resolve, rollback. Entities supported: skill · tool · connector · command_type.
Core domain registries
Add: policy · artifact schema · memory schema · agent role · approval type. Add relationship validation, runtime manifest generation, capability filtering, risk classification, high-risk approval workflow.
Relationship graph
Add: registry_relationships, impact analysis, dependency resolution, version compatibility checks, graph visualization.
Evals & rollout
Add: registry_evaluations, eval suites, canary rollout, compatibility checks, deprecation warnings, usage tracking.
Swarm-aware registry
Add: swarm template + evaluation suite + workflow type registries, subagent skill constraints, parent-child scope inheritance, join-strategy compatibility.
Optional extraction
If the kernel stabilizes, extract concord-registry as a reusable library. Keep Concord-specific domain semantics in Concord.
Section 119 Acceptance criteria
119.1 · Functional · v1
- Create registry entity.
- Create registry version.
- Validate manifest.
- Publish version.
- Bind version to scope.
- Resolve version by scope.
- Resolve skills / tools / connectors / command types for an AgentRun.
- Prevent retired version from resolving.
- Write lifecycle events.
- Rollback binding.
119.2 · Technical · v1
- All registry writes are audited.
- Published versions are immutable.
- Runtime manifests exclude disallowed capabilities.
- DBOS lifecycle workflows run idempotently.
- High-risk registry changes are blocked without approval.
- Resolution is deterministic.
- Postgres is the source of truth.
Section 120 Open questions
- How generic should the registry kernel be in v1?
- Should typed domain tables exist immediately, or are manifests + relationships sufficient at first?
- How strict should manifest validation be at first?
- Should
SKILL.mdand other large artifacts live in Git, object storage, or Postgres? - Should high-risk approval be per version or per binding?
- How should runtime manifest caching work?
- How should registry impact analysis be visualized?
- Should deprecated versions resolve in production?
- How should active DBOS workflows behave if a registry version is retired mid-run?
- Should command type contracts be required before any command can run?
Section 121 Final recommendation
Build a new core Concord module concord.registry — the Concord Domain Registry, not only a Skills Registry.
121.1 · The twelve registries
121.2 · On a shared kernel
121.3 · The architecture
The Domain Registry is Concord's capability graph. Skills are one node type in that graph.