Research Project · Case Study

Exploring reliability for multi-agent AI

A research project investigating how to validate agent handoffs, replay from failure checkpoints, and prevent race conditions in concurrent agent pipelines — built as a working proof of concept.

agentsentinelai.com/dashboard⤢ expand
SSentinel.AI
Overview
Live
3
Active runs
94%
Success rate
1
Open incidents
284ms
Avg latency
PipelineStatusWhen
my_pipelinesuccess2 min ago
lead_qualifierblocked12 min ago
content_draftersuccess28 min ago
data_enricherrunningjust now

Research Findings

What existing tools can't do.

Monitoring detects. Alerting notifies. Tracing records. Circuit breakers retry. None of it validates what one agent hands to the next, recovers from a specific failure point, or prevents concurrent agents from overwriting shared state. This project explores what a purpose-built reliability layer would look like.

🔗
Observability

Tracing

Every agent step recorded — status, latency, inputs, outputs. See the full pipeline in one view. Equivalent to structured logging, but built for agent handoffs.

Logging does this too, but unstructured and without the pipeline view.

📋
EnforcementCan't do with logging

Contract + Replay

Define the schema each agent expects. Sentinel validates every handoff — bad output is blocked before the next agent runs. An incident is created, a checkpoint saved. Fix the output, replay from there.

Logging records the failure after the fact. Sentinel prevents it and gives you a recovery path.

🔒
CoordinationCan't do with logging

Shared State

Safe concurrent writes for parallel agents. If two agents write to the same key simultaneously, Sentinel retries and merges — no silent overwrites, no data loss.

Logging can't prevent a race condition, only record that it happened.

How it works

Under the hood

Sentinel instruments at the Python function level — context managers, class-level monkey-patching, and inline validation. No daemons, no sidecars, no config files.

01

sentinel.workflow() opens a run

Tracing

Assigns a run_id, POSTs to the ingestion API. The context manager auto-closes the run as success, blocked, or failed when the block exits — no try/finally needed.

02

run.step() wraps each agent

Tracing

Records step name, type (llm_call / tool_call), input, output, and wall-clock latency. Steps nest under the run automatically — no manual wiring.

03

patch_openai_async() hooks at the class level

Auto-instrument

Patches AsyncCompletions.create on the OpenAI SDK class itself — so every AsyncOpenAI client created anywhere, including inside third-party libraries, becomes a traced step under your active run.

04

handoff() validates at the boundary

Contracts

Before the next agent runs, Sentinel checks the payload against the registered contract. Wrong type, missing field, or out-of-range value raises ContractViolationError — the downstream agent never executes.

05

Checkpoint saved — replay when ready

Replay

A checkpoint is saved at every handoff. Fix the payload, call replay() — Sentinel fast-forwards past the checkpoint, reusing outputs from all prior steps.

import sentinel
 
sentinel.init(api_key="sk_live_...")
 
with sentinel.workflow("travel-planner") as run:
with run.step("planner", step_type="llm_call") as step:
step.set_input({"query": query})
result = planner_agent(query)
step.set_output(result)
 
with run.step("researcher", step_type="tool_call") as step:
step.set_input(result)
data = researcher_agent(result)
step.set_output({"findings": data})

Live Prototype

Explore the working prototype

The concepts in this research are implemented as a working proof of concept — every step traced, every failure caught, full replay capability. Open to explore.

Integrations

Works with your stack

Sentinel wraps your existing framework — no rewrites, no lock-in. If it makes LLM calls or coordinates agents, it works.

OpenAILLM
AnthropicLLM
GeminiLLM
LlamaLLM
LangChainFramework
LlamaIndexFramework
AutoGenFramework
CrewAIFramework
LangGraphFramework
Any HTTP APICustom

Don't see your stack? Sentinel instruments at the Python function level — if you can call it, Sentinel can watch it.

Collaborate

Interested in this research?

Reach out to discuss the ideas, explore a collaboration, or ask questions about the prototype.