Exploring reliability for multi-agent AI
A research project investigating how to validate agent handoffs, replay from failure checkpoints, and prevent race conditions in concurrent agent pipelines — built as a working proof of concept.
Research Findings
What existing tools can't do.
Monitoring detects. Alerting notifies. Tracing records. Circuit breakers retry. None of it validates what one agent hands to the next, recovers from a specific failure point, or prevents concurrent agents from overwriting shared state. This project explores what a purpose-built reliability layer would look like.
Tracing
Every agent step recorded — status, latency, inputs, outputs. See the full pipeline in one view. Equivalent to structured logging, but built for agent handoffs.
Logging does this too, but unstructured and without the pipeline view.
Contract + Replay
Define the schema each agent expects. Sentinel validates every handoff — bad output is blocked before the next agent runs. An incident is created, a checkpoint saved. Fix the output, replay from there.
Logging records the failure after the fact. Sentinel prevents it and gives you a recovery path.
Shared State
Safe concurrent writes for parallel agents. If two agents write to the same key simultaneously, Sentinel retries and merges — no silent overwrites, no data loss.
Logging can't prevent a race condition, only record that it happened.
How it works
Under the hood
Sentinel instruments at the Python function level — context managers, class-level monkey-patching, and inline validation. No daemons, no sidecars, no config files.
sentinel.workflow() opens a run
TracingAssigns a run_id, POSTs to the ingestion API. The context manager auto-closes the run as success, blocked, or failed when the block exits — no try/finally needed.
run.step() wraps each agent
TracingRecords step name, type (llm_call / tool_call), input, output, and wall-clock latency. Steps nest under the run automatically — no manual wiring.
patch_openai_async() hooks at the class level
Auto-instrumentPatches AsyncCompletions.create on the OpenAI SDK class itself — so every AsyncOpenAI client created anywhere, including inside third-party libraries, becomes a traced step under your active run.
handoff() validates at the boundary
ContractsBefore the next agent runs, Sentinel checks the payload against the registered contract. Wrong type, missing field, or out-of-range value raises ContractViolationError — the downstream agent never executes.
Checkpoint saved — replay when ready
ReplayA checkpoint is saved at every handoff. Fix the payload, call replay() — Sentinel fast-forwards past the checkpoint, reusing outputs from all prior steps.
Live Prototype
Explore the working prototype
The concepts in this research are implemented as a working proof of concept — every step traced, every failure caught, full replay capability. Open to explore.
Integrations
Works with your stack
Sentinel wraps your existing framework — no rewrites, no lock-in. If it makes LLM calls or coordinates agents, it works.
Don't see your stack? Sentinel instruments at the Python function level — if you can call it, Sentinel can watch it.
Collaborate
Interested in this research?
Reach out to discuss the ideas, explore a collaboration, or ask questions about the prototype.