← All posts

Four Subscriptions, One Plan: Routing Coding Agents Across Anthropic, OpenAI, Google, and Z.ai

· Lee
#VERMITTLER#FORGE#DSG#MAELSTROM

Four Subscriptions, One Plan: Routing Coding Agents Across Anthropic, OpenAI, Google, and Z.ai

I pay for four AI subscriptions, top tier on each. Claude Max. ChatGPT Pro. Google AI Ultra. GLM Coding Plan Max. The base subscriptions alone are around $800 per month; in April I burned another $2,000 of extra Claude usage on top, for over eleven billion tokens of Claude Code alone in a single month. On any given evening, all four CLIs are running against the same monorepo, working through the same plan, opening pull requests against the same main branch. They do not step on each other, they do not exhaust each other's quotas, and every dispatch produces a cryptographically signed receipt I can replay months later.

This is the workflow. It is the second piece of a story I started in the previous post about constitutional security. There I argued that safety is a property of infrastructure, not of the model. Here is what that infrastructure actually looks like when I point it at myself.

The Problem: Vendor Roulette

The naive way to use multiple AI subscriptions is to alt-tab. You ask Claude to plan something. You paste the plan into ChatGPT. Codex writes the code. You drop the diff back into Claude for review. Half your context is lost in every transition. You have no audit trail. You have no idea which model wrote which line. You discover, three weeks later, that two of them edited the same file and the second one silently undid the first.

The slightly less naive way is to write API integrations against the underlying model providers. Now you have a real pipeline, but you have also stepped outside the terms of service of every subscription you pay for. Pro plans, Max plans, coding plans — they are subscriptions to the CLI tool, not generic API quotas you can route through your own gateway. You can use the API, but then you pay API prices on top of your subscription, and you lose the prompt engineering, caching strategy, and tool definitions that the vendors have spent millions optimizing.

I wanted a third option: orchestrate the real CLIs, route prompts through them as the user, and treat the whole thing as a regulated transport layer.

That third option is VERMITTLER.

The Mediator Pattern

VERMITTLER (German for "mediator" or "broker") is a Rust port of an earlier TypeScript gateway I wrote inside the GEIST platform. Its job description fits in one line, taken verbatim from the crate manifest:

Routes CLI prompts across registered agent providers (Claude Code, Codex, Gemini, OpenCode, Cursor, Copilot).

Each provider is a separate crate with the same trait surface: vermittler-claude, vermittler-codex, vermittler-gemini, vermittler-cursor, vermittler-opencode, vermittler-copilot. They all implement ICLIAgentProvider. The gateway holds a registry, dispatches a CLIPromptInput, and returns either a CLIPromptOutput or an async stream of CLIStreamEvents. There is also vermittler-mlx for local models — Qwen3-Coder, Gemma — which lets me route cheap tasks to my own M2 instead of paying for them.

The interesting decision is that VERMITTLER does not call any provider's API. It spawns the user's subprocess. The description on vermittler-claude is explicit:

Claude Code CLI adapter. Spawns the user's claude subprocess for TOS-safe agent dispatch.

This buys three properties at once. Quota flows through the right account — my subscription, not a wallet attached to an API key. The CLI tool keeps its own caching, system prompts, and tool definitions intact. And the dispatch is terms-of-service-clean because, from the vendor's perspective, the user is just using the CLI; VERMITTLER is just a fancy wrapper around popen.

The catch is that the spawned subprocess needs to think it's running in a real terminal, or every interactive CLI in the world will refuse to render properly. So VERMITTLER hands each provider a PTY allocated by a separate crate called dsg-pty — a portable_pty wrapper that bridges blocking PTY reads to a tokio stream. isatty() returns true, fancy progress bars render, color codes survive, and the subprocess never knows it's being orchestrated.

The Plan: FORGE

VERMITTLER routes prompts. It does not decide what those prompts are. That is FORGE's job.

FORGE is a workflow plugin (it has packagings for Claude Code, Cursor, and Gemini CLI — same skills, three plugin layouts) plus a deterministic Rust core called forge-core. The flow has three primary commands: brainstorm, plan, execute. They produce, respectively, a brainstorm markdown file, a plan markdown file with structured tasks, and a directory of git worktrees where the actual work happens.

The plan format is just markdown, parsed deterministically. Here is a fragment from a real recent session in my repo:

### T1: Crate Refactor: WASM-Compatible VortexManifest
- **Status**: done
- **Category**: implementation
- **Depends on**: none
- **Files**: `crates/vortex-manifest/Cargo.toml`, ...

### T2: Maelstrom: Native Resonance Inference
- **Status**: done
- **Category**: implementation
- **Depends on**: T1
- **Files**: `crates/maelstrom/src/kernel/reiz_model.rs`, ...

## Execution Batches

| Batch | Tasks | Strategy   | Notes                          |
|-------|-------|------------|--------------------------------|
| 1     | T1    | sequential | Core Struct Migration          |
| 2     | T2,T3 | parallel   | Kernel & Worker Finalization   |
| 3     | T4,T5 | sequential | UI Polish & Final Validation   |

The markdown is what I write, but it is not what FORGE stores. The Rust core parses it into typed structs (ForgePlan, ForgeTask, ForgeBatch), validates the dependency DAG, and produces a ForgeValidationReport that downstream tools can act on. If the markdown is malformed, the plan is rejected. If the DAG has a cycle or a missing dependency, the plan is rejected. If two tasks in the same parallel batch touch the same file, the plan is also rejected.

That last one is the part that makes parallel execution safe.

FORGE Describes Itself

There is one more layer behind those typed structs, and it is the part I am most proud of. The markdown plan I write at the top of every session is not the canonical representation of a FORGE plan. It is a rendered view of a canonical representation that lives in a file format I designed for this purpose, called FSF — Flux Shape Format. Every FORGE data structure has its own .fsf file. Here is forge-plan.fsf in full:

shape ForgeTask {
  id: str,
  title: str,
  status: str,
  category: str,
  dependsOn: arr[0..128]<str>,
  files: arr[0..256]<str>,
  description: str
}

shape ForgeBatch {
  id: str,
  taskIds: arr[0..128]<str>,
  strategy: str,
  worktreeRef: str
}

shape ForgePlan {
  sessionId: str,
  title: str,
  tasks: arr[0..512]<ForgeTask>,
  batches: arr[0..256]<ForgeBatch>
}

FSF is a tiny schema DSL. Shapes can nest. Arrays have explicit bounds. Optional fields are str?. The parser (flux-shape-format) compiles each .fsf file into Beschau bytecode, the same codec the rest of the platform uses for content-addressed validation. The canonical value — a parsed plan, a worktree manifest, a validation report — is serialized as FSV, a binary format that requires its FSF schema to decode. Markdown is generated from the FSV value, never the other way around.

The boundaries between formats are themselves typed. From forge-schemas/src/lib.rs:

pub const FSF_SCHEMA_BOUNDARY: ForgeArtifactBoundary = ForgeArtifactBoundary {
    format: ForgeArtifactFormat::FsfSchema,
    extension: ".fsf",
    canonical: true,
    binary: false,
    generated: false,
    requires_fsf_schema: false,
};

pub const FSV_BINARY_VALUE_BOUNDARY: ForgeArtifactBoundary = ForgeArtifactBoundary {
    format: ForgeArtifactFormat::FsvBinaryValue,
    extension: ".fsv",
    canonical: true,
    binary: true,
    generated: false,
    requires_fsf_schema: true,
};

pub const MARKDOWN_RENDERED_BOUNDARY: ForgeArtifactBoundary = ForgeArtifactBoundary {
    format: ForgeArtifactFormat::MarkdownRendered,
    extension: ".md",
    canonical: false,
    binary: false,
    generated: true,
    requires_fsf_schema: false,
};

Six artifact formats, each declared as a compile-time constant with a fixed extension, fixed canonicity status, and a fixed answer to "do you need an FSF schema to interpret this?" Markdown is canonical: false, generated: true. FSV is canonical: true, requires_fsf_schema: true. JSON exists, but only for JsonCompatibility — a non-canonical boundary that lets old tooling read FORGE data. There is exactly one source of truth, and the type system knows which one it is.

The point of all this is simple: FORGE describes itself in the same schema system it compiles plans against. There is no "schema for the schema language" written in some other format. FSF schemas declare FORGE's data; FSF schemas are themselves parsed by a Rust crate; the Rust crate produces typed values that round-trip through FSV; and FSV is content-addressed through BLAKE3, so any change to a plan produces a new hash, which feeds the audit chain described later in this post. The whole loop is closed.

I did not start by planning to build a schema language. I started by wanting FORGE plans to be diffable in git, debuggable by eye, and impossible to corrupt silently. By the time I had those three properties, I had FSF.

Where Parallel Goes Wrong

The most expensive mistake you can make with parallel coding agents is letting two of them edit the same file. Most of the time the agents are smart enough to handle merge conflicts at the end. The problem is when they're not, and you spend two hours debugging code where each line is locally correct but the file as a whole no longer compiles, because half of it assumes one refactor and the other half assumes another.

The fix is structural: detect the conflict in the plan, before any agent runs. From forge-core/src/conflicts.rs:

pub fn validate_parallel_file_conflicts(plan: &ForgePlan) -> Vec<FileConflictIssue> {
    // ...
    for batch in &plan.batches {
        if batch.strategy != ForgeBatchStrategy::Parallel || batch.task_ids.len() < 2 {
            continue;
        }
        let mut owners_by_file: BTreeMap<String, Vec<String>> = BTreeMap::new();
        for task_id in &batch.task_ids {
            if let Some(files) = files_by_task.get(task_id.as_str()) {
                for file in files {
                    owners_by_file.entry((*file).to_string()).or_default().push(task_id.clone());
                }
            }
        }
        issues.extend(/* every file with more than one owner becomes an issue */);
    }
    issues
}

Ninety-five lines. Every task in a plan declares the files it intends to modify. For every batch marked parallel, the validator collects the per-file ownership map and emits an issue for every file owned by more than one task. If any issues exist, forge execute refuses to run. The plan must be rewritten — usually by moving the conflicting tasks to a sequential batch, or by splitting them so each agent owns a disjoint slice of the file system.

This is the same fail-secure pattern from the Constitutional Security post — any deny at any layer stops processing, no exceptions. The difference is the layer. The firewall in GEIST protects an AI agent from executing unauthorized tool calls. The conflict validator in FORGE protects a developer from executing parallelism that would corrupt the file system.

Once a plan is validated, FORGE creates one git worktree per batch — physical filesystem isolation — and dispatches the implementer agents through VERMITTLER. Each agent sees its own worktree, edits only the files declared in its tasks, and produces a pull request. Batches run in dependency order. Inside a batch, parallel tasks run concurrently against distinct worktrees that cannot, by construction, intersect.

Quota as a State Machine

Routing across four subscriptions is easy until you hit a rate limit mid-batch. VERMITTLER's solution is vermittler-quota, a plan-quota state machine. The relevant types:

pub const DEFAULT_PLAN_CATALOG: &[(&str, &str, &[&str])] = &[
    ("claude-max",       "Claude Max ($200)",     &["claude-opus-4-7", "claude-sonnet-4-6", "claude-haiku-4-5"]),
    ("chatgpt-plus",     "ChatGPT Plus",          &["codex-gpt-5.5", "codex-gpt-4-1"]),
    ("zai-coding-plan",  "Z.ai Coding Plan",      &["opencode-glm-5.1", "opencode-glm-4.5"]),
    ("gemini-pro",       "Gemini Pro",            &["gemini-2.5-flash", "gemini-3-pro"]),
];

pub enum QuotaDecision {
    NoPlan,
    Unlimited { plan_id: String },
    Ok        { plan_id: String, ratio: f64, remaining: u64 },
    Warn      { plan_id: String, ratio: f64, remaining: u64, threshold: f64 },
    Blocked   { plan_id: String, ratio: f64, cap: u64, used: u64 },
}

Every dispatch checks the plan-quota state machine before it goes out. Ok proceeds. Warn (default threshold: 80% of cap) emits telemetry and proceeds. Blocked is a hard refusal — the gateway returns an error before the subprocess is ever spawned. Counters auto-reset on the UTC month boundary. Soft-warn transitions emit PlanQuotaUpdate telemetry events so a dashboard can show me when I'm about to burn through Claude Max two weeks early.

The state machine is concurrent (DashMap + atomic threshold) because four CLIs hitting the gateway simultaneously is normal traffic. It is in-memory because persistence happens in a separate module (vermittler-quota::persist) that snapshots to disk and pushes telemetry to a separate billing endpoint. The split exists so a broken persistence layer cannot break dispatch — same fail-secure discipline.

Signed Receipts for Every Dispatch

The audit layer is where the constitutional substrate from the previous post shows up directly. Every dispatch through VERMITTLER produces a SignedAuditRecord, written through MaelstromCasAuditSink. From the audit module docstring:

[MaelstromCasAuditSink] is the production sink: signs each record with Ed25519 (key supplied at construction time), serializes the [SignedAuditRecord] as canonical JSON, writes the bytes to a maelstrom [MaelstromFs] (which content-addresses by BLAKE3), and walks an in-memory chain head so subsequent records carry prev_record_hash_hex.

The hash inputs are deliberate: input_hash_hex is the BLAKE3 of the canonical JSON of the inbound CLIPromptInput, and output_hash_hex is the same for the outbound CLIPromptOutput. Each record carries the hash of its predecessor, forming a chain. If anyone — including me — modifies a historical record, the chain breaks and every subsequent record fails to verify.

This is the same primitive set as MAELSTROM's transport layer: Ed25519, BLAKE3, content-addressable storage. Same crates, different surface. The audit module also handles failure correctly: if the CAS write fails, the in-memory chain head is not updated, so the next record retries against the same predecessor instead of forking the chain.

The practical effect is that I can, months later, ask: which agent edited this file? Which model wrote that function? What was the exact prompt? Was the dispatch within quota at the time? Did I authorize this capability? The answers exist, signed, and replayable.

Routing Decisions

The actual routing logic is small — almost embarrassingly so. FORGE's config file looks like this:

{
  "LINEAR_API_KEY": "${tresor:forge/linear-api-key}",
  "NOTION_TOKEN":   "${tresor:forge/notion-token}",
  "FORGE_MAX_BATCH_SIZE":   4,
  "FORGE_DEFAULT_MODEL":    "gemini-3-flash-preview",
  "FORGE_AUTO_RESEARCH":    false
}

FORGE_MAX_BATCH_SIZE: 4 is the parallelism ceiling. FORGE_DEFAULT_MODEL is the fallback. Routing happens per-stage, declared in each agent's .flux file. A real example from the security auditor:

# model:            anthropic/claude-opus-4-7
# fallback_model:   codex/gpt-5.5
# fallback_model_2: google/gemini-3.1-pro-preview
# rationale: cross-family security review. Opus 4.7 primary for highest-
# stakes work — secret leaks and capability misdeclarations have permanent
# consequences. Codex/GPT-5.5 (ChatGPT Pro) FB1 catches what same-family
# review misses; cross-training-source pattern matching is the value-add
# for security-specific work. Gemini-3.1-pro-preview FB2 third perspective.

The pattern repeats per agent. Brainstorm-heavy work goes to Opus or Gemini 3.1 Pro for divergent thinking; planning goes to Gemini Flash because plans are dense, structured, and cheap to regenerate; bulk implementation goes to GLM-5.1 (Z.ai) for parallel-batch tasks where I want volume; mechanical edits drop to local Qwen3-Coder on my M2 to avoid cloud spend entirely. Reviewers run on a different vendor family than the implementer — the most useful catches in practice come from a model that did not write the code, trained on a different corpus, looking at the same diff. Konzil (council vote) is reserved for plans that touch more than three packages or rewrite a public API; six models vote, ballots are Ed25519-signed, the audit chain is attached to the plan.

The MODEL_CATALOG entries record cost-per-million-tokens for each provider — except where they don't:

'codex/gpt-5.5': {
  name: 'GPT-5.5 (via codex CLI)',
  cost: cost(0, 0, 0, 0), // covered by ChatGPT Pro subscription
  ...
},

That zero is the proof that the CLI-spawn-not-API approach actually works at the accounting layer. Costs are recorded as zero in the catalog because the billing happens through the subscription, not through my pocket per token.

The secrets in FORGE's config reference ${tresor:forge/...} — TRESOR is the secrets store, resolved at runtime. None of the keys land on disk.

The Meta-Point

The architectural story is that vendor diversity is a constitutional question, not a cost-optimization one. If your workflow depends on a single AI provider, you are one terms-of-service change, one outage, one rate-limit drop away from being unable to ship. The fix is not to negotiate harder with the vendor. The fix is to build the routing infrastructure that lets you treat any provider as fungible at the dispatch layer while keeping their CLI tool intact at the execution layer.

VERMITTLER is that routing layer. FORGE is the planner that feeds it. The audit chain is the receipt that proves the whole thing actually happened the way I claim.

Four subscriptions, one plan, one DAG, no merge conflicts, one signed history. Built in free time, by a single person, in Rust, because the alternative was to keep alt-tabbing forever.

The Honest Version

That is the architectural story. The honest story is that I would have preferred to ship all of this on one provider.

The numbers from earlier this year: 9.6 billion tokens through Claude Code between January and mid-April, an API-equivalent value of $4,822 according to Anthropic's own metering, against the roughly $700 I had paid in subscription fees over the same period. April alone — top of my note above — passed eleven billion tokens with another $2,000 in extra usage on top of the Max subscription. By any reasonable measure of "customer who should be on a phone call with enterprise sales," I qualify.

I sent the email. Subject line: "Solo developer, $4,800+ API-equivalent usage on Max — need a conversation." Two clear questions: is there an Enterprise plan that makes sense for a one-person company with this usage pattern, and is there someone on the safety side who would want to look at the constitutional security framework I had built around their CLI. I attached the usage report. I followed up. Nothing.

So the workaround came first. Before VERMITTLER, before the audit chain, before the quota state machine, there was a more embarrassing artifact called DSG — a full GPU-accelerated terminal emulator I wrote so I could run Claude Code inside it, intercept its stdout and stderr, and parse the tool calls and responses out of the rendered terminal frames in order to route the output into my own pipelines. A terminal emulator. To bridge a subscription to a workflow. The Rust crate is real, it works, and it would not exist if a single config flag had let me point Claude Code at a callback. The DSG codename is Datensichtgerät — "data display device," a 1970s East-German term for a terminal. It is the most on-the-nose name for a workaround I have ever shipped.

The infrastructure built outward from there. If I have to parse a terminal to get programmatic access to a subscription model, I might as well do the same for the other three vendors and have a clean abstraction. VERMITTLER is at one level a constitutional security argument about vendor lock-in, and at another level the engineering equivalent of "fine, I'll do it myself." The result is genuinely better than what I would have built on a single provider — multi-vendor routing produces measurably better code than any one model, because cross-family review catches what same-family review misses; the quota state machine surfaces problems no single-vendor dashboard would; the audit chain works across providers in a way no vendor-specific telemetry could match. But better-than-counterfactual is not the same as planned.

The right way to read both posts together is: this is what a single developer builds when the alternative is waiting for a callback that does not come. The code is good because it had to be. The architecture is documented because nobody else would have documented it. The patterns transfer because they were never specific to any one vendor in the first place.

Coming Soon: Open Source

The first piece going public is @ekelhaft-tools/vermittler-provider — an opencode plugin that registers VERMITTLER as a custom AI provider. opencode integrates models through the AI SDK, so providers expose an OpenAI-compatible HTTP endpoint that the SDK hits with standard chat-completion requests. The plugin uses that same shape but intercepts the fetch call, translates each request into a CLIPromptInput, and dispatches it through vermittler-napi's VermittlerGateway to the matching CLI adapter. The CLIStreamEvent stream coming back is translated into OpenAI-compatible SSE chunks the AI SDK already knows how to render.

What the user sees is a model picker that lists everything in one place:

claude/opus-4-7            codex/gpt-5.5              gemini/3.1-pro-preview
claude/sonnet-4-6          codex/gpt-5-codex          gemini/3-flash-preview
claude/haiku-4-5           cursor/agent               gemini/3.1-flash-lite
opencode/glm-5.1           copilot/agent              mlx/qwen3-coder-30b
                                                      mlx/qwen3-4b
                                                      mlx/gemma-4-e4b

The same plan can now run any model from any vendor against the same monorepo, and the only thing that changes between runs is which subscription pays the bill. opencode/glm-5.1 is recursive — useful for sandboxed sub-sessions where I want an opencode call from inside an opencode call. The MLX adapters are local; they cost nothing and never leave the M2.

The circle closes neatly enough that I should name it. opencode is — by community consensus — the third-party harness whose rapid growth is what pushed the major subscription providers to start binding their plans to their own first-party CLIs in the first place. The gating exists because opencode existed. VERMITTLER exists because the gating exists. And now VERMITTLER ships first as an opencode plugin. I did not plan it this way, but I am not unhappy about it.

The rest of the stack follows over the next month: FORGE itself, the audit module, the quota state machine, the conflict validator, and the FSF/FSV schema toolkit — all moving to a permissive open-source license. After that comes KATASTER, a developer-portal catalog that does what Backstage promises but as a Rust TUI: single binary, no Postgres, three-method plugin trait, parallel snapshots with isolated failures. The pieces that touch the constitutional substrate (MAELSTROM, the Schloss security stack) follow on a longer timeline as the dual-license commercial layer for the customer-facing apps stabilizes.

The reason for shipping is straightforward: this stack was built by one person against the public surface of every major AI coding tool, and it works. There is no reason for the next person who hits the same wall to rebuild it from scratch. If the four providers themselves want to ship a competing gateway, good — that means the abstraction is real.

Watch github.com/ekelhaft-tools over the next month.


Lee builds ekelhaft.tools — an ecosystem of DJ tools, AI agent orchestration, and streaming infrastructure. VERMITTLER, FORGE, MAELSTROM, and the constitutional security framework described here are part of this ecosystem. All project names are German words chosen for their industrial energy and Berghain aesthetic.

— reads