How we're building the first agentic marketing agency (and why you should too).
Our internal playbook for turning agents into a productized service — pricing, guardrails, kill-switches, the works.
Nobody hands you the playbook for this. There is no prior art worth copying, because the category is eight months old and half the people claiming to operate in it are running glorified prompt chains and calling it an agency. We learned most of what follows by shipping something real, watching it fail in specific and instructive ways, and then writing down what we actually did.
This is that document, made public. Not a pitch. A schematic.
I. The Pricing Decision (The Hardest One)
We went back and forth on pricing longer than we went back and forth on anything else. The options, in the order we ruled them out:
Per asset was first. The logic was intuitive — you produce a thing, you charge for the thing. The problem is that it turns every output into a unit, and once you are in a unit negotiation you are an agency in the old sense: defending your rate per blog post, per ad variant, per deliverable. The client's incentive becomes to cap the units. Your incentive becomes to multiply them. Nobody wins.
Per agent-hour was the Silicon Valley play and it is a trap. It sounds modern because it has the word "agent" in it, but it is just the billable hour with a different noun. You are still selling time. The client is still asking: is this time well spent? You are still unable to say yes with any certainty, because the value of an agent's hour depends entirely on what it is doing with that hour.
Pure performance commission was the one that looked best in our decks and would have destroyed us operationally. The idea: take a percentage of revenue lift, full stop. Clean. Aligned. The reality: you cannot run an agency on a commission structure where you might do six months of work before seeing a dollar. You cannot invest in infrastructure — which takes time to show returns — when your cash flow is a function of Q4 campaigns for a client who may churn in Q2.
What we landed on: a monthly subscription for the system — a flat fee that covers the swarm, the infrastructure, the ongoing management — plus a performance kicker tied to one agreed-upon metric. Not five metrics. One. The metric has to be something we can actually influence, measured over a window long enough to see compounding effects, and agreed upon in writing before any work begins. The kicker is a bonus, not the base. The base has to cover operations.
The refund clause is not optional in our model. If the agreed metric does not move within the contracted period, a portion of the monthly fee comes back. This clause costs us nothing when the work is good. It costs us appropriately when it is not. It also, crucially, changes how clients relate to the agreement — they are not buying a service, they are entering a performance partnership.
II. The Brief That the Swarm Can Actually Read
The single biggest failure mode in early agentic engagements is a brief that was written for humans. A human can read a brief that says "capture our playful but authoritative voice" and translate it — imperfectly, over time — into something useful. An agent cannot. An agent needs that sentence to be a policy: here are three approved phrases, here are three phrases we do not use, here is the register we aim for, here is a document that exemplifies it.
The brief we produce at the start of every engagement has five sections, and we do not move past week one until all five are signed off by a human on the client's side.
The first is the objective: one sentence, one metric, one target. Not "grow awareness and drive conversions." Growth and conversion are different goals that require different swarm configurations. Pick one for this engagement.
The second is the audience definition: specific enough that the Recon agent can watch for signals, specific enough that the Studio agents can write to a real person. "B2B SaaS CMOs at Series B companies with 50–200 employees who are currently evaluating their agency stack" is a brief. "Marketing leaders" is a noise floor.
The third is the brand policy: what we can say, what we cannot say, what requires human escalation. This document starts small and grows. Every Critic rejection that reveals an edge case becomes a new policy line. By month three it is the most valuable document in the engagement.
The fourth is the kill-switch definition: under what conditions does a human have to intervene before the swarm acts? Large budget commitments above a threshold. Statements on politically sensitive topics. Anything involving a named competitor. These are not suggestions. They are hard stops, wired into the swarm's decision tree.
The fifth is the escalation path: when the swarm encounters something genuinely ambiguous, who does it contact, on what channel, and how fast do they need to respond before the swarm makes its own call?
A brief that has all five of these is a brief a swarm can act on. Everything else is aspiration.
III. Kill-Switches and the Guardrail Architecture
The kill-switch conversation is the one most agency founders want to skip because it sounds like you are planning for failure. We frame it differently: the kill-switch is what makes it safe to go fast.
We maintain four levels of agent authority, configured per engagement:
- Level 1 — autonomous: the agent acts, logs, and reports. No human approval required. Covers: content scheduling, performance reporting, competitor monitoring, copy drafts below a word-count threshold.
- Level 2 — approve-before-ship: the agent produces the output, a human reviews and approves before it goes live. Covers: paid campaign launches, landing page copy, any external-facing statement about a named third party.
- Level 3 — escalate-and-pause: the agent flags the situation, pauses all related work, and waits for a human decision before doing anything. Covers: budget decisions above a threshold, situations that touch legal risk, any output the Critic scores as genuinely uncertain.
- Level 4 — global kill: one button, one click, every agent in the swarm stops. Available to the client and to us. Reason required afterward. Logs preserved forever.
The Critic agent, which most people treat as a quality filter, is actually the primary guardrail mechanism. It runs before any Level 1 output becomes a Level 1 output — which is to say, it sits between the work and the world for everything that ships without human approval. The Critic's brief is fixed, not learned: check against brand policy, check against factual accuracy, check against the legal conservatism we agreed on, run the gut-check. If it fails any of these, the output goes back. Not forward.
We built underneath all of this a substrate we don't yet name — a shared cognition layer that holds the engagement's accumulated policies, brand decisions, and Critic rulings in one place, queryable by every agent in the swarm. That is what makes the guardrail architecture work over time rather than just at launch.
IV. The Three-Week Ship Schedule
Week 1: Brief and policy. We do not deploy anything. We write. The brief, the five-section document above. The policy engine. The kill-switch thresholds. The brand voice document, built by showing the Studio agents three exemplary pieces and three anti-examples and iterating until the output is indistinguishable from the exemplars. At the end of week one, a human on the client side signs off on everything. No exceptions.
Week 2: Sandbox. The swarm goes live against real data but a capped budget — typically ten percent of the intended live spend. Every output that would ship in production ships in the sandbox to a staging environment or a review queue. The client reviews. We review. The Critic rejects approximately thirty percent of what the Studio swarm produces in week two, and that is correct — week two is the Critic learning the engagement, not the Studio underperforming. By end of week two, the rejection rate is down to twelve to fifteen percent, which is where we want it for production.
Week 3: Live. Full budget. Real channels. The number starts moving. We watch the telemetry. The client watches the dashboard. The Critic keeps rejecting what it should reject. The Growth agent starts running its first optimization cycles. At the end of week three we have a two-week-old swarm that has already accumulated more brand-specific policy data than most traditional agencies accumulate in a year.
V. The Unglamorous Part
The unglamorous part is maintenance. The brief needs updating when strategy shifts. The policy engine needs updating when the Critic catches something it should have caught but didn't. The kill-switch thresholds need recalibrating when the client's risk tolerance changes. None of this is interesting to talk about. All of it is where the compounding value actually lives.
The agencies that will be standing in five years are not the ones with the best swarm architecture at launch. They are the ones with the most disciplined policy maintenance practice — the ones that treat every Critic rejection as a data point, every client escalation as a policy gap, every near-miss as an opportunity to make the brief more precise.
The playbook is not exciting. Neither is compound interest, until it is.
§ — the founding swarm · operational since MMXXVI