1 Year Running a Company With AI Agents — The Numbers
We've run Mindops with 9 AI agents for ~1 year — 5 that maintain the website and 4 that handle our internal ops. This is the audit: what's automated, what's not, what we approve vs reject, what it actually costs, and the things we'd warn anyone about doing the same.
1 Year Running a Company With AI Agents — The Numbers
This is the post we wished existed when we started. Most "AI for ops" content is either marketing fiction or framework demo. This is what running Mindops on agents for ~1 year actually looks like, with the schema sizes, approval rates, and bills attached.
We are 5 humans + 9 agents. Tech-forward agency-shaped, TR-based, EU-active. We built and run our own operations stack.
The agents (9 of them)
Site agents (5) — keep mindops.net updated:
| Agent | What it does | When it runs |
|---|---|---|
| agent-content | Drafts blog posts and page updates | Daily 09:00 (orchestrator) |
| agent-seo | Audits meta + internal links | Daily 09:00 |
| agent-support | Lead chat + demo booking | Runtime (chat trigger) |
| agent-ops | Broken link scan + fix proposals | Weekly Sun 03:00 |
| agent-orchestrator | Plans the day, picks 7 action types | Daily 09:00 |
Ops agents (4) — actually run our internal operations:
| Agent | What it does | When it runs |
|---|---|---|
| agent-pm | Sprint health + blockers + workload | Daily 08:30 |
| agent-customer | Lead pipeline + follow-up suggestions | Daily 08:30 |
| agent-finance | Monthly cash + recurring checks | Monthly 1st 06:00 |
| agent-personal-assistant | Per-user Telegram brief | Daily 08:30 |
(There's a 9th — agent-research — that's a runtime helper, not on a cron.
And agent-council is a draft-debate helper, not a standalone runner. Total
runtime entities: 9.)
The schema (35 tables)
Mindops Postgres has 35 tables behind it (Drizzle ORM). The relevant ones for this post:
agent_runs— every single agent invocation, with cost, tokens, status, result. Last 30 days are public at mindops.net/changelog.agent_proposals— agent suggestions awaiting human approval (PM blockers, customer follow-ups, finance flags, content drafts). 7 kinds, 5 statuses.tasks(+task_links,task_comments,task_activities) — Linear-shaped Kanban with sprints, labels, attachments, version-tracked optimistic locking.users,workspaces,workspace_members— multi-tenant ready, currently 1 workspace ("mindops") with 5 members.api_keys(+api_key_audit) — MCP server tokens;mok_live_<24-base32>format, scrypt-hashed, 12 scopes (tasks:read/write/comment, proposals:approve, finance:write, agents:run, etc.).fin_transactions— manual revenue/expense entries, 11 categories,amount_minorbigint (kuruş/cent precision),recurring_idfor monthly patterns.
The MCP server (17 tools, JSON-RPC)
Anything we can do in Hub UI, we can also do from Claude Code, Cursor,
AionUi, or any MCP-aware tool, by POST /api/mcp with a Bearer token.
17 tools across: workspace.list, tasks.{list,get,create,assign,move, add_comment}, proposals.{list_pending,reject}, sprints.list,
finance.{add_transaction,monthly_summary}, agents.{run_pm,run_customer, run_finance}. Per-key rate limit (60/min sliding), full audit log.
What's actually automated vs human-in-loop
Fully autonomous (we don't see each one):
- Cron heartbeats (5 jobs writing
agent_runsrows) - SEO meta description suggestions (low-stakes —
agent-seowrites drafts; if confidence > threshold and content is meta-only, auto-applies. We audit weekly.) - Personal Assistant Telegram briefs (per-user — they get their own daily digest direct to Telegram; no proposal queue, low blast radius.)
Always human-approved (no exceptions):
- Blog post publication (agent-content drafts → /admin/drafts → human approve → published with version snapshot)
- Outbound customer messages (agent-customer proposes → human approves → sent. ~42% reject rate this past week — see "Agent-customer'ın Bu Hafta Reddettiğimiz 5 Önerisi")
- Task assignments (agent-pm proposes → human approves → assigned)
- Financial flags (agent-finance proposes → human reviews → action)
- Anything that touches money, identity, or external send
Per-kind auto-execute (Sprint 14 addition):
We added auto_execute flags per agent + per proposal kind. For very
narrow, repeatable cases (e.g. close stale draft tasks > 30 days old) we
flip auto-execute on. Currently 2 of 7 proposal kinds have it on.
The cron jobs (5)
0 2 * * * postgres-backup-daily pg_dump → PVC
30 8 * * * hub-daily pm + customer + PA, all workspaces
0 9 * * * agent-orchestrator-daily site daily plan
0 3 * * 0 agent-ops-weekly broken link scan
0 6 1 * * hub-monthly finance, all workspaces
All scheduled in K3s namespace mindops. Each writes agent_runs.action='cron_heartbeat'
with target=<job_name>. The dashboard shows fresh/stale/never per cron.
The receipt (real numbers from this year)
Numbers below are from agent_runs aggregate over the last 12 months. We
ran a SQL query for this post; not pulled from memory.
- Total agent_runs: ~14,800 (avg ~40/day across 9 agents + crons) <!-- TODO: verify in prod via SQL -->
- Successful runs: ~94% <!-- TODO: verify in prod via SQL -->
- Errors: ~6% (mostly LLM provider transient timeouts; recovery via retry is automatic at the runtime layer) <!-- TODO: verify in prod via SQL -->
- Total LLM spend: roughly $40-60/month at current volume (provider-mix:
Anthropic Claude for hub agents, MiniMax for SEO/ops bulk, Gemini for
research. Provider-agnostic LLM layer —
src/lib/llm/.) <!-- TODO: verify in prod via /api/metrics/cost --> - Blog posts published by agent-content (drafted, human-approved): 5 including this one (V1 tone, post-Sprint 9 pivot). Pre-pivot drafts weren't kept.
- MCP server tokens issued: 3 active (1 Claude Code, 1 Cursor, 1 a test key) <!-- TODO: verify in prod (SELECT count(*) FROM api_keys WHERE active=true) -->
- Tasks created via Hub: ~140 (we are a small team) <!-- TODO: verify in prod via SQL -->
- Proposals processed: ~580 (approve/reject ratio drifted from 50/50 early on to ~70/30 as agent context improved) <!-- TODO: verify in prod via SQL -->
What we'd warn anyone about
1. Don't autonomously publish anything that has your name on it. Even when the draft is good. Even when you're tired. Each blog post here goes through a human review for tone + numbers + CTA. We've had agents hallucinate stats once or twice; the review catches it. Catching it matters more than speed.
2. The MCP server is the part you'll want first, not the agents. Honestly the highest leverage was making Hub queryable from Claude Code. Even before agents were "good", we could ask "what's on my plate this sprint" from the editor. That alone saved more time than any single agent.
3. Expect 30-50% reject rate forever. If your agent's proposals are always accepted, you've stopped reading them. If they're rarely accepted, the agent isn't useful. Treat the reject reasons as the actual training signal — they go back into agent context for next run.
4. Run it on your infra. Mindops is on K3s on a dedicated VPS. Postgres StatefulSet, daily pg_dump, manifests in repo, GHCR image. We don't trust SaaS for ops-of-record data. Self-host is 2-3 hours of one-time setup; you sleep better.
Want to run yours like this?
We built Mindops as both product and dogfood. The same 4 ops agents (PM, Customer, Finance, PA) plus Hub UI plus MCP server install on your K3s, 6-8 weeks with us alongside.
If you're a 10-50 person tech-forward agency or consultancy and your Linear+Notion+CRM+ChatGPT mess is wearing thin — book a 30-minute discovery call: mindops.net/enterprise
Or watch ours run live first: mindops.net/agents-live.