Argus
I noticed checkout queue depth climbing past 4k. Opened INC-2104 — classified P1.
Argus tracks incidents, changes, on-call, and postmortems with AI classification, similar-incident search, drafted postmortems, and plain-English search. Audit trail on every entity.
I noticed checkout queue depth climbing past 4k. Opened INC-2104 — classified P1.
Two from Q4 ring the same bell — INC-1987 and INC-2042. Same service, same hour, same shape.
joined the watch.
I closed INC-2099. Memory pressure resolved after gc cycle completed. 19m resolution.
Argus · needs your nod
I'd like to throttle Stripe webhook retries to 30/min. That should clear the backpressure without dropping callbacks.
I'm not a chatbot bolted on the side. I work inside the operations workflow — classifying, drafting, searching, surfacing the past incidents that ring the same bell. Every read I give you carries confidence and sources.
The moment an alert lands, I read the title, the description, and the systems involved. I label it P1–P4 and tag the surface affected — before anyone gets paged.
"Checkout 503s climbing" → P1 · payments-api · customer-facing
Vector embeddings on every postmortem and known error. When a new alert lands, the runbook from three months ago is one keystroke away — not buried six clicks deep.
3 similar: INC-1284, INC-998, INC-742 → known-error KE-12 · workaround attached
Once a thread closes, I pull the timeline from comments, alerts, and the trail into a draft. Yours to revise — never written from scratch.
Timeline · Root cause · Contributing factors · Action items → draft ready in 8s · review and publish
"All P1s in payments last quarter." "What ran on the gateway during the 3am page?" I read it. No query DSL, no filter pyramid.
"show me on-call escalations that paged twice last week" → 4 results · grouped by team · with timeline
The legacy ITSM stack is an artefact of pre-AI workflows. I'm built for the way IT teams actually work in 2026 — keyboard, trail, and an agent on the watch from day one.
I adapt to your operating model — pick the shape that matches your team.
Platform teams catch regressions early, route incidents to the right owner, and ship postmortems without leaving their stack.
SREs track MTTR, error budgets, and on-call load — and let AI surface lessons their team already learned last quarter.
Platform teams centralize incident response across every service tier, with audit trails their compliance team can read.
Compliance gets a real audit trail; on-call gets a calm console; engineering ships postmortems regulators will actually read.
Operations teams run incidents with isolation, retention, and audit log on every entity — without spinning up a separate compliance tool.
Detect outages early, coordinate response across merchandising and engineering, and protect peak-traffic windows.
No. Argus works inside the operations workflow: it classifies severity, surfaces similar past incidents, drafts postmortems, and presents every suggestion for operator approval before anything commits. Humans stay in the decision loop.
Heavy legacy ITSM suites, lightweight ticketing tools you have outgrown, and the home-grown incident workflows every team starts with. Argus brings ITIL-aligned process discipline without the weight.
Self-hosted deployment is available on Enterprise today; managed cloud is the default. Argus is built for teams that need control over data residency, identity, and operational history.
AI output is a suggestion, never a side effect. Severity classification, similar-incident matches, and postmortem drafts are presented to the operator for review. We will publish accuracy benchmarks alongside the public release.
No. Tenant data is never used to train models. Your operational data stays under your control, and AI-assisted suggestions are handled under zero-retention commitments where applicable.
No. Argus discovers assets, services, and relationships incrementally. Start with one incident; add structure as you go.
Yes. Argus uses Pydantic AI under the hood. Swap providers from the admin panel's service-config screen without code changes — OpenAI, Anthropic, self-hosted, or your enterprise gateway.
No fixed date. The waitlist is for early access and design partners. We will write when there is something to share, not before.
Per-seat, monthly, no surprise overages. Start free, upgrade when your team grows.
Custom
Self-hosted deployment, regulated industries, or a tailored contract? Let's talk.
Early access for IT teams, SREs, and incident responders who want an agent on the watch — not another tracker. No countdowns, no spam. I'll write when there's something to share.
Curious where I reach? Security posture, API & integrations, blog.