Why we built incident classification on pgvector

When a new incident lands, the first useful question is almost never "what severity is this?" — it is "have we seen this before?" The on-call who can answer that in ten seconds resolves twenty minutes faster than the one who cannot. We wanted Argus to answer it on every page load, for every incident, without anyone clicking a button.

The naive shape is obvious: take the title and description, embed them, search a vector index for the nearest neighbours, return the matches. The interesting question is where that index lives.

What we considered

Three candidates were on the table when we started:

A dedicated vector database — Pinecone, Weaviate, Qdrant, Milvus. Purpose-built, fast, well-supported.
An OpenSearch / Elasticsearch dense vector field — already in the stack if you have one, decent ANN performance.
pgvector inside the same Postgres that owns the incidents — newest of the three, exact and approximate indexes both supported.

We picked the third one and it was not close.

Why pgvector won

The cost of a separate vector store is not the dollars per month. It is the second source of truth.

Every incident already has a Postgres row with a tenant_id, soft-delete flag, severity, status, owner, audit trail. The moment the vector lives somewhere else, every one of those constraints has to be re-enforced on the other side. Tenant isolation alone is enough reason — a vector store that does not know about tenant_id will happily return another customer's incident as the nearest neighbour, and you will find out about it during a security review, not during development.

With pgvector we get four things for free:

Tenant scoping on the same query. WHERE tenant_id = $1 ORDER BY embedding <=> $2 LIMIT 10. The Prisma tenant-scope extension covers it automatically.
Joins to the rest of the row. The cosine search returns the same incident object the rest of the app reads — severity, status, owner_id, links, the lot — in one round trip.
Soft-delete and audit trail apply. Deleted incidents disappear from search the second they are deleted, because they disappear from every query.
Backups are one backup. RDS snapshot covers the embeddings too. Point-in-time restore covers them too.

The cost: ANN performance is slightly behind the dedicated stores at the largest scales. For a single tenant with under a million incidents — which is every customer we will see for years — ivfflat with lists = sqrt(rows) returns top-10 in under 15ms on a modest RDS instance. We measured it on the seeded data and it has not been the bottleneck once.

The shape of the column

ALTER TABLE incidents
  ADD COLUMN embedding vector(1536);

CREATE INDEX incidents_embedding_idx
  ON incidents
  USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

1536 dims because text-embedding-3-small is the cheap, fast model we default to. Cost is roughly $0.02 per million tokens — for the average incident title + description, well under a tenth of a cent each. We embed on write, never on read.

ivfflat is approximate, which is the right trade. We are surfacing candidates for a human to glance at, not running a regulator-grade similarity search. The recall hit at lists=100 is in the low single digits and we have not seen a complaint.

The wiring

Three pieces:

The Fastify backend owns the schema. The AI service does not run migrations — it reads and writes the same incidents table over a service token.
When an incident is created or its title/description changes, we enqueue an embedding job. The job calls the AI service's /embed, writes the vector back to the row.
When a user opens an incident, GET /api/v1/ai/similar-incidents/:id runs the cosine search inside the same Postgres connection pool the rest of the request used. Average end-to-end is around 60ms; the bulk of that is the network hop, not the index.

The audit log entry for "AI similar-incident search" is a row in the same ai_audit_logs table that every other AI call writes to. One place to look when a customer asks what happened.

Where this leaves us

The biggest design win is not the latency. It is that we never have to reason about two systems being out of sync. An incident either exists and has an embedding, or it does not. There is no third state. The most boring shape we could pick turned out to be the most reliable.

If you are building similar functionality and you already run Postgres in production, the answer is almost certainly the same. Reach for the dedicated vector store when you actually outgrow pgvector — for most teams, that day never comes.