Should we use pgvector or a dedicated vector database?

pgvector for most RAG applications. Storing embeddings in your existing Postgres instance eliminates operational complexity, enables hybrid queries (filter by metadata + vector similarity in one query), and works with your existing backup and access control infrastructure. Dedicated vector databases (Pinecone, Qdrant, Weaviate) are appropriate for very high write throughput (millions of vectors/day), specialized ANN algorithms, or multi-tenancy at extreme scale. For document Q&A, product search, and similar applications, pgvector with HNSW indexing is production-proven and sufficient.

What managed Postgres options do you recommend?

Supabase for new projects wanting a full backend stack (auth, storage, real-time, edge functions, built-in pooling). Neon for serverless/branching workflows — database branching is uniquely useful for preview environments. AWS RDS or Aurora PostgreSQL for teams already in AWS with compliance requirements. Google Cloud SQL for GCP environments. All of these handle automated backups, failover, and patching. We typically avoid self-managed Postgres in production unless there's a specific compliance reason.

How do we handle Postgres migrations safely at scale?

Never run blocking DDL on production during peak traffic. Use CREATE INDEX CONCURRENTLY for new indexes (no table lock). Add columns as nullable first — adding NOT NULL with DEFAULT requires a full table rewrite in older Postgres versions (fixed in Postgres 11+, but verify your version). Test migration timing against a production-size data copy before scheduling. For large table modifications, pg_repack can rewrite tables without long-running locks. Use a migration framework (Flyway, Liquibase, Prisma Migrate) to version and audit changes.

Development Technology PostgreSQL

Databases

PostgreSQL

License

PostgreSQL (open)

Type

Relational + JSONB

Age

35+ years

AI Extension

pgvector

PostgreSQL is the production standard for relational data in modern web applications. After 35 years of development, it has grown far beyond a traditional RDBMS: native JSON/JSONB support for document-style storage, full-text search without an external engine, geospatial capabilities via PostGIS, and AI-ready vector search through pgvector. This combination means many applications that might reach for MongoDB, Elasticsearch, or a dedicated vector database can stay on a single Postgres instance without sacrificing capability.

Axevate uses PostgreSQL across AI pipelines (pgvector for RAG), eCommerce backends, and SaaS applications. Managed Postgres has also reached a level of maturity — via Supabase, Neon, AWS RDS, and Google Cloud SQL — where operational complexity is no longer a barrier. Our experience spans schema design for performance, migration strategies at scale, connection pooling under load, and pgvector index tuning for production RAG systems.

1Core Query Engine & Indexing

PostgreSQL's query planner is one of the most sophisticated in any database. It evaluates dozens of execution plans for each query, estimating costs based on table statistics, index availability, and join ordering. Understanding how the planner works — and how to guide it when it makes wrong choices — is essential for consistent production performance. Run EXPLAIN (ANALYZE, BUFFERS) on slow queries to see the actual execution plan, row estimates, and where time is spent.

Index selection is the highest-leverage performance decision. B-tree indexes (the default) cover equality, range, and sorting operations. GIN indexes are appropriate for JSONB, full-text search, and array containment — critical for JSON-heavy schemas. BRIN indexes are extremely compact and appropriate for monotonically-growing data like timestamps in append-only tables. Partial indexes (WHERE clause on CREATE INDEX) cover specific query patterns at a fraction of the storage and maintenance cost of full indexes.

Partial indexes and expression indexes are underused. A partial index on (status) WHERE status = 'pending' covers queue-style patterns with near-zero storage overhead. An expression index on LOWER(email) makes case-insensitive lookups instant. These pay large dividends in applications with predictable access patterns — which includes most production web applications.

2JSONB: Document Storage in Postgres

JSONB stores JSON as a binary decomposed representation that supports GIN indexing, containment queries (@>), key existence checks (?), and path-based access. This makes Postgres capable of many document-database workloads without a separate system. A GIN index on a JSONB column enables fast queries like WHERE metadata @> '{"status": "active"}' — equivalent to a MongoDB query against an indexed field.

The tradeoff is schema discipline. JSONB gives flexibility but loses the type safety, constraint enforcement, and query planner statistics of typed columns. Our pattern: use typed columns for fields you filter or sort on frequently (status, user_id, created_at), and JSONB for extensible metadata that varies by record and is rarely queried directly. This gives you predictable performance on hot paths and flexibility on variable-schema data.

JSONB functions are powerful but syntax-heavy. The ->> operator extracts a field as text; -> extracts as JSON; #>> and #> handle nested paths. jsonb_set() for updates and jsonb_agg() for aggregating results into JSON arrays cover most mutation patterns. For applications that heavily manipulate nested JSON structures, consider whether the data model should be normalized into relational tables instead.

3pgvector: AI & RAG Embeddings

pgvector is the PostgreSQL extension for storing and querying vector embeddings — enabling semantic search and retrieval-augmented generation (RAG) directly in your existing Postgres database. Install via CREATE EXTENSION vector; store embeddings as vector(1536) columns (for OpenAI text-embedding-3-small); query with <-> (L2 distance), <=> (cosine distance), or <#> (inner product). For most RAG applications, cosine distance on normalized embeddings gives the best semantic similarity results.

pgvector supports two index types: IVFFlat and HNSW. IVFFlat divides the vector space into lists and searches only a subset — lower memory, slower build, good for large datasets. HNSW (Hierarchical Navigable Small World) builds a graph structure — higher memory usage, faster queries, better recall. For production RAG systems where query latency matters, HNSW is generally the right choice. Set hnsw.ef_search higher (200+) at query time for better recall at modest throughput cost.

pgvector's major advantage is consolidation: your embeddings, metadata, and relational data live in one place. You can filter by metadata before or after vector search (hybrid queries), join to other tables, and apply your existing backup, replication, and access control infrastructure. The tradeoff vs. dedicated vector databases (Pinecone, Qdrant, Weaviate) is that pgvector's indexing performance under very high write throughput or at 100M+ vector scale is less proven. For most production RAG applications — documents, support articles, product catalogs — pgvector is the right call.

4Connection Pooling & PgBouncer

PostgreSQL creates a process per connection, which means each connection has memory overhead (~5-10MB) and OS process cost. A Next.js application deployed on Vercel with 100 concurrent serverless function instances will attempt 100 simultaneous database connections — well above the default 100-connection Postgres limit and causing connection pool exhaustion under load.

PgBouncer is the standard connection pooler for PostgreSQL. It sits between your application and Postgres, maintaining a small pool of long-lived Postgres connections and queuing application requests. In transaction pooling mode (the highest-throughput setting), a connection is held only for the duration of a transaction — allowing thousands of application connections to share dozens of Postgres connections. The caveat: transaction pooling is incompatible with prepared statements and advisory locks, so verify your ORM or query library is compatible.

Managed Postgres services handle pooling differently. Supabase includes a PgBouncer-based pooler on port 6543 (vs. direct connections on 5432). Neon uses a serverless proxy that handles connection pooling transparently. AWS RDS Proxy provides PgBouncer-equivalent pooling for RDS instances. For serverless deployments (Vercel, AWS Lambda), always route through a pooler — direct connections to Postgres from serverless functions will exhaust your connection limit under real load.

5Migrations & Schema Management

Schema migrations are the most operationally dangerous part of PostgreSQL at scale. Certain DDL operations acquire table-level locks that block reads and writes: adding a column with a DEFAULT, dropping a column, adding a non-partial non-concurrent index. In a high-traffic application, these can cause visible downtime.

Patterns for zero-downtime migrations: add columns as nullable first (no lock), backfill in batches, then add constraints; use CREATE INDEX CONCURRENTLY to build indexes without locking; use RENAME COLUMN only during a maintenance window or with a multi-step deploy (add new column → dual-write → backfill → switch reads → drop old). Tools like Flyway, Liquibase, and Sqitch provide migration versioning and audit trails. Prisma Migrate and Drizzle Kit generate and apply migrations from schema definitions.

Migration testing in production-scale copies is undervalued. Many teams test migrations against a small development database where they complete in milliseconds, then discover they run for 2 hours against 500M rows in production. Run migration time estimates against a production-size copy before scheduling a deploy. For very large tables, pg_repack can rebuild a table with new constraints without the long-running lock.

How We Use It in Practice

Real architectural problems across industries — and how we approach them.

SaaS / Multi-Tenant Platform

Row-Level Security + pgvector: Tenant-Isolated RAG Without a Separate Vector DB

A B2B SaaS platform serving 200+ enterprise tenants needed each tenant's document knowledge base to be semantically searchable — and completely invisible to other tenants. The initial design called for a dedicated Pinecone index per tenant, which was cost-prohibitive at $70/month per tenant and operationally unmanageable. A shared index with application-layer filtering was architecturally risky.

Our approach

Single PostgreSQL database with pgvector and PostgreSQL Row-Level Security. Each embedding row carries a tenant_id column; an RLS policy restricts all SELECT operations to the authenticated tenant's ID, enforced at the database level regardless of application query. HNSW index on the embedding column with m=16, ef_construction=64. At query time, the connection is opened with SET app.current_tenant = '{id}' and the RLS policy activates automatically — no application-layer WHERE clause required, no risk of a missing filter leaking cross-tenant data. Index recall tested monthly against a held-out ground truth set; no tenant data leakage has been detected across 18 months of production operation.

eCommerce / High-Traffic Retail

Serverless Connection Exhaustion: Next.js on Vercel + Postgres + PgBouncer

A fashion retailer running Next.js on Vercel launched a flash sale that drove 2,000 concurrent users. Within minutes, their Postgres database began refusing connections — max_connections = 100 was exhausted by Vercel's serverless function instances each holding their own direct connections. Orders were failing; the error was connection refused, not application logic.

Our approach

PgBouncer in transaction pooling mode deployed on a small EC2 instance in the same VPC as RDS. All Next.js API routes and Server Components connect through the pooler endpoint. Prisma configured with connection_limit=1 per serverless instance (the correct setting for PgBouncer transaction pooling, which manages the pool server-side). Server-side Postgres max_connections reduced to 50 — smaller pool, but fully utilized. At the next flash sale, the application served 4,000 concurrent users with 47 active Postgres connections and zero connection errors. Neon's serverless proxy was evaluated as an alternative — simpler to operate, recommended for greenfield projects.

Legal Tech / Document Intelligence

Zero-Downtime Schema Migration on a 300M-Row Contract Table

A legal document platform needed to add a full-text search tsvector column and a GIN index to a contracts table with 300 million rows and continuous 24/7 writes. A naive ALTER TABLE would lock the table for hours. Running CREATE INDEX without CONCURRENTLY would block all reads. The team had no maintenance window — the platform operates globally across time zones.

Our approach

Multi-phase migration without a maintenance window: first added the tsvector column as nullable with no default (zero lock, instant); then populated it in batches of 50,000 rows using a background job that respected rate limits and paused during peak traffic windows; added a trigger to keep tsvector current on new inserts and updates during the backfill period; ran CREATE INDEX CONCURRENTLY to build the GIN index without blocking reads (took 4.5 hours on the live table); finally added NOT NULL constraint after confirming 100% population. Total process took 6 days of staged work. Zero application downtime, zero failed queries attributed to the migration.

FAQ

PostgreSQL for most web applications. Its JSONB support covers the flexibility use cases that made MongoDB attractive, while adding ACID transactions, JOINs, constraints, and a mature ecosystem. Choose MongoDB when you need horizontal write scaling across shards from day one, or when your data model is genuinely document-oriented with no relational patterns. For AI applications requiring vector search, pgvector makes Postgres a strong choice that avoids adding a separate vector database.

Ready to build with PostgreSQL?

Talk to Us