Most developers reach for specialized vector databases like Pinecone or Milvus when building RAG applications. This is often an architectural mistake. By using pgvector, you keep your embeddings alongside your relational data, eliminating the need for complex synchronization pipelines and maintaining ACID compliance for your vector indices.

Why pgvector instead of a dedicated vector DB

The primary advantage of pgvector is the reduction of operational overhead. When you use a standalone vector database, you manage two separate systems: one for metadata (Postgres) and one for embeddings (e.g., Weaviate). This introduces the 'dual-write' problem, where a failure in one system leads to data inconsistency.

By using pgvector, you perform joins between relational filters and vector similarity searches in a single query. This allows for precise filtering—such as finding the most similar documents only for a specific user_id—without retrieving thousands of candidates from a vector store and filtering them in application code.

Installation and Setup

pgvector is an extension. If you are using a managed service like AWS RDS or Supabase, it is likely pre-installed. For a local installation on Linux, you can build it from source:

git clone https://github.com/pgvector/pgvector.git
cd pgvector
make
make install

Once installed, enable the extension in your database:

CREATE EXTENSION vector;

Defining the Vector Schema

The vector type requires a fixed dimension. If you are using text-embedding-3-small from OpenAI, the dimension is 1536. For all-MiniLM-L6-v2, it is 384.

CREATE TABLE documents (
    id uuid PRIMARY KEY,
    content text,
    embedding vector(1536)
);

Distance Metrics and Querying

pgvector supports three primary distance metrics. Choosing the wrong one will result in poor search quality:

L2 distance (Euclidean): Measured via the <-> operator. Best for raw coordinate distance.
Inner product: Measured via the <#> operator. Used for non-normalized vectors.
Cosine distance: Measured via the <=> operator. The industry standard for text embeddings, as it measures the angle between vectors regardless of magnitude.

To find the top 5 most similar documents to a query embedding:

SELECT content FROM documents 
ORDER BY embedding => '[0.12, 0.05, ...]' 
LIMIT 5;

Indexing for Production Performance

A linear scan (exact search) is too slow for datasets larger than a few thousand rows. pgvector provides two index types: IVFFlat and HNSW.

IVFFlat (Inverted File Flat)

IVFFlat divides vectors into clusters. It is faster to build but requires a lists parameter (usually rows / 1000) and needs periodic rebuilding as data grows.

HNSW (Hierarchical Navigable Small World)

HNSW is the gold standard for vector search. It creates a multi-layered graph that allows for logarithmic search time. It is slower to build and consumes more RAM than IVFFlat, but provides significantly higher recall and speed.

CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

Integrating with AI Agents

Implementing a vector store is only half the battle. The other half is the retrieval loop. Most developers use LangChain or LlamaIndex, but for those building sovereign agents, direct SQL control is preferable.

If you are building a custom agent to manage your Postgres migrations or query optimization, you need a tool that doesn't abstract away the shell. While tools like GitHub Copilot or Cursor provide excellent inline completions, they often lack the deep terminal integration needed for database administration.

This is where AZMX AI fits in. Unlike Electron-based wrappers, AZMX is a 7 MB native app that gives you a real PTY terminal alongside a CodeMirror 6 editor. When implementing pgvector, you can run your psql migrations in the integrated terminal and use an approval-gated agent to write the complex SQL queries for your HNSW indices, ensuring no destructive DROP TABLE commands are executed without your explicit consent. You can store your vector schema definitions and indexing strategies in an AZMX.md file for project-wide memory, allowing the agent to remember your specific dimension sizes and distance metrics across sessions.

Common Pitfalls

Dimension Mismatch: Trying to insert a 768-dim vector into a 1536-dim column will throw a hard error. Always validate embedding lengths in your application layer.
Memory Exhaustion: HNSW indices live in RAM. If your index exceeds available memory, Postgres will swap to disk, and performance will collapse. Monitor your shared_buffers.
Normalization: If you use inner product search, ensure your vectors are normalized to unit length; otherwise, longer vectors will dominate the results regardless of semantic similarity.

Conclusion

pgvector transforms PostgreSQL from a relational store into a hybrid search engine. By avoiding the complexity of a dedicated vector database and leveraging HNSW indices, you can build scalable RAG systems with minimal architectural friction. For developers managing these systems, using a lightweight, native environment like AZMX AI ensures that your AI assistance stays close to the metal and under your direct control.

Vector Search with pgvector