Vector Search Tutorial¶
Version: 1.0 Created: 2025-12-01 Target Audience: Developers building AI/ML applications Prerequisites: Basic SQL knowledge, understanding of embeddings Estimated Time: 45-60 minutes
Table of Contents¶
- Introduction
- Vector Basics
- Creating Vector Tables
- Generating Embeddings
- HNSW Indexing
- Similarity Search
- Product Quantization
- Real-World Examples
- Performance Tuning
- Monitoring
1. Introduction¶
What is Vector Search?¶
Vector search (also known as semantic search or similarity search) enables AI applications to find similar items based on meaning rather than exact keyword matches. Instead of searching for exact text, vector search compares mathematical representations (embeddings) of content to find semantically similar items.
Traditional Keyword Search vs Vector Search:
Query: "automobile repair"
Keyword Search Results:
✓ "automobile repair shop"
✗ "car maintenance service" (missed - different words)
✗ "vehicle fixing center" (missed - different words)
Vector Search Results:
✓ "automobile repair shop" (exact match)
✓ "car maintenance service" (semantically similar)
✓ "vehicle fixing center" (semantically similar)
Use Cases¶
| Use Case | Description | Typical Dimensions |
|---|---|---|
| Semantic Search | Find documents by meaning, not keywords | 384-768 |
| Recommendations | Suggest similar products, articles, or content | 128-512 |
| RAG (Retrieval Augmented Generation) | Retrieve relevant context for LLM prompts | 384-1536 |
| Image Similarity | Find visually similar images | 512-2048 |
| Anomaly Detection | Identify outliers in time-series or sensor data | 64-256 |
| Duplicate Detection | Find near-duplicate content | 256-768 |
How HeliosDB-Lite Implements It¶
HeliosDB-Lite provides native vector search capabilities with:
┌─────────────────────────────────────────────────────────────┐
│ HeliosDB-Lite Vector Search │
├─────────────────────────────────────────────────────────────┤
│ SQL Interface: VECTOR type, distance operators, indexes │
├─────────────────────────────────────────────────────────────┤
│ HNSW Algorithm: Graph-based approximate nearest neighbors │
├─────────────────────────────────────────────────────────────┤
│ SIMD Acceleration: AVX2/NEON for 2-6x faster distances │
├─────────────────────────────────────────────────────────────┤
│ Product Quantization: 384x memory compression │
├─────────────────────────────────────────────────────────────┤
│ RocksDB Storage: Persistent, crash-safe index storage │
└─────────────────────────────────────────────────────────────┘
Key Benefits: - Embedded (no separate server required) - Sub-millisecond query latency for millions of vectors - SIMD-accelerated distance calculations - Memory-efficient with Product Quantization - Compatible with pgvector syntax
2. Vector Basics¶
Understanding Embeddings¶
Embeddings are mathematical representations of content (text, images, audio) as fixed-length arrays of floating-point numbers. Each dimension captures semantic information.
Text: "The cat sat on the mat"
↓ (embedding model)
Vector: [0.12, -0.45, 0.78, 0.34, -0.12, ...] (384 dimensions)
Text: "A feline rested on the rug"
↓ (embedding model)
Vector: [0.15, -0.42, 0.81, 0.36, -0.09, ...] (similar values!)
Properties of Good Embeddings: - Similar content → similar vectors (small distance) - Different content → different vectors (large distance) - Fixed dimensionality (all vectors same length) - Typically normalized (unit length) for cosine similarity
Dimension Considerations¶
Choosing the right dimension depends on your use case and embedding model:
| Model | Dimension | Use Case | Speed | Accuracy |
|---|---|---|---|---|
all-MiniLM-L6-v2 |
384 | Fast semantic search | Fastest | Good |
all-mpnet-base-v2 |
768 | Balanced performance | Fast | Better |
text-embedding-ada-002 (OpenAI) |
1536 | High accuracy | Medium | Best |
ResNet-50 (images) |
2048 | Image features | Medium | Excellent |
Rules of Thumb: - Higher dimensions: Better accuracy, more memory, slower queries - Lower dimensions: Faster queries, less memory, may lose nuance - Common choices: 384 (fast), 768 (balanced), 1536 (accurate)
Distance Metrics¶
HeliosDB-Lite supports three distance metrics to measure vector similarity:
L2 Distance (Euclidean)¶
Measures straight-line distance in N-dimensional space.
Formula: distance = √(Σ(a[i] - b[i])²)
Example:
Vector A: [1.0, 0.0, 0.0]
Vector B: [0.0, 1.0, 0.0]
L2 Distance: √((1-0)² + (0-1)² + (0-0)²) = √2 ≈ 1.414
Use when: General-purpose similarity, raw feature vectors
Operator: <->
Range: [0, ∞) (0 = identical, larger = more different)
Cosine Similarity (Cosine Distance)¶
Measures angle between vectors (ignores magnitude).
Formula: distance = 1 - (A·B / (||A|| × ||B||))
Example:
Vector A: [1.0, 0.0, 0.0] (normalized)
Vector B: [0.707, 0.707, 0.0] (normalized)
Cosine Distance: 1 - 0.707 ≈ 0.293
Use when: Text embeddings, normalized vectors
Operator: <=>
Range: [0, 2] (0 = identical, 2 = opposite direction)
Why Cosine for Text? - Text embeddings are typically normalized (unit length) - Cosine ignores magnitude, focuses on semantic direction - Most NLP models optimize for cosine similarity
Inner Product (Dot Product)¶
Measures alignment between vectors (considers both angle and magnitude).
Formula: distance = -(A·B) = -Σ(a[i] × b[i])
(negative because smaller = better in HeliosDB-Lite)
Example:
Vector A: [1.0, 2.0, 3.0]
Vector B: [4.0, 5.0, 6.0]
Inner Product Distance: -(4 + 10 + 18) = -32
Use when: Recommendation systems, non-normalized vectors
Operator: <#>
Range: (-∞, 0] (more negative = more similar)
Comparison Table:
| Metric | When to Use | Normalized? | SQL Operator |
|---|---|---|---|
| L2 | General similarity, image features | No | <-> |
| Cosine | Text embeddings, semantic search | Yes (recommended) | <=> |
| Inner Product | Recommendations, scoring | No | <#> |
3. Creating Vector Tables¶
VECTOR Data Type Syntax¶
The VECTOR(n) data type stores fixed-dimension floating-point arrays.
-- Basic syntax
VECTOR(dimension)
-- Examples
VECTOR(3) -- 3-dimensional vector
VECTOR(384) -- Common for MiniLM embeddings
VECTOR(768) -- Common for BERT embeddings
VECTOR(1536) -- OpenAI ada-002 embeddings
Example Table Creation¶
Example 1: Simple Document Embeddings¶
-- Semantic search for documents
CREATE TABLE documents (
id INT PRIMARY KEY,
title TEXT NOT NULL,
content TEXT NOT NULL,
embedding VECTOR(384) -- all-MiniLM-L6-v2 embeddings
);
Example 2: Multi-Column Vectors¶
-- Product catalog with multiple embedding types
CREATE TABLE products (
product_id INT PRIMARY KEY,
name TEXT NOT NULL,
description TEXT,
price DECIMAL(10, 2),
category TEXT,
-- Text embedding for name + description
text_embedding VECTOR(768),
-- Image embedding for product photos
image_embedding VECTOR(512),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Example 3: User Preferences for Recommendations¶
-- User preference vectors for collaborative filtering
CREATE TABLE user_profiles (
user_id INT PRIMARY KEY,
username TEXT NOT NULL,
preference_vector VECTOR(128),
last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Item vectors for recommendation matching
CREATE TABLE items (
item_id INT PRIMARY KEY,
title TEXT NOT NULL,
item_vector VECTOR(128),
popularity_score FLOAT DEFAULT 0.0
);
Example 4: Time-Series Anomaly Detection¶
-- Sensor readings with feature vectors
CREATE TABLE sensor_readings (
reading_id INT PRIMARY KEY,
sensor_id TEXT NOT NULL,
timestamp TIMESTAMP NOT NULL,
raw_values TEXT, -- JSON or serialized data
feature_vector VECTOR(64), -- Extracted features
is_anomaly BOOLEAN DEFAULT FALSE
);
Example 5: Image Similarity Search¶
-- Image database with visual features
CREATE TABLE images (
image_id INT PRIMARY KEY,
filename TEXT NOT NULL,
upload_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
tags TEXT, -- Comma-separated tags
-- ResNet-50 feature vector
features VECTOR(2048),
-- Metadata
width INT,
height INT,
file_size INT
);
Table Design Best Practices¶
-
Choose dimension based on your embedding model
-
Add metadata columns for filtering
-
Use appropriate primary keys
-
Consider NULL handling
4. Generating Embeddings¶
Using OpenAI API (Python)¶
The OpenAI API provides high-quality text embeddings via text-embedding-ada-002 (1536 dimensions).
Installation¶
Basic Usage¶
import openai
import json
# Configure API key
openai.api_key = "your-api-key-here"
def get_embedding(text, model="text-embedding-ada-002"):
"""Generate embedding for a single text string."""
response = openai.Embedding.create(
input=text,
model=model
)
return response['data'][0]['embedding']
# Example
text = "The quick brown fox jumps over the lazy dog"
embedding = get_embedding(text)
print(f"Dimension: {len(embedding)}") # 1536
print(f"First 5 values: {embedding[:5]}")
# Output: [0.0234, -0.0156, 0.0789, -0.0412, 0.0567]
Batch Processing¶
def get_embeddings_batch(texts, model="text-embedding-ada-002"):
"""Generate embeddings for multiple texts (up to 2048 per request)."""
response = openai.Embedding.create(
input=texts,
model=model
)
return [item['embedding'] for item in response['data']]
# Process multiple documents
documents = [
"First document text",
"Second document text",
"Third document text"
]
embeddings = get_embeddings_batch(documents)
print(f"Generated {len(embeddings)} embeddings")
Storing in HeliosDB-Lite¶
import heliosdb # Hypothetical Python client
# Connect to database
db = heliosdb.connect("mydb.db")
# Generate embedding
text = "Sample document content"
embedding = get_embedding(text)
# Convert to SQL array format
embedding_str = "[" + ",".join(map(str, embedding)) + "]"
# Insert into database
db.execute("""
INSERT INTO documents (id, content, embedding)
VALUES (?, ?, ?)
""", (1, text, embedding_str))
db.commit()
Using sentence-transformers (Python)¶
The sentence-transformers library provides efficient local embedding generation (no API calls).
Installation¶
Basic Usage¶
from sentence_transformers import SentenceTransformer
# Load model (downloads on first use, cached afterward)
model = SentenceTransformer('all-MiniLM-L6-v2')
# Generate single embedding
text = "This is a sample sentence"
embedding = model.encode(text)
print(f"Dimension: {len(embedding)}") # 384
print(f"Type: {type(embedding)}") # numpy.ndarray
print(f"First 5 values: {embedding[:5]}")
# Output: [0.0234, -0.0156, 0.0789, -0.0412, 0.0567]
Batch Processing (Efficient)¶
# Process multiple texts in parallel
texts = [
"First sentence",
"Second sentence",
"Third sentence"
]
# Batch encoding (much faster than loop)
embeddings = model.encode(texts, batch_size=32)
print(f"Shape: {embeddings.shape}") # (3, 384)
# Iterate over results
for i, emb in enumerate(embeddings):
print(f"Text {i}: dimension={len(emb)}")
Complete Example: Indexing Documents¶
from sentence_transformers import SentenceTransformer
import sqlite3 # or heliosdb client
# Initialize model
model = SentenceTransformer('all-MiniLM-L6-v2')
# Sample documents
documents = [
{"id": 1, "text": "Python is a programming language"},
{"id": 2, "text": "Machine learning uses algorithms"},
{"id": 3, "text": "Databases store structured data"},
{"id": 4, "text": "Neural networks process information"},
]
# Generate embeddings
texts = [doc["text"] for doc in documents]
embeddings = model.encode(texts, show_progress_bar=True)
# Connect to HeliosDB-Lite
conn = sqlite3.connect("mydb.db") # Use HeliosDB-Lite client in production
cursor = conn.cursor()
# Create table
cursor.execute("""
CREATE TABLE IF NOT EXISTS documents (
id INT PRIMARY KEY,
text TEXT,
embedding VECTOR(384)
)
""")
# Insert documents with embeddings
for doc, emb in zip(documents, embeddings):
emb_str = "[" + ",".join(map(str, emb)) + "]"
cursor.execute("""
INSERT INTO documents (id, text, embedding)
VALUES (?, ?, ?)
""", (doc["id"], doc["text"], emb_str))
conn.commit()
print(f"Indexed {len(documents)} documents")
Popular Models Comparison¶
| Model | Dimensions | Speed | Quality | Use Case |
|---|---|---|---|---|
all-MiniLM-L6-v2 |
384 | Fastest | Good | General semantic search |
all-mpnet-base-v2 |
768 | Fast | Better | Balanced quality/speed |
all-MiniLM-L12-v2 |
384 | Fast | Good | Slightly better than L6 |
multi-qa-mpnet-base-dot-v1 |
768 | Medium | Best | Q&A and retrieval |
paraphrase-multilingual-mpnet-base-v2 |
768 | Medium | Best | Multilingual support |
Custom Model Fine-Tuning¶
from sentence_transformers import SentenceTransformer, InputExample, losses
from torch.utils.data import DataLoader
# Load base model
model = SentenceTransformer('all-MiniLM-L6-v2')
# Create training data (text pairs with similarity scores)
train_examples = [
InputExample(texts=["query 1", "relevant doc 1"], label=1.0),
InputExample(texts=["query 1", "irrelevant doc"], label=0.0),
# Add more examples...
]
# Create data loader
train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16)
# Define loss function
train_loss = losses.CosineSimilarityLoss(model)
# Fine-tune
model.fit(
train_objectives=[(train_dataloader, train_loss)],
epochs=1,
warmup_steps=100
)
# Save fine-tuned model
model.save("./my-finetuned-model")
# Use fine-tuned model
custom_model = SentenceTransformer("./my-finetuned-model")
Using Other Embedding Providers¶
Cohere API¶
import cohere
co = cohere.Client("your-api-key")
response = co.embed(
texts=["Text to embed"],
model="embed-english-v3.0"
)
embeddings = response.embeddings # List of vectors
Hugging Face Transformers¶
from transformers import AutoTokenizer, AutoModel
import torch
# Load model
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")
# Generate embedding
text = "Sample text"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
outputs = model(**inputs)
# Use [CLS] token embedding or mean pooling
embedding = outputs.last_hidden_state[:, 0, :].numpy()[0]
print(f"Dimension: {len(embedding)}") # 768 for BERT-base
Best Practices for Embedding Generation¶
-
Batch processing for efficiency
-
Normalize vectors for cosine similarity
-
Cache embeddings to avoid recomputation
-
Handle long texts appropriately
# Truncate or chunk long documents max_length = 512 # Model's max tokens # Option 1: Truncate text = text[:max_length] # Option 2: Chunk and average chunks = [text[i:i+max_length] for i in range(0, len(text), max_length)] chunk_embeddings = model.encode(chunks) avg_embedding = np.mean(chunk_embeddings, axis=0)
5. HNSW Indexing¶
What is HNSW?¶
HNSW (Hierarchical Navigable Small World) is a graph-based algorithm for approximate nearest neighbor search. It constructs a multi-layer graph where each layer contains progressively fewer nodes, enabling fast logarithmic-time searches.
How HNSW Works¶
Layer 2: A ←─────→ D (Entry point, sparse)
↓ ↓
Layer 1: A ←→ B ←→ D ←→ E (Medium density)
↓ ↓ ↓ ↓
Layer 0: A ←→ B ←→ C ←→ D ←→ E ←→ F (All nodes, fully connected)
Search Process:
1. Start at entry point in top layer (Layer 2: node A)
2. Greedily navigate to closest node in current layer (A → D)
3. Drop down to next layer, continue greedy search (D → D → E)
4. Repeat until bottom layer (Layer 0)
5. Perform local search in bottom layer for k neighbors
Key Properties: - Approximate: Finds ~95-99% of true nearest neighbors (configurable) - Fast: O(log N) query time vs O(N) for brute force - Scalable: Handles millions to billions of vectors - Memory-efficient: ~50-100 bytes per vector
Creating HNSW Indexes¶
Basic Index Creation¶
-- Create HNSW index on vector column
CREATE INDEX embedding_idx ON documents USING hnsw (embedding);
With Custom Table¶
-- Full example
CREATE TABLE articles (
id INT PRIMARY KEY,
title TEXT,
content TEXT,
embedding VECTOR(768)
);
-- Create HNSW index
CREATE INDEX articles_embedding_idx ON articles USING hnsw (embedding);
Multiple Indexes¶
-- Table with multiple vector columns
CREATE TABLE products (
product_id INT PRIMARY KEY,
text_embedding VECTOR(768),
image_embedding VECTOR(512)
);
-- Create separate indexes for each vector column
CREATE INDEX products_text_idx ON products USING hnsw (text_embedding);
CREATE INDEX products_image_idx ON products USING hnsw (image_embedding);
Index Parameters¶
HNSW indexes have two critical parameters that control the accuracy/speed tradeoff:
Parameter: m (Max Connections)¶
Number of bidirectional links per node in the graph.
-- Default: m = 16
CREATE INDEX idx ON docs USING hnsw (embedding);
-- Custom m parameter (if supported in future versions)
-- CREATE INDEX idx ON docs USING hnsw (embedding) WITH (m = 32);
Rules of Thumb: - m = 16: Default, good balance (recommended) - m = 8: Faster build, less memory, lower accuracy - m = 32: Slower build, more memory, higher accuracy - m = 64: Very accurate but memory-intensive
Memory Impact: - Memory per vector ≈ m × 8 bytes (for pointers/IDs) - m=16: ~128 bytes per vector - m=32: ~256 bytes per vector
Parameter: ef_construction (Construction Search Depth)¶
Size of dynamic candidate list during index construction.
-- Default: ef_construction = 200
CREATE INDEX idx ON docs USING hnsw (embedding);
-- Custom ef_construction (if supported in future versions)
-- CREATE INDEX idx ON docs USING hnsw (embedding) WITH (ef_construction = 400);
Rules of Thumb: - ef_construction = 100: Fast build, lower quality graph - ef_construction = 200: Default, good balance (recommended) - ef_construction = 400: Slower build, higher quality graph - ef_construction = 800: Very slow but maximum quality
Build Time Impact: - Higher ef_construction = slower index builds (linear increase) - But better recall at query time
Recommendations by Dataset Size¶
| Dataset Size | m | ef_construction | Expected Recall@10 |
|---|---|---|---|
| <10K vectors | 16 | 200 | >99% |
| 10K-100K | 16 | 200 | >98% |
| 100K-1M | 16 | 200 | >95% |
| 1M-10M | 32 | 400 | >95% |
| >10M | 32-64 | 400-800 | >93% |
When to Create Indexes¶
Strategy 1: Index Before Insertion (Empty Table)
-- Create index first, then insert data
CREATE TABLE docs (id INT, vec VECTOR(384));
CREATE INDEX docs_idx ON docs USING hnsw (vec);
-- Insert data (index updates automatically)
INSERT INTO docs VALUES (1, '[0.1, 0.2, ...]');
INSERT INTO docs VALUES (2, '[0.3, 0.4, ...]');
Pros: Incremental index updates Cons: Slower individual inserts
Strategy 2: Bulk Load Then Index
-- Insert data first
CREATE TABLE docs (id INT, vec VECTOR(384));
-- Bulk insert (fast, no index overhead)
INSERT INTO docs VALUES (1, '[0.1, 0.2, ...]');
INSERT INTO docs VALUES (2, '[0.3, 0.4, ...]');
-- ... (thousands or millions of rows)
-- Create index after all data loaded
CREATE INDEX docs_idx ON docs USING hnsw (vec);
Pros: Faster bulk loading Cons: Index build at end can be slow for large datasets
Recommendation: Use Strategy 2 for initial bulk loads, Strategy 1 for incremental updates.
Index Build Time Estimates¶
| Dataset Size | Dimension | Expected Build Time |
|---|---|---|
| 1,000 vectors | 384 | <1 second |
| 10,000 vectors | 384 | 1-3 seconds |
| 100,000 vectors | 384 | 10-30 seconds |
| 1,000,000 vectors | 384 | 2-5 minutes |
| 10,000,000 vectors | 768 | 30-60 minutes |
Times on commodity hardware (4-core CPU, 16GB RAM)
Verifying Index Creation¶
-- Check indexes on a table
\d documents
-- Expected output:
-- Indexes:
-- "documents_embedding_idx" hnsw (embedding)
6. Similarity Search¶
KNN Queries with <-> Operator (L2 Distance)¶
The <-> operator performs L2 (Euclidean) distance calculation.
Basic KNN Query¶
-- Find 5 nearest neighbors using L2 distance
SELECT id, title, embedding <-> '[0.1, 0.2, 0.3, ...]' AS distance
FROM documents
ORDER BY embedding <-> '[0.1, 0.2, 0.3, ...]'
LIMIT 5;
Output:
id | title | distance
----|---------------------|----------
42 | "Machine Learning" | 0.234
17 | "Neural Networks" | 0.456
89 | "Deep Learning" | 0.678
23 | "AI Algorithms" | 0.789
56 | "Data Science" | 0.901
With Parameterized Queries¶
# Python example with parameter binding
query_vector = model.encode("machine learning tutorial")
query_str = "[" + ",".join(map(str, query_vector)) + "]"
results = db.execute("""
SELECT id, title, embedding <-> ? AS distance
FROM documents
ORDER BY embedding <-> ?
LIMIT 10
""", (query_str, query_str))
Practical Example: Document Search¶
-- Setup
CREATE TABLE articles (
id INT PRIMARY KEY,
title TEXT,
author TEXT,
content TEXT,
embedding VECTOR(384)
);
CREATE INDEX articles_idx ON articles USING hnsw (embedding);
-- Insert sample data (embeddings pre-generated)
INSERT INTO articles VALUES
(1, 'Intro to Python', 'Alice', '...', '[0.12, -0.34, ...]'),
(2, 'Advanced ML', 'Bob', '...', '[0.45, 0.67, ...]'),
(3, 'Database Design', 'Carol', '...', '[-0.23, 0.56, ...]');
-- Search: Find articles similar to "python programming"
-- (query_embedding generated from "python programming")
SELECT
id,
title,
author,
embedding <-> '[0.15, -0.30, ...]' AS distance
FROM articles
ORDER BY embedding <-> '[0.15, -0.30, ...]'
LIMIT 5;
Cosine Similarity with <=> Operator¶
The <=> operator calculates cosine distance (1 - cosine similarity).
Basic Cosine Search¶
-- Find 10 most similar documents using cosine distance
SELECT id, content, embedding <=> '[0.1, 0.2, ...]' AS cosine_distance
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 10;
Why Cosine for Text? - Ignores vector magnitude (length), focuses on direction - Most text embedding models produce normalized vectors - Better semantic similarity for text than L2 distance
Converting Distance to Similarity Score¶
-- Cosine similarity = 1 - cosine distance
SELECT
id,
title,
1 - (embedding <=> '[0.1, 0.2, ...]') AS similarity_score
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 10;
Output:
id | title | similarity_score
----|---------------------|------------------
42 | "Machine Learning" | 0.987 (very similar)
17 | "Neural Networks" | 0.923
89 | "Deep Learning" | 0.876
23 | "AI Algorithms" | 0.834
56 | "Data Science" | 0.801
Practical Example: Semantic Q&A¶
-- FAQ database with questions and answers
CREATE TABLE faqs (
faq_id INT PRIMARY KEY,
question TEXT,
answer TEXT,
question_embedding VECTOR(384)
);
CREATE INDEX faqs_idx ON faqs USING hnsw (question_embedding);
-- Find most relevant FAQ for user query
-- Query: "How do I reset my password?"
-- (query_embedding pre-generated)
SELECT
question,
answer,
(1 - (question_embedding <=> '[0.23, -0.45, ...]')) AS relevance
FROM faqs
ORDER BY question_embedding <=> '[0.23, -0.45, ...]'
LIMIT 3;
Output:
question | answer | relevance
------------------------------|----------------------------|----------
"Password reset instructions" | "Go to Settings > Reset" | 0.95
"Forgot password recovery" | "Click 'Forgot Password'" | 0.89
"Change account credentials" | "Visit Account Settings" | 0.76
Inner Product with <#> Operator¶
The <#> operator calculates negative inner product (negative because smaller = better).
Basic Inner Product Search¶
-- Find top matches using inner product
SELECT id, name, embedding <#> '[1.0, 0.5, 0.3, ...]' AS score
FROM items
ORDER BY embedding <#> '[1.0, 0.5, 0.3, ...]'
LIMIT 10;
When to Use Inner Product: - Recommendation systems (user-item matching) - Non-normalized vectors - When magnitude matters (popularity, confidence scores)
Practical Example: Product Recommendations¶
-- User preference vectors
CREATE TABLE users (
user_id INT PRIMARY KEY,
username TEXT,
preference_vector VECTOR(128)
);
-- Product feature vectors
CREATE TABLE products (
product_id INT PRIMARY KEY,
name TEXT,
price DECIMAL(10, 2),
product_vector VECTOR(128)
);
CREATE INDEX products_idx ON products USING hnsw (product_vector);
-- Recommend products for user 42
-- (combining user preference with product features)
SELECT
p.product_id,
p.name,
p.price,
p.product_vector <#> u.preference_vector AS match_score
FROM products p, users u
WHERE u.user_id = 42
ORDER BY p.product_vector <#> u.preference_vector
LIMIT 20;
Combining with WHERE Filters¶
Combine vector similarity with traditional SQL filters for powerful hybrid search.
Filter Before Vector Search¶
-- Find similar documents in specific category
SELECT id, title, embedding <=> '[0.1, ...]' AS distance
FROM documents
WHERE category = 'technology' -- Pre-filter
AND publish_date > '2024-01-01'
ORDER BY embedding <=> '[0.1, ...]'
LIMIT 10;
Multi-Condition Filtering¶
-- Complex filtering + vector search
SELECT
id,
title,
author,
price,
embedding <-> '[0.2, ...]' AS distance
FROM books
WHERE category IN ('science', 'technology')
AND price < 50.00
AND rating >= 4.0
AND in_stock = TRUE
ORDER BY embedding <-> '[0.2, ...]'
LIMIT 5;
Distance Threshold Filtering¶
-- Only return results within distance threshold
SELECT id, title, embedding <=> '[0.1, ...]' AS distance
FROM documents
WHERE (embedding <=> '[0.1, ...]') < 0.5 -- Similarity threshold
ORDER BY embedding <=> '[0.1, ...]'
LIMIT 100;
Practical Example: Location + Semantic Search¶
CREATE TABLE restaurants (
restaurant_id INT PRIMARY KEY,
name TEXT,
cuisine TEXT,
latitude FLOAT,
longitude FLOAT,
description_embedding VECTOR(384)
);
-- Find restaurants matching "cozy italian place"
-- within 5 miles of user location (40.7128, -74.0060)
SELECT
name,
cuisine,
description_embedding <=> '[0.12, -0.34, ...]' AS relevance,
SQRT(POW(latitude - 40.7128, 2) + POW(longitude - (-74.0060), 2)) * 69 AS miles
FROM restaurants
WHERE cuisine = 'Italian'
AND (SQRT(POW(latitude - 40.7128, 2) + POW(longitude - (-74.0060), 2)) * 69) < 5
ORDER BY description_embedding <=> '[0.12, -0.34, ...]'
LIMIT 10;
Advanced Search Patterns¶
Hybrid Search: Keyword + Vector¶
-- Combine full-text search with vector similarity
SELECT
id,
title,
embedding <=> '[0.1, ...]' AS vector_score,
CASE
WHEN title LIKE '%machine learning%' THEN 0.5
ELSE 1.0
END AS keyword_penalty
FROM documents
WHERE content LIKE '%neural network%' -- Keyword filter
ORDER BY (embedding <=> '[0.1, ...]') * keyword_penalty
LIMIT 10;
Multi-Vector Search¶
-- Search across multiple vector types
SELECT
product_id,
name,
text_embedding <=> '[0.1, ...]' AS text_score,
image_embedding <-> '[0.2, ...]' AS image_score,
(text_embedding <=> '[0.1, ...]') +
(image_embedding <-> '[0.2, ...]') AS combined_score
FROM products
ORDER BY combined_score
LIMIT 10;
Weighted Multi-Vector Search¶
-- Weight different embedding types
SELECT
product_id,
name,
(0.7 * (text_embedding <=> '[0.1, ...]')) +
(0.3 * (image_embedding <-> '[0.2, ...]')) AS weighted_score
FROM products
ORDER BY weighted_score
LIMIT 10;
7. Product Quantization¶
What is Product Quantization?¶
Product Quantization (PQ) is a compression technique that reduces vector storage by 8-384x with minimal accuracy loss (<5% typically).
How PQ Works¶
Original Vector (768 dimensions):
[0.123, -0.456, 0.789, ..., 0.234] → 768 × 4 bytes = 3,072 bytes
Product Quantization:
1. Split into M=8 sub-vectors (96 dimensions each)
2. Map each sub-vector to nearest centroid (1 byte code)
3. Store 8 codes instead of 768 floats
Quantized: [42, 17, 89, 123, 5, 67, 201, 34] → 8 bytes
Compression: 3,072 bytes → 8 bytes = 384x reduction
Key Concepts: - M (sub-quantizers): Number of chunks (typical: 8-64) - K (centroids): Codebook size per chunk (typical: 256 for 1-byte codes) - Codebook: Learned lookup table mapping codes to vectors - ADC (Asymmetric Distance Computation): Fast approximate distance
Memory Optimization Benefits¶
| Vector Dim | Original Size | PQ Size (M=8) | Compression | 1M Vectors |
|---|---|---|---|---|
| 384 | 1,536 bytes | 8 bytes | 192x | 1.5 GB → 8 MB |
| 768 | 3,072 bytes | 8 bytes | 384x | 3 GB → 8 MB |
| 1536 | 6,144 bytes | 8 bytes | 768x | 6 GB → 8 MB |
When to Use PQ: - Large datasets (>100K vectors) with memory constraints - Edge deployments (IoT, mobile) - Cost optimization (smaller instances) - Can tolerate 2-5% accuracy loss
Creating PQ Indexes¶
Basic PQ Index¶
-- Create HNSW index with Product Quantization
CREATE INDEX products_idx
ON products(embedding)
USING hnsw
WITH (quantization='product');
With Custom Sub-Quantizers¶
-- Control compression ratio via sub-quantizers
-- More sub-quantizers = better accuracy, less compression
-- High compression (M=8, dimension must be divisible by 8)
CREATE INDEX idx_compress
ON docs(embedding)
USING hnsw
WITH (quantization='product', pq_subquantizers=8);
-- Balanced (M=16, dimension must be divisible by 16)
CREATE INDEX idx_balanced
ON docs(embedding)
USING hnsw
WITH (quantization='product', pq_subquantizers=16);
-- High accuracy (M=32, dimension must be divisible by 32)
CREATE INDEX idx_accurate
ON docs(embedding)
USING hnsw
WITH (quantization='product', pq_subquantizers=32);
With Custom Centroids¶
-- Control codebook size (K parameter)
-- More centroids = better accuracy, larger codebook
-- Standard (K=256, 1-byte codes)
CREATE INDEX idx
ON docs(embedding)
USING hnsw
WITH (quantization='product', pq_centroids=256);
-- Smaller codebook (K=128, faster but less accurate)
CREATE INDEX idx
ON docs(embedding)
USING hnsw
WITH (quantization='product', pq_centroids=128);
PQ Configuration Guidelines¶
Choosing M (Sub-Quantizers)¶
| Dimension | Recommended M | Compression | Accuracy Loss |
|---|---|---|---|
| 128 | 8 | 64x | <3% |
| 256 | 8 | 128x | <3% |
| 384 | 8 | 192x | <5% |
| 768 | 8-16 | 384x-192x | <5% |
| 1536 | 16-32 | 384x-192x | <5% |
Rules: - Dimension must be divisible by M - Start with M=8 for most cases - Increase M if accuracy is critical - Lower M for maximum compression
Training Requirements¶
PQ requires training data to build codebooks:
-- Minimum vectors needed for training
-- Rule of thumb: 10,000+ vectors for quality codebooks
-- Small dataset (<10K): PQ may not be worth it
-- Medium dataset (10K-100K): PQ beneficial, use M=8
-- Large dataset (>100K): PQ highly recommended, M=8-16
Trade-offs¶
| Aspect | Without PQ | With PQ (M=8, K=256) |
|---|---|---|
| Memory | 3,072 bytes (768-dim) | 8 bytes |
| Compression | 1x | 384x |
| Query Speed | Baseline | 1.2-1.5x faster (less memory bandwidth) |
| Accuracy | 100% | 95-98% (Recall@10) |
| Index Build | Faster | Slower (codebook training) |
| Disk Storage | Larger | Much smaller |
Practical Example: Large-Scale Document Search¶
-- Scenario: 10 million documents, 768-dim embeddings
-- Without PQ: 10M × 3KB = 30 GB RAM
-- With PQ (M=8): 10M × 8 bytes = 80 MB RAM (375x reduction)
CREATE TABLE documents (
doc_id INT PRIMARY KEY,
title TEXT,
content TEXT,
embedding VECTOR(768)
);
-- Create PQ-compressed HNSW index
CREATE INDEX docs_pq_idx
ON documents(embedding)
USING hnsw
WITH (
quantization='product',
pq_subquantizers=8,
pq_centroids=256
);
-- Queries work identically (compression is transparent)
SELECT
doc_id,
title,
embedding <=> '[0.1, 0.2, ...]' AS distance
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 10;
Monitoring PQ Performance¶
-- Check index configuration and memory usage
SELECT
index_name,
dimensions,
quantization_type,
memory_bytes,
compression_ratio
FROM pg_vector_index_stats()
WHERE index_name = 'docs_pq_idx';
Expected Output:
index_name | dimensions | quantization_type | memory_bytes | compression_ratio
--------------|------------|------------------|--------------|------------------
docs_pq_idx | 768 | product | 83886080 | 384.0
8. Real-World Examples¶
Example 1: Semantic Document Search¶
Build a semantic search engine for a knowledge base with 100K articles.
Schema Design¶
CREATE TABLE knowledge_base (
article_id INT PRIMARY KEY,
title TEXT NOT NULL,
content TEXT NOT NULL,
category TEXT,
author TEXT,
publish_date DATE,
view_count INT DEFAULT 0,
-- 384-dim embedding from all-MiniLM-L6-v2
title_embedding VECTOR(384),
content_embedding VECTOR(768) -- all-mpnet-base-v2 for content
);
-- Create indexes
CREATE INDEX kb_title_idx ON knowledge_base USING hnsw (title_embedding);
CREATE INDEX kb_content_idx ON knowledge_base USING hnsw (content_embedding);
-- Traditional indexes for filtering
CREATE INDEX kb_category_idx ON knowledge_base(category);
CREATE INDEX kb_date_idx ON knowledge_base(publish_date);
Indexing Pipeline (Python)¶
from sentence_transformers import SentenceTransformer
import heliosdb
# Load models
title_model = SentenceTransformer('all-MiniLM-L6-v2')
content_model = SentenceTransformer('all-mpnet-base-v2')
# Connect to database
db = heliosdb.connect("knowledge.db")
# Sample articles
articles = [
{
"id": 1,
"title": "Introduction to Machine Learning",
"content": "Machine learning is a subset of artificial intelligence...",
"category": "AI",
"author": "Alice Smith",
"publish_date": "2024-01-15"
},
# ... more articles
]
# Generate embeddings in batches
batch_size = 32
for i in range(0, len(articles), batch_size):
batch = articles[i:i+batch_size]
# Generate embeddings
titles = [a["title"] for a in batch]
contents = [a["content"][:512] for a in batch] # Truncate long content
title_embeddings = title_model.encode(titles)
content_embeddings = content_model.encode(contents)
# Insert into database
for article, title_emb, content_emb in zip(batch, title_embeddings, content_embeddings):
title_str = "[" + ",".join(map(str, title_emb)) + "]"
content_str = "[" + ",".join(map(str, content_emb)) + "]"
db.execute("""
INSERT INTO knowledge_base
(article_id, title, content, category, author, publish_date,
title_embedding, content_embedding)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
""", (
article["id"], article["title"], article["content"],
article["category"], article["author"], article["publish_date"],
title_str, content_str
))
db.commit()
print(f"Indexed {min(i+batch_size, len(articles))} / {len(articles)} articles")
Search Implementation¶
def search_knowledge_base(query, category=None, top_k=10):
"""
Semantic search with optional category filter.
"""
# Generate query embedding
query_embedding = content_model.encode(query)
query_str = "[" + ",".join(map(str, query_embedding)) + "]"
# Build SQL query
sql = """
SELECT
article_id,
title,
category,
author,
publish_date,
content_embedding <=> ? AS relevance_score
FROM knowledge_base
"""
params = [query_str, query_str]
# Add category filter if specified
if category:
sql += " WHERE category = ?"
params.append(category)
sql += """
ORDER BY content_embedding <=> ?
LIMIT ?
"""
params.extend([query_str, top_k])
# Execute query
results = db.execute(sql, params).fetchall()
return [
{
"article_id": r[0],
"title": r[1],
"category": r[2],
"author": r[3],
"publish_date": r[4],
"relevance": 1 - r[5] # Convert distance to similarity
}
for r in results
]
# Example usage
results = search_knowledge_base(
query="How does neural network training work?",
category="AI",
top_k=5
)
for r in results:
print(f"{r['title']} (relevance: {r['relevance']:.3f})")
Output:
Introduction to Neural Networks (relevance: 0.923)
Backpropagation Algorithm Explained (relevance: 0.887)
Deep Learning Training Techniques (relevance: 0.845)
Gradient Descent Optimization (relevance: 0.812)
Neural Network Architectures (relevance: 0.798)
Example 2: Product Recommendations¶
Build a recommendation engine for e-commerce using collaborative filtering.
Schema Design¶
-- User preference vectors (learned from purchase history)
CREATE TABLE user_preferences (
user_id INT PRIMARY KEY,
username TEXT,
preference_vector VECTOR(128),
last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Product feature vectors
CREATE TABLE products (
product_id INT PRIMARY KEY,
name TEXT NOT NULL,
description TEXT,
category TEXT,
price DECIMAL(10, 2),
in_stock BOOLEAN DEFAULT TRUE,
popularity_score FLOAT DEFAULT 0.0,
-- Combined feature vector (category + attributes + user preferences)
feature_vector VECTOR(128)
);
-- Create HNSW index for fast recommendations
CREATE INDEX products_feature_idx
ON products(feature_vector)
USING hnsw
WITH (quantization='product', pq_subquantizers=8);
-- Purchase history for training
CREATE TABLE purchases (
purchase_id INT PRIMARY KEY,
user_id INT,
product_id INT,
purchase_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
rating INT, -- 1-5 stars
FOREIGN KEY (user_id) REFERENCES user_preferences(user_id),
FOREIGN KEY (product_id) REFERENCES products(product_id)
);
Recommendation Query¶
-- Get personalized recommendations for user 42
-- Combine vector similarity with business rules
SELECT
p.product_id,
p.name,
p.price,
p.category,
p.popularity_score,
-- Vector similarity score
p.feature_vector <#> u.preference_vector AS match_score,
-- Combined ranking score
(
-- 70% vector similarity
(p.feature_vector <#> u.preference_vector) * 0.7 +
-- 20% popularity
(p.popularity_score * 0.2) +
-- 10% recency bonus (newly added products)
CASE
WHEN p.created_at > CURRENT_DATE - INTERVAL '7 days' THEN 0.1
ELSE 0.0
END
) AS final_score
FROM products p, user_preferences u
WHERE u.user_id = 42
AND p.in_stock = TRUE
AND p.price > 0
-- Exclude already purchased items
AND p.product_id NOT IN (
SELECT product_id
FROM purchases
WHERE user_id = 42
)
ORDER BY final_score
LIMIT 20;
Python Implementation¶
def get_recommendations(user_id, max_price=None, category=None, limit=20):
"""
Get personalized product recommendations.
"""
sql = """
SELECT
p.product_id,
p.name,
p.price,
p.category,
p.feature_vector <#> u.preference_vector AS score
FROM products p, user_preferences u
WHERE u.user_id = ?
AND p.in_stock = TRUE
AND p.product_id NOT IN (
SELECT product_id FROM purchases WHERE user_id = ?
)
"""
params = [user_id, user_id]
if max_price:
sql += " AND p.price <= ?"
params.append(max_price)
if category:
sql += " AND p.category = ?"
params.append(category)
sql += " ORDER BY score LIMIT ?"
params.append(limit)
results = db.execute(sql, params).fetchall()
return [
{
"product_id": r[0],
"name": r[1],
"price": float(r[2]),
"category": r[3],
"score": float(r[4])
}
for r in results
]
# Example usage
recommendations = get_recommendations(
user_id=42,
max_price=100.00,
category="Electronics",
limit=10
)
for rec in recommendations:
print(f"{rec['name']} - ${rec['price']:.2f} (score: {rec['score']:.3f})")
Example 3: RAG Pipeline for LLM¶
Build a Retrieval Augmented Generation system for answering questions with LLM + context.
Schema Design¶
-- Document chunks for retrieval
CREATE TABLE document_chunks (
chunk_id INT PRIMARY KEY,
document_id INT,
document_title TEXT,
chunk_text TEXT NOT NULL,
chunk_index INT, -- Position in document
-- Embedding for semantic retrieval
embedding VECTOR(1536), -- OpenAI ada-002
-- Metadata for filtering
source TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Create HNSW index
CREATE INDEX chunks_embedding_idx
ON document_chunks(embedding)
USING hnsw;
RAG Implementation (Python)¶
import openai
from typing import List, Dict
# Initialize OpenAI
openai.api_key = "your-api-key"
def embed_query(query: str) -> List[float]:
"""Generate embedding for query."""
response = openai.Embedding.create(
input=query,
model="text-embedding-ada-002"
)
return response['data'][0]['embedding']
def retrieve_context(query: str, top_k: int = 5) -> List[Dict]:
"""Retrieve relevant document chunks."""
# Generate query embedding
query_emb = embed_query(query)
query_str = "[" + ",".join(map(str, query_emb)) + "]"
# Search for similar chunks
results = db.execute("""
SELECT
chunk_id,
document_title,
chunk_text,
embedding <=> ? AS relevance
FROM document_chunks
ORDER BY embedding <=> ?
LIMIT ?
""", (query_str, query_str, top_k)).fetchall()
return [
{
"chunk_id": r[0],
"title": r[1],
"text": r[2],
"relevance": 1 - r[3]
}
for r in results
]
def generate_answer(query: str, context_chunks: List[Dict]) -> str:
"""Generate answer using LLM with retrieved context."""
# Build context string
context = "\n\n".join([
f"[Document: {c['title']}]\n{c['text']}"
for c in context_chunks
])
# Create prompt
prompt = f"""Answer the question based on the following context:
Context:
{context}
Question: {query}
Answer:"""
# Generate answer
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a helpful assistant that answers questions based on provided context."},
{"role": "user", "content": prompt}
],
temperature=0.7,
max_tokens=500
)
return response['choices'][0]['message']['content']
def rag_query(query: str, top_k: int = 5) -> Dict:
"""Complete RAG pipeline: retrieve + generate."""
# Retrieve relevant context
context_chunks = retrieve_context(query, top_k)
# Generate answer
answer = generate_answer(query, context_chunks)
return {
"query": query,
"answer": answer,
"sources": [
{"title": c["title"], "relevance": c["relevance"]}
for c in context_chunks
]
}
# Example usage
result = rag_query("How does HNSW indexing work?")
print(f"Query: {result['query']}")
print(f"Answer: {result['answer']}")
print("\nSources:")
for source in result['sources']:
print(f" - {source['title']} (relevance: {source['relevance']:.3f})")
9. Performance Tuning¶
Index Optimization¶
Build-Time vs Query-Time Tradeoffs¶
-- Fast build, lower recall (~90%)
CREATE INDEX fast_idx ON docs(embedding)
USING hnsw; -- Default: m=16, ef_construction=200
-- Slow build, higher recall (~98%)
-- (If future version supports parameters)
-- CREATE INDEX accurate_idx ON docs(embedding)
-- USING hnsw WITH (m=32, ef_construction=400);
When to Rebuild Indexes¶
Rebuild indexes when: 1. Dataset grows >2x since index creation 2. Significant data distribution changes 3. Query accuracy degrades over time
-- Rebuild index
DROP INDEX docs_embedding_idx;
CREATE INDEX docs_embedding_idx ON docs(embedding) USING hnsw;
Batch Operations¶
Efficient Bulk Insertion¶
# BAD: Individual inserts (slow)
for doc in documents:
db.execute("INSERT INTO docs VALUES (?, ?)", (doc.id, doc.embedding))
db.commit() # Commit each row
# GOOD: Batch inserts
db.execute("BEGIN TRANSACTION")
for doc in documents:
db.execute("INSERT INTO docs VALUES (?, ?)", (doc.id, doc.embedding))
db.execute("COMMIT") # Commit once
# BETTER: Prepared statement with executemany
values = [(doc.id, doc.embedding_str) for doc in documents]
db.executemany("INSERT INTO docs VALUES (?, ?)", values)
db.commit()
Optimal Batch Sizes¶
| Operation | Recommended Batch Size | Reason |
|---|---|---|
| Embedding generation | 32-64 | GPU memory limits |
| Database inserts | 1000-10000 | Transaction overhead |
| Vector searches | 1-10 | Parallel queries limited |
Query Optimization¶
Use LIMIT Always¶
-- BAD: No limit (retrieves all results)
SELECT * FROM docs
ORDER BY embedding <=> '[0.1, ...]';
-- GOOD: Limit results
SELECT * FROM docs
ORDER BY embedding <=> '[0.1, ...]'
LIMIT 10;
Pre-Filter with WHERE¶
-- BAD: Filter after vector search
SELECT * FROM docs
ORDER BY embedding <=> '[0.1, ...]'
LIMIT 10
-- Post-processing in application: filter by category
-- GOOD: Filter before vector search
SELECT * FROM docs
WHERE category = 'technology'
ORDER BY embedding <=> '[0.1, ...]'
LIMIT 10;
Avoid Computing Distance Twice¶
-- BAD: Compute distance twice
SELECT
id,
title,
embedding <=> '[0.1, ...]' AS score
FROM docs
ORDER BY embedding <=> '[0.1, ...]' -- Computed again
LIMIT 10;
-- GOOD: Reference distance column
SELECT
id,
title,
embedding <=> '[0.1, ...]' AS score
FROM docs
ORDER BY score -- Use computed column
LIMIT 10;
Memory Management¶
Monitor Memory Usage¶
-- Check index memory consumption
SELECT
index_name,
dimensions,
vector_count,
memory_bytes / 1024 / 1024 AS memory_mb
FROM pg_vector_index_stats();
Memory-Saving Strategies¶
-
Use Product Quantization
-
Lower dimensions
-
Partition large tables
Hardware Optimization¶
CPU Considerations¶
-
SIMD Support: Ensure AVX2 (x86) or NEON (ARM) enabled
-
Core Count: More cores = faster parallel searches
- 4 cores: Good for small datasets
- 8+ cores: Better for concurrent queries
Memory Considerations¶
| Dataset Size | Minimum RAM | Recommended RAM |
|---|---|---|
| 10K vectors | 512 MB | 1 GB |
| 100K vectors | 2 GB | 4 GB |
| 1M vectors | 8 GB | 16 GB |
| 10M vectors | 32 GB | 64 GB |
Storage Considerations¶
- SSD vs HDD: SSD strongly recommended
- SSD: <10ms latency for index loads
-
HDD: 100-1000ms latency (100x slower)
-
Disk Space:
- Without PQ: ~100 bytes/vector
- With PQ: ~20-50 bytes/vector
10. Monitoring¶
pg_vector_index_stats Function¶
Query index statistics and health metrics.
Basic Usage¶
Output Columns:
- index_name: Name of the index
- table_name: Table the index belongs to
- dimensions: Vector dimensionality
- vector_count: Number of vectors indexed
- memory_bytes: Memory consumed by index
- quantization_type: Compression type (none, product)
- distance_metric: Distance function used
Example Output¶
index_name | table_name | dimensions | vector_count | memory_bytes | quantization_type | distance_metric
--------------------|------------|------------|--------------|--------------|-------------------|----------------
docs_embedding_idx | documents | 384 | 50000 | 8388608 | none | cosine
products_vec_idx | products | 768 | 100000 | 8388608 | product | l2
Monitoring Queries¶
Index Size by Table¶
SELECT
table_name,
index_name,
dimensions,
vector_count,
memory_bytes / 1024 / 1024 AS memory_mb,
(memory_bytes / vector_count) AS bytes_per_vector
FROM pg_vector_index_stats()
ORDER BY memory_bytes DESC;
Compression Effectiveness¶
SELECT
index_name,
quantization_type,
dimensions,
memory_bytes / 1024 / 1024 AS memory_mb,
-- Calculate compression ratio
CASE
WHEN quantization_type = 'product' THEN
(dimensions * 4.0) / 8.0 -- 4 bytes per float -> 8 bytes PQ
ELSE
1.0
END AS compression_ratio
FROM pg_vector_index_stats();
Identify Large Indexes¶
-- Find indexes using >1 GB
SELECT
index_name,
table_name,
memory_bytes / 1024 / 1024 / 1024 AS memory_gb
FROM pg_vector_index_stats()
WHERE memory_bytes > 1073741824 -- 1 GB
ORDER BY memory_bytes DESC;
Performance Metrics¶
Query Latency Tracking¶
import time
def measure_query_latency(query_embedding, top_k=10, iterations=100):
"""Measure average query latency."""
query_str = "[" + ",".join(map(str, query_embedding)) + "]"
latencies = []
for _ in range(iterations):
start = time.perf_counter()
results = db.execute("""
SELECT id, embedding <-> ? AS dist
FROM documents
ORDER BY embedding <-> ?
LIMIT ?
""", (query_str, query_str, top_k)).fetchall()
end = time.perf_counter()
latencies.append((end - start) * 1000) # Convert to ms
avg_latency = sum(latencies) / len(latencies)
p50 = sorted(latencies)[len(latencies) // 2]
p99 = sorted(latencies)[int(len(latencies) * 0.99)]
return {
"avg_ms": avg_latency,
"p50_ms": p50,
"p99_ms": p99,
"min_ms": min(latencies),
"max_ms": max(latencies)
}
# Example usage
metrics = measure_query_latency(query_embedding, top_k=10)
print(f"Average latency: {metrics['avg_ms']:.2f} ms")
print(f"P50 latency: {metrics['p50_ms']:.2f} ms")
print(f"P99 latency: {metrics['p99_ms']:.2f} ms")
Throughput Measurement¶
import concurrent.futures
import time
def measure_throughput(query_embeddings, top_k=10, workers=4):
"""Measure queries per second with parallel execution."""
def run_query(query_emb):
query_str = "[" + ",".join(map(str, query_emb)) + "]"
results = db.execute("""
SELECT id FROM documents
ORDER BY embedding <-> ?
LIMIT ?
""", (query_str, top_k)).fetchall()
return len(results)
start = time.perf_counter()
with concurrent.futures.ThreadPoolExecutor(max_workers=workers) as executor:
results = list(executor.map(run_query, query_embeddings))
end = time.perf_counter()
elapsed = end - start
qps = len(query_embeddings) / elapsed
return {
"total_queries": len(query_embeddings),
"elapsed_seconds": elapsed,
"queries_per_second": qps,
"workers": workers
}
# Example usage
throughput = measure_throughput(query_embeddings, workers=8)
print(f"Throughput: {throughput['queries_per_second']:.1f} QPS")
Health Checks¶
Index Integrity Check¶
-- Verify index exists and has expected vectors
SELECT
index_name,
vector_count,
CASE
WHEN vector_count > 0 THEN 'OK'
ELSE 'EMPTY'
END AS status
FROM pg_vector_index_stats();
Dimension Mismatch Detection¶
-- Check for dimension inconsistencies
-- (This would be a custom query checking actual data)
SELECT
'documents' AS table_name,
COUNT(*) AS total_rows,
COUNT(embedding) AS non_null_embeddings
FROM documents;
Alerting Thresholds¶
Recommended alert thresholds:
| Metric | Warning | Critical |
|---|---|---|
| Query latency (p99) | >50ms | >200ms |
| Index memory usage | >80% available | >95% available |
| Query error rate | >1% | >5% |
| QPS throughput | <100 QPS | <10 QPS |
Summary¶
This tutorial covered:
- Introduction: What vector search is and why it matters
- Vector Basics: Embeddings, dimensions, and distance metrics
- Creating Tables: VECTOR type and schema design
- Generating Embeddings: OpenAI, sentence-transformers, and batch processing
- HNSW Indexing: Graph-based approximate search with configurable parameters
- Similarity Search: KNN queries with L2, cosine, and inner product operators
- Product Quantization: 8-384x memory compression for large-scale deployments
- Real-World Examples: Document search, recommendations, and RAG pipelines
- Performance Tuning: Index optimization, batching, and hardware considerations
- Monitoring: pg_vector_index_stats, latency tracking, and health checks
Next Steps¶
- Try the examples: Implement a semantic search system with your own data
- Experiment with parameters: Test different dimensions and distance metrics
- Optimize for your use case: Profile queries and tune indexes
- Explore advanced features: Multi-modal search, hybrid ranking, custom embeddings
Additional Resources¶
Troubleshooting¶
Common Issues:
- Dimension mismatch errors
- Ensure embedding dimension matches VECTOR(n) declaration
-
Verify all embeddings from same model
-
Slow queries
- Add HNSW index if missing
- Use LIMIT to cap results
-
Consider Product Quantization for large datasets
-
Memory issues
- Enable Product Quantization (384x compression)
- Use smaller embedding dimensions
-
Partition large tables
-
Low accuracy
- Increase ef_construction (if configurable)
- Use higher m parameter (more connections)
- Disable PQ or increase sub-quantizers
Document Version: 1.0 Last Updated: 2025-12-01 Feedback: Please report issues or suggestions via GitHub issues