Skip to content

Vector Search Tutorial

Version: 1.0 Created: 2025-12-01 Target Audience: Developers building AI/ML applications Prerequisites: Basic SQL knowledge, understanding of embeddings Estimated Time: 45-60 minutes


Table of Contents

  1. Introduction
  2. Vector Basics
  3. Creating Vector Tables
  4. Generating Embeddings
  5. HNSW Indexing
  6. Similarity Search
  7. Product Quantization
  8. Real-World Examples
  9. Performance Tuning
  10. Monitoring

1. Introduction

Vector search (also known as semantic search or similarity search) enables AI applications to find similar items based on meaning rather than exact keyword matches. Instead of searching for exact text, vector search compares mathematical representations (embeddings) of content to find semantically similar items.

Traditional Keyword Search vs Vector Search:

Query: "automobile repair"

Keyword Search Results:
✓ "automobile repair shop"
✗ "car maintenance service"  (missed - different words)
✗ "vehicle fixing center"    (missed - different words)

Vector Search Results:
✓ "automobile repair shop"   (exact match)
✓ "car maintenance service"  (semantically similar)
✓ "vehicle fixing center"    (semantically similar)

Use Cases

Use Case Description Typical Dimensions
Semantic Search Find documents by meaning, not keywords 384-768
Recommendations Suggest similar products, articles, or content 128-512
RAG (Retrieval Augmented Generation) Retrieve relevant context for LLM prompts 384-1536
Image Similarity Find visually similar images 512-2048
Anomaly Detection Identify outliers in time-series or sensor data 64-256
Duplicate Detection Find near-duplicate content 256-768

How HeliosDB-Lite Implements It

HeliosDB-Lite provides native vector search capabilities with:

┌─────────────────────────────────────────────────────────────┐
│                 HeliosDB-Lite Vector Search                 │
├─────────────────────────────────────────────────────────────┤
│  SQL Interface: VECTOR type, distance operators, indexes    │
├─────────────────────────────────────────────────────────────┤
│  HNSW Algorithm: Graph-based approximate nearest neighbors  │
├─────────────────────────────────────────────────────────────┤
│  SIMD Acceleration: AVX2/NEON for 2-6x faster distances     │
├─────────────────────────────────────────────────────────────┤
│  Product Quantization: 384x memory compression             │
├─────────────────────────────────────────────────────────────┤
│  RocksDB Storage: Persistent, crash-safe index storage      │
└─────────────────────────────────────────────────────────────┘

Key Benefits: - Embedded (no separate server required) - Sub-millisecond query latency for millions of vectors - SIMD-accelerated distance calculations - Memory-efficient with Product Quantization - Compatible with pgvector syntax


2. Vector Basics

Understanding Embeddings

Embeddings are mathematical representations of content (text, images, audio) as fixed-length arrays of floating-point numbers. Each dimension captures semantic information.

Text: "The cat sat on the mat"
       ↓ (embedding model)
Vector: [0.12, -0.45, 0.78, 0.34, -0.12, ...]  (384 dimensions)

Text: "A feline rested on the rug"
       ↓ (embedding model)
Vector: [0.15, -0.42, 0.81, 0.36, -0.09, ...]  (similar values!)

Properties of Good Embeddings: - Similar content → similar vectors (small distance) - Different content → different vectors (large distance) - Fixed dimensionality (all vectors same length) - Typically normalized (unit length) for cosine similarity

Dimension Considerations

Choosing the right dimension depends on your use case and embedding model:

Model Dimension Use Case Speed Accuracy
all-MiniLM-L6-v2 384 Fast semantic search Fastest Good
all-mpnet-base-v2 768 Balanced performance Fast Better
text-embedding-ada-002 (OpenAI) 1536 High accuracy Medium Best
ResNet-50 (images) 2048 Image features Medium Excellent

Rules of Thumb: - Higher dimensions: Better accuracy, more memory, slower queries - Lower dimensions: Faster queries, less memory, may lose nuance - Common choices: 384 (fast), 768 (balanced), 1536 (accurate)

Distance Metrics

HeliosDB-Lite supports three distance metrics to measure vector similarity:

L2 Distance (Euclidean)

Measures straight-line distance in N-dimensional space.

Formula: distance = √(Σ(a[i] - b[i])²)

Example:
Vector A: [1.0, 0.0, 0.0]
Vector B: [0.0, 1.0, 0.0]
L2 Distance: √((1-0)² + (0-1)² + (0-0)²) = √2 ≈ 1.414

Use when: General-purpose similarity, raw feature vectors
Operator: <->
Range: [0, ∞)  (0 = identical, larger = more different)

Cosine Similarity (Cosine Distance)

Measures angle between vectors (ignores magnitude).

Formula: distance = 1 - (A·B / (||A|| × ||B||))

Example:
Vector A: [1.0, 0.0, 0.0]  (normalized)
Vector B: [0.707, 0.707, 0.0]  (normalized)
Cosine Distance: 1 - 0.707 ≈ 0.293

Use when: Text embeddings, normalized vectors
Operator: <=>
Range: [0, 2]  (0 = identical, 2 = opposite direction)

Why Cosine for Text? - Text embeddings are typically normalized (unit length) - Cosine ignores magnitude, focuses on semantic direction - Most NLP models optimize for cosine similarity

Inner Product (Dot Product)

Measures alignment between vectors (considers both angle and magnitude).

Formula: distance = -(A·B) = -Σ(a[i] × b[i])
         (negative because smaller = better in HeliosDB-Lite)

Example:
Vector A: [1.0, 2.0, 3.0]
Vector B: [4.0, 5.0, 6.0]
Inner Product Distance: -(4 + 10 + 18) = -32

Use when: Recommendation systems, non-normalized vectors
Operator: <#>
Range: (-∞, 0]  (more negative = more similar)

Comparison Table:

Metric When to Use Normalized? SQL Operator
L2 General similarity, image features No <->
Cosine Text embeddings, semantic search Yes (recommended) <=>
Inner Product Recommendations, scoring No <#>

3. Creating Vector Tables

VECTOR Data Type Syntax

The VECTOR(n) data type stores fixed-dimension floating-point arrays.

-- Basic syntax
VECTOR(dimension)

-- Examples
VECTOR(3)     -- 3-dimensional vector
VECTOR(384)   -- Common for MiniLM embeddings
VECTOR(768)   -- Common for BERT embeddings
VECTOR(1536)  -- OpenAI ada-002 embeddings

Example Table Creation

Example 1: Simple Document Embeddings

-- Semantic search for documents
CREATE TABLE documents (
    id INT PRIMARY KEY,
    title TEXT NOT NULL,
    content TEXT NOT NULL,
    embedding VECTOR(384)  -- all-MiniLM-L6-v2 embeddings
);

Example 2: Multi-Column Vectors

-- Product catalog with multiple embedding types
CREATE TABLE products (
    product_id INT PRIMARY KEY,
    name TEXT NOT NULL,
    description TEXT,
    price DECIMAL(10, 2),
    category TEXT,

    -- Text embedding for name + description
    text_embedding VECTOR(768),

    -- Image embedding for product photos
    image_embedding VECTOR(512),

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Example 3: User Preferences for Recommendations

-- User preference vectors for collaborative filtering
CREATE TABLE user_profiles (
    user_id INT PRIMARY KEY,
    username TEXT NOT NULL,
    preference_vector VECTOR(128),
    last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Item vectors for recommendation matching
CREATE TABLE items (
    item_id INT PRIMARY KEY,
    title TEXT NOT NULL,
    item_vector VECTOR(128),
    popularity_score FLOAT DEFAULT 0.0
);

Example 4: Time-Series Anomaly Detection

-- Sensor readings with feature vectors
CREATE TABLE sensor_readings (
    reading_id INT PRIMARY KEY,
    sensor_id TEXT NOT NULL,
    timestamp TIMESTAMP NOT NULL,
    raw_values TEXT,  -- JSON or serialized data
    feature_vector VECTOR(64),  -- Extracted features
    is_anomaly BOOLEAN DEFAULT FALSE
);
-- Image database with visual features
CREATE TABLE images (
    image_id INT PRIMARY KEY,
    filename TEXT NOT NULL,
    upload_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    tags TEXT,  -- Comma-separated tags

    -- ResNet-50 feature vector
    features VECTOR(2048),

    -- Metadata
    width INT,
    height INT,
    file_size INT
);

Table Design Best Practices

  1. Choose dimension based on your embedding model

    -- Match the model output dimension exactly
    -- sentence-transformers/all-MiniLM-L6-v2 → 384
    CREATE TABLE docs (id INT, vec VECTOR(384));
    

  2. Add metadata columns for filtering

    -- Combine vector search with WHERE filters
    CREATE TABLE articles (
        id INT PRIMARY KEY,
        category TEXT,        -- For filtering by category
        publish_date DATE,    -- For filtering by date
        author TEXT,          -- For filtering by author
        embedding VECTOR(768)
    );
    

  3. Use appropriate primary keys

    -- Auto-incrementing ID for new data
    CREATE TABLE logs (
        id SERIAL PRIMARY KEY,
        log_text TEXT,
        embedding VECTOR(384)
    );
    

  4. Consider NULL handling

    -- Allow NULL if embeddings are generated asynchronously
    CREATE TABLE pending_docs (
        id INT PRIMARY KEY,
        content TEXT NOT NULL,
        embedding VECTOR(384) NULL  -- Generated after insertion
    );
    


4. Generating Embeddings

Using OpenAI API (Python)

The OpenAI API provides high-quality text embeddings via text-embedding-ada-002 (1536 dimensions).

Installation

pip install openai

Basic Usage

import openai
import json

# Configure API key
openai.api_key = "your-api-key-here"

def get_embedding(text, model="text-embedding-ada-002"):
    """Generate embedding for a single text string."""
    response = openai.Embedding.create(
        input=text,
        model=model
    )
    return response['data'][0]['embedding']

# Example
text = "The quick brown fox jumps over the lazy dog"
embedding = get_embedding(text)

print(f"Dimension: {len(embedding)}")  # 1536
print(f"First 5 values: {embedding[:5]}")
# Output: [0.0234, -0.0156, 0.0789, -0.0412, 0.0567]

Batch Processing

def get_embeddings_batch(texts, model="text-embedding-ada-002"):
    """Generate embeddings for multiple texts (up to 2048 per request)."""
    response = openai.Embedding.create(
        input=texts,
        model=model
    )
    return [item['embedding'] for item in response['data']]

# Process multiple documents
documents = [
    "First document text",
    "Second document text",
    "Third document text"
]

embeddings = get_embeddings_batch(documents)
print(f"Generated {len(embeddings)} embeddings")

Storing in HeliosDB-Lite

import heliosdb  # Hypothetical Python client

# Connect to database
db = heliosdb.connect("mydb.db")

# Generate embedding
text = "Sample document content"
embedding = get_embedding(text)

# Convert to SQL array format
embedding_str = "[" + ",".join(map(str, embedding)) + "]"

# Insert into database
db.execute("""
    INSERT INTO documents (id, content, embedding)
    VALUES (?, ?, ?)
""", (1, text, embedding_str))

db.commit()

Using sentence-transformers (Python)

The sentence-transformers library provides efficient local embedding generation (no API calls).

Installation

pip install sentence-transformers

Basic Usage

from sentence_transformers import SentenceTransformer

# Load model (downloads on first use, cached afterward)
model = SentenceTransformer('all-MiniLM-L6-v2')

# Generate single embedding
text = "This is a sample sentence"
embedding = model.encode(text)

print(f"Dimension: {len(embedding)}")  # 384
print(f"Type: {type(embedding)}")      # numpy.ndarray
print(f"First 5 values: {embedding[:5]}")
# Output: [0.0234, -0.0156, 0.0789, -0.0412, 0.0567]

Batch Processing (Efficient)

# Process multiple texts in parallel
texts = [
    "First sentence",
    "Second sentence",
    "Third sentence"
]

# Batch encoding (much faster than loop)
embeddings = model.encode(texts, batch_size=32)

print(f"Shape: {embeddings.shape}")  # (3, 384)

# Iterate over results
for i, emb in enumerate(embeddings):
    print(f"Text {i}: dimension={len(emb)}")

Complete Example: Indexing Documents

from sentence_transformers import SentenceTransformer
import sqlite3  # or heliosdb client

# Initialize model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Sample documents
documents = [
    {"id": 1, "text": "Python is a programming language"},
    {"id": 2, "text": "Machine learning uses algorithms"},
    {"id": 3, "text": "Databases store structured data"},
    {"id": 4, "text": "Neural networks process information"},
]

# Generate embeddings
texts = [doc["text"] for doc in documents]
embeddings = model.encode(texts, show_progress_bar=True)

# Connect to HeliosDB-Lite
conn = sqlite3.connect("mydb.db")  # Use HeliosDB-Lite client in production
cursor = conn.cursor()

# Create table
cursor.execute("""
    CREATE TABLE IF NOT EXISTS documents (
        id INT PRIMARY KEY,
        text TEXT,
        embedding VECTOR(384)
    )
""")

# Insert documents with embeddings
for doc, emb in zip(documents, embeddings):
    emb_str = "[" + ",".join(map(str, emb)) + "]"
    cursor.execute("""
        INSERT INTO documents (id, text, embedding)
        VALUES (?, ?, ?)
    """, (doc["id"], doc["text"], emb_str))

conn.commit()
print(f"Indexed {len(documents)} documents")
Model Dimensions Speed Quality Use Case
all-MiniLM-L6-v2 384 Fastest Good General semantic search
all-mpnet-base-v2 768 Fast Better Balanced quality/speed
all-MiniLM-L12-v2 384 Fast Good Slightly better than L6
multi-qa-mpnet-base-dot-v1 768 Medium Best Q&A and retrieval
paraphrase-multilingual-mpnet-base-v2 768 Medium Best Multilingual support

Custom Model Fine-Tuning

from sentence_transformers import SentenceTransformer, InputExample, losses
from torch.utils.data import DataLoader

# Load base model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Create training data (text pairs with similarity scores)
train_examples = [
    InputExample(texts=["query 1", "relevant doc 1"], label=1.0),
    InputExample(texts=["query 1", "irrelevant doc"], label=0.0),
    # Add more examples...
]

# Create data loader
train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16)

# Define loss function
train_loss = losses.CosineSimilarityLoss(model)

# Fine-tune
model.fit(
    train_objectives=[(train_dataloader, train_loss)],
    epochs=1,
    warmup_steps=100
)

# Save fine-tuned model
model.save("./my-finetuned-model")

# Use fine-tuned model
custom_model = SentenceTransformer("./my-finetuned-model")

Using Other Embedding Providers

Cohere API

import cohere

co = cohere.Client("your-api-key")

response = co.embed(
    texts=["Text to embed"],
    model="embed-english-v3.0"
)

embeddings = response.embeddings  # List of vectors

Hugging Face Transformers

from transformers import AutoTokenizer, AutoModel
import torch

# Load model
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")

# Generate embedding
text = "Sample text"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)

with torch.no_grad():
    outputs = model(**inputs)
    # Use [CLS] token embedding or mean pooling
    embedding = outputs.last_hidden_state[:, 0, :].numpy()[0]

print(f"Dimension: {len(embedding)}")  # 768 for BERT-base

Best Practices for Embedding Generation

  1. Batch processing for efficiency

    # Good: Process in batches
    embeddings = model.encode(texts, batch_size=32)
    
    # Bad: One at a time
    embeddings = [model.encode(text) for text in texts]
    

  2. Normalize vectors for cosine similarity

    import numpy as np
    
    # Normalize to unit length
    embedding = model.encode(text)
    embedding = embedding / np.linalg.norm(embedding)
    

  3. Cache embeddings to avoid recomputation

    # Store embeddings in database, don't regenerate
    # Check if embedding exists before generating
    

  4. Handle long texts appropriately

    # Truncate or chunk long documents
    max_length = 512  # Model's max tokens
    
    # Option 1: Truncate
    text = text[:max_length]
    
    # Option 2: Chunk and average
    chunks = [text[i:i+max_length] for i in range(0, len(text), max_length)]
    chunk_embeddings = model.encode(chunks)
    avg_embedding = np.mean(chunk_embeddings, axis=0)
    


5. HNSW Indexing

What is HNSW?

HNSW (Hierarchical Navigable Small World) is a graph-based algorithm for approximate nearest neighbor search. It constructs a multi-layer graph where each layer contains progressively fewer nodes, enabling fast logarithmic-time searches.

How HNSW Works

Layer 2:  A ←─────→ D          (Entry point, sparse)
          ↓         ↓
Layer 1:  A ←→ B ←→ D ←→ E      (Medium density)
          ↓    ↓    ↓    ↓
Layer 0:  A ←→ B ←→ C ←→ D ←→ E ←→ F  (All nodes, fully connected)

Search Process:
1. Start at entry point in top layer (Layer 2: node A)
2. Greedily navigate to closest node in current layer (A → D)
3. Drop down to next layer, continue greedy search (D → D → E)
4. Repeat until bottom layer (Layer 0)
5. Perform local search in bottom layer for k neighbors

Key Properties: - Approximate: Finds ~95-99% of true nearest neighbors (configurable) - Fast: O(log N) query time vs O(N) for brute force - Scalable: Handles millions to billions of vectors - Memory-efficient: ~50-100 bytes per vector

Creating HNSW Indexes

Basic Index Creation

-- Create HNSW index on vector column
CREATE INDEX embedding_idx ON documents USING hnsw (embedding);

With Custom Table

-- Full example
CREATE TABLE articles (
    id INT PRIMARY KEY,
    title TEXT,
    content TEXT,
    embedding VECTOR(768)
);

-- Create HNSW index
CREATE INDEX articles_embedding_idx ON articles USING hnsw (embedding);

Multiple Indexes

-- Table with multiple vector columns
CREATE TABLE products (
    product_id INT PRIMARY KEY,
    text_embedding VECTOR(768),
    image_embedding VECTOR(512)
);

-- Create separate indexes for each vector column
CREATE INDEX products_text_idx ON products USING hnsw (text_embedding);
CREATE INDEX products_image_idx ON products USING hnsw (image_embedding);

Index Parameters

HNSW indexes have two critical parameters that control the accuracy/speed tradeoff:

Parameter: m (Max Connections)

Number of bidirectional links per node in the graph.

-- Default: m = 16
CREATE INDEX idx ON docs USING hnsw (embedding);

-- Custom m parameter (if supported in future versions)
-- CREATE INDEX idx ON docs USING hnsw (embedding) WITH (m = 32);

Rules of Thumb: - m = 16: Default, good balance (recommended) - m = 8: Faster build, less memory, lower accuracy - m = 32: Slower build, more memory, higher accuracy - m = 64: Very accurate but memory-intensive

Memory Impact: - Memory per vector ≈ m × 8 bytes (for pointers/IDs) - m=16: ~128 bytes per vector - m=32: ~256 bytes per vector

Parameter: ef_construction (Construction Search Depth)

Size of dynamic candidate list during index construction.

-- Default: ef_construction = 200
CREATE INDEX idx ON docs USING hnsw (embedding);

-- Custom ef_construction (if supported in future versions)
-- CREATE INDEX idx ON docs USING hnsw (embedding) WITH (ef_construction = 400);

Rules of Thumb: - ef_construction = 100: Fast build, lower quality graph - ef_construction = 200: Default, good balance (recommended) - ef_construction = 400: Slower build, higher quality graph - ef_construction = 800: Very slow but maximum quality

Build Time Impact: - Higher ef_construction = slower index builds (linear increase) - But better recall at query time

Recommendations by Dataset Size

Dataset Size m ef_construction Expected Recall@10
<10K vectors 16 200 >99%
10K-100K 16 200 >98%
100K-1M 16 200 >95%
1M-10M 32 400 >95%
>10M 32-64 400-800 >93%

When to Create Indexes

Strategy 1: Index Before Insertion (Empty Table)

-- Create index first, then insert data
CREATE TABLE docs (id INT, vec VECTOR(384));
CREATE INDEX docs_idx ON docs USING hnsw (vec);

-- Insert data (index updates automatically)
INSERT INTO docs VALUES (1, '[0.1, 0.2, ...]');
INSERT INTO docs VALUES (2, '[0.3, 0.4, ...]');

Pros: Incremental index updates Cons: Slower individual inserts

Strategy 2: Bulk Load Then Index

-- Insert data first
CREATE TABLE docs (id INT, vec VECTOR(384));

-- Bulk insert (fast, no index overhead)
INSERT INTO docs VALUES (1, '[0.1, 0.2, ...]');
INSERT INTO docs VALUES (2, '[0.3, 0.4, ...]');
-- ... (thousands or millions of rows)

-- Create index after all data loaded
CREATE INDEX docs_idx ON docs USING hnsw (vec);

Pros: Faster bulk loading Cons: Index build at end can be slow for large datasets

Recommendation: Use Strategy 2 for initial bulk loads, Strategy 1 for incremental updates.

Index Build Time Estimates

Dataset Size Dimension Expected Build Time
1,000 vectors 384 <1 second
10,000 vectors 384 1-3 seconds
100,000 vectors 384 10-30 seconds
1,000,000 vectors 384 2-5 minutes
10,000,000 vectors 768 30-60 minutes

Times on commodity hardware (4-core CPU, 16GB RAM)

Verifying Index Creation

-- Check indexes on a table
\d documents

-- Expected output:
-- Indexes:
--   "documents_embedding_idx" hnsw (embedding)

KNN Queries with <-> Operator (L2 Distance)

The <-> operator performs L2 (Euclidean) distance calculation.

Basic KNN Query

-- Find 5 nearest neighbors using L2 distance
SELECT id, title, embedding <-> '[0.1, 0.2, 0.3, ...]' AS distance
FROM documents
ORDER BY embedding <-> '[0.1, 0.2, 0.3, ...]'
LIMIT 5;

Output:

id  | title                | distance
----|---------------------|----------
42  | "Machine Learning"  | 0.234
17  | "Neural Networks"   | 0.456
89  | "Deep Learning"     | 0.678
23  | "AI Algorithms"     | 0.789
56  | "Data Science"      | 0.901

With Parameterized Queries

# Python example with parameter binding
query_vector = model.encode("machine learning tutorial")
query_str = "[" + ",".join(map(str, query_vector)) + "]"

results = db.execute("""
    SELECT id, title, embedding <-> ? AS distance
    FROM documents
    ORDER BY embedding <-> ?
    LIMIT 10
""", (query_str, query_str))
-- Setup
CREATE TABLE articles (
    id INT PRIMARY KEY,
    title TEXT,
    author TEXT,
    content TEXT,
    embedding VECTOR(384)
);

CREATE INDEX articles_idx ON articles USING hnsw (embedding);

-- Insert sample data (embeddings pre-generated)
INSERT INTO articles VALUES
    (1, 'Intro to Python', 'Alice', '...', '[0.12, -0.34, ...]'),
    (2, 'Advanced ML', 'Bob', '...', '[0.45, 0.67, ...]'),
    (3, 'Database Design', 'Carol', '...', '[-0.23, 0.56, ...]');

-- Search: Find articles similar to "python programming"
-- (query_embedding generated from "python programming")
SELECT
    id,
    title,
    author,
    embedding <-> '[0.15, -0.30, ...]' AS distance
FROM articles
ORDER BY embedding <-> '[0.15, -0.30, ...]'
LIMIT 5;

Cosine Similarity with <=> Operator

The <=> operator calculates cosine distance (1 - cosine similarity).

-- Find 10 most similar documents using cosine distance
SELECT id, content, embedding <=> '[0.1, 0.2, ...]' AS cosine_distance
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 10;

Why Cosine for Text? - Ignores vector magnitude (length), focuses on direction - Most text embedding models produce normalized vectors - Better semantic similarity for text than L2 distance

Converting Distance to Similarity Score

-- Cosine similarity = 1 - cosine distance
SELECT
    id,
    title,
    1 - (embedding <=> '[0.1, 0.2, ...]') AS similarity_score
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 10;

Output:

id  | title                | similarity_score
----|---------------------|------------------
42  | "Machine Learning"  | 0.987  (very similar)
17  | "Neural Networks"   | 0.923
89  | "Deep Learning"     | 0.876
23  | "AI Algorithms"     | 0.834
56  | "Data Science"      | 0.801

Practical Example: Semantic Q&A

-- FAQ database with questions and answers
CREATE TABLE faqs (
    faq_id INT PRIMARY KEY,
    question TEXT,
    answer TEXT,
    question_embedding VECTOR(384)
);

CREATE INDEX faqs_idx ON faqs USING hnsw (question_embedding);

-- Find most relevant FAQ for user query
-- Query: "How do I reset my password?"
-- (query_embedding pre-generated)

SELECT
    question,
    answer,
    (1 - (question_embedding <=> '[0.23, -0.45, ...]')) AS relevance
FROM faqs
ORDER BY question_embedding <=> '[0.23, -0.45, ...]'
LIMIT 3;

Output:

question                      | answer                     | relevance
------------------------------|----------------------------|----------
"Password reset instructions" | "Go to Settings > Reset"   | 0.95
"Forgot password recovery"    | "Click 'Forgot Password'"  | 0.89
"Change account credentials"  | "Visit Account Settings"   | 0.76

Inner Product with <#> Operator

The <#> operator calculates negative inner product (negative because smaller = better).

-- Find top matches using inner product
SELECT id, name, embedding <#> '[1.0, 0.5, 0.3, ...]' AS score
FROM items
ORDER BY embedding <#> '[1.0, 0.5, 0.3, ...]'
LIMIT 10;

When to Use Inner Product: - Recommendation systems (user-item matching) - Non-normalized vectors - When magnitude matters (popularity, confidence scores)

Practical Example: Product Recommendations

-- User preference vectors
CREATE TABLE users (
    user_id INT PRIMARY KEY,
    username TEXT,
    preference_vector VECTOR(128)
);

-- Product feature vectors
CREATE TABLE products (
    product_id INT PRIMARY KEY,
    name TEXT,
    price DECIMAL(10, 2),
    product_vector VECTOR(128)
);

CREATE INDEX products_idx ON products USING hnsw (product_vector);

-- Recommend products for user 42
-- (combining user preference with product features)
SELECT
    p.product_id,
    p.name,
    p.price,
    p.product_vector <#> u.preference_vector AS match_score
FROM products p, users u
WHERE u.user_id = 42
ORDER BY p.product_vector <#> u.preference_vector
LIMIT 20;

Combining with WHERE Filters

Combine vector similarity with traditional SQL filters for powerful hybrid search.

-- Find similar documents in specific category
SELECT id, title, embedding <=> '[0.1, ...]' AS distance
FROM documents
WHERE category = 'technology'  -- Pre-filter
  AND publish_date > '2024-01-01'
ORDER BY embedding <=> '[0.1, ...]'
LIMIT 10;

Multi-Condition Filtering

-- Complex filtering + vector search
SELECT
    id,
    title,
    author,
    price,
    embedding <-> '[0.2, ...]' AS distance
FROM books
WHERE category IN ('science', 'technology')
  AND price < 50.00
  AND rating >= 4.0
  AND in_stock = TRUE
ORDER BY embedding <-> '[0.2, ...]'
LIMIT 5;

Distance Threshold Filtering

-- Only return results within distance threshold
SELECT id, title, embedding <=> '[0.1, ...]' AS distance
FROM documents
WHERE (embedding <=> '[0.1, ...]') < 0.5  -- Similarity threshold
ORDER BY embedding <=> '[0.1, ...]'
LIMIT 100;
CREATE TABLE restaurants (
    restaurant_id INT PRIMARY KEY,
    name TEXT,
    cuisine TEXT,
    latitude FLOAT,
    longitude FLOAT,
    description_embedding VECTOR(384)
);

-- Find restaurants matching "cozy italian place"
-- within 5 miles of user location (40.7128, -74.0060)
SELECT
    name,
    cuisine,
    description_embedding <=> '[0.12, -0.34, ...]' AS relevance,
    SQRT(POW(latitude - 40.7128, 2) + POW(longitude - (-74.0060), 2)) * 69 AS miles
FROM restaurants
WHERE cuisine = 'Italian'
  AND (SQRT(POW(latitude - 40.7128, 2) + POW(longitude - (-74.0060), 2)) * 69) < 5
ORDER BY description_embedding <=> '[0.12, -0.34, ...]'
LIMIT 10;

Advanced Search Patterns

Hybrid Search: Keyword + Vector

-- Combine full-text search with vector similarity
SELECT
    id,
    title,
    embedding <=> '[0.1, ...]' AS vector_score,
    CASE
        WHEN title LIKE '%machine learning%' THEN 0.5
        ELSE 1.0
    END AS keyword_penalty
FROM documents
WHERE content LIKE '%neural network%'  -- Keyword filter
ORDER BY (embedding <=> '[0.1, ...]') * keyword_penalty
LIMIT 10;
-- Search across multiple vector types
SELECT
    product_id,
    name,
    text_embedding <=> '[0.1, ...]' AS text_score,
    image_embedding <-> '[0.2, ...]' AS image_score,
    (text_embedding <=> '[0.1, ...]') +
    (image_embedding <-> '[0.2, ...]') AS combined_score
FROM products
ORDER BY combined_score
LIMIT 10;
-- Weight different embedding types
SELECT
    product_id,
    name,
    (0.7 * (text_embedding <=> '[0.1, ...]')) +
    (0.3 * (image_embedding <-> '[0.2, ...]')) AS weighted_score
FROM products
ORDER BY weighted_score
LIMIT 10;

7. Product Quantization

What is Product Quantization?

Product Quantization (PQ) is a compression technique that reduces vector storage by 8-384x with minimal accuracy loss (<5% typically).

How PQ Works

Original Vector (768 dimensions):
[0.123, -0.456, 0.789, ..., 0.234]  → 768 × 4 bytes = 3,072 bytes

Product Quantization:
1. Split into M=8 sub-vectors (96 dimensions each)
2. Map each sub-vector to nearest centroid (1 byte code)
3. Store 8 codes instead of 768 floats

Quantized: [42, 17, 89, 123, 5, 67, 201, 34]  → 8 bytes

Compression: 3,072 bytes → 8 bytes = 384x reduction

Key Concepts: - M (sub-quantizers): Number of chunks (typical: 8-64) - K (centroids): Codebook size per chunk (typical: 256 for 1-byte codes) - Codebook: Learned lookup table mapping codes to vectors - ADC (Asymmetric Distance Computation): Fast approximate distance

Memory Optimization Benefits

Vector Dim Original Size PQ Size (M=8) Compression 1M Vectors
384 1,536 bytes 8 bytes 192x 1.5 GB → 8 MB
768 3,072 bytes 8 bytes 384x 3 GB → 8 MB
1536 6,144 bytes 8 bytes 768x 6 GB → 8 MB

When to Use PQ: - Large datasets (>100K vectors) with memory constraints - Edge deployments (IoT, mobile) - Cost optimization (smaller instances) - Can tolerate 2-5% accuracy loss

Creating PQ Indexes

Basic PQ Index

-- Create HNSW index with Product Quantization
CREATE INDEX products_idx
ON products(embedding)
USING hnsw
WITH (quantization='product');

With Custom Sub-Quantizers

-- Control compression ratio via sub-quantizers
-- More sub-quantizers = better accuracy, less compression

-- High compression (M=8, dimension must be divisible by 8)
CREATE INDEX idx_compress
ON docs(embedding)
USING hnsw
WITH (quantization='product', pq_subquantizers=8);

-- Balanced (M=16, dimension must be divisible by 16)
CREATE INDEX idx_balanced
ON docs(embedding)
USING hnsw
WITH (quantization='product', pq_subquantizers=16);

-- High accuracy (M=32, dimension must be divisible by 32)
CREATE INDEX idx_accurate
ON docs(embedding)
USING hnsw
WITH (quantization='product', pq_subquantizers=32);

With Custom Centroids

-- Control codebook size (K parameter)
-- More centroids = better accuracy, larger codebook

-- Standard (K=256, 1-byte codes)
CREATE INDEX idx
ON docs(embedding)
USING hnsw
WITH (quantization='product', pq_centroids=256);

-- Smaller codebook (K=128, faster but less accurate)
CREATE INDEX idx
ON docs(embedding)
USING hnsw
WITH (quantization='product', pq_centroids=128);

PQ Configuration Guidelines

Choosing M (Sub-Quantizers)

Dimension Recommended M Compression Accuracy Loss
128 8 64x <3%
256 8 128x <3%
384 8 192x <5%
768 8-16 384x-192x <5%
1536 16-32 384x-192x <5%

Rules: - Dimension must be divisible by M - Start with M=8 for most cases - Increase M if accuracy is critical - Lower M for maximum compression

Training Requirements

PQ requires training data to build codebooks:

-- Minimum vectors needed for training
-- Rule of thumb: 10,000+ vectors for quality codebooks

-- Small dataset (<10K): PQ may not be worth it
-- Medium dataset (10K-100K): PQ beneficial, use M=8
-- Large dataset (>100K): PQ highly recommended, M=8-16

Trade-offs

Aspect Without PQ With PQ (M=8, K=256)
Memory 3,072 bytes (768-dim) 8 bytes
Compression 1x 384x
Query Speed Baseline 1.2-1.5x faster (less memory bandwidth)
Accuracy 100% 95-98% (Recall@10)
Index Build Faster Slower (codebook training)
Disk Storage Larger Much smaller
-- Scenario: 10 million documents, 768-dim embeddings
-- Without PQ: 10M × 3KB = 30 GB RAM
-- With PQ (M=8): 10M × 8 bytes = 80 MB RAM (375x reduction)

CREATE TABLE documents (
    doc_id INT PRIMARY KEY,
    title TEXT,
    content TEXT,
    embedding VECTOR(768)
);

-- Create PQ-compressed HNSW index
CREATE INDEX docs_pq_idx
ON documents(embedding)
USING hnsw
WITH (
    quantization='product',
    pq_subquantizers=8,
    pq_centroids=256
);

-- Queries work identically (compression is transparent)
SELECT
    doc_id,
    title,
    embedding <=> '[0.1, 0.2, ...]' AS distance
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 10;

Monitoring PQ Performance

-- Check index configuration and memory usage
SELECT
    index_name,
    dimensions,
    quantization_type,
    memory_bytes,
    compression_ratio
FROM pg_vector_index_stats()
WHERE index_name = 'docs_pq_idx';

Expected Output:

index_name    | dimensions | quantization_type | memory_bytes | compression_ratio
--------------|------------|------------------|--------------|------------------
docs_pq_idx   | 768        | product          | 83886080     | 384.0


8. Real-World Examples

Build a semantic search engine for a knowledge base with 100K articles.

Schema Design

CREATE TABLE knowledge_base (
    article_id INT PRIMARY KEY,
    title TEXT NOT NULL,
    content TEXT NOT NULL,
    category TEXT,
    author TEXT,
    publish_date DATE,
    view_count INT DEFAULT 0,

    -- 384-dim embedding from all-MiniLM-L6-v2
    title_embedding VECTOR(384),
    content_embedding VECTOR(768)  -- all-mpnet-base-v2 for content
);

-- Create indexes
CREATE INDEX kb_title_idx ON knowledge_base USING hnsw (title_embedding);
CREATE INDEX kb_content_idx ON knowledge_base USING hnsw (content_embedding);

-- Traditional indexes for filtering
CREATE INDEX kb_category_idx ON knowledge_base(category);
CREATE INDEX kb_date_idx ON knowledge_base(publish_date);

Indexing Pipeline (Python)

from sentence_transformers import SentenceTransformer
import heliosdb

# Load models
title_model = SentenceTransformer('all-MiniLM-L6-v2')
content_model = SentenceTransformer('all-mpnet-base-v2')

# Connect to database
db = heliosdb.connect("knowledge.db")

# Sample articles
articles = [
    {
        "id": 1,
        "title": "Introduction to Machine Learning",
        "content": "Machine learning is a subset of artificial intelligence...",
        "category": "AI",
        "author": "Alice Smith",
        "publish_date": "2024-01-15"
    },
    # ... more articles
]

# Generate embeddings in batches
batch_size = 32
for i in range(0, len(articles), batch_size):
    batch = articles[i:i+batch_size]

    # Generate embeddings
    titles = [a["title"] for a in batch]
    contents = [a["content"][:512] for a in batch]  # Truncate long content

    title_embeddings = title_model.encode(titles)
    content_embeddings = content_model.encode(contents)

    # Insert into database
    for article, title_emb, content_emb in zip(batch, title_embeddings, content_embeddings):
        title_str = "[" + ",".join(map(str, title_emb)) + "]"
        content_str = "[" + ",".join(map(str, content_emb)) + "]"

        db.execute("""
            INSERT INTO knowledge_base
            (article_id, title, content, category, author, publish_date,
             title_embedding, content_embedding)
            VALUES (?, ?, ?, ?, ?, ?, ?, ?)
        """, (
            article["id"], article["title"], article["content"],
            article["category"], article["author"], article["publish_date"],
            title_str, content_str
        ))

    db.commit()
    print(f"Indexed {min(i+batch_size, len(articles))} / {len(articles)} articles")

Search Implementation

def search_knowledge_base(query, category=None, top_k=10):
    """
    Semantic search with optional category filter.
    """
    # Generate query embedding
    query_embedding = content_model.encode(query)
    query_str = "[" + ",".join(map(str, query_embedding)) + "]"

    # Build SQL query
    sql = """
        SELECT
            article_id,
            title,
            category,
            author,
            publish_date,
            content_embedding <=> ? AS relevance_score
        FROM knowledge_base
    """

    params = [query_str, query_str]

    # Add category filter if specified
    if category:
        sql += " WHERE category = ?"
        params.append(category)

    sql += """
        ORDER BY content_embedding <=> ?
        LIMIT ?
    """
    params.extend([query_str, top_k])

    # Execute query
    results = db.execute(sql, params).fetchall()

    return [
        {
            "article_id": r[0],
            "title": r[1],
            "category": r[2],
            "author": r[3],
            "publish_date": r[4],
            "relevance": 1 - r[5]  # Convert distance to similarity
        }
        for r in results
    ]

# Example usage
results = search_knowledge_base(
    query="How does neural network training work?",
    category="AI",
    top_k=5
)

for r in results:
    print(f"{r['title']} (relevance: {r['relevance']:.3f})")

Output:

Introduction to Neural Networks (relevance: 0.923)
Backpropagation Algorithm Explained (relevance: 0.887)
Deep Learning Training Techniques (relevance: 0.845)
Gradient Descent Optimization (relevance: 0.812)
Neural Network Architectures (relevance: 0.798)

Example 2: Product Recommendations

Build a recommendation engine for e-commerce using collaborative filtering.

Schema Design

-- User preference vectors (learned from purchase history)
CREATE TABLE user_preferences (
    user_id INT PRIMARY KEY,
    username TEXT,
    preference_vector VECTOR(128),
    last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Product feature vectors
CREATE TABLE products (
    product_id INT PRIMARY KEY,
    name TEXT NOT NULL,
    description TEXT,
    category TEXT,
    price DECIMAL(10, 2),
    in_stock BOOLEAN DEFAULT TRUE,
    popularity_score FLOAT DEFAULT 0.0,

    -- Combined feature vector (category + attributes + user preferences)
    feature_vector VECTOR(128)
);

-- Create HNSW index for fast recommendations
CREATE INDEX products_feature_idx
ON products(feature_vector)
USING hnsw
WITH (quantization='product', pq_subquantizers=8);

-- Purchase history for training
CREATE TABLE purchases (
    purchase_id INT PRIMARY KEY,
    user_id INT,
    product_id INT,
    purchase_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    rating INT,  -- 1-5 stars
    FOREIGN KEY (user_id) REFERENCES user_preferences(user_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

Recommendation Query

-- Get personalized recommendations for user 42
-- Combine vector similarity with business rules

SELECT
    p.product_id,
    p.name,
    p.price,
    p.category,
    p.popularity_score,

    -- Vector similarity score
    p.feature_vector <#> u.preference_vector AS match_score,

    -- Combined ranking score
    (
        -- 70% vector similarity
        (p.feature_vector <#> u.preference_vector) * 0.7 +

        -- 20% popularity
        (p.popularity_score * 0.2) +

        -- 10% recency bonus (newly added products)
        CASE
            WHEN p.created_at > CURRENT_DATE - INTERVAL '7 days' THEN 0.1
            ELSE 0.0
        END
    ) AS final_score

FROM products p, user_preferences u
WHERE u.user_id = 42
  AND p.in_stock = TRUE
  AND p.price > 0

  -- Exclude already purchased items
  AND p.product_id NOT IN (
      SELECT product_id
      FROM purchases
      WHERE user_id = 42
  )

ORDER BY final_score
LIMIT 20;

Python Implementation

def get_recommendations(user_id, max_price=None, category=None, limit=20):
    """
    Get personalized product recommendations.
    """
    sql = """
        SELECT
            p.product_id,
            p.name,
            p.price,
            p.category,
            p.feature_vector <#> u.preference_vector AS score
        FROM products p, user_preferences u
        WHERE u.user_id = ?
          AND p.in_stock = TRUE
          AND p.product_id NOT IN (
              SELECT product_id FROM purchases WHERE user_id = ?
          )
    """

    params = [user_id, user_id]

    if max_price:
        sql += " AND p.price <= ?"
        params.append(max_price)

    if category:
        sql += " AND p.category = ?"
        params.append(category)

    sql += " ORDER BY score LIMIT ?"
    params.append(limit)

    results = db.execute(sql, params).fetchall()

    return [
        {
            "product_id": r[0],
            "name": r[1],
            "price": float(r[2]),
            "category": r[3],
            "score": float(r[4])
        }
        for r in results
    ]

# Example usage
recommendations = get_recommendations(
    user_id=42,
    max_price=100.00,
    category="Electronics",
    limit=10
)

for rec in recommendations:
    print(f"{rec['name']} - ${rec['price']:.2f} (score: {rec['score']:.3f})")

Example 3: RAG Pipeline for LLM

Build a Retrieval Augmented Generation system for answering questions with LLM + context.

Schema Design

-- Document chunks for retrieval
CREATE TABLE document_chunks (
    chunk_id INT PRIMARY KEY,
    document_id INT,
    document_title TEXT,
    chunk_text TEXT NOT NULL,
    chunk_index INT,  -- Position in document

    -- Embedding for semantic retrieval
    embedding VECTOR(1536),  -- OpenAI ada-002

    -- Metadata for filtering
    source TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Create HNSW index
CREATE INDEX chunks_embedding_idx
ON document_chunks(embedding)
USING hnsw;

RAG Implementation (Python)

import openai
from typing import List, Dict

# Initialize OpenAI
openai.api_key = "your-api-key"

def embed_query(query: str) -> List[float]:
    """Generate embedding for query."""
    response = openai.Embedding.create(
        input=query,
        model="text-embedding-ada-002"
    )
    return response['data'][0]['embedding']

def retrieve_context(query: str, top_k: int = 5) -> List[Dict]:
    """Retrieve relevant document chunks."""
    # Generate query embedding
    query_emb = embed_query(query)
    query_str = "[" + ",".join(map(str, query_emb)) + "]"

    # Search for similar chunks
    results = db.execute("""
        SELECT
            chunk_id,
            document_title,
            chunk_text,
            embedding <=> ? AS relevance
        FROM document_chunks
        ORDER BY embedding <=> ?
        LIMIT ?
    """, (query_str, query_str, top_k)).fetchall()

    return [
        {
            "chunk_id": r[0],
            "title": r[1],
            "text": r[2],
            "relevance": 1 - r[3]
        }
        for r in results
    ]

def generate_answer(query: str, context_chunks: List[Dict]) -> str:
    """Generate answer using LLM with retrieved context."""
    # Build context string
    context = "\n\n".join([
        f"[Document: {c['title']}]\n{c['text']}"
        for c in context_chunks
    ])

    # Create prompt
    prompt = f"""Answer the question based on the following context:

Context:
{context}

Question: {query}

Answer:"""

    # Generate answer
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a helpful assistant that answers questions based on provided context."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.7,
        max_tokens=500
    )

    return response['choices'][0]['message']['content']

def rag_query(query: str, top_k: int = 5) -> Dict:
    """Complete RAG pipeline: retrieve + generate."""
    # Retrieve relevant context
    context_chunks = retrieve_context(query, top_k)

    # Generate answer
    answer = generate_answer(query, context_chunks)

    return {
        "query": query,
        "answer": answer,
        "sources": [
            {"title": c["title"], "relevance": c["relevance"]}
            for c in context_chunks
        ]
    }

# Example usage
result = rag_query("How does HNSW indexing work?")

print(f"Query: {result['query']}")
print(f"Answer: {result['answer']}")
print("\nSources:")
for source in result['sources']:
    print(f"  - {source['title']} (relevance: {source['relevance']:.3f})")

9. Performance Tuning

Index Optimization

Build-Time vs Query-Time Tradeoffs

-- Fast build, lower recall (~90%)
CREATE INDEX fast_idx ON docs(embedding)
USING hnsw;  -- Default: m=16, ef_construction=200

-- Slow build, higher recall (~98%)
-- (If future version supports parameters)
-- CREATE INDEX accurate_idx ON docs(embedding)
-- USING hnsw WITH (m=32, ef_construction=400);

When to Rebuild Indexes

Rebuild indexes when: 1. Dataset grows >2x since index creation 2. Significant data distribution changes 3. Query accuracy degrades over time

-- Rebuild index
DROP INDEX docs_embedding_idx;
CREATE INDEX docs_embedding_idx ON docs(embedding) USING hnsw;

Batch Operations

Efficient Bulk Insertion

# BAD: Individual inserts (slow)
for doc in documents:
    db.execute("INSERT INTO docs VALUES (?, ?)", (doc.id, doc.embedding))
    db.commit()  # Commit each row

# GOOD: Batch inserts
db.execute("BEGIN TRANSACTION")
for doc in documents:
    db.execute("INSERT INTO docs VALUES (?, ?)", (doc.id, doc.embedding))
db.execute("COMMIT")  # Commit once

# BETTER: Prepared statement with executemany
values = [(doc.id, doc.embedding_str) for doc in documents]
db.executemany("INSERT INTO docs VALUES (?, ?)", values)
db.commit()

Optimal Batch Sizes

Operation Recommended Batch Size Reason
Embedding generation 32-64 GPU memory limits
Database inserts 1000-10000 Transaction overhead
Vector searches 1-10 Parallel queries limited

Query Optimization

Use LIMIT Always

-- BAD: No limit (retrieves all results)
SELECT * FROM docs
ORDER BY embedding <=> '[0.1, ...]';

-- GOOD: Limit results
SELECT * FROM docs
ORDER BY embedding <=> '[0.1, ...]'
LIMIT 10;

Pre-Filter with WHERE

-- BAD: Filter after vector search
SELECT * FROM docs
ORDER BY embedding <=> '[0.1, ...]'
LIMIT 10
-- Post-processing in application: filter by category

-- GOOD: Filter before vector search
SELECT * FROM docs
WHERE category = 'technology'
ORDER BY embedding <=> '[0.1, ...]'
LIMIT 10;

Avoid Computing Distance Twice

-- BAD: Compute distance twice
SELECT
    id,
    title,
    embedding <=> '[0.1, ...]' AS score
FROM docs
ORDER BY embedding <=> '[0.1, ...]'  -- Computed again
LIMIT 10;

-- GOOD: Reference distance column
SELECT
    id,
    title,
    embedding <=> '[0.1, ...]' AS score
FROM docs
ORDER BY score  -- Use computed column
LIMIT 10;

Memory Management

Monitor Memory Usage

-- Check index memory consumption
SELECT
    index_name,
    dimensions,
    vector_count,
    memory_bytes / 1024 / 1024 AS memory_mb
FROM pg_vector_index_stats();

Memory-Saving Strategies

  1. Use Product Quantization

    CREATE INDEX idx ON docs(embedding)
    USING hnsw WITH (quantization='product');
    

  2. Lower dimensions

    # Use smaller embedding model
    model = SentenceTransformer('all-MiniLM-L6-v2')  # 384-dim
    # Instead of 'all-mpnet-base-v2' (768-dim)
    

  3. Partition large tables

    -- Split by date for time-series data
    CREATE TABLE docs_2024_q1 (embedding VECTOR(384), ...);
    CREATE TABLE docs_2024_q2 (embedding VECTOR(384), ...);
    

Hardware Optimization

CPU Considerations

  • SIMD Support: Ensure AVX2 (x86) or NEON (ARM) enabled

    # Check CPU features
    lscpu | grep -i avx2  # x86_64
    lscpu | grep -i neon  # ARM
    

  • Core Count: More cores = faster parallel searches

  • 4 cores: Good for small datasets
  • 8+ cores: Better for concurrent queries

Memory Considerations

Dataset Size Minimum RAM Recommended RAM
10K vectors 512 MB 1 GB
100K vectors 2 GB 4 GB
1M vectors 8 GB 16 GB
10M vectors 32 GB 64 GB

Storage Considerations

  • SSD vs HDD: SSD strongly recommended
  • SSD: <10ms latency for index loads
  • HDD: 100-1000ms latency (100x slower)

  • Disk Space:

  • Without PQ: ~100 bytes/vector
  • With PQ: ~20-50 bytes/vector

10. Monitoring

pg_vector_index_stats Function

Query index statistics and health metrics.

Basic Usage

-- Get all vector index stats
SELECT * FROM pg_vector_index_stats();

Output Columns: - index_name: Name of the index - table_name: Table the index belongs to - dimensions: Vector dimensionality - vector_count: Number of vectors indexed - memory_bytes: Memory consumed by index - quantization_type: Compression type (none, product) - distance_metric: Distance function used

Example Output

index_name          | table_name | dimensions | vector_count | memory_bytes | quantization_type | distance_metric
--------------------|------------|------------|--------------|--------------|-------------------|----------------
docs_embedding_idx  | documents  | 384        | 50000        | 8388608      | none              | cosine
products_vec_idx    | products   | 768        | 100000       | 8388608      | product           | l2

Monitoring Queries

Index Size by Table

SELECT
    table_name,
    index_name,
    dimensions,
    vector_count,
    memory_bytes / 1024 / 1024 AS memory_mb,
    (memory_bytes / vector_count) AS bytes_per_vector
FROM pg_vector_index_stats()
ORDER BY memory_bytes DESC;

Compression Effectiveness

SELECT
    index_name,
    quantization_type,
    dimensions,
    memory_bytes / 1024 / 1024 AS memory_mb,

    -- Calculate compression ratio
    CASE
        WHEN quantization_type = 'product' THEN
            (dimensions * 4.0) / 8.0  -- 4 bytes per float -> 8 bytes PQ
        ELSE
            1.0
    END AS compression_ratio
FROM pg_vector_index_stats();

Identify Large Indexes

-- Find indexes using >1 GB
SELECT
    index_name,
    table_name,
    memory_bytes / 1024 / 1024 / 1024 AS memory_gb
FROM pg_vector_index_stats()
WHERE memory_bytes > 1073741824  -- 1 GB
ORDER BY memory_bytes DESC;

Performance Metrics

Query Latency Tracking

import time

def measure_query_latency(query_embedding, top_k=10, iterations=100):
    """Measure average query latency."""
    query_str = "[" + ",".join(map(str, query_embedding)) + "]"

    latencies = []
    for _ in range(iterations):
        start = time.perf_counter()

        results = db.execute("""
            SELECT id, embedding <-> ? AS dist
            FROM documents
            ORDER BY embedding <-> ?
            LIMIT ?
        """, (query_str, query_str, top_k)).fetchall()

        end = time.perf_counter()
        latencies.append((end - start) * 1000)  # Convert to ms

    avg_latency = sum(latencies) / len(latencies)
    p50 = sorted(latencies)[len(latencies) // 2]
    p99 = sorted(latencies)[int(len(latencies) * 0.99)]

    return {
        "avg_ms": avg_latency,
        "p50_ms": p50,
        "p99_ms": p99,
        "min_ms": min(latencies),
        "max_ms": max(latencies)
    }

# Example usage
metrics = measure_query_latency(query_embedding, top_k=10)
print(f"Average latency: {metrics['avg_ms']:.2f} ms")
print(f"P50 latency: {metrics['p50_ms']:.2f} ms")
print(f"P99 latency: {metrics['p99_ms']:.2f} ms")

Throughput Measurement

import concurrent.futures
import time

def measure_throughput(query_embeddings, top_k=10, workers=4):
    """Measure queries per second with parallel execution."""

    def run_query(query_emb):
        query_str = "[" + ",".join(map(str, query_emb)) + "]"
        results = db.execute("""
            SELECT id FROM documents
            ORDER BY embedding <-> ?
            LIMIT ?
        """, (query_str, top_k)).fetchall()
        return len(results)

    start = time.perf_counter()

    with concurrent.futures.ThreadPoolExecutor(max_workers=workers) as executor:
        results = list(executor.map(run_query, query_embeddings))

    end = time.perf_counter()
    elapsed = end - start
    qps = len(query_embeddings) / elapsed

    return {
        "total_queries": len(query_embeddings),
        "elapsed_seconds": elapsed,
        "queries_per_second": qps,
        "workers": workers
    }

# Example usage
throughput = measure_throughput(query_embeddings, workers=8)
print(f"Throughput: {throughput['queries_per_second']:.1f} QPS")

Health Checks

Index Integrity Check

-- Verify index exists and has expected vectors
SELECT
    index_name,
    vector_count,
    CASE
        WHEN vector_count > 0 THEN 'OK'
        ELSE 'EMPTY'
    END AS status
FROM pg_vector_index_stats();

Dimension Mismatch Detection

-- Check for dimension inconsistencies
-- (This would be a custom query checking actual data)
SELECT
    'documents' AS table_name,
    COUNT(*) AS total_rows,
    COUNT(embedding) AS non_null_embeddings
FROM documents;

Alerting Thresholds

Recommended alert thresholds:

Metric Warning Critical
Query latency (p99) >50ms >200ms
Index memory usage >80% available >95% available
Query error rate >1% >5%
QPS throughput <100 QPS <10 QPS

Summary

This tutorial covered:

  1. Introduction: What vector search is and why it matters
  2. Vector Basics: Embeddings, dimensions, and distance metrics
  3. Creating Tables: VECTOR type and schema design
  4. Generating Embeddings: OpenAI, sentence-transformers, and batch processing
  5. HNSW Indexing: Graph-based approximate search with configurable parameters
  6. Similarity Search: KNN queries with L2, cosine, and inner product operators
  7. Product Quantization: 8-384x memory compression for large-scale deployments
  8. Real-World Examples: Document search, recommendations, and RAG pipelines
  9. Performance Tuning: Index optimization, batching, and hardware considerations
  10. Monitoring: pg_vector_index_stats, latency tracking, and health checks

Next Steps

  • Try the examples: Implement a semantic search system with your own data
  • Experiment with parameters: Test different dimensions and distance metrics
  • Optimize for your use case: Profile queries and tune indexes
  • Explore advanced features: Multi-modal search, hybrid ranking, custom embeddings

Additional Resources

Troubleshooting

Common Issues:

  1. Dimension mismatch errors
  2. Ensure embedding dimension matches VECTOR(n) declaration
  3. Verify all embeddings from same model

  4. Slow queries

  5. Add HNSW index if missing
  6. Use LIMIT to cap results
  7. Consider Product Quantization for large datasets

  8. Memory issues

  9. Enable Product Quantization (384x compression)
  10. Use smaller embedding dimensions
  11. Partition large tables

  12. Low accuracy

  13. Increase ef_construction (if configurable)
  14. Use higher m parameter (more connections)
  15. Disable PQ or increase sub-quantizers

Document Version: 1.0 Last Updated: 2025-12-01 Feedback: Please report issues or suggestions via GitHub issues