Vector Search Tutorial¶

Version: 1.0 Created: 2025-12-01 Target Audience: Developers building AI/ML applications Prerequisites: Basic SQL knowledge, understanding of embeddings Estimated Time: 45-60 minutes

Table of Contents¶

Introduction
Vector Basics
Creating Vector Tables
Generating Embeddings
HNSW Indexing
Similarity Search
Product Quantization
Real-World Examples
Performance Tuning
Monitoring

1. Introduction¶

What is Vector Search?¶

Vector search (also known as semantic search or similarity search) enables AI applications to find similar items based on meaning rather than exact keyword matches. Instead of searching for exact text, vector search compares mathematical representations (embeddings) of content to find semantically similar items.

Traditional Keyword Search vs Vector Search:

Query: "automobile repair"

Keyword Search Results:
✓ "automobile repair shop"
✗ "car maintenance service"  (missed - different words)
✗ "vehicle fixing center"    (missed - different words)

Vector Search Results:
✓ "automobile repair shop"   (exact match)
✓ "car maintenance service"  (semantically similar)
✓ "vehicle fixing center"    (semantically similar)

Use Cases¶

Use Case	Description	Typical Dimensions
Semantic Search	Find documents by meaning, not keywords	384-768
Recommendations	Suggest similar products, articles, or content	128-512
RAG (Retrieval Augmented Generation)	Retrieve relevant context for LLM prompts	384-1536
Image Similarity	Find visually similar images	512-2048
Anomaly Detection	Identify outliers in time-series or sensor data	64-256
Duplicate Detection	Find near-duplicate content	256-768

How HeliosDB-Lite Implements It¶

HeliosDB-Lite provides native vector search capabilities with:

┌─────────────────────────────────────────────────────────────┐
│                 HeliosDB-Lite Vector Search                 │
├─────────────────────────────────────────────────────────────┤
│  SQL Interface: VECTOR type, distance operators, indexes    │
├─────────────────────────────────────────────────────────────┤
│  HNSW Algorithm: Graph-based approximate nearest neighbors  │
├─────────────────────────────────────────────────────────────┤
│  SIMD Acceleration: AVX2/NEON for 2-6x faster distances     │
├─────────────────────────────────────────────────────────────┤
│  Product Quantization: 384x memory compression             │
├─────────────────────────────────────────────────────────────┤
│  RocksDB Storage: Persistent, crash-safe index storage      │
└─────────────────────────────────────────────────────────────┘

Key Benefits: - Embedded (no separate server required) - Sub-millisecond query latency for millions of vectors - SIMD-accelerated distance calculations - Memory-efficient with Product Quantization - Compatible with pgvector syntax

2. Vector Basics¶

Understanding Embeddings¶

Embeddings are mathematical representations of content (text, images, audio) as fixed-length arrays of floating-point numbers. Each dimension captures semantic information.

Text: "The cat sat on the mat"
       ↓ (embedding model)
Vector: [0.12, -0.45, 0.78, 0.34, -0.12, ...]  (384 dimensions)

Text: "A feline rested on the rug"
       ↓ (embedding model)
Vector: [0.15, -0.42, 0.81, 0.36, -0.09, ...]  (similar values!)

Properties of Good Embeddings: - Similar content → similar vectors (small distance) - Different content → different vectors (large distance) - Fixed dimensionality (all vectors same length) - Typically normalized (unit length) for cosine similarity

Dimension Considerations¶

Choosing the right dimension depends on your use case and embedding model:

Model	Dimension	Use Case	Speed	Accuracy
`all-MiniLM-L6-v2`	384	Fast semantic search	Fastest	Good
`all-mpnet-base-v2`	768	Balanced performance	Fast	Better
`text-embedding-ada-002` (OpenAI)	1536	High accuracy	Medium	Best
`ResNet-50` (images)	2048	Image features	Medium	Excellent

Rules of Thumb: - Higher dimensions: Better accuracy, more memory, slower queries - Lower dimensions: Faster queries, less memory, may lose nuance - Common choices: 384 (fast), 768 (balanced), 1536 (accurate)

Distance Metrics¶

HeliosDB-Lite supports three distance metrics to measure vector similarity:

L2 Distance (Euclidean)¶

Measures straight-line distance in N-dimensional space.

Formula: distance = √(Σ(a[i] - b[i])²)

Example:
Vector A: [1.0, 0.0, 0.0]
Vector B: [0.0, 1.0, 0.0]
L2 Distance: √((1-0)² + (0-1)² + (0-0)²) = √2 ≈ 1.414

Use when: General-purpose similarity, raw feature vectors
Operator: <->
Range: [0, ∞)  (0 = identical, larger = more different)

Cosine Similarity (Cosine Distance)¶

Measures angle between vectors (ignores magnitude).

Formula: distance = 1 - (A·B / (||A|| × ||B||))

Example:
Vector A: [1.0, 0.0, 0.0]  (normalized)
Vector B: [0.707, 0.707, 0.0]  (normalized)
Cosine Distance: 1 - 0.707 ≈ 0.293

Use when: Text embeddings, normalized vectors
Operator: <=>
Range: [0, 2]  (0 = identical, 2 = opposite direction)

Why Cosine for Text? - Text embeddings are typically normalized (unit length) - Cosine ignores magnitude, focuses on semantic direction - Most NLP models optimize for cosine similarity

Inner Product (Dot Product)¶

Measures alignment between vectors (considers both angle and magnitude).

Formula: distance = -(A·B) = -Σ(a[i] × b[i])
         (negative because smaller = better in HeliosDB-Lite)

Example:
Vector A: [1.0, 2.0, 3.0]
Vector B: [4.0, 5.0, 6.0]
Inner Product Distance: -(4 + 10 + 18) = -32

Use when: Recommendation systems, non-normalized vectors
Operator: <#>
Range: (-∞, 0]  (more negative = more similar)

Comparison Table:

Metric	When to Use	Normalized?	SQL Operator
L2	General similarity, image features	No	`<->`
Cosine	Text embeddings, semantic search	Yes (recommended)	`<=>`
Inner Product	Recommendations, scoring	No	`<#>`

3. Creating Vector Tables¶

VECTOR Data Type Syntax¶

The VECTOR(n) data type stores fixed-dimension floating-point arrays.

-- Basic syntax
VECTOR(dimension)

-- Examples
VECTOR(3)     -- 3-dimensional vector
VECTOR(384)   -- Common for MiniLM embeddings
VECTOR(768)   -- Common for BERT embeddings
VECTOR(1536)  -- OpenAI ada-002 embeddings

Example Table Creation¶

Example 1: Simple Document Embeddings¶

-- Semantic search for documents
CREATE TABLE documents (
    id INT PRIMARY KEY,
    title TEXT NOT NULL,
    content TEXT NOT NULL,
    embedding VECTOR(384)  -- all-MiniLM-L6-v2 embeddings
);

Example 2: Multi-Column Vectors¶

-- Product catalog with multiple embedding types
CREATE TABLE products (
    product_id INT PRIMARY KEY,
    name TEXT NOT NULL,
    description TEXT,
    price DECIMAL(10, 2),
    category TEXT,

    -- Text embedding for name + description
    text_embedding VECTOR(768),

    -- Image embedding for product photos
    image_embedding VECTOR(512),

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Example 3: User Preferences for Recommendations¶

-- User preference vectors for collaborative filtering
CREATE TABLE user_profiles (
    user_id INT PRIMARY KEY,
    username TEXT NOT NULL,
    preference_vector VECTOR(128),
    last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Item vectors for recommendation matching
CREATE TABLE items (
    item_id INT PRIMARY KEY,
    title TEXT NOT NULL,
    item_vector VECTOR(128),
    popularity_score FLOAT DEFAULT 0.0
);

Example 4: Time-Series Anomaly Detection¶

-- Sensor readings with feature vectors
CREATE TABLE sensor_readings (
    reading_id INT PRIMARY KEY,
    sensor_id TEXT NOT NULL,
    timestamp TIMESTAMP NOT NULL,
    raw_values TEXT,  -- JSON or serialized data
    feature_vector VECTOR(64),  -- Extracted features
    is_anomaly BOOLEAN DEFAULT FALSE
);

Example 5: Image Similarity Search¶

-- Image database with visual features
CREATE TABLE images (
    image_id INT PRIMARY KEY,
    filename TEXT NOT NULL,
    upload_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    tags TEXT,  -- Comma-separated tags

    -- ResNet-50 feature vector
    features VECTOR(2048),

    -- Metadata
    width INT,
    height INT,
    file_size INT
);

Table Design Best Practices¶

Choose dimension based on your embedding model

-- Match the model output dimension exactly
-- sentence-transformers/all-MiniLM-L6-v2 → 384
CREATE TABLE docs (id INT, vec VECTOR(384));

Add metadata columns for filtering

-- Combine vector search with WHERE filters
CREATE TABLE articles (
    id INT PRIMARY KEY,
    category TEXT,        -- For filtering by category
    publish_date DATE,    -- For filtering by date
    author TEXT,          -- For filtering by author
    embedding VECTOR(768)
);

Use appropriate primary keys

-- Auto-incrementing ID for new data
CREATE TABLE logs (
    id SERIAL PRIMARY KEY,
    log_text TEXT,
    embedding VECTOR(384)
);

Consider NULL handling

-- Allow NULL if embeddings are generated asynchronously
CREATE TABLE pending_docs (
    id INT PRIMARY KEY,
    content TEXT NOT NULL,
    embedding VECTOR(384) NULL  -- Generated after insertion
);

4. Generating Embeddings¶

Using OpenAI API (Python)¶

The OpenAI API provides high-quality text embeddings via text-embedding-ada-002 (1536 dimensions).

Installation¶

pip install openai

Basic Usage¶

import openai
import json

# Configure API key
openai.api_key = "your-api-key-here"

def get_embedding(text, model="text-embedding-ada-002"):
    """Generate embedding for a single text string."""
    response = openai.Embedding.create(
        input=text,
        model=model
    )
    return response['data'][0]['embedding']

# Example
text = "The quick brown fox jumps over the lazy dog"
embedding = get_embedding(text)

print(f"Dimension: {len(embedding)}")  # 1536
print(f"First 5 values: {embedding[:5]}")
# Output: [0.0234, -0.0156, 0.0789, -0.0412, 0.0567]

Batch Processing¶

def get_embeddings_batch(texts, model="text-embedding-ada-002"):
    """Generate embeddings for multiple texts (up to 2048 per request)."""
    response = openai.Embedding.create(
        input=texts,
        model=model
    )
    return [item['embedding'] for item in response['data']]

# Process multiple documents
documents = [
    "First document text",
    "Second document text",
    "Third document text"
]

embeddings = get_embeddings_batch(documents)
print(f"Generated {len(embeddings)} embeddings")

Storing in HeliosDB-Lite¶

import heliosdb  # Hypothetical Python client

# Connect to database
db = heliosdb.connect("mydb.db")

# Generate embedding
text = "Sample document content"
embedding = get_embedding(text)

# Convert to SQL array format
embedding_str = "[" + ",".join(map(str, embedding)) + "]"

# Insert into database
db.execute("""
    INSERT INTO documents (id, content, embedding)
    VALUES (?, ?, ?)
""", (1, text, embedding_str))

db.commit()

Using sentence-transformers (Python)¶

The sentence-transformers library provides efficient local embedding generation (no API calls).

Installation¶

pip install sentence-transformers

Basic Usage¶

from sentence_transformers import SentenceTransformer

# Load model (downloads on first use, cached afterward)
model = SentenceTransformer('all-MiniLM-L6-v2')

# Generate single embedding
text = "This is a sample sentence"
embedding = model.encode(text)

print(f"Dimension: {len(embedding)}")  # 384
print(f"Type: {type(embedding)}")      # numpy.ndarray
print(f"First 5 values: {embedding[:5]}")
# Output: [0.0234, -0.0156, 0.0789, -0.0412, 0.0567]

Batch Processing (Efficient)¶

# Process multiple texts in parallel
texts = [
    "First sentence",
    "Second sentence",
    "Third sentence"
]

# Batch encoding (much faster than loop)
embeddings = model.encode(texts, batch_size=32)

print(f"Shape: {embeddings.shape}")  # (3, 384)

# Iterate over results
for i, emb in enumerate(embeddings):
    print(f"Text {i}: dimension={len(emb)}")

Complete Example: Indexing Documents¶

from sentence_transformers import SentenceTransformer
import sqlite3  # or heliosdb client

# Initialize model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Sample documents
documents = [
    {"id": 1, "text": "Python is a programming language"},
    {"id": 2, "text": "Machine learning uses algorithms"},
    {"id": 3, "text": "Databases store structured data"},
    {"id": 4, "text": "Neural networks process information"},
]

# Generate embeddings
texts = [doc["text"] for doc in documents]
embeddings = model.encode(texts, show_progress_bar=True)

# Connect to HeliosDB-Lite
conn = sqlite3.connect("mydb.db")  # Use HeliosDB-Lite client in production
cursor = conn.cursor()

# Create table
cursor.execute("""
    CREATE TABLE IF NOT EXISTS documents (
        id INT PRIMARY KEY,
        text TEXT,
        embedding VECTOR(384)
    )
""")

# Insert documents with embeddings
for doc, emb in zip(documents, embeddings):
    emb_str = "[" + ",".join(map(str, emb)) + "]"
    cursor.execute("""
        INSERT INTO documents (id, text, embedding)
        VALUES (?, ?, ?)
    """, (doc["id"], doc["text"], emb_str))

conn.commit()
print(f"Indexed {len(documents)} documents")

Popular Models Comparison¶

Model	Dimensions	Speed	Quality	Use Case
`all-MiniLM-L6-v2`	384	Fastest	Good	General semantic search
`all-mpnet-base-v2`	768	Fast	Better	Balanced quality/speed
`all-MiniLM-L12-v2`	384	Fast	Good	Slightly better than L6
`multi-qa-mpnet-base-dot-v1`	768	Medium	Best	Q&A and retrieval
`paraphrase-multilingual-mpnet-base-v2`	768	Medium	Best	Multilingual support

Custom Model Fine-Tuning¶

from sentence_transformers import SentenceTransformer, InputExample, losses
from torch.utils.data import DataLoader

# Load base model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Create training data (text pairs with similarity scores)
train_examples = [
    InputExample(texts=["query 1", "relevant doc 1"], label=1.0),
    InputExample(texts=["query 1", "irrelevant doc"], label=0.0),
    # Add more examples...
]

# Create data loader
train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16)

# Define loss function
train_loss = losses.CosineSimilarityLoss(model)

# Fine-tune
model.fit(
    train_objectives=[(train_dataloader, train_loss)],
    epochs=1,
    warmup_steps=100
)

# Save fine-tuned model
model.save("./my-finetuned-model")

# Use fine-tuned model
custom_model = SentenceTransformer("./my-finetuned-model")

Using Other Embedding Providers¶

Cohere API¶

import cohere

co = cohere.Client("your-api-key")

response = co.embed(
    texts=["Text to embed"],
    model="embed-english-v3.0"
)

embeddings = response.embeddings  # List of vectors

Hugging Face Transformers¶

from transformers import AutoTokenizer, AutoModel
import torch

# Load model
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")

# Generate embedding
text = "Sample text"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)

with torch.no_grad():
    outputs = model(**inputs)
    # Use [CLS] token embedding or mean pooling
    embedding = outputs.last_hidden_state[:, 0, :].numpy()[0]

print(f"Dimension: {len(embedding)}")  # 768 for BERT-base

Best Practices for Embedding Generation¶

Batch processing for efficiency

# Good: Process in batches
embeddings = model.encode(texts, batch_size=32)

# Bad: One at a time
embeddings = [model.encode(text) for text in texts]

Normalize vectors for cosine similarity

import numpy as np

# Normalize to unit length
embedding = model.encode(text)
embedding = embedding / np.linalg.norm(embedding)

Cache embeddings to avoid recomputation

# Store embeddings in database, don't regenerate
# Check if embedding exists before generating

Handle long texts appropriately

# Truncate or chunk long documents
max_length = 512  # Model's max tokens

# Option 1: Truncate
text = text[:max_length]

# Option 2: Chunk and average
chunks = [text[i:i+max_length] for i in range(0, len(text), max_length)]
chunk_embeddings = model.encode(chunks)
avg_embedding = np.mean(chunk_embeddings, axis=0)

5. HNSW Indexing¶

What is HNSW?¶

HNSW (Hierarchical Navigable Small World) is a graph-based algorithm for approximate nearest neighbor search. It constructs a multi-layer graph where each layer contains progressively fewer nodes, enabling fast logarithmic-time searches.

How HNSW Works¶

Layer 2:  A ←─────→ D          (Entry point, sparse)
          ↓         ↓
Layer 1:  A ←→ B ←→ D ←→ E      (Medium density)
          ↓    ↓    ↓    ↓
Layer 0:  A ←→ B ←→ C ←→ D ←→ E ←→ F  (All nodes, fully connected)

Search Process:
1. Start at entry point in top layer (Layer 2: node A)
2. Greedily navigate to closest node in current layer (A → D)
3. Drop down to next layer, continue greedy search (D → D → E)
4. Repeat until bottom layer (Layer 0)
5. Perform local search in bottom layer for k neighbors

Key Properties: - Approximate: Finds ~95-99% of true nearest neighbors (configurable) - Fast: O(log N) query time vs O(N) for brute force - Scalable: Handles millions to billions of vectors - Memory-efficient: ~50-100 bytes per vector

Creating HNSW Indexes¶

Basic Index Creation¶

-- Create HNSW index on vector column
CREATE INDEX embedding_idx ON documents USING hnsw (embedding);

With Custom Table¶

-- Full example
CREATE TABLE articles (
    id INT PRIMARY KEY,
    title TEXT,
    content TEXT,
    embedding VECTOR(768)
);

-- Create HNSW index
CREATE INDEX articles_embedding_idx ON articles USING hnsw (embedding);

Multiple Indexes¶

-- Table with multiple vector columns
CREATE TABLE products (
    product_id INT PRIMARY KEY,
    text_embedding VECTOR(768),
    image_embedding VECTOR(512)
);

-- Create separate indexes for each vector column
CREATE INDEX products_text_idx ON products USING hnsw (text_embedding);
CREATE INDEX products_image_idx ON products USING hnsw (image_embedding);

Index Parameters¶

HNSW indexes have two critical parameters that control the accuracy/speed tradeoff:

Parameter: `m` (Max Connections)¶

Number of bidirectional links per node in the graph.

-- Default: m = 16
CREATE INDEX idx ON docs USING hnsw (embedding);

-- Custom m parameter (if supported in future versions)
-- CREATE INDEX idx ON docs USING hnsw (embedding) WITH (m = 32);

Rules of Thumb: - m = 16: Default, good balance (recommended) - m = 8: Faster build, less memory, lower accuracy - m = 32: Slower build, more memory, higher accuracy - m = 64: Very accurate but memory-intensive

Memory Impact: - Memory per vector ≈ m × 8 bytes (for pointers/IDs) - m=16: ~128 bytes per vector - m=32: ~256 bytes per vector

Parameter: `ef_construction` (Construction Search Depth)¶

Size of dynamic candidate list during index construction.

-- Default: ef_construction = 200
CREATE INDEX idx ON docs USING hnsw (embedding);

-- Custom ef_construction (if supported in future versions)
-- CREATE INDEX idx ON docs USING hnsw (embedding) WITH (ef_construction = 400);

Rules of Thumb: - ef_construction = 100: Fast build, lower quality graph - ef_construction = 200: Default, good balance (recommended) - ef_construction = 400: Slower build, higher quality graph - ef_construction = 800: Very slow but maximum quality

Build Time Impact: - Higher ef_construction = slower index builds (linear increase) - But better recall at query time

Recommendations by Dataset Size¶

Dataset Size	m	ef_construction	Expected Recall@10
<10K vectors	16	200	>99%
10K-100K	16	200	>98%
100K-1M	16	200	>95%
1M-10M	32	400	>95%
>10M	32-64	400-800	>93%

When to Create Indexes¶

Strategy 1: Index Before Insertion (Empty Table)

-- Create index first, then insert data
CREATE TABLE docs (id INT, vec VECTOR(384));
CREATE INDEX docs_idx ON docs USING hnsw (vec);

-- Insert data (index updates automatically)
INSERT INTO docs VALUES (1, '[0.1, 0.2, ...]');
INSERT INTO docs VALUES (2, '[0.3, 0.4, ...]');

Pros: Incremental index updates Cons: Slower individual inserts

Strategy 2: Bulk Load Then Index

-- Insert data first
CREATE TABLE docs (id INT, vec VECTOR(384));

-- Bulk insert (fast, no index overhead)
INSERT INTO docs VALUES (1, '[0.1, 0.2, ...]');
INSERT INTO docs VALUES (2, '[0.3, 0.4, ...]');
-- ... (thousands or millions of rows)

-- Create index after all data loaded
CREATE INDEX docs_idx ON docs USING hnsw (vec);

Pros: Faster bulk loading Cons: Index build at end can be slow for large datasets

Recommendation: Use Strategy 2 for initial bulk loads, Strategy 1 for incremental updates.

Index Build Time Estimates¶

Dataset Size	Dimension	Expected Build Time
1,000 vectors	384	<1 second
10,000 vectors	384	1-3 seconds
100,000 vectors	384	10-30 seconds
1,000,000 vectors	384	2-5 minutes
10,000,000 vectors	768	30-60 minutes

Times on commodity hardware (4-core CPU, 16GB RAM)

Verifying Index Creation¶

-- Check indexes on a table
\d documents

-- Expected output:
-- Indexes:
--   "documents_embedding_idx" hnsw (embedding)

6. Similarity Search¶

KNN Queries with <-> Operator (L2 Distance)¶

The <-> operator performs L2 (Euclidean) distance calculation.

Basic KNN Query¶

-- Find 5 nearest neighbors using L2 distance
SELECT id, title, embedding <-> '[0.1, 0.2, 0.3, ...]' AS distance
FROM documents
ORDER BY embedding <-> '[0.1, 0.2, 0.3, ...]'
LIMIT 5;

Output:

id  | title                | distance
----|---------------------|----------
42  | "Machine Learning"  | 0.234
17  | "Neural Networks"   | 0.456
89  | "Deep Learning"     | 0.678
23  | "AI Algorithms"     | 0.789
56  | "Data Science"      | 0.901

With Parameterized Queries¶

# Python example with parameter binding
query_vector = model.encode("machine learning tutorial")
query_str = "[" + ",".join(map(str, query_vector)) + "]"

results = db.execute("""
    SELECT id, title, embedding <-> ? AS distance
    FROM documents
    ORDER BY embedding <-> ?
    LIMIT 10
""", (query_str, query_str))

Practical Example: Document Search¶

-- Setup
CREATE TABLE articles (
    id INT PRIMARY KEY,
    title TEXT,
    author TEXT,
    content TEXT,
    embedding VECTOR(384)
);

CREATE INDEX articles_idx ON articles USING hnsw (embedding);

-- Insert sample data (embeddings pre-generated)
INSERT INTO articles VALUES
    (1, 'Intro to Python', 'Alice', '...', '[0.12, -0.34, ...]'),
    (2, 'Advanced ML', 'Bob', '...', '[0.45, 0.67, ...]'),
    (3, 'Database Design', 'Carol', '...', '[-0.23, 0.56, ...]');

-- Search: Find articles similar to "python programming"
-- (query_embedding generated from "python programming")
SELECT
    id,
    title,
    author,
    embedding <-> '[0.15, -0.30, ...]' AS distance
FROM articles
ORDER BY embedding <-> '[0.15, -0.30, ...]'
LIMIT 5;

Cosine Similarity with <=> Operator¶

The <=> operator calculates cosine distance (1 - cosine similarity).

Basic Cosine Search¶

-- Find 10 most similar documents using cosine distance
SELECT id, content, embedding <=> '[0.1, 0.2, ...]' AS cosine_distance
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 10;

Why Cosine for Text? - Ignores vector magnitude (length), focuses on direction - Most text embedding models produce normalized vectors - Better semantic similarity for text than L2 distance

Converting Distance to Similarity Score¶

-- Cosine similarity = 1 - cosine distance
SELECT
    id,
    title,
    1 - (embedding <=> '[0.1, 0.2, ...]') AS similarity_score
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 10;

Output:

id  | title                | similarity_score
----|---------------------|------------------
42  | "Machine Learning"  | 0.987  (very similar)
17  | "Neural Networks"   | 0.923
89  | "Deep Learning"     | 0.876
23  | "AI Algorithms"     | 0.834
56  | "Data Science"      | 0.801

Practical Example: Semantic Q&A¶

-- FAQ database with questions and answers
CREATE TABLE faqs (
    faq_id INT PRIMARY KEY,
    question TEXT,
    answer TEXT,
    question_embedding VECTOR(384)
);

CREATE INDEX faqs_idx ON faqs USING hnsw (question_embedding);

-- Find most relevant FAQ for user query
-- Query: "How do I reset my password?"
-- (query_embedding pre-generated)

SELECT
    question,
    answer,
    (1 - (question_embedding <=> '[0.23, -0.45, ...]')) AS relevance
FROM faqs
ORDER BY question_embedding <=> '[0.23, -0.45, ...]'
LIMIT 3;

Output:

question                      | answer                     | relevance
------------------------------|----------------------------|----------
"Password reset instructions" | "Go to Settings > Reset"   | 0.95
"Forgot password recovery"    | "Click 'Forgot Password'"  | 0.89
"Change account credentials"  | "Visit Account Settings"   | 0.76

Inner Product with <#> Operator¶

The <#> operator calculates negative inner product (negative because smaller = better).

Basic Inner Product Search¶

-- Find top matches using inner product
SELECT id, name, embedding <#> '[1.0, 0.5, 0.3, ...]' AS score
FROM items
ORDER BY embedding <#> '[1.0, 0.5, 0.3, ...]'
LIMIT 10;

When to Use Inner Product: - Recommendation systems (user-item matching) - Non-normalized vectors - When magnitude matters (popularity, confidence scores)

Practical Example: Product Recommendations¶

-- User preference vectors
CREATE TABLE users (
    user_id INT PRIMARY KEY,
    username TEXT,
    preference_vector VECTOR(128)
);

-- Product feature vectors
CREATE TABLE products (
    product_id INT PRIMARY KEY,
    name TEXT,
    price DECIMAL(10, 2),
    product_vector VECTOR(128)
);

CREATE INDEX products_idx ON products USING hnsw (product_vector);

-- Recommend products for user 42
-- (combining user preference with product features)
SELECT
    p.product_id,
    p.name,
    p.price,
    p.product_vector <#> u.preference_vector AS match_score
FROM products p, users u
WHERE u.user_id = 42
ORDER BY p.product_vector <#> u.preference_vector
LIMIT 20;

Combining with WHERE Filters¶

Combine vector similarity with traditional SQL filters for powerful hybrid search.

Filter Before Vector Search¶

-- Find similar documents in specific category
SELECT id, title, embedding <=> '[0.1, ...]' AS distance
FROM documents
WHERE category = 'technology'  -- Pre-filter
  AND publish_date > '2024-01-01'
ORDER BY embedding <=> '[0.1, ...]'
LIMIT 10;

Multi-Condition Filtering¶

-- Complex filtering + vector search
SELECT
    id,
    title,
    author,
    price,
    embedding <-> '[0.2, ...]' AS distance
FROM books
WHERE category IN ('science', 'technology')
  AND price < 50.00
  AND rating >= 4.0
  AND in_stock = TRUE
ORDER BY embedding <-> '[0.2, ...]'
LIMIT 5;

Distance Threshold Filtering¶

-- Only return results within distance threshold
SELECT id, title, embedding <=> '[0.1, ...]' AS distance
FROM documents
WHERE (embedding <=> '[0.1, ...]') < 0.5  -- Similarity threshold
ORDER BY embedding <=> '[0.1, ...]'
LIMIT 100;

Practical Example: Location + Semantic Search¶

CREATE TABLE restaurants (
    restaurant_id INT PRIMARY KEY,
    name TEXT,
    cuisine TEXT,
    latitude FLOAT,
    longitude FLOAT,
    description_embedding VECTOR(384)
);

-- Find restaurants matching "cozy italian place"
-- within 5 miles of user location (40.7128, -74.0060)
SELECT
    name,
    cuisine,
    description_embedding <=> '[0.12, -0.34, ...]' AS relevance,
    SQRT(POW(latitude - 40.7128, 2) + POW(longitude - (-74.0060), 2)) * 69 AS miles
FROM restaurants
WHERE cuisine = 'Italian'
  AND (SQRT(POW(latitude - 40.7128, 2) + POW(longitude - (-74.0060), 2)) * 69) < 5
ORDER BY description_embedding <=> '[0.12, -0.34, ...]'
LIMIT 10;

Advanced Search Patterns¶

Hybrid Search: Keyword + Vector¶

-- Combine full-text search with vector similarity
SELECT
    id,
    title,
    embedding <=> '[0.1, ...]' AS vector_score,
    CASE
        WHEN title LIKE '%machine learning%' THEN 0.5
        ELSE 1.0
    END AS keyword_penalty
FROM documents
WHERE content LIKE '%neural network%'  -- Keyword filter
ORDER BY (embedding <=> '[0.1, ...]') * keyword_penalty
LIMIT 10;

Multi-Vector Search¶

-- Search across multiple vector types
SELECT
    product_id,
    name,
    text_embedding <=> '[0.1, ...]' AS text_score,
    image_embedding <-> '[0.2, ...]' AS image_score,
    (text_embedding <=> '[0.1, ...]') +
    (image_embedding <-> '[0.2, ...]') AS combined_score
FROM products
ORDER BY combined_score
LIMIT 10;

Weighted Multi-Vector Search¶

-- Weight different embedding types
SELECT
    product_id,
    name,
    (0.7 * (text_embedding <=> '[0.1, ...]')) +
    (0.3 * (image_embedding <-> '[0.2, ...]')) AS weighted_score
FROM products
ORDER BY weighted_score
LIMIT 10;

7. Product Quantization¶

What is Product Quantization?¶

Product Quantization (PQ) is a compression technique that reduces vector storage by 8-384x with minimal accuracy loss (<5% typically).

How PQ Works¶

Original Vector (768 dimensions):
[0.123, -0.456, 0.789, ..., 0.234]  → 768 × 4 bytes = 3,072 bytes

Product Quantization:
1. Split into M=8 sub-vectors (96 dimensions each)
2. Map each sub-vector to nearest centroid (1 byte code)
3. Store 8 codes instead of 768 floats

Quantized: [42, 17, 89, 123, 5, 67, 201, 34]  → 8 bytes

Compression: 3,072 bytes → 8 bytes = 384x reduction

Key Concepts: - M (sub-quantizers): Number of chunks (typical: 8-64) - K (centroids): Codebook size per chunk (typical: 256 for 1-byte codes) - Codebook: Learned lookup table mapping codes to vectors - ADC (Asymmetric Distance Computation): Fast approximate distance

Memory Optimization Benefits¶

Vector Dim	Original Size	PQ Size (M=8)	Compression	1M Vectors
384	1,536 bytes	8 bytes	192x	1.5 GB → 8 MB
768	3,072 bytes	8 bytes	384x	3 GB → 8 MB
1536	6,144 bytes	8 bytes	768x	6 GB → 8 MB

When to Use PQ: - Large datasets (>100K vectors) with memory constraints - Edge deployments (IoT, mobile) - Cost optimization (smaller instances) - Can tolerate 2-5% accuracy loss

Creating PQ Indexes¶

Basic PQ Index¶

-- Create HNSW index with Product Quantization
CREATE INDEX products_idx
ON products(embedding)
USING hnsw
WITH (quantization='product');

With Custom Sub-Quantizers¶

-- Control compression ratio via sub-quantizers
-- More sub-quantizers = better accuracy, less compression

-- High compression (M=8, dimension must be divisible by 8)
CREATE INDEX idx_compress
ON docs(embedding)
USING hnsw
WITH (quantization='product', pq_subquantizers=8);

-- Balanced (M=16, dimension must be divisible by 16)
CREATE INDEX idx_balanced
ON docs(embedding)
USING hnsw
WITH (quantization='product', pq_subquantizers=16);

-- High accuracy (M=32, dimension must be divisible by 32)
CREATE INDEX idx_accurate
ON docs(embedding)
USING hnsw
WITH (quantization='product', pq_subquantizers=32);

With Custom Centroids¶

-- Control codebook size (K parameter)
-- More centroids = better accuracy, larger codebook

-- Standard (K=256, 1-byte codes)
CREATE INDEX idx
ON docs(embedding)
USING hnsw
WITH (quantization='product', pq_centroids=256);

-- Smaller codebook (K=128, faster but less accurate)
CREATE INDEX idx
ON docs(embedding)
USING hnsw
WITH (quantization='product', pq_centroids=128);

PQ Configuration Guidelines¶

Choosing M (Sub-Quantizers)¶

Dimension	Recommended M	Compression	Accuracy Loss
128	8	64x	<3%
256	8	128x	<3%
384	8	192x	<5%
768	8-16	384x-192x	<5%
1536	16-32	384x-192x	<5%

Rules: - Dimension must be divisible by M - Start with M=8 for most cases - Increase M if accuracy is critical - Lower M for maximum compression

Training Requirements¶

PQ requires training data to build codebooks:

-- Minimum vectors needed for training
-- Rule of thumb: 10,000+ vectors for quality codebooks

-- Small dataset (<10K): PQ may not be worth it
-- Medium dataset (10K-100K): PQ beneficial, use M=8
-- Large dataset (>100K): PQ highly recommended, M=8-16

Trade-offs¶

Aspect	Without PQ	With PQ (M=8, K=256)
Memory	3,072 bytes (768-dim)	8 bytes
Compression	1x	384x
Query Speed	Baseline	1.2-1.5x faster (less memory bandwidth)
Accuracy	100%	95-98% (Recall@10)
Index Build	Faster	Slower (codebook training)
Disk Storage	Larger	Much smaller

Practical Example: Large-Scale Document Search¶

-- Scenario: 10 million documents, 768-dim embeddings
-- Without PQ: 10M × 3KB = 30 GB RAM
-- With PQ (M=8): 10M × 8 bytes = 80 MB RAM (375x reduction)

CREATE TABLE documents (
    doc_id INT PRIMARY KEY,
    title TEXT,
    content TEXT,
    embedding VECTOR(768)
);

-- Create PQ-compressed HNSW index
CREATE INDEX docs_pq_idx
ON documents(embedding)
USING hnsw
WITH (
    quantization='product',
    pq_subquantizers=8,
    pq_centroids=256
);

-- Queries work identically (compression is transparent)
SELECT
    doc_id,
    title,
    embedding <=> '[0.1, 0.2, ...]' AS distance
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 10;

Monitoring PQ Performance¶

-- Check index configuration and memory usage
SELECT
    index_name,
    dimensions,
    quantization_type,
    memory_bytes,
    compression_ratio
FROM pg_vector_index_stats()
WHERE index_name = 'docs_pq_idx';

Expected Output:

index_name    | dimensions | quantization_type | memory_bytes | compression_ratio
--------------|------------|------------------|--------------|------------------
docs_pq_idx   | 768        | product          | 83886080     | 384.0

8. Real-World Examples¶

Example 1: Semantic Document Search¶

Build a semantic search engine for a knowledge base with 100K articles.

Schema Design¶

CREATE TABLE knowledge_base (
    article_id INT PRIMARY KEY,
    title TEXT NOT NULL,
    content TEXT NOT NULL,
    category TEXT,
    author TEXT,
    publish_date DATE,
    view_count INT DEFAULT 0,

    -- 384-dim embedding from all-MiniLM-L6-v2
    title_embedding VECTOR(384),
    content_embedding VECTOR(768)  -- all-mpnet-base-v2 for content
);

-- Create indexes
CREATE INDEX kb_title_idx ON knowledge_base USING hnsw (title_embedding);
CREATE INDEX kb_content_idx ON knowledge_base USING hnsw (content_embedding);

-- Traditional indexes for filtering
CREATE INDEX kb_category_idx ON knowledge_base(category);
CREATE INDEX kb_date_idx ON knowledge_base(publish_date);

Indexing Pipeline (Python)¶

from sentence_transformers import SentenceTransformer
import heliosdb

# Load models
title_model = SentenceTransformer('all-MiniLM-L6-v2')
content_model = SentenceTransformer('all-mpnet-base-v2')

# Connect to database
db = heliosdb.connect("knowledge.db")

# Sample articles
articles = [
    {
        "id": 1,
        "title": "Introduction to Machine Learning",
        "content": "Machine learning is a subset of artificial intelligence...",
        "category": "AI",
        "author": "Alice Smith",
        "publish_date": "2024-01-15"
    },
    # ... more articles
]

# Generate embeddings in batches
batch_size = 32
for i in range(0, len(articles), batch_size):
    batch = articles[i:i+batch_size]

    # Generate embeddings
    titles = [a["title"] for a in batch]
    contents = [a["content"][:512] for a in batch]  # Truncate long content

    title_embeddings = title_model.encode(titles)
    content_embeddings = content_model.encode(contents)

    # Insert into database
    for article, title_emb, content_emb in zip(batch, title_embeddings, content_embeddings):
        title_str = "[" + ",".join(map(str, title_emb)) + "]"
        content_str = "[" + ",".join(map(str, content_emb)) + "]"

        db.execute("""
            INSERT INTO knowledge_base
            (article_id, title, content, category, author, publish_date,
             title_embedding, content_embedding)
            VALUES (?, ?, ?, ?, ?, ?, ?, ?)
        """, (
            article["id"], article["title"], article["content"],
            article["category"], article["author"], article["publish_date"],
            title_str, content_str
        ))

    db.commit()
    print(f"Indexed {min(i+batch_size, len(articles))} / {len(articles)} articles")

Search Implementation¶

def search_knowledge_base(query, category=None, top_k=10):
    """
    Semantic search with optional category filter.
    """
    # Generate query embedding
    query_embedding = content_model.encode(query)
    query_str = "[" + ",".join(map(str, query_embedding)) + "]"

    # Build SQL query
    sql = """
        SELECT
            article_id,
            title,
            category,
            author,
            publish_date,
            content_embedding <=> ? AS relevance_score
        FROM knowledge_base
    """

    params = [query_str, query_str]

    # Add category filter if specified
    if category:
        sql += " WHERE category = ?"
        params.append(category)

    sql += """
        ORDER BY content_embedding <=> ?
        LIMIT ?
    """
    params.extend([query_str, top_k])

    # Execute query
    results = db.execute(sql, params).fetchall()

    return [
        {
            "article_id": r[0],
            "title": r[1],
            "category": r[2],
            "author": r[3],
            "publish_date": r[4],
            "relevance": 1 - r[5]  # Convert distance to similarity
        }
        for r in results
    ]

# Example usage
results = search_knowledge_base(
    query="How does neural network training work?",
    category="AI",
    top_k=5
)

for r in results:
    print(f"{r['title']} (relevance: {r['relevance']:.3f})")

Output:

Introduction to Neural Networks (relevance: 0.923)
Backpropagation Algorithm Explained (relevance: 0.887)
Deep Learning Training Techniques (relevance: 0.845)
Gradient Descent Optimization (relevance: 0.812)
Neural Network Architectures (relevance: 0.798)

Example 2: Product Recommendations¶

Build a recommendation engine for e-commerce using collaborative filtering.

Schema Design¶

-- User preference vectors (learned from purchase history)
CREATE TABLE user_preferences (
    user_id INT PRIMARY KEY,
    username TEXT,
    preference_vector VECTOR(128),
    last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Product feature vectors
CREATE TABLE products (
    product_id INT PRIMARY KEY,
    name TEXT NOT NULL,
    description TEXT,
    category TEXT,
    price DECIMAL(10, 2),
    in_stock BOOLEAN DEFAULT TRUE,
    popularity_score FLOAT DEFAULT 0.0,

    -- Combined feature vector (category + attributes + user preferences)
    feature_vector VECTOR(128)
);

-- Create HNSW index for fast recommendations
CREATE INDEX products_feature_idx
ON products(feature_vector)
USING hnsw
WITH (quantization='product', pq_subquantizers=8);

-- Purchase history for training
CREATE TABLE purchases (
    purchase_id INT PRIMARY KEY,
    user_id INT,
    product_id INT,
    purchase_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    rating INT,  -- 1-5 stars
    FOREIGN KEY (user_id) REFERENCES user_preferences(user_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

Recommendation Query¶

-- Get personalized recommendations for user 42
-- Combine vector similarity with business rules

SELECT
    p.product_id,
    p.name,
    p.price,
    p.category,
    p.popularity_score,

    -- Vector similarity score
    p.feature_vector <#> u.preference_vector AS match_score,

    -- Combined ranking score
    (
        -- 70% vector similarity
        (p.feature_vector <#> u.preference_vector) * 0.7 +

        -- 20% popularity
        (p.popularity_score * 0.2) +

        -- 10% recency bonus (newly added products)
        CASE
            WHEN p.created_at > CURRENT_DATE - INTERVAL '7 days' THEN 0.1
            ELSE 0.0
        END
    ) AS final_score

FROM products p, user_preferences u
WHERE u.user_id = 42
  AND p.in_stock = TRUE
  AND p.price > 0

  -- Exclude already purchased items
  AND p.product_id NOT IN (
      SELECT product_id
      FROM purchases
      WHERE user_id = 42
  )

ORDER BY final_score
LIMIT 20;

Python Implementation¶

def get_recommendations(user_id, max_price=None, category=None, limit=20):
    """
    Get personalized product recommendations.
    """
    sql = """
        SELECT
            p.product_id,
            p.name,
            p.price,
            p.category,
            p.feature_vector <#> u.preference_vector AS score
        FROM products p, user_preferences u
        WHERE u.user_id = ?
          AND p.in_stock = TRUE
          AND p.product_id NOT IN (
              SELECT product_id FROM purchases WHERE user_id = ?
          )
    """

    params = [user_id, user_id]

    if max_price:
        sql += " AND p.price <= ?"
        params.append(max_price)

    if category:
        sql += " AND p.category = ?"
        params.append(category)

    sql += " ORDER BY score LIMIT ?"
    params.append(limit)

    results = db.execute(sql, params).fetchall()

    return [
        {
            "product_id": r[0],
            "name": r[1],
            "price": float(r[2]),
            "category": r[3],
            "score": float(r[4])
        }
        for r in results
    ]

# Example usage
recommendations = get_recommendations(
    user_id=42,
    max_price=100.00,
    category="Electronics",
    limit=10
)

for rec in recommendations:
    print(f"{rec['name']} - ${rec['price']:.2f} (score: {rec['score']:.3f})")

Example 3: RAG Pipeline for LLM¶

Build a Retrieval Augmented Generation system for answering questions with LLM + context.

Schema Design¶

-- Document chunks for retrieval
CREATE TABLE document_chunks (
    chunk_id INT PRIMARY KEY,
    document_id INT,
    document_title TEXT,
    chunk_text TEXT NOT NULL,
    chunk_index INT,  -- Position in document

    -- Embedding for semantic retrieval
    embedding VECTOR(1536),  -- OpenAI ada-002

    -- Metadata for filtering
    source TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Create HNSW index
CREATE INDEX chunks_embedding_idx
ON document_chunks(embedding)
USING hnsw;

RAG Implementation (Python)¶

import openai
from typing import List, Dict

# Initialize OpenAI
openai.api_key = "your-api-key"

def embed_query(query: str) -> List[float]:
    """Generate embedding for query."""
    response = openai.Embedding.create(
        input=query,
        model="text-embedding-ada-002"
    )
    return response['data'][0]['embedding']

def retrieve_context(query: str, top_k: int = 5) -> List[Dict]:
    """Retrieve relevant document chunks."""
    # Generate query embedding
    query_emb = embed_query(query)
    query_str = "[" + ",".join(map(str, query_emb)) + "]"

    # Search for similar chunks
    results = db.execute("""
        SELECT
            chunk_id,
            document_title,
            chunk_text,
            embedding <=> ? AS relevance
        FROM document_chunks
        ORDER BY embedding <=> ?
        LIMIT ?
    """, (query_str, query_str, top_k)).fetchall()

    return [
        {
            "chunk_id": r[0],
            "title": r[1],
            "text": r[2],
            "relevance": 1 - r[3]
        }
        for r in results
    ]

def generate_answer(query: str, context_chunks: List[Dict]) -> str:
    """Generate answer using LLM with retrieved context."""
    # Build context string
    context = "\n\n".join([
        f"[Document: {c['title']}]\n{c['text']}"
        for c in context_chunks
    ])

    # Create prompt
    prompt = f"""Answer the question based on the following context:

Context:
{context}

Question: {query}

Answer:"""

    # Generate answer
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a helpful assistant that answers questions based on provided context."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.7,
        max_tokens=500
    )

    return response['choices'][0]['message']['content']

def rag_query(query: str, top_k: int = 5) -> Dict:
    """Complete RAG pipeline: retrieve + generate."""
    # Retrieve relevant context
    context_chunks = retrieve_context(query, top_k)

    # Generate answer
    answer = generate_answer(query, context_chunks)

    return {
        "query": query,
        "answer": answer,
        "sources": [
            {"title": c["title"], "relevance": c["relevance"]}
            for c in context_chunks
        ]
    }

# Example usage
result = rag_query("How does HNSW indexing work?")

print(f"Query: {result['query']}")
print(f"Answer: {result['answer']}")
print("\nSources:")
for source in result['sources']:
    print(f"  - {source['title']} (relevance: {source['relevance']:.3f})")

9. Performance Tuning¶

Index Optimization¶

Build-Time vs Query-Time Tradeoffs¶

-- Fast build, lower recall (~90%)
CREATE INDEX fast_idx ON docs(embedding)
USING hnsw;  -- Default: m=16, ef_construction=200

-- Slow build, higher recall (~98%)
-- (If future version supports parameters)
-- CREATE INDEX accurate_idx ON docs(embedding)
-- USING hnsw WITH (m=32, ef_construction=400);

When to Rebuild Indexes¶

Rebuild indexes when: 1. Dataset grows >2x since index creation 2. Significant data distribution changes 3. Query accuracy degrades over time

-- Rebuild index
DROP INDEX docs_embedding_idx;
CREATE INDEX docs_embedding_idx ON docs(embedding) USING hnsw;

Batch Operations¶

Efficient Bulk Insertion¶

# BAD: Individual inserts (slow)
for doc in documents:
    db.execute("INSERT INTO docs VALUES (?, ?)", (doc.id, doc.embedding))
    db.commit()  # Commit each row

# GOOD: Batch inserts
db.execute("BEGIN TRANSACTION")
for doc in documents:
    db.execute("INSERT INTO docs VALUES (?, ?)", (doc.id, doc.embedding))
db.execute("COMMIT")  # Commit once

# BETTER: Prepared statement with executemany
values = [(doc.id, doc.embedding_str) for doc in documents]
db.executemany("INSERT INTO docs VALUES (?, ?)", values)
db.commit()

Optimal Batch Sizes¶

Operation	Recommended Batch Size	Reason
Embedding generation	32-64	GPU memory limits
Database inserts	1000-10000	Transaction overhead
Vector searches	1-10	Parallel queries limited

Query Optimization¶

Use LIMIT Always¶

-- BAD: No limit (retrieves all results)
SELECT * FROM docs
ORDER BY embedding <=> '[0.1, ...]';

-- GOOD: Limit results
SELECT * FROM docs
ORDER BY embedding <=> '[0.1, ...]'
LIMIT 10;

Pre-Filter with WHERE¶

-- BAD: Filter after vector search
SELECT * FROM docs
ORDER BY embedding <=> '[0.1, ...]'
LIMIT 10
-- Post-processing in application: filter by category

-- GOOD: Filter before vector search
SELECT * FROM docs
WHERE category = 'technology'
ORDER BY embedding <=> '[0.1, ...]'
LIMIT 10;

Avoid Computing Distance Twice¶

-- BAD: Compute distance twice
SELECT
    id,
    title,
    embedding <=> '[0.1, ...]' AS score
FROM docs
ORDER BY embedding <=> '[0.1, ...]'  -- Computed again
LIMIT 10;

-- GOOD: Reference distance column
SELECT
    id,
    title,
    embedding <=> '[0.1, ...]' AS score
FROM docs
ORDER BY score  -- Use computed column
LIMIT 10;

Memory Management¶

Monitor Memory Usage¶

-- Check index memory consumption
SELECT
    index_name,
    dimensions,
    vector_count,
    memory_bytes / 1024 / 1024 AS memory_mb
FROM pg_vector_index_stats();

Memory-Saving Strategies¶

Use Product Quantization

CREATE INDEX idx ON docs(embedding)
USING hnsw WITH (quantization='product');

Lower dimensions

# Use smaller embedding model
model = SentenceTransformer('all-MiniLM-L6-v2')  # 384-dim
# Instead of 'all-mpnet-base-v2' (768-dim)

Partition large tables

-- Split by date for time-series data
CREATE TABLE docs_2024_q1 (embedding VECTOR(384), ...);
CREATE TABLE docs_2024_q2 (embedding VECTOR(384), ...);

Hardware Optimization¶

CPU Considerations¶

SIMD Support: Ensure AVX2 (x86) or NEON (ARM) enabled

# Check CPU features
lscpu | grep -i avx2  # x86_64
lscpu | grep -i neon  # ARM

Core Count: More cores = faster parallel searches
4 cores: Good for small datasets
8+ cores: Better for concurrent queries

Memory Considerations¶

Dataset Size	Minimum RAM	Recommended RAM
10K vectors	512 MB	1 GB
100K vectors	2 GB	4 GB
1M vectors	8 GB	16 GB
10M vectors	32 GB	64 GB

Storage Considerations¶

SSD vs HDD: SSD strongly recommended
SSD: <10ms latency for index loads
HDD: 100-1000ms latency (100x slower)
Disk Space:
Without PQ: ~100 bytes/vector
With PQ: ~20-50 bytes/vector

10. Monitoring¶

pg_vector_index_stats Function¶

Query index statistics and health metrics.

Basic Usage¶

-- Get all vector index stats
SELECT * FROM pg_vector_index_stats();

Output Columns: - index_name: Name of the index - table_name: Table the index belongs to - dimensions: Vector dimensionality - vector_count: Number of vectors indexed - memory_bytes: Memory consumed by index - quantization_type: Compression type (none, product) - distance_metric: Distance function used

Example Output¶

index_name          | table_name | dimensions | vector_count | memory_bytes | quantization_type | distance_metric
--------------------|------------|------------|--------------|--------------|-------------------|----------------
docs_embedding_idx  | documents  | 384        | 50000        | 8388608      | none              | cosine
products_vec_idx    | products   | 768        | 100000       | 8388608      | product           | l2

Monitoring Queries¶

Index Size by Table¶

SELECT
    table_name,
    index_name,
    dimensions,
    vector_count,
    memory_bytes / 1024 / 1024 AS memory_mb,
    (memory_bytes / vector_count) AS bytes_per_vector
FROM pg_vector_index_stats()
ORDER BY memory_bytes DESC;

Compression Effectiveness¶

SELECT
    index_name,
    quantization_type,
    dimensions,
    memory_bytes / 1024 / 1024 AS memory_mb,

    -- Calculate compression ratio
    CASE
        WHEN quantization_type = 'product' THEN
            (dimensions * 4.0) / 8.0  -- 4 bytes per float -> 8 bytes PQ
        ELSE
            1.0
    END AS compression_ratio
FROM pg_vector_index_stats();

Identify Large Indexes¶

-- Find indexes using >1 GB
SELECT
    index_name,
    table_name,
    memory_bytes / 1024 / 1024 / 1024 AS memory_gb
FROM pg_vector_index_stats()
WHERE memory_bytes > 1073741824  -- 1 GB
ORDER BY memory_bytes DESC;

Performance Metrics¶

Query Latency Tracking¶

import time

def measure_query_latency(query_embedding, top_k=10, iterations=100):
    """Measure average query latency."""
    query_str = "[" + ",".join(map(str, query_embedding)) + "]"

    latencies = []
    for _ in range(iterations):
        start = time.perf_counter()

        results = db.execute("""
            SELECT id, embedding <-> ? AS dist
            FROM documents
            ORDER BY embedding <-> ?
            LIMIT ?
        """, (query_str, query_str, top_k)).fetchall()

        end = time.perf_counter()
        latencies.append((end - start) * 1000)  # Convert to ms

    avg_latency = sum(latencies) / len(latencies)
    p50 = sorted(latencies)[len(latencies) // 2]
    p99 = sorted(latencies)[int(len(latencies) * 0.99)]

    return {
        "avg_ms": avg_latency,
        "p50_ms": p50,
        "p99_ms": p99,
        "min_ms": min(latencies),
        "max_ms": max(latencies)
    }

# Example usage
metrics = measure_query_latency(query_embedding, top_k=10)
print(f"Average latency: {metrics['avg_ms']:.2f} ms")
print(f"P50 latency: {metrics['p50_ms']:.2f} ms")
print(f"P99 latency: {metrics['p99_ms']:.2f} ms")

Throughput Measurement¶

import concurrent.futures
import time

def measure_throughput(query_embeddings, top_k=10, workers=4):
    """Measure queries per second with parallel execution."""

    def run_query(query_emb):
        query_str = "[" + ",".join(map(str, query_emb)) + "]"
        results = db.execute("""
            SELECT id FROM documents
            ORDER BY embedding <-> ?
            LIMIT ?
        """, (query_str, top_k)).fetchall()
        return len(results)

    start = time.perf_counter()

    with concurrent.futures.ThreadPoolExecutor(max_workers=workers) as executor:
        results = list(executor.map(run_query, query_embeddings))

    end = time.perf_counter()
    elapsed = end - start
    qps = len(query_embeddings) / elapsed

    return {
        "total_queries": len(query_embeddings),
        "elapsed_seconds": elapsed,
        "queries_per_second": qps,
        "workers": workers
    }

# Example usage
throughput = measure_throughput(query_embeddings, workers=8)
print(f"Throughput: {throughput['queries_per_second']:.1f} QPS")

Health Checks¶

Index Integrity Check¶

-- Verify index exists and has expected vectors
SELECT
    index_name,
    vector_count,
    CASE
        WHEN vector_count > 0 THEN 'OK'
        ELSE 'EMPTY'
    END AS status
FROM pg_vector_index_stats();

Dimension Mismatch Detection¶

-- Check for dimension inconsistencies
-- (This would be a custom query checking actual data)
SELECT
    'documents' AS table_name,
    COUNT(*) AS total_rows,
    COUNT(embedding) AS non_null_embeddings
FROM documents;

Alerting Thresholds¶

Recommended alert thresholds:

Metric	Warning	Critical
Query latency (p99)	>50ms	>200ms
Index memory usage	>80% available	>95% available
Query error rate	>1%	>5%
QPS throughput	<100 QPS	<10 QPS

Summary¶

This tutorial covered:

Introduction: What vector search is and why it matters
Vector Basics: Embeddings, dimensions, and distance metrics
Creating Tables: VECTOR type and schema design
Generating Embeddings: OpenAI, sentence-transformers, and batch processing
HNSW Indexing: Graph-based approximate search with configurable parameters
Similarity Search: KNN queries with L2, cosine, and inner product operators
Product Quantization: 8-384x memory compression for large-scale deployments
Real-World Examples: Document search, recommendations, and RAG pipelines
Performance Tuning: Index optimization, batching, and hardware considerations
Monitoring: pg_vector_index_stats, latency tracking, and health checks

Next Steps¶

Try the examples: Implement a semantic search system with your own data
Experiment with parameters: Test different dimensions and distance metrics
Optimize for your use case: Profile queries and tune indexes
Explore advanced features: Multi-modal search, hybrid ranking, custom embeddings

Additional Resources¶

Troubleshooting¶

Common Issues:

Dimension mismatch errors
Ensure embedding dimension matches VECTOR(n) declaration
Verify all embeddings from same model
Slow queries
Add HNSW index if missing
Use LIMIT to cap results
Consider Product Quantization for large datasets
Memory issues
Enable Product Quantization (384x compression)
Use smaller embedding dimensions
Partition large tables
Low accuracy
Increase ef_construction (if configurable)
Use higher m parameter (more connections)
Disable PQ or increase sub-quantizers

Document Version: 1.0 Last Updated: 2025-12-01 Feedback: Please report issues or suggestions via GitHub issues

Vector Search Tutorial¶

Table of Contents¶

1. Introduction¶

What is Vector Search?¶

Use Cases¶

How HeliosDB-Lite Implements It¶

2. Vector Basics¶

Understanding Embeddings¶

Dimension Considerations¶

Distance Metrics¶

L2 Distance (Euclidean)¶

Cosine Similarity (Cosine Distance)¶

Inner Product (Dot Product)¶

3. Creating Vector Tables¶

VECTOR Data Type Syntax¶

Example Table Creation¶

Example 1: Simple Document Embeddings¶

Example 2: Multi-Column Vectors¶

Example 3: User Preferences for Recommendations¶

Example 4: Time-Series Anomaly Detection¶

Example 5: Image Similarity Search¶

Table Design Best Practices¶

4. Generating Embeddings¶

Using OpenAI API (Python)¶

Installation¶

Basic Usage¶

Batch Processing¶

Storing in HeliosDB-Lite¶

Using sentence-transformers (Python)¶

Installation¶

Basic Usage¶

Batch Processing (Efficient)¶

Complete Example: Indexing Documents¶

Popular Models Comparison¶

Custom Model Fine-Tuning¶

Using Other Embedding Providers¶

Cohere API¶

Hugging Face Transformers¶

Best Practices for Embedding Generation¶

5. HNSW Indexing¶

What is HNSW?¶

How HNSW Works¶

Creating HNSW Indexes¶

Basic Index Creation¶

With Custom Table¶

Multiple Indexes¶

Index Parameters¶

Parameter: m (Max Connections)¶

Parameter: ef_construction (Construction Search Depth)¶

Recommendations by Dataset Size¶

When to Create Indexes¶

Index Build Time Estimates¶

Verifying Index Creation¶

6. Similarity Search¶

KNN Queries with <-> Operator (L2 Distance)¶

Basic KNN Query¶

With Parameterized Queries¶

Practical Example: Document Search¶

Cosine Similarity with <=> Operator¶

Basic Cosine Search¶

Converting Distance to Similarity Score¶

Practical Example: Semantic Q&A¶

Inner Product with <#> Operator¶

Basic Inner Product Search¶

Practical Example: Product Recommendations¶

Combining with WHERE Filters¶

Filter Before Vector Search¶

Multi-Condition Filtering¶

Distance Threshold Filtering¶

Practical Example: Location + Semantic Search¶

Advanced Search Patterns¶

Hybrid Search: Keyword + Vector¶

Multi-Vector Search¶

Weighted Multi-Vector Search¶

7. Product Quantization¶

What is Product Quantization?¶

How PQ Works¶

Memory Optimization Benefits¶

Creating PQ Indexes¶

Basic PQ Index¶

Parameter: `m` (Max Connections)¶

Parameter: `ef_construction` (Construction Search Depth)¶