Skip to content

Self-Maintaining Filter Index (SMFI) User Guide

Quick Start

SMFI is enabled by default and requires no configuration. Your queries automatically benefit from storage-level filtering.

-- These queries automatically use SMFI
SELECT * FROM orders WHERE customer_id = 123;        -- Uses Bloom filter
SELECT * FROM events WHERE timestamp > '2025-01-01'; -- Uses Zone map
SELECT * FROM products WHERE price BETWEEN 10 AND 50; -- Uses Zone map

What SMFI Does For You

1. Faster Queries (Automatic)

Every query with a WHERE clause is automatically optimized:

Query Type Optimization Typical Speedup
Equality (=) Bloom filter 10-20x
Range (>, <, BETWEEN) Zone map 5-10x
Combined Both 20-50x

2. Zero Configuration

Unlike traditional indexes: - No CREATE INDEX required - No REINDEX maintenance - No memory tuning needed

3. Zero Memory Overhead

All filter structures live on disk: - Loaded only when needed - Shared across all connections - Automatically cleaned up

Verifying SMFI Is Working

Check System Status

-- Overall SMFI status
SELECT * FROM pg_smfi_status();

-- Per-table statistics
SELECT table_name, bloom_checks, bloom_hits, blocks_pruned
FROM pg_smfi_table_stats();

Use EXPLAIN

EXPLAIN SELECT * FROM orders WHERE customer_id = 123;

Look for these indicators:

→ Bloom Filter Check on orders.customer_id
→ Zone Map Prune: 847/1000 blocks skipped
→ Parallel SIMD Filter: 8 workers

Check Query Patterns

-- See what patterns SMFI has learned
SELECT table_name, column_name, query_frequency, selectivity
FROM pg_speculative_filters()
ORDER BY query_frequency DESC;

Common Settings

Adjusting CPU Usage

By default, background maintenance uses max 15% CPU:

-- Allow more CPU for faster consolidation
SET smfi_max_cpu_percent = 25;

-- Reduce impact on production workload
SET smfi_max_cpu_percent = 10;

Controlling Parallel Execution

-- Disable parallel filtering (single-threaded)
SET smfi_parallel_enabled = off;

-- Set maximum workers
SET smfi_max_workers = 4;

-- Minimum rows before parallelizing
SET smfi_parallel_threshold = 50000;

Tuning Speculative Filters

-- Create filters faster (lower threshold)
SET smfi_min_query_frequency = 5;

-- Keep unused filters longer
SET smfi_drop_after_days = 14;

-- Allow more auto-created filters per table
SET smfi_max_filters_per_table = 20;

Bulk Load Best Practices

Automatic Optimization (v3.4+)

SMFI automatically detects bulk operations and suspends tracking to avoid overhead:

-- Bulk INSERT (100+ rows) - SMFI auto-suspends
INSERT INTO large_table VALUES
  (1, 'data1'),
  (2, 'data2'),
  -- ... 100+ rows ...
  (1000, 'data1000');
-- SMFI automatically resumes and schedules rebuild

-- INSERT ... SELECT - SMFI auto-suspends (when implemented)
INSERT INTO large_table SELECT * FROM staging;

-- COPY FROM - SMFI auto-suspends (when implemented)
COPY large_table FROM '/data/import.csv';

The system: 1. Detects bulk operations (≥10,000 rows) 2. Suspends per-row tracking for the affected table 3. Counts rows affected during suspension 4. Automatically resumes tracking when operation completes 5. Schedules filter rebuild if rows were modified

Manual Control (Optional)

For very large imports, you can still manually control tracking:

-- 1. Disable SMFI tracking globally
SET smfi_tracking_enabled = off;

-- 2. Perform bulk load
COPY large_table FROM '/data/import.csv';

-- 3. Re-enable tracking
SET smfi_tracking_enabled = on;

-- 4. Rebuild filters for the table
CALL smfi_rebuild_table('large_table');

Bulk Load Threshold

The auto-suspension threshold is 10,000 rows by default. This is hot-reloadable - no restart required:

-- Check current threshold
SHOW smfi_bulk_load_threshold;

-- Increase threshold (larger batches before auto-suspend)
SET smfi_bulk_load_threshold = 50000;

-- Decrease threshold (more aggressive suspension)
SET smfi_bulk_load_threshold = 1000;

-- Reset to default
RESET smfi_bulk_load_threshold;

Operations below this threshold use normal per-row tracking (~150ns overhead per row).

Performance Monitoring

Key Metrics

-- Bloom filter effectiveness
SELECT
  table_name,
  bloom_checks,
  bloom_hits,
  ROUND(100.0 * bloom_hits / NULLIF(bloom_checks, 0), 2) AS hit_rate_pct
FROM pg_smfi_table_stats()
WHERE bloom_checks > 0;
-- Zone map effectiveness
SELECT
  table_name,
  total_blocks,
  blocks_pruned,
  ROUND(100.0 * blocks_pruned / NULLIF(total_blocks, 0), 2) AS prune_rate_pct
FROM pg_smfi_table_stats()
WHERE total_blocks > 0;

Identifying Optimization Opportunities

-- Tables that might benefit from speculative filters
SELECT
  table_name,
  column_name,
  pattern_type,
  query_frequency,
  selectivity
FROM pg_speculative_filters()
WHERE selectivity < 0.01  -- Less than 1% selectivity
ORDER BY query_frequency DESC
LIMIT 10;

When to Manually Rebuild

SMFI self-maintains, but manual rebuild helps after: - Large bulk imports - Significant data changes - Table truncation

-- Rebuild single table
CALL smfi_rebuild_table('orders');

-- Rebuild all tables (use sparingly)
CALL smfi_rebuild_all();

Troubleshooting

"Zone Map Prune: 0 blocks skipped"

Cause: Data not ordered by filter column.

Solution: Zone maps work best with ordered data (timestamps, sequences):

-- For time-series data, insert in order
INSERT INTO events (timestamp, data)
VALUES (NOW(), '...');  -- Timestamps naturally ordered

-- For random data, zone maps less effective (this is expected)

"Bloom Filter Check: not used"

Cause: Column not tracked or low cardinality.

Check:

SELECT * FROM pg_smfi_table_stats() WHERE table_name = 'your_table';

Solution:

-- Force rebuild
CALL smfi_rebuild_table('your_table');

High CPU from Consolidation

Cause: Too many pending deltas, aggressive consolidation.

Solution:

-- Reduce consolidation frequency
SET smfi_max_cpu_percent = 10;
SET smfi_delta_threshold = 5000;
SET smfi_time_threshold = 600;

Queries Still Slow

Check if SMFI is being used:

EXPLAIN SELECT * FROM slow_table WHERE column = 'value';

If not using SMFI: 1. Verify SMFI is enabled: SHOW smfi_enabled; 2. Check table stats: SELECT * FROM pg_smfi_table_stats(); 3. Rebuild if needed: CALL smfi_rebuild_table('slow_table');

Settings Reference

Setting Default Description
smfi_enabled on Enable/disable SMFI system
smfi_tracking_enabled on Track DML for filter updates
smfi_parallel_enabled on Enable parallel filtering
smfi_max_cpu_percent 15 Max CPU for consolidation
smfi_delta_threshold 1000 Deltas before consolidation
smfi_time_threshold 300 Seconds before consolidation
smfi_min_query_frequency 10 Min queries for auto-filter
smfi_drop_after_days 7 Days before dropping unused filter
smfi_max_filters_per_table 10 Max auto-filters per table
smfi_parallel_threshold 10000 Min rows for parallel
smfi_max_workers CPU cores Max parallel workers
smfi_bloom_fpr 0.01 Bloom false positive rate

See Also