Self-Maintaining Filter Index (SMFI) User Guide¶
Quick Start¶
SMFI is enabled by default and requires no configuration. Your queries automatically benefit from storage-level filtering.
-- These queries automatically use SMFI
SELECT * FROM orders WHERE customer_id = 123; -- Uses Bloom filter
SELECT * FROM events WHERE timestamp > '2025-01-01'; -- Uses Zone map
SELECT * FROM products WHERE price BETWEEN 10 AND 50; -- Uses Zone map
What SMFI Does For You¶
1. Faster Queries (Automatic)¶
Every query with a WHERE clause is automatically optimized:
| Query Type | Optimization | Typical Speedup |
|---|---|---|
Equality (=) |
Bloom filter | 10-20x |
Range (>, <, BETWEEN) |
Zone map | 5-10x |
| Combined | Both | 20-50x |
2. Zero Configuration¶
Unlike traditional indexes:
- No CREATE INDEX required
- No REINDEX maintenance
- No memory tuning needed
3. Zero Memory Overhead¶
All filter structures live on disk: - Loaded only when needed - Shared across all connections - Automatically cleaned up
Verifying SMFI Is Working¶
Check System Status¶
-- Overall SMFI status
SELECT * FROM pg_smfi_status();
-- Per-table statistics
SELECT table_name, bloom_checks, bloom_hits, blocks_pruned
FROM pg_smfi_table_stats();
Use EXPLAIN¶
Look for these indicators:
→ Bloom Filter Check on orders.customer_id
→ Zone Map Prune: 847/1000 blocks skipped
→ Parallel SIMD Filter: 8 workers
Check Query Patterns¶
-- See what patterns SMFI has learned
SELECT table_name, column_name, query_frequency, selectivity
FROM pg_speculative_filters()
ORDER BY query_frequency DESC;
Common Settings¶
Adjusting CPU Usage¶
By default, background maintenance uses max 15% CPU:
-- Allow more CPU for faster consolidation
SET smfi_max_cpu_percent = 25;
-- Reduce impact on production workload
SET smfi_max_cpu_percent = 10;
Controlling Parallel Execution¶
-- Disable parallel filtering (single-threaded)
SET smfi_parallel_enabled = off;
-- Set maximum workers
SET smfi_max_workers = 4;
-- Minimum rows before parallelizing
SET smfi_parallel_threshold = 50000;
Tuning Speculative Filters¶
-- Create filters faster (lower threshold)
SET smfi_min_query_frequency = 5;
-- Keep unused filters longer
SET smfi_drop_after_days = 14;
-- Allow more auto-created filters per table
SET smfi_max_filters_per_table = 20;
Bulk Load Best Practices¶
Automatic Optimization (v3.4+)¶
SMFI automatically detects bulk operations and suspends tracking to avoid overhead:
-- Bulk INSERT (100+ rows) - SMFI auto-suspends
INSERT INTO large_table VALUES
(1, 'data1'),
(2, 'data2'),
-- ... 100+ rows ...
(1000, 'data1000');
-- SMFI automatically resumes and schedules rebuild
-- INSERT ... SELECT - SMFI auto-suspends (when implemented)
INSERT INTO large_table SELECT * FROM staging;
-- COPY FROM - SMFI auto-suspends (when implemented)
COPY large_table FROM '/data/import.csv';
The system: 1. Detects bulk operations (≥10,000 rows) 2. Suspends per-row tracking for the affected table 3. Counts rows affected during suspension 4. Automatically resumes tracking when operation completes 5. Schedules filter rebuild if rows were modified
Manual Control (Optional)¶
For very large imports, you can still manually control tracking:
-- 1. Disable SMFI tracking globally
SET smfi_tracking_enabled = off;
-- 2. Perform bulk load
COPY large_table FROM '/data/import.csv';
-- 3. Re-enable tracking
SET smfi_tracking_enabled = on;
-- 4. Rebuild filters for the table
CALL smfi_rebuild_table('large_table');
Bulk Load Threshold¶
The auto-suspension threshold is 10,000 rows by default. This is hot-reloadable - no restart required:
-- Check current threshold
SHOW smfi_bulk_load_threshold;
-- Increase threshold (larger batches before auto-suspend)
SET smfi_bulk_load_threshold = 50000;
-- Decrease threshold (more aggressive suspension)
SET smfi_bulk_load_threshold = 1000;
-- Reset to default
RESET smfi_bulk_load_threshold;
Operations below this threshold use normal per-row tracking (~150ns overhead per row).
Performance Monitoring¶
Key Metrics¶
-- Bloom filter effectiveness
SELECT
table_name,
bloom_checks,
bloom_hits,
ROUND(100.0 * bloom_hits / NULLIF(bloom_checks, 0), 2) AS hit_rate_pct
FROM pg_smfi_table_stats()
WHERE bloom_checks > 0;
-- Zone map effectiveness
SELECT
table_name,
total_blocks,
blocks_pruned,
ROUND(100.0 * blocks_pruned / NULLIF(total_blocks, 0), 2) AS prune_rate_pct
FROM pg_smfi_table_stats()
WHERE total_blocks > 0;
Identifying Optimization Opportunities¶
-- Tables that might benefit from speculative filters
SELECT
table_name,
column_name,
pattern_type,
query_frequency,
selectivity
FROM pg_speculative_filters()
WHERE selectivity < 0.01 -- Less than 1% selectivity
ORDER BY query_frequency DESC
LIMIT 10;
When to Manually Rebuild¶
SMFI self-maintains, but manual rebuild helps after: - Large bulk imports - Significant data changes - Table truncation
-- Rebuild single table
CALL smfi_rebuild_table('orders');
-- Rebuild all tables (use sparingly)
CALL smfi_rebuild_all();
Troubleshooting¶
"Zone Map Prune: 0 blocks skipped"¶
Cause: Data not ordered by filter column.
Solution: Zone maps work best with ordered data (timestamps, sequences):
-- For time-series data, insert in order
INSERT INTO events (timestamp, data)
VALUES (NOW(), '...'); -- Timestamps naturally ordered
-- For random data, zone maps less effective (this is expected)
"Bloom Filter Check: not used"¶
Cause: Column not tracked or low cardinality.
Check:
Solution:
High CPU from Consolidation¶
Cause: Too many pending deltas, aggressive consolidation.
Solution:
-- Reduce consolidation frequency
SET smfi_max_cpu_percent = 10;
SET smfi_delta_threshold = 5000;
SET smfi_time_threshold = 600;
Queries Still Slow¶
Check if SMFI is being used:
If not using SMFI:
1. Verify SMFI is enabled: SHOW smfi_enabled;
2. Check table stats: SELECT * FROM pg_smfi_table_stats();
3. Rebuild if needed: CALL smfi_rebuild_table('slow_table');
Settings Reference¶
| Setting | Default | Description |
|---|---|---|
smfi_enabled |
on |
Enable/disable SMFI system |
smfi_tracking_enabled |
on |
Track DML for filter updates |
smfi_parallel_enabled |
on |
Enable parallel filtering |
smfi_max_cpu_percent |
15 |
Max CPU for consolidation |
smfi_delta_threshold |
1000 |
Deltas before consolidation |
smfi_time_threshold |
300 |
Seconds before consolidation |
smfi_min_query_frequency |
10 |
Min queries for auto-filter |
smfi_drop_after_days |
7 |
Days before dropping unused filter |
smfi_max_filters_per_table |
10 |
Max auto-filters per table |
smfi_parallel_threshold |
10000 |
Min rows for parallel |
smfi_max_workers |
CPU cores |
Max parallel workers |
smfi_bloom_fpr |
0.01 |
Bloom false positive rate |
See Also¶
- SMFI Concept Guide - Architecture details
- SQL Settings Reference - All settings
- SMFI Scalability Report - Benchmarks