Skip to content

HashJoin Operator Architecture Design

Document Version: 1.0 Created: November 13, 2025 Owner: System Architecture Designer Status: ⚠️ PARTIALLY IMPLEMENTED (Plan Structure Only) Algorithm Source: Silberschatz "Database System Concepts" (Chapter 12), Ramakrishnan "Database Management Systems" (Chapter 14)


STATUS UPDATE (2025-12-09)

Implementation Progress: 30% Complete

What's Implemented

  • PhysicalPlan::HashJoin variant defined in src/optimizer/planner.rs (lines 48-57)
  • ✅ Planner logic for hash join selection (should_use_hash_join method)
  • ✅ Plan structure supports all join types (INNER, LEFT, RIGHT, FULL)

What's NOT Yet Implemented

  • ❌ Executor implementation (compute/executor.rs)
  • ❌ Hash table construction and probing logic
  • ❌ Memory management and limits
  • ❌ Outer join unmatched tuple emission
  • ❌ Integration tests for hash join execution

Current Behavior

Queries that match hash join criteria fall back to nested loop join (less optimal but correct).

Estimated Effort to Complete: 3-5 days (Phase 4 in roadmap)



Executive Summary

This document specifies the architecture for HashJoinOperator, a critical physical operator for HeliosDB Lite that implements hash-based join algorithms. The design follows standard database textbook algorithms (Silberschatz, Ramakrishan) and integrates with the existing Volcano/Iterator execution model.

Key Design Decisions: - Two-phase algorithm: Build phase (hash table construction) + Probe phase (tuple matching) - FNV-1a hash function (fast, non-cryptographic, good distribution) - In-memory hash table with overflow chaining - Support for INNER, LEFT, RIGHT, FULL OUTER joins - Integration with existing PhysicalOperator trait - Memory-bounded operation with configurable limits


1. Architecture Overview

1.1 Standard Hash Join Algorithm

The classic hash join algorithm (Silberschatz Ch. 12.5.3) operates in two phases:

  1. Build Phase: Read all tuples from the smaller relation (right/inner), extract join keys, build hash table
  2. Probe Phase: Read tuples from the larger relation (left/outer), probe hash table, emit matches

Algorithm Complexity: - Time: O(|R| + |S|) for relations R, S - Space: O(|S|) where S is the smaller (build) relation - Assumes hash table fits in memory (addressed via memory limits)

1.2 Integration with Volcano Model

HashJoinOperator implements the PhysicalOperator trait:

pub trait PhysicalOperator {
    fn next(&mut self) -> Result<Option<Tuple>>;
    fn schema(&self) -> Arc<Schema>;
}

Execution Flow: 1. First call to next(): Execute build phase (materialize right side into hash table) 2. Subsequent calls: Execute probe phase (stream left side, probe hash table) 3. Return Ok(None) when exhausted


2. Data Structures

2.1 HashJoinOperator Struct

use std::collections::HashMap;
use std::sync::Arc;
use crate::{Result, Tuple, Schema, Value};
use crate::sql::{LogicalExpr, JoinType};

/// Hash join operator
///
/// Implements hash-based join using classic two-phase algorithm:
/// 1. Build phase: Hash all tuples from right (build) side
/// 2. Probe phase: Stream left (probe) side, lookup matches
///
/// Algorithm from Silberschatz "Database System Concepts" Ch. 12.5.3
pub struct HashJoinOperator {
    // Input operators
    left: Box<dyn PhysicalOperator>,
    right: Box<dyn PhysicalOperator>,

    // Join specification
    join_type: JoinType,
    on_condition: Option<LogicalExpr>,

    // Hash table (key: join columns, value: matching tuples)
    hash_table: HashMap<JoinKey, Vec<Tuple>>,

    // Output schema
    output_schema: Arc<Schema>,

    // Expression evaluator
    evaluator: crate::sql::Evaluator,

    // State machine
    state: JoinState,

    // Probe phase state
    current_left_tuple: Option<Tuple>,
    current_matches: Vec<Tuple>,
    match_index: usize,

    // LEFT/RIGHT/FULL join state
    matched_left_tuples: HashSet<usize>,  // Track which left tuples matched (for FULL)
    matched_right_keys: HashSet<JoinKey>, // Track which right keys matched (for RIGHT/FULL)

    // Memory management
    memory_limit: usize,  // Maximum memory for hash table (bytes)
    memory_used: usize,   // Current memory usage estimate
}

/// Join key for hash table lookups
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
struct JoinKey(Vec<Value>);

/// State machine for join execution
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
enum JoinState {
    /// Initial state - build phase not started
    Initial,
    /// Building hash table from right side
    Building,
    /// Probing hash table with left side
    Probing,
    /// Emitting unmatched tuples (for outer joins)
    EmittingUnmatched,
    /// Exhausted - no more tuples
    Exhausted,
}

2.2 Hash Table Design

Structure: HashMap<JoinKey, Vec<Tuple>> - Key: JoinKey wrapping Vec<Value> (supports composite keys) - Value: Vec<Tuple> (handles duplicate join keys via chaining)

Rationale: - Standard Rust HashMap provides O(1) average-case lookup - FNV-1a hash (default Rust hasher is SipHash-1-3, we override for performance) - Overflow chaining handles hash collisions and duplicate keys naturally

Memory Model:

Hash Table Memory =
    HashMap overhead (24 bytes per entry) +
    JoinKey size (8 bytes per Value + 24 bytes Vec overhead) +
    Vec<Tuple> size (24 bytes + N * Tuple size)

Estimated: ~100-200 bytes per entry + tuple data


3. Build Phase Algorithm

3.1 Pseudocode (Silberschatz Algorithm 12.10)

Algorithm: HashJoinBuild(R)
Input: Right relation R, join attributes
Output: Hash table H

1. H := empty hash table
2. for each tuple t_r in R do
3.     k := extract_join_key(t_r)
4.     H[k].append(t_r)
5. return H

3.2 Rust Implementation

impl HashJoinOperator {
    /// Execute build phase: construct hash table from right side
    ///
    /// This phase materializes ALL tuples from the right (build) input
    /// into an in-memory hash table indexed by join keys.
    ///
    /// Algorithm: Silberschatz Ch. 12.5.3
    fn build_phase(&mut self) -> Result<()> {
        // Transition state
        if self.state != JoinState::Initial {
            return Err(Error::query_execution("Build phase already executed"));
        }
        self.state = JoinState::Building;

        // Read all tuples from right (build) side
        while let Some(tuple) = self.right.next()? {
            // Extract join key from tuple
            let key = self.extract_join_key(&tuple, true)?;

            // Estimate memory for this tuple
            let tuple_size = estimate_tuple_size(&tuple);
            let key_size = estimate_key_size(&key);
            let entry_overhead = 24; // HashMap entry overhead
            let additional_memory = tuple_size + key_size + entry_overhead;

            // Check memory limit
            if self.memory_used + additional_memory > self.memory_limit {
                return Err(Error::query_execution(
                    format!(
                        "Hash join exceeds memory limit ({} bytes). Consider increasing limit or using nested loop join.",
                        self.memory_limit
                    )
                ));
            }

            // Insert into hash table (with overflow chaining)
            self.hash_table
                .entry(key)
                .or_insert_with(Vec::new)
                .push(tuple);

            self.memory_used += additional_memory;
        }

        // Transition to probe phase
        self.state = JoinState::Probing;

        Ok(())
    }

    /// Extract join key from a tuple
    ///
    /// For ON clause like: a.id = b.id AND a.type = b.type
    /// Extract values of [a.id, a.type] or [b.id, b.type] depending on side
    fn extract_join_key(&self, tuple: &Tuple, is_right_side: bool) -> Result<JoinKey> {
        if let Some(condition) = &self.on_condition {
            let key_values = self.extract_join_columns(condition, tuple, is_right_side)?;
            Ok(JoinKey(key_values))
        } else {
            // Cross join - use empty key (all tuples match)
            Ok(JoinKey(vec![]))
        }
    }

    /// Extract join column values from ON condition
    ///
    /// Handles:
    /// - Simple equality: a.id = b.id
    /// - Composite keys: a.id = b.id AND a.type = b.type
    /// - Complex expressions: EXTRACT(YEAR FROM a.date) = b.year
    fn extract_join_columns(
        &self,
        condition: &LogicalExpr,
        tuple: &Tuple,
        is_right_side: bool,
    ) -> Result<Vec<Value>> {
        use LogicalExpr::*;
        use BinaryOperator::*;

        match condition {
            BinaryExpr { left, op: Eq, right } => {
                // Single equality: extract the appropriate side
                let expr = if is_right_side { right } else { left };
                let value = self.evaluator.evaluate(expr, tuple)?;
                Ok(vec![value])
            }
            BinaryExpr { left, op: And, right } => {
                // Composite key: a.x = b.x AND a.y = b.y
                let mut values = self.extract_join_columns(left, tuple, is_right_side)?;
                values.extend(self.extract_join_columns(right, tuple, is_right_side)?);
                Ok(values)
            }
            _ => {
                // For non-equality joins, we'll evaluate the full condition later
                // Use empty key to force Cartesian product, then filter
                Ok(vec![])
            }
        }
    }
}

/// Estimate memory size of a tuple
fn estimate_tuple_size(tuple: &Tuple) -> usize {
    // Base tuple overhead + values
    let base = 24; // Vec overhead
    let values_size: usize = tuple.values.iter()
        .map(|v| estimate_value_size(v))
        .sum();
    base + values_size
}

/// Estimate memory size of a value
fn estimate_value_size(value: &Value) -> usize {
    use Value::*;
    match value {
        Null => 1,
        Boolean(_) => 1,
        Int2(_) => 2,
        Int4(_) => 4,
        Int8(_) => 8,
        Float4(_) => 4,
        Float8(_) => 8,
        String(s) => 24 + s.len(),
        Bytes(b) => 24 + b.len(),
        Vector(v) => 24 + v.len() * 4,
        Array(arr) => 24 + arr.iter().map(estimate_value_size).sum::<usize>(),
        Json(_) => 256, // Rough estimate for JSON
        _ => 64, // Conservative estimate
    }
}

fn estimate_key_size(key: &JoinKey) -> usize {
    24 + key.0.iter().map(estimate_value_size).sum::<usize>()
}

Build Phase Complexity: - Time: O(|R|) where R is right relation - Space: O(|R|) for hash table - Memory limit: Configurable (default 100MB)


4. Probe Phase Algorithm

4.1 Pseudocode (Silberschatz Algorithm 12.11)

Algorithm: HashJoinProbe(S, H)
Input: Left relation S, hash table H
Output: Stream of joined tuples

1. for each tuple t_s in S do
2.     k := extract_join_key(t_s)
3.     if k in H then
4.         for each tuple t_r in H[k] do
5.             if join_condition(t_s, t_r) then
6.                 emit join(t_s, t_r)

4.2 Rust Implementation (PhysicalOperator::next)

impl PhysicalOperator for HashJoinOperator {
    fn next(&mut self) -> Result<Option<Tuple>> {
        // Execute build phase on first call
        if self.state == JoinState::Initial {
            self.build_phase()?;
        }

        match self.state {
            JoinState::Probing => self.probe_phase(),
            JoinState::EmittingUnmatched => self.emit_unmatched(),
            JoinState::Exhausted => Ok(None),
            _ => Err(Error::query_execution("Invalid join state")),
        }
    }

    fn schema(&self) -> Arc<Schema> {
        self.output_schema.clone()
    }
}

impl HashJoinOperator {
    /// Probe phase: stream left side, lookup matches in hash table
    fn probe_phase(&mut self) -> Result<Option<Tuple>> {
        loop {
            // If we have pending matches for current left tuple, emit them
            if self.match_index < self.current_matches.len() {
                let right_tuple = &self.current_matches[self.match_index];
                self.match_index += 1;

                let left_tuple = self.current_left_tuple.as_ref()
                    .ok_or_else(|| Error::query_execution("Missing left tuple"))?;

                // Mark this as a matched tuple (for outer joins)
                if matches!(self.join_type, JoinType::Full | JoinType::Left) {
                    // Track that this left tuple found a match
                    // (Implementation detail: we'd need tuple IDs for this)
                }

                return Ok(Some(self.join_tuples(left_tuple, right_tuple)));
            }

            // Get next left tuple
            match self.left.next()? {
                None => {
                    // No more left tuples
                    // For outer joins, emit unmatched tuples
                    if matches!(self.join_type, JoinType::Left | JoinType::Right | JoinType::Full) {
                        self.state = JoinState::EmittingUnmatched;
                        return self.emit_unmatched();
                    }

                    self.state = JoinState::Exhausted;
                    return Ok(None);
                }
                Some(left_tuple) => {
                    // Extract join key and probe hash table
                    let key = self.extract_join_key(&left_tuple, false)?;

                    // Lookup in hash table
                    if let Some(matches) = self.hash_table.get(&key) {
                        // Found matches - filter by full join condition
                        let filtered_matches: Vec<Tuple> = matches.iter()
                            .filter(|right_tuple| {
                                self.evaluate_join_condition(&left_tuple, right_tuple)
                                    .unwrap_or(false)
                            })
                            .cloned()
                            .collect();

                        if !filtered_matches.is_empty() {
                            // Store matches and emit first one
                            self.current_left_tuple = Some(left_tuple);
                            self.current_matches = filtered_matches;
                            self.match_index = 0;

                            // Continue loop to emit first match
                            continue;
                        }
                    }

                    // No matches found for this left tuple
                    // For LEFT/FULL join, emit with NULLs
                    if matches!(self.join_type, JoinType::Left | JoinType::Full) {
                        return Ok(Some(self.join_with_nulls_right(&left_tuple)));
                    }

                    // For INNER join, skip this tuple
                    continue;
                }
            }
        }
    }

    /// Evaluate full join condition on combined tuple
    fn evaluate_join_condition(&self, left: &Tuple, right: &Tuple) -> Result<bool> {
        if let Some(condition) = &self.on_condition {
            // Combine tuples for evaluation
            let combined = self.join_tuples(left, right);

            // Evaluate condition
            let result = self.evaluator.evaluate(condition, &combined)?;

            match result {
                Value::Boolean(b) => Ok(b),
                _ => Ok(false),
            }
        } else {
            // No condition = cross join = always match
            Ok(true)
        }
    }

    /// Join two tuples (concatenate values)
    fn join_tuples(&self, left: &Tuple, right: &Tuple) -> Tuple {
        let mut values = left.values.clone();
        values.extend(right.values.clone());
        Tuple::new(values)
    }

    /// Join left tuple with NULLs (for unmatched left tuple in LEFT/FULL join)
    fn join_with_nulls_right(&self, left: &Tuple) -> Tuple {
        let mut values = left.values.clone();

        // Add NULLs for all right columns
        let right_schema = self.right.schema();
        values.extend(vec![Value::Null; right_schema.columns.len()]);

        Tuple::new(values)
    }

    /// Join right tuple with NULLs (for unmatched right tuple in RIGHT/FULL join)
    fn join_with_nulls_left(&self, right: &Tuple) -> Tuple {
        let left_schema = self.left.schema();
        let mut values = vec![Value::Null; left_schema.columns.len()];
        values.extend(right.values.clone());
        Tuple::new(values)
    }
}

Probe Phase Complexity: - Time: O(|S|) where S is left relation (assuming O(1) hash lookup) - Space: O(1) for state (tuples streamed, not materialized)


5. Join Type Handling

5.1 INNER JOIN (Default)

Semantics: Emit only tuples that match the join condition

Algorithm: Standard probe phase (implemented above) - Build hash table from right - Probe with left - Emit only matching pairs

Example:

SELECT * FROM employees e
INNER JOIN departments d ON e.dept_id = d.id

5.2 LEFT OUTER JOIN

Semantics: All left tuples + matching right tuples (NULLs for unmatched left)

Algorithm Modification:

fn probe_phase_left_join(&mut self) -> Result<Option<Tuple>> {
    // ... standard probe ...

    // No matches found for left tuple
    if no_matches {
        // Emit left tuple with NULLs for right side
        return Ok(Some(self.join_with_nulls_right(&left_tuple)));
    }
}

Key Change: When no hash table match, emit left tuple + NULL padding

5.3 RIGHT OUTER JOIN

Semantics: All right tuples + matching left tuples (NULLs for unmatched right)

Algorithm: Transform to LEFT JOIN by swapping inputs

impl HashJoinOperator {
    fn new_right_join(
        left: Box<dyn PhysicalOperator>,
        right: Box<dyn PhysicalOperator>,
        on: Option<LogicalExpr>,
    ) -> Self {
        // RIGHT JOIN: swap left and right, execute as LEFT JOIN
        Self::new_left_join(
            right,  // Swap: right becomes left (probe)
            left,   // Swap: left becomes right (build)
            Self::swap_join_condition(on),  // Adjust ON clause
        )
    }

    /// Swap column references in join condition
    fn swap_join_condition(condition: Option<LogicalExpr>) -> Option<LogicalExpr> {
        // Convert: left.col = right.col  ->  right.col = left.col
        // This requires tracking which columns belong to which side
        // (Implementation detail - analyze condition and swap references)
        condition // Simplified - actual impl swaps column refs
    }
}

Rationale: RIGHT JOIN is uncommon; transforming to LEFT JOIN simplifies implementation

5.4 FULL OUTER JOIN

Semantics: All left tuples + all right tuples (NULLs for unmatched on either side)

Algorithm: Two-pass approach

fn emit_unmatched(&mut self) -> Result<Option<Tuple>> {
    match self.join_type {
        JoinType::Left => {
            // Already handled during probe phase
            self.state = JoinState::Exhausted;
            Ok(None)
        }
        JoinType::Right | JoinType::Full => {
            // Emit unmatched tuples from hash table (right side)
            for (key, tuples) in &self.hash_table {
                if !self.matched_right_keys.contains(key) {
                    for tuple in tuples {
                        // Emit right tuple with NULLs for left side
                        return Ok(Some(self.join_with_nulls_left(tuple)));
                    }
                }
            }

            self.state = JoinState::Exhausted;
            Ok(None)
        }
        _ => {
            self.state = JoinState::Exhausted;
            Ok(None)
        }
    }
}

Implementation Notes: - Track matched keys during probe phase: matched_right_keys: HashSet<JoinKey> - After probe phase completes, iterate hash table and emit unmatched right tuples - Add NULL padding for left columns

Complexity: O(|R|) additional pass over hash table


6. Memory Management

6.1 Memory Limits

Strategy: Bounded memory with explicit limits

const DEFAULT_MEMORY_LIMIT: usize = 100 * 1024 * 1024; // 100 MB

impl HashJoinOperator {
    pub fn new(
        left: Box<dyn PhysicalOperator>,
        right: Box<dyn PhysicalOperator>,
        join_type: JoinType,
        on: Option<LogicalExpr>,
    ) -> Result<Self> {
        Self::with_memory_limit(left, right, join_type, on, DEFAULT_MEMORY_LIMIT)
    }

    pub fn with_memory_limit(
        left: Box<dyn PhysicalOperator>,
        right: Box<dyn PhysicalOperator>,
        join_type: JoinType,
        on: Option<LogicalExpr>,
        memory_limit: usize,
    ) -> Result<Self> {
        // ... initialize ...
    }
}

Behavior: - Track memory usage during build phase: memory_used - Reject inserts exceeding memory_limit - Return error: "Hash join exceeds memory limit. Consider using nested loop join." - Future optimization: Spill to disk (hybrid hash join)

6.2 Build Side Selection

Strategy: Build on smaller relation

fn choose_build_side(
    left: &Box<dyn PhysicalOperator>,
    right: &Box<dyn PhysicalOperator>,
) -> (Box<dyn PhysicalOperator>, Box<dyn PhysicalOperator>) {
    // Heuristic: If we have cardinality estimates, build on smaller side
    // For now, always build on right (convention)
    // Future: Query optimizer should swap inputs based on statistics
    (left.clone(), right.clone())
}

Rationale: Optimizer responsibility (not operator implementation)

6.3 Hash Table Sizing

Strategy: Let Rust HashMap auto-grow

// HashMap::with_capacity(estimated_size) for better performance
let estimated_tuples = 10_000; // Default estimate
self.hash_table = HashMap::with_capacity(estimated_tuples);

Trade-off: Slight memory overhead vs. avoiding rehashing


7. Integration Plan

7.1 Changes to Existing Code

File: /home/claude/HeliosDB/heliosdb-lite/src/sql/executor.rs

Modification 1: Add HashJoinOperator to executor plan translation

impl<'a> Executor<'a> {
    fn plan_to_operator(&self, plan: &LogicalPlan) -> Result<Box<dyn PhysicalOperator>> {
        match plan {
            // ... existing cases ...

            LogicalPlan::Join { left, right, join_type, on } => {
                let left_op = self.plan_to_operator(left)?;
                let right_op = self.plan_to_operator(right)?;

                // Decision: Use HashJoin or NestedLoopJoin?
                // For MVP: always use HashJoin for equi-joins
                if is_equi_join(on) {
                    Ok(Box::new(HashJoinOperator::new(
                        left_op,
                        right_op,
                        join_type.clone(),
                        on.clone(),
                    )?))
                } else {
                    // Fall back to nested loop for non-equi joins
                    Ok(Box::new(NestedLoopJoinOperator::new(
                        left_op,
                        right_op,
                        join_type.clone(),
                        on.clone(),
                    )?))
                }
            }

            // ... rest of cases ...
        }
    }
}

/// Check if join condition is an equi-join (equality on join keys)
fn is_equi_join(condition: &Option<LogicalExpr>) -> bool {
    // Implementation: Walk condition tree, check for equality operators
    // Return true if all join predicates are equality comparisons
    true // Simplified - always use hash join for MVP
}

Modification 2: Add HashJoinOperator module

// At top of executor.rs
mod hash_join;
use hash_join::HashJoinOperator;

7.2 New Files

File: /home/claude/HeliosDB/heliosdb-lite/src/sql/hash_join.rs - Contains HashJoinOperator implementation - Contains helper functions (estimate_tuple_size, etc.) - Contains JoinKey and JoinState types - ~500-600 lines of code

7.3 Testing Strategy

Unit Tests (in hash_join.rs):

#[cfg(test)]
mod tests {
    #[test]
    fn test_inner_join_simple() {
        // Two tables: employees, departments
        // Test basic INNER JOIN on dept_id
    }

    #[test]
    fn test_left_join_with_nulls() {
        // Test LEFT JOIN produces NULLs for unmatched
    }

    #[test]
    fn test_composite_key() {
        // Test JOIN with multiple key columns
    }

    #[test]
    fn test_duplicate_keys() {
        // Test hash table overflow chaining
    }

    #[test]
    fn test_memory_limit() {
        // Test that memory limit is enforced
    }
}

Integration Tests (in tests/advanced_sql_test.rs):

#[test]
fn test_join_with_where() {
    // SELECT * FROM a JOIN b ON a.id = b.id WHERE a.value > 10
}

#[test]
fn test_multi_table_join() {
    // Three-way join: a JOIN b JOIN c
}

#[test]
fn test_join_with_aggregation() {
    // SELECT dept, COUNT(*) FROM employees JOIN departments GROUP BY dept
}

Performance Benchmarks:

#[bench]
fn bench_hash_join_100k(b: &mut Bencher) {
    // Benchmark with 100K rows
    // Target: >10K joins/sec
}


8. Performance Characteristics

8.1 Theoretical Complexity

Operation Time Complexity Space Complexity
Build Phase O(N) O(N)
Probe Phase O(M) O(1)
Total O(N + M) O(N)

Where N = right relation size, M = left relation size

8.2 Expected Performance

Assumptions: - FNV-1a hash: ~2 CPU cycles per byte - Memory bandwidth: ~50 GB/s - Tuple size: ~100 bytes average

Estimated Throughput: - Build: ~500K tuples/sec (memory-bound) - Probe: ~1M tuples/sec (CPU-bound) - Combined: ~10K-50K joins/sec (depends on selectivity)

Comparison to Nested Loop: - Nested loop: O(N * M) = 100x-1000x slower for large inputs - Hash join preferred when: N * M > 1000 (rough heuristic)

8.3 Memory Usage

Example: Join 100K tuples (100 bytes each) - Hash table: 100K * (100 bytes + 48 bytes overhead) = ~14 MB - Well within 100 MB default limit

Guideline: Hash join efficient up to ~500K-1M tuples on build side


9. Future Optimizations

9.1 Hybrid Hash Join (Spill to Disk)

Problem: Hash table exceeds memory limit Solution: Partition relations, spill partitions to disk (Grace Hash Join)

Algorithm (Ramakrishnan Ch. 14.4.2): 1. Partition both relations into N buckets using hash function h1 2. Build hash table for each bucket using hash function h2 (h1 != h2) 3. Join partitions one at a time

Complexity: O(N + M) with 2-3 passes over data

Implementation Estimate: +300 LOC, requires disk I/O abstraction

9.2 Bloom Filter Optimization

Problem: Probe phase wastes time on non-matching keys Solution: Bloom filter to skip lookups for keys not in hash table

Algorithm: 1. During build: Insert all keys into Bloom filter 2. During probe: Check Bloom filter before hash table lookup 3. False positive: Harmless (lookup finds nothing) 4. False negative: Impossible by design

Benefit: 10-30% speedup when selectivity < 50%

Implementation Estimate: +100 LOC

9.3 SIMD Key Comparison

Problem: Key equality checks dominate CPU time Solution: Vectorized comparison using SIMD (AVX2/AVX512)

Benefit: 2-4x speedup for integer keys

Implementation Estimate: +200 LOC, requires std::arch or external crate


10. Architectural Decision Records (ADRs)

ADR-001: Use FNV-1a Hash Function

Context: Need fast, non-cryptographic hash for join keys

Decision: Use FNV-1a (Fowler-Noll-Vo) hash

Rationale: - Excellent distribution (low collision rate) - Fast: 2-3 CPU cycles per byte - Simple implementation (public domain) - Better than SipHash-1-3 (Rust default) for performance

Trade-offs: - Not cryptographically secure (irrelevant for joins) - Slightly worse than xxHash (but simpler)

Status: Accepted

ADR-002: In-Memory Only (No Disk Spill)

Context: Need to handle large joins that exceed memory

Decision: Initial implementation is in-memory only with explicit limits

Rationale: - MVP scope: Keep implementation simple - 100 MB limit covers 90%+ of real-world joins - Error message guides users to alternatives (nested loop, increase limit) - Hybrid hash join is future optimization

Trade-offs: - Cannot join relations larger than memory limit - User must manually switch to nested loop join

Status: Accepted (with future enhancement path)

ADR-003: Build on Right, Probe on Left

Context: Which side should be build vs. probe?

Decision: Always build on right, probe on left (convention)

Rationale: - Matches SQL semantics (right side is typically inner relation) - Query optimizer can swap inputs based on cardinality estimates - Operator implementation is agnostic (works either way)

Trade-offs: - Relies on optimizer to choose smaller relation for right side - Without optimizer hints, may build on larger side (suboptimal)

Status: Accepted (optimizer enhancement tracked separately)

ADR-004: Support All Join Types

Context: Which join types to support?

Decision: INNER, LEFT, RIGHT, FULL (all standard SQL join types)

Rationale: - Complete SQL compatibility - LEFT join is critical for outer join patterns - RIGHT/FULL are easily derived from LEFT - Cross join is degenerate case (empty join key)

Trade-offs: - Additional complexity for outer join tracking - ~100 LOC for unmatched tuple emission

Status: Accepted


11. Implementation Roadmap

Phase 1: Core Hash Join (Days 1-2)

Tasks: 1. Create hash_join.rs module 2. Implement HashJoinOperator struct 3. Implement build phase 4. Implement probe phase (INNER JOIN only) 5. Unit tests for basic functionality

Deliverable: INNER JOIN working

Phase 2: Outer Joins (Day 3)

Tasks: 1. Add LEFT JOIN support 2. Add RIGHT JOIN support (transform to LEFT) 3. Add FULL JOIN support 4. Unit tests for each join type

Deliverable: All join types working

Phase 3: Integration & Testing (Day 4)

Tasks: 1. Integrate with executor (plan_to_operator) 2. Add equi-join detection 3. Integration tests 4. Performance benchmarks

Deliverable: Hash join production-ready

Phase 4: Documentation & Polish (Day 5)

Tasks: 1. Rustdoc comments 2. User-facing error messages 3. Memory limit configuration 4. Code review

Deliverable: Release-ready

Total Estimate: 5 days (40 hours)


12. Conclusion

This architecture provides a complete, textbook-correct implementation of hash join for HeliosDB Lite. The design:

  1. Follows Standard Algorithms: Silberschatz Ch. 12.5.3, Ramakrishnan Ch. 14
  2. Integrates with Volcano Model: Implements PhysicalOperator trait
  3. Supports All Join Types: INNER, LEFT, RIGHT, FULL
  4. Manages Memory Explicitly: Configurable limits, clear error messages
  5. Provides Future Optimization Path: Hybrid hash join, Bloom filters, SIMD

Key Metrics: - Time Complexity: O(N + M) (optimal for equi-joins) - Space Complexity: O(N) where N is build side - Expected Performance: 10K-50K joins/sec - Memory Limit: 100 MB default (configurable)

Next Steps: 1. Coder Agent: Implement hash_join.rs following this spec 2. Tester Agent: Develop comprehensive test suite 3. Optimizer Agent: Add cost model for join operator selection 4. Documentation Agent: User guide for JOIN operations


Document Status: APPROVED FOR IMPLEMENTATION Review Date: November 13, 2025 Reviewed By: System Architecture Designer Next Review: Post-Implementation (after Phase 4)