Skip to content

Branch Storage Architecture - Part 1 of 3

Architecture Design

Navigation: Index | Part 1 | Part 2 →


HeliosDB Lite - Branch Storage Architecture

Copy-on-Write Branch Storage Backend Design

Version: 2.2 Created: November 18, 2025 Status: Architecture Design Document Target: HeliosDB Lite v0.2.0 Scope: Database branching storage backend with efficient copy-on-write


Executive Summary

This document defines the storage architecture for HeliosDB Lite's database branching feature, implementing efficient copy-on-write (CoW) semantics with minimal storage overhead. The design leverages RocksDB's MVCC capabilities and introduces a hierarchical branch metadata system that enables instant branch creation, efficient storage sharing, and conflict-aware merging.

Key Design Principles: 1. Zero-Copy Branch Creation - Branches are created instantly via metadata only 2. Shared Storage - Unchanged data is shared between branches (CoW) 3. MVCC Integration - Branches integrate seamlessly with existing snapshot isolation 4. Efficient Garbage Collection - Automatic cleanup of unreferenced versions 5. Conflict Detection - Three-way merge with configurable resolution strategies

Performance Targets: - Branch Creation: <10ms regardless of database size - Branch Read Overhead: <5% vs. non-branched reads - Storage Overhead: <2% per branch for metadata - Merge Performance: Linear in changed data (not total data)


Table of Contents

  1. Architecture Overview
  2. Branch Metadata Schema
  3. Copy-on-Write Mechanism
  4. Key Space Design
  5. Data Structures
  6. API Design
  7. Branch Isolation
  8. Merge Strategies
  9. Garbage Collection
  10. MVCC Integration
  11. Performance Considerations
  12. Testing Strategy

1. Architecture Overview

1.1 High-Level Design

┌─────────────────────────────────────────────────────────────────┐
│                     Branch Storage Architecture                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                   │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │                  Branch Manager                           │   │
│  │  • Branch metadata CRUD                                   │   │
│  │  • Branch lineage tracking                                │   │
│  │  • Merge coordination                                     │   │
│  └──────────────────────┬───────────────────────────────────┘   │
│                         │                                         │
│  ┌──────────────────────▼───────────────────────────────────┐   │
│  │              Copy-on-Write Layer                          │   │
│  │  • Versioned key resolution                               │   │
│  │  • Branch-aware reads/writes                              │   │
│  │  • Version visibility rules                               │   │
│  └──────────────────────┬───────────────────────────────────┘   │
│                         │                                         │
│  ┌──────────────────────▼───────────────────────────────────┐   │
│  │                MVCC Storage Layer                         │   │
│  │  • Snapshot isolation (existing)                          │   │
│  │  • Multi-version tuples                                   │   │
│  │  • Timestamp-based visibility                             │   │
│  └──────────────────────┬───────────────────────────────────┘   │
│                         │                                         │
│  ┌──────────────────────▼───────────────────────────────────┐   │
│  │                  RocksDB Backend                          │   │
│  │  • Key-value storage                                      │   │
│  │  • SST files & memtables                                  │   │
│  │  • Compaction & compression                               │   │
│  └───────────────────────────────────────────────────────────┘   │
│                                                                   │
└─────────────────────────────────────────────────────────────────┘

1.2 Branch Hierarchy Model

Branches form a directed acyclic graph (DAG) where: - Root Branch (main): The default branch, created automatically - Child Branches: Created from a parent branch at a specific point in time - Orphan Branches: Branches whose parent has been dropped (still valid)

Example Branch DAG:

    main (root)
     ├─────┐
     │     │
    dev   staging@T100
     │     │
     │     └─── feature-x@T150
     └─── hotfix@T200

1.3 Copy-on-Write Principles

  1. Branch Creation: Copy metadata only, no data duplication
  2. First Write: Data is copied on first modification in a branch
  3. Read Priority: Reads check branch-specific versions first, then parent
  4. Storage Sharing: Unchanged data remains shared across branches
  5. Isolation: Each branch maintains independent version history

2. Branch Metadata Schema

2.1 Branch Metadata Structure

Each branch has comprehensive metadata stored in RocksDB:

Key:   branch:meta:<branch_name>
Value: BranchMetadata (bincode serialized)

BranchMetadata {
    name: String,
    branch_id: BranchId,                  // Unique identifier
    parent_id: Option<BranchId>,          // None for root branch
    created_at: Timestamp,                // Creation timestamp
    created_from_snapshot: SnapshotId,    // Parent snapshot at branch point
    state: BranchState,                   // Active, Dropped, Merged
    merge_base: Option<SnapshotId>,       // Last merge point (for merge tracking)
    metadata: BranchMetadataExtras,       // Options, stats, etc.
}

2.2 Branch State Transitions

     ┌─────────┐
     │  Create │
     └────┬────┘
    ┌──────────┐      Merge      ┌────────┐
    │  Active  ├─────────────────▶│ Merged │
    └─────┬────┘                  └────────┘
          │ Drop
     ┌─────────┐
     │ Dropped │
     └─────────┘

States:
- Active:  Branch is active, can be read/written
- Merged:  Branch has been merged into target (read-only)
- Dropped: Branch has been dropped (GC eligible)

2.3 Branch Registry

Global branch registry for efficient lookup:

Key:   branch:registry
Value: BranchRegistry (bincode serialized)

BranchRegistry {
    branches: HashMap<BranchId, BranchName>,
    main_branch: BranchId,
    next_branch_id: u64,
}

2.4 Branch Lineage Index

Parent-child relationship index for GC and merge:

Key:   branch:children:<parent_branch_id>
Value: Vec<BranchId> (bincode serialized)

Enables:
- Finding all children of a branch
- Preventing parent drop while children exist
- Efficient lineage traversal

3. Copy-on-Write Mechanism

3.1 Key Versioning Strategy

Each data key is versioned with branch and timestamp information:

Physical Key Format:
┌──────────┬─────────┬────────────┬───────────┬───────────┐
│  Prefix  │ BranchID│  UserKey   │ Timestamp │  Version  │
│  (data:) │  (u64)  │ (variable) │   (u64)   │   (u32)   │
└──────────┴─────────┴────────────┴───────────┴───────────┘

Example:
data:0000000001:users:123:0000000100:00000001
     │         │         │         │
     │         │         │         └─ Version counter
     │         │         └─────────── Timestamp
     │         └───────────────────── User key (table:row_id)
     └─────────────────────────────── Branch ID (1 = main)

Benefits:
- Efficient range scans per branch
- Natural clustering of branch data
- Simple visibility checks

3.2 Write Path - Copy-on-Write

fn write_to_branch(
    branch_id: BranchId,
    user_key: &str,
    value: &[u8],
    timestamp: Timestamp,
) -> Result<()> {
    // 1. Check if key exists in current branch
    let existing = get_latest_version_in_branch(branch_id, user_key)?;

    if existing.is_some() {
        // Key already modified in this branch - overwrite
        let physical_key = encode_key(branch_id, user_key, timestamp);
        db.put(physical_key, value)?;
    } else {
        // 2. Check parent branches for existing version
        let parent_version = get_from_parent_chain(branch_id, user_key)?;

        // 3. Copy-on-write: Create new version in current branch
        let physical_key = encode_key(branch_id, user_key, timestamp);
        db.put(physical_key, value)?;

        // Parent version remains unchanged (shared storage)
    }

    Ok(())
}

3.3 Read Path - Branch Chain Resolution

fn read_from_branch(
    branch_id: BranchId,
    user_key: &str,
    snapshot: SnapshotId,
) -> Result<Option<Value>> {
    // 1. Try current branch first
    if let Some(value) = get_visible_version_in_branch(
        branch_id,
        user_key,
        snapshot
    )? {
        return Ok(Some(value));
    }

    // 2. Walk up parent chain
    let mut current_branch = branch_id;
    while let Some(parent) = get_parent_branch(current_branch)? {
        // Get parent snapshot at branch point
        let branch_meta = get_branch_metadata(current_branch)?;
        let parent_snapshot = branch_meta.created_from_snapshot;

        // Read from parent at its snapshot
        if let Some(value) = get_visible_version_in_branch(
            parent.branch_id,
            user_key,
            parent_snapshot
        )? {
            return Ok(Some(value));
        }

        current_branch = parent.branch_id;
    }

    // 3. Not found in any branch
    Ok(None)
}

3.4 Storage Deduplication

Unchanged data is automatically shared:

Before Branch:
main: key1 -> v1, key2 -> v2, key3 -> v3

After CREATE BRANCH dev FROM main:
main:  key1 -> v1, key2 -> v2, key3 -> v3
dev:   (empty - all reads delegate to main)

After UPDATE key2 in dev:
main:  key1 -> v1, key2 -> v2, key3 -> v3
dev:   key2 -> v2'  (only modified data stored)
       key1, key3 still read from main

Storage Overhead: 1 value + metadata (~100 bytes)

4. Key Space Design

4.1 Key Prefixes

All keys use prefixed namespaces for organization:

// Branch metadata
const BRANCH_META_PREFIX: &[u8] = b"branch:meta:";
const BRANCH_REGISTRY: &[u8] = b"branch:registry";
const BRANCH_CHILDREN_PREFIX: &[u8] = b"branch:children:";

// Branch data (versioned)
const BRANCH_DATA_PREFIX: &[u8] = b"data:";

// Branch catalog (table schemas per branch)
const BRANCH_CATALOG_PREFIX: &[u8] = b"branch:catalog:";

// Branch counters (row IDs per branch)
const BRANCH_COUNTER_PREFIX: &[u8] = b"branch:counter:";

// GC tracking
const GC_REFS_PREFIX: &[u8] = b"gc:refs:";
const GC_PENDING_PREFIX: &[u8] = b"gc:pending:";

4.2 Key Encoding Functions

/// Encode a branch data key
fn encode_branch_data_key(
    branch_id: BranchId,
    user_key: &str,
    timestamp: Timestamp,
) -> Vec<u8> {
    let mut key = Vec::new();
    key.extend_from_slice(BRANCH_DATA_PREFIX);
    key.extend_from_slice(&branch_id.to_be_bytes());
    key.push(b':');
    key.extend_from_slice(user_key.as_bytes());
    key.push(b':');
    key.extend_from_slice(&timestamp.to_be_bytes());
    key
}

/// Decode a branch data key
fn decode_branch_data_key(key: &[u8]) -> (BranchId, String, Timestamp) {
    // Parse components from key
    // Returns (branch_id, user_key, timestamp)
}

/// Encode a branch metadata key
fn encode_branch_meta_key(branch_name: &str) -> Vec<u8> {
    let mut key = Vec::new();
    key.extend_from_slice(BRANCH_META_PREFIX);
    key.extend_from_slice(branch_name.as_bytes());
    key
}

4.3 Key Ordering Properties

RocksDB's lexicographic key ordering provides natural clustering:

Sorted Key Space:
┌─────────────────────────────────────────────────────────┐
│ branch:children:1 -> [2, 3]                             │
│ branch:children:2 -> [4]                                │
│ branch:meta:dev -> BranchMetadata { ... }              │
│ branch:meta:main -> BranchMetadata { ... }             │
│ branch:registry -> BranchRegistry { ... }              │
│ data:0000000001:table1:row1:0000000100 -> value        │ ← main branch
│ data:0000000001:table1:row2:0000000105 -> value        │
│ data:0000000002:table1:row1:0000000120 -> value'       │ ← dev branch
│ data:0000000002:table1:row3:0000000125 -> value        │
└─────────────────────────────────────────────────────────┘

Benefits:
- Sequential scans within a branch
- Efficient prefix searches
- Natural compaction grouping

5. Data Structures

5.1 Core Rust Structures

/// Branch identifier (globally unique)
pub type BranchId = u64;

/// Branch metadata
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct BranchMetadata {
    /// Branch name (user-visible)
    pub name: String,

    /// Unique branch ID
    pub branch_id: BranchId,

    /// Parent branch ID (None for root)
    pub parent_id: Option<BranchId>,

    /// Creation timestamp
    pub created_at: Timestamp,

    /// Snapshot at branch point
    pub created_from_snapshot: SnapshotId,

    /// Current state
    pub state: BranchState,

    /// Last merge base (for three-way merge)
    pub merge_base: Option<SnapshotId>,

    /// Additional metadata
    pub options: BranchOptions,

    /// Statistics
    pub stats: BranchStats,
}

/// Branch state
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
pub enum BranchState {
    /// Branch is active
    Active,

    /// Branch has been merged
    Merged {
        /// Target branch
        into_branch: BranchId,
        /// Merge timestamp
        at_timestamp: Timestamp,
    },

    /// Branch has been dropped
    Dropped {
        /// Drop timestamp
        at_timestamp: Timestamp,
    },
}

/// Branch creation/merge options
#[derive(Debug, Clone, Default, Serialize, Deserialize)]
pub struct BranchOptions {
    /// Replication factor (for distributed mode)
    pub replication_factor: Option<usize>,

    /// Region hint (for distributed mode)
    pub region: Option<String>,

    /// Custom metadata
    pub metadata: HashMap<String, String>,
}

/// Branch statistics
#[derive(Debug, Clone, Default, Serialize, Deserialize)]
pub struct BranchStats {
    /// Number of modified keys in this branch
    pub modified_keys: u64,

    /// Approximate storage size (bytes)
    pub storage_bytes: u64,

    /// Number of commits in this branch
    pub commit_count: u64,

    /// Last activity timestamp
    pub last_modified: Timestamp,
}

/// Branch registry
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct BranchRegistry {
    /// Map of branch ID -> branch name
    pub branches: HashMap<BranchId, String>,

    /// Main branch ID
    pub main_branch: BranchId,

    /// Next available branch ID
    pub next_branch_id: u64,
}

impl BranchRegistry {
    /// Get next branch ID (auto-increment)
    pub fn next_id(&mut self) -> BranchId {
        let id = self.next_branch_id;
        self.next_branch_id += 1;
        id
    }
}

5.2 Branch Manager Structure

/// Branch manager - coordinates all branch operations
pub struct BranchManager {
    /// Storage engine reference
    storage: Arc<StorageEngine>,

    /// Branch registry (cached)
    registry: Arc<RwLock<BranchRegistry>>,

    /// Branch metadata cache
    metadata_cache: Arc<RwLock<HashMap<BranchId, BranchMetadata>>>,
}

impl BranchManager {
    /// Create new branch manager
    pub fn new(storage: Arc<StorageEngine>) -> Result<Self> {
        let registry = Self::load_or_create_registry(&storage)?;
        Ok(Self {
            storage,
            registry: Arc::new(RwLock::new(registry)),
            metadata_cache: Arc::new(RwLock::new(HashMap::new())),
        })
    }

    /// Create a new branch
    pub fn create_branch(
        &self,
        name: &str,
        parent: Option<&str>,
        as_of: AsOfClause,
        options: BranchOptions,
    ) -> Result<BranchId>;

    /// Drop a branch
    pub fn drop_branch(&self, name: &str, if_exists: bool) -> Result<()>;

    /// Merge branches
    pub fn merge_branch(
        &self,
        source: &str,
        target: &str,
        options: MergeOptions,
    ) -> Result<MergeResult>;

    /// Get branch metadata
    pub fn get_branch(&self, name: &str) -> Result<BranchMetadata>;

    /// List all branches
    pub fn list_branches(&self) -> Result<Vec<BranchMetadata>>;

    /// Get current branch (from context)
    pub fn current_branch(&self) -> Result<BranchId>;
}

5.3 Branch Transaction Structure

/// Branch-aware transaction
pub struct BranchTransaction {
    /// Underlying MVCC transaction
    tx: Transaction,

    /// Branch ID
    branch_id: BranchId,

    /// Branch metadata (cached)
    branch_meta: BranchMetadata,

    /// Parent chain (cached for reads)
    parent_chain: Vec<(BranchId, SnapshotId)>,
}

impl BranchTransaction {
    /// Read from branch with parent fallback
    pub fn get(&self, key: &Key) -> Result<Option<Vec<u8>>>;

    /// Write to branch (copy-on-write)
    pub fn put(&mut self, key: Key, value: Vec<u8>) -> Result<()>;

    /// Delete from branch
    pub fn delete(&mut self, key: Key) -> Result<()>;

    /// Commit transaction
    pub fn commit(self) -> Result<()>;

    /// Rollback transaction
    pub fn rollback(self) -> Result<()>;
}

6. API Design

6.1 Branch Management API

```rust /// Branch creation pub fn create_branch( storage: &StorageEngine, branch_name: &str, parent: Option<&str>, as_of: AsOfClause, options: BranchOptions, ) -> Result { let branch_mgr = BranchManager::new(Arc::clone(&storage.inner))?;

// 1. Validate branch name
if branch_mgr.get_branch(branch_name).is_ok() {
    return Err(Error::branch_exists(branch_name));
}

// 2. Resolve parent branch
let parent_id = if let Some(parent_name) = parent {
    Some(branch_mgr.get_branch(parent_name)?.branch_id)
} else {
    // None = CURRENT branch from context
    Some(branch_mgr.current_branch()?)
};

// 3. Resolve AS OF snapshot
let snapshot_id = resolve_as_of_clause(&as_of, storage)?;

// 4. Allocate branch ID
let mut registry = branch_mgr.registry.write();
let branch_id = registry.next_id();

// 5. Create metadata
let metadata = BranchMetadata {
    name: branch_name.to_string(),
    branch_id,
    parent_id,
    created_at: storage.current_timestamp(),
    created_from_snapshot: snapshot_id,
    state: BranchState::Active,
    merge_base: None,
    options,
    stats: BranchStats::default(),
};

// 6. Persist metadata
let key = encode_branch_meta_key(branch_name);
let value = bincode::serialize(&metadata)?;
storage.put(&key, &value)?;

// 7. Update registry
registry.branches.insert(branch_id, branch_name.to_string());
let registry_value = bincode::serialize(&*registry)?;
storage.put(BRANCH_REGISTRY, &registry_value)?;

// 8. Update parent's children list
if let Some(parent_id) = parent_id {
    add_child_branch(storage, parent_id, branch_id)?;
}

Ok(branch_id)

}

/// Branch deletion pub fn drop_branch( storage: &StorageEngine, branch_name: &str, if_exists: bool, ) -> Result<()> { let branch_mgr = BranchManager::new(Arc::clone(&storage.inner))?;

// 1. Get branch metadata
let metadata = match branch_mgr.get_branch(branch_name) {
    Ok(meta) => meta,
    Err(_) if if_exists => return Ok(()),
    Err(e) => return Err(e),
};

// 2. Prevent dropping main branch
if metadata.branch_id == branch_mgr.registry.read().main_branch {
    return Err(Error::cannot_drop_main_branch());
}

// 3. Check for child branches
let children = get_child_branches(storage, metadata.branch_id)?;
if !children.is_empty() {
    return Err(Error::branch_has_children(
        branch_name,
        children.len(),
    ));
}

// 4. Mark as dropped (soft delete)
let mut updated_meta = metadata.clone();
updated_meta.state = BranchState::Dropped {
    at_timestamp: storage.current_timestamp(),
};

let key = encode_branch_meta_key(branch_name);
let value = bincode::serialize(&updated_meta)?;
storage.put(&key, &value)?;

// 5. Remove from registry
let mut registry = branch_mgr.registry.write();
registry.branches.remove(&metadata.branch_id);
let registry_value = bincode::serialize(&*registry)?;
storage.put(BRANCH_REGISTRY, &registry_value)?;

// 6. Schedule GC for branch data
schedule_branch_gc(storage, metadata.branch_id)?;

Ok(())

}

/// Branch merging pub fn merge_branch( storage: &StorageEngine, source_name: &str, target_name: &str, options: MergeOptions, ) -> Result { let branch_mgr = BranchManager::new(Arc::clone(&storage.inner))?;

// 1. Get source and target metadata
let source = branch_mgr.get_branch(source_name)?;
let target = branch_mgr.get_branch(target_name)?;