Skip to content

Business Continuity Plan

Overview

This Business Continuity Plan (BCP) ensures HeliosDB-Lite operations can continue during and after disruptive events, protecting business functions, stakeholders, and reputation.

Scope

This plan covers: - Development and engineering operations - Customer support services - Infrastructure and operations - Corporate functions

Business Impact Analysis

Critical Business Functions

Function RTO RPO Impact of Disruption
Production database service 5 min 1 min Customer data unavailable
Customer support 4 hours N/A Support tickets delayed
Development 24 hours N/A Release schedule impacted
Sales/Marketing 48 hours N/A Revenue pipeline impacted

Dependency Matrix

┌─────────────────────────────────────────────────────────────────┐
│                    Critical Dependencies                        │
├─────────────────────────────────────────────────────────────────┤
│  Database Service                                               │
│    ├── Cloud Infrastructure (AWS/GCP)                          │
│    ├── DNS Services                                             │
│    ├── Certificate Authority                                    │
│    └── Monitoring Systems                                       │
│                                                                 │
│  Development                                                    │
│    ├── GitHub                                                   │
│    ├── CI/CD Pipeline                                           │
│    └── Development Environments                                 │
│                                                                 │
│  Support                                                        │
│    ├── Ticketing System                                         │
│    ├── Communication Tools                                      │
│    └── Documentation                                            │
└─────────────────────────────────────────────────────────────────┘

Continuity Strategies

Strategy 1: Geographic Redundancy

  • Primary: US-East region
  • Secondary: US-West region
  • Tertiary: EU region (for EU customers)

Strategy 2: Remote Work Capability

All team members equipped for full remote work: - Laptop with development environment - VPN access to all systems - Communication tools (Slack, Zoom) - Documentation access

Strategy 3: Supplier Diversification

Service Primary Backup
Cloud hosting AWS GCP
DNS Route53 Cloudflare
Email Google Workspace Backup SMTP
Communication Slack Discord

Activation Procedures

Activation Criteria

Event Activation Level Authority
Single component failure None Automated
Service degradation Level 1 Operations
Partial outage Level 2 VP Engineering
Full outage Level 3 Executive team
Regional disaster Level 4 CEO

Activation Process

Event Detected
Assess Impact ──▶ Minor? ──▶ Normal Incident Response
     ▼ Major
Activate BCP Team
Determine Level
Execute Procedures
Monitor & Adjust
Recovery & Lessons Learned

Response Procedures

Level 1: Service Degradation

Duration: Up to 4 hours

  1. Activate on-call team
  2. Implement workarounds
  3. Communicate with affected customers
  4. Restore normal operations
  5. Document incident

Level 2: Partial Outage

Duration: 4-24 hours

  1. Activate BCP team
  2. Failover to redundant systems
  3. Customer communication (status page)
  4. Coordinate with affected teams
  5. Regular status updates
  6. Recovery planning

Level 3: Full Outage

Duration: 24+ hours

  1. Executive notification
  2. Full DR activation
  3. Customer communication (direct)
  4. Media/PR coordination
  5. Extended team mobilization
  6. Daily status calls

Level 4: Regional Disaster

Duration: Extended

  1. All-hands notification
  2. Employee safety verification
  3. Alternate site activation
  4. Business function prioritization
  5. Extended operation mode
  6. Recovery planning

Communication Plan

Internal Communication

Audience Channel Frequency Owner
BCP Team Slack #incident Real-time IC
Engineering Email + Slack Hourly VP Eng
All Staff Email Daily HR
Executives Phone/Slack As needed CEO

External Communication

Audience Channel Frequency Owner
Affected customers Email Immediate Support
All customers Status page Real-time Ops
Partners Email Daily BD
Media Press release As needed PR

Communication Templates

Customer Notification:

Subject: [Status Update] HeliosDB Service

Current Status: [Investigating/Identified/Resolved]

We are currently experiencing [brief description].

Impact: [What customers may experience]

Actions: [What we are doing]

ETA: [Expected resolution time]

Updates: status.heliosdb.io

We apologize for any inconvenience.

Team Responsibilities

BCP Team Structure

Role Responsibilities Primary Backup
Incident Commander Overall coordination VP Ops Director Eng
Technical Lead Technical decisions CTO Sr. Engineer
Communications Internal/external comms VP Marketing PR Manager
Customer Success Customer communication VP CS CS Manager
HR/Safety Employee welfare HR Director HR Manager

Contact Information

Maintained in secure, offline document available to all BCP team members.

Recovery Procedures

Service Recovery

  1. Assessment: Evaluate damage and requirements
  2. Prioritization: Critical functions first
  3. Restoration: Systematic service restoration
  4. Verification: Testing and validation
  5. Return to Normal: Full operations resume

Data Recovery

See: DISASTER_RECOVERY.md

Facility Recovery

  1. Assess facility status
  2. Activate alternate site if needed
  3. Coordinate equipment/supplies
  4. Resume operations
  5. Plan permanent recovery

Testing & Maintenance

Testing Schedule

Test Type Frequency Participants Duration
Tabletop exercise Quarterly BCP team 2 hours
Communication test Monthly All staff 30 min
Technical DR drill Monthly Engineering 4 hours
Full simulation Annually All teams 1 day

Plan Maintenance

Activity Frequency Owner
Contact list update Monthly HR
Procedure review Quarterly Operations
Full plan review Annually Executive team
Post-incident update After each incident IC

Training

  • Annual BCP awareness training for all staff
  • Quarterly deep-dive for BCP team
  • New hire orientation includes BCP overview

Appendices

Appendix A: Emergency Contacts

[Maintained separately in secure document]

Appendix B: Vendor Contacts

Vendor Service Support Contact Account ID
AWS Infrastructure aws.amazon.com/support [ID]
Cloudflare CDN/DNS cloudflare.com/support [ID]
GitHub Source control github.com/support [ID]
PagerDuty Alerting pagerduty.com/support [ID]

Appendix C: Checklist

Initial Response: - [ ] Incident confirmed - [ ] BCP team notified - [ ] Impact assessed - [ ] Level determined - [ ] Procedures initiated

During Incident: - [ ] Regular status updates - [ ] Customer communication - [ ] Resource coordination - [ ] Documentation maintained

Recovery: - [ ] Services restored - [ ] Verification complete - [ ] Stakeholders notified - [ ] Normal operations resumed

Post-Incident: - [ ] Lessons learned meeting - [ ] Plan updates identified - [ ] Documentation updated - [ ] Training needs assessed