Disaster Recovery: The Chain of Recovery
Disaster Recovery: The Chain of Recovery
Section titled “Disaster Recovery: The Chain of Recovery”In z0, disaster recovery is not just about restoring one record; it’s about reconstructing the entire platform state from immutable facts. This process follows a deterministic Chain of Recovery.
Architecture Overview
Section titled “Architecture Overview”All facts are persisted to an R2 Master Log. When a Durable Object (DO) is lost or corrupted, it can rehydrate its state by replaying its history.
The recovery process is orchestrated by the System Ledger, which acts as the index for the entire platform.
The Recovery Chain
Section titled “The Recovery Chain”graph TD R2[(R2 Master Log)] --> SL[System Ledger] SL -- "1. Recover Tenants" --> T[Tenant Registry] SL -- "2. Recover Entities" --> E[Entity Registry] E -- "3. Trigger Hydration" --> DO1[Entity DO 1] E -- "4. Trigger Hydration" --> DO2[Entity DO 2] DO1 -- "Replay Facts" --> S1[Restored State 1] DO2 -- "Replay Facts" --> S2[Restored State 2]Step 1: Recover the Root (System Ledger)
Section titled “Step 1: Recover the Root (System Ledger)”The SystemLedger is the first DO to be recovered. Since its identity is always system, we can bootstrap it manually or via the CLI.
z0 hydrate systemOnce hydrated, the System Ledger contains a complete list of:
- All tenants created on the platform.
- All Entity IDs and their corresponding Ledger Types.
Step 2: Discovery & Orchestration
Section titled “Step 2: Discovery & Orchestration”With the System Ledger restored, we can query it to discover which entities need recovery.
const system = new SystemLedger(ctx, env);const entities = system.getEntities(); // Returns [{entity_id, entity_type, tenant_id}, ...]The platform (or a recovery script) then iterates through this list to trigger the hydration of each individual DO.
Step 3: Individual DO Hydration
Section titled “Step 3: Individual DO Hydration”Each Entity DO performs a Chunked Hydration:
- Index: It lists all batch files in R2 under its ID:
master_log/${id}/. - Ingestion: It processes these chunks via Durable Object Alarms (to avoid execution limits).
- Direct SQL: Facts are inserted directly into the DO’s local SQLite storage.
- Reconstruction: Once all facts are ingested, the DO runs
reconstructState()to rebuild its cached views and configurations.
Benefits
Section titled “Benefits”- No Data Loss: Every state change is a fact in R2.
- Highly Parallel: Individual DOs hydrate independently.
- Throttled: Alarms ensure recovery doesn’t overwhelm the system or exceed Cloudflare limits.
- Deterministic: Replaying facts always results in the same final state.