Skip to content

Disaster Recovery: The Chain of Recovery

In z0, disaster recovery is not just about restoring one record; it’s about reconstructing the entire platform state from immutable facts. This process follows a deterministic Chain of Recovery.

All facts are persisted to an R2 Master Log. When a Durable Object (DO) is lost or corrupted, it can rehydrate its state by replaying its history.

The recovery process is orchestrated by the System Ledger, which acts as the index for the entire platform.

graph TD
R2[(R2 Master Log)] --> SL[System Ledger]
SL -- "1. Recover Tenants" --> T[Tenant Registry]
SL -- "2. Recover Entities" --> E[Entity Registry]
E -- "3. Trigger Hydration" --> DO1[Entity DO 1]
E -- "4. Trigger Hydration" --> DO2[Entity DO 2]
DO1 -- "Replay Facts" --> S1[Restored State 1]
DO2 -- "Replay Facts" --> S2[Restored State 2]

The SystemLedger is the first DO to be recovered. Since its identity is always system, we can bootstrap it manually or via the CLI.

Terminal window
z0 hydrate system

Once hydrated, the System Ledger contains a complete list of:

  • All tenants created on the platform.
  • All Entity IDs and their corresponding Ledger Types.

With the System Ledger restored, we can query it to discover which entities need recovery.

const system = new SystemLedger(ctx, env);
const entities = system.getEntities(); // Returns [{entity_id, entity_type, tenant_id}, ...]

The platform (or a recovery script) then iterates through this list to trigger the hydration of each individual DO.

Each Entity DO performs a Chunked Hydration:

  1. Index: It lists all batch files in R2 under its ID: master_log/${id}/.
  2. Ingestion: It processes these chunks via Durable Object Alarms (to avoid execution limits).
  3. Direct SQL: Facts are inserted directly into the DO’s local SQLite storage.
  4. Reconstruction: Once all facts are ingested, the DO runs reconstructState() to rebuild its cached views and configurations.
  • No Data Loss: Every state change is a fact in R2.
  • Highly Parallel: Individual DOs hydrate independently.
  • Throttled: Alarms ensure recovery doesn’t overwhelm the system or exceed Cloudflare limits.
  • Deterministic: Replaying facts always results in the same final state.