D1 (Distributed SQLite)

How z0 uses Cloudflare D1 for queryable projections and cross-entity reporting.

Prerequisites: PRINCIPLES.md, PRIMITIVES.md, durable-objects.md

Overview

D1 is Cloudflare’s distributed SQLite database. In z0, D1 serves as a queryable projection of data whose source of truth lives in Durable Objects.

Role	Durable Objects	D1
Source of truth	Yes	No
Write path	Direct	Via replication
Query pattern	Single-entity by ID	Cross-entity, aggregations
Consistency	Strong	Eventually consistent
Use case	Ledger operations, cached state	Reporting, search, dashboards

Key Insight: Durable Objects answer “what is the current state of entity X?” D1 answers “which entities match this criteria?” and “what happened across all entities?”

Architectural Role

Source of Truth: Durable Objects

Per Principle 7 (Derived State Is Disposable), D1 is not authoritative. The ledger in Durable Objects is the source of truth:

Write Path:
  API → Worker → Durable Object (ledger) → Queue → D1 (projection)
                      ↑
                Source of truth

Read Path (single entity):
  API → Worker → Durable Object → Response

Read Path (cross-entity):
  API → Worker → D1 → Response

When D1 Data Is Lost

D1 data can be rebuilt from Durable Object ledgers. This is inconvenient but not catastrophic:

Replay Facts from DO ledgers to D1
Queries return stale data during rebuild
No business data is lost (it lives in DOs)

Replication Pattern

Facts flow from Durable Objects to D1 asynchronously:

DO Ledger Write
      │
      ▼
   Queue (buffered)
      │
      ▼
D1 Batch Insert
      │
      ▼
Available for Query

Replication Guarantees

Property	Guarantee
Ordering	Per-entity FIFO (facts from same DO arrive in order)
Delivery	At-least-once (idempotent writes required)
Latency	Typically < 1s, target < 10s
Durability	Fact is durable in DO before queue acknowledgment

Idempotent Writes

D1 writes use INSERT OR IGNORE with fact_id as primary key:

INSERT OR IGNORE INTO facts (id, type, timestamp, ...)
VALUES (?, ?, ?, ...);

Duplicate deliveries are safely ignored.

Handling Replication Lag

D1 replicas are eventually consistent. For recent data:

Need	Pattern
User’s own recent actions	Query DO directly
Dashboard showing “last 5 minutes”	Show staleness indicator
Financial reports	Wait for replication or query DO
Cross-entity search	Accept eventual consistency

Pattern: Freshness check

// Check if D1 has caught up to a known Fact
async function ensureFresh(factId: string): Promise<boolean> {
  const inD1 = await d1.prepare('SELECT 1 FROM facts WHERE id = ?').bind(factId).first();
  return inD1 !== null;
}

Pattern: Staleness indicator

// Surface data freshness to users
interface QueryResult<T> {
  data: T;
  freshness: {
    source: 'D1' | 'DO';
    maxAge: number; // ms since last known replication
    stale: boolean; // true if maxAge > threshold
  };
}

When to fall back to DO queries:

User just performed an action and expects to see it immediately
Financial calculations where correctness outweighs latency
Audit scenarios requiring absolute consistency

When to accept D1 staleness:

Cross-entity search results (user expects eventual consistency)
Historical reports (data is old anyway)
Dashboards with explicit “updated X seconds ago” indicators

Read Replicas and Consistency

Global Read Replicas

D1 automatically replicates data to edge locations. Reads are served from the nearest replica.

Primary (Write) ──► Replica (US-East)
       │
       ├──────────► Replica (EU-West)
       │
       └──────────► Replica (Asia-Pacific)

Consistency Model

Operation	Consistency
Write	Strongly consistent (primary)
Read (default)	Eventually consistent (nearest replica)
Read (with flag)	Strongly consistent (primary, higher latency)

When to Use Strong Reads

// Default: eventually consistent, fast
const results = await db.prepare("SELECT ...").all();

// Strong consistency: use when correctness > latency
const results = await db.prepare("SELECT ...")
  .all({ consistency: "strong" });

Use strong reads for:

Financial reconciliation
Billing calculations
Audit queries

Use eventual consistency for:

Dashboards
Search
Analytics

Schema Patterns

Facts Table

The core projection of the Fact ledger:

CREATE TABLE facts (
  id TEXT PRIMARY KEY,
  type TEXT NOT NULL,
  subtype TEXT,
  timestamp INTEGER NOT NULL,  -- Unix epoch ms

  tenant_id TEXT NOT NULL,

  -- Entity links (nullable)
  source_id TEXT,
  entity_id TEXT,
  asset_id TEXT,
  campaign_id TEXT,
  contact_id TEXT,
  deal_id TEXT,
  tool_id TEXT,
  user_id TEXT,

  -- Economics
  from_entity TEXT,
  to_entity TEXT,
  amount INTEGER,  -- Stored as cents/smallest unit
  currency TEXT,

  -- Audit
  config_id TEXT,
  config_version INTEGER,

  -- Payload (JSON)
  data TEXT,

  -- Indexing
  created_at INTEGER DEFAULT (strftime('%s', 'now') * 1000)
);

-- Query patterns
CREATE INDEX idx_facts_tenant_type ON facts(tenant_id, type);
CREATE INDEX idx_facts_tenant_timestamp ON facts(tenant_id, timestamp);
CREATE INDEX idx_facts_asset ON facts(asset_id) WHERE asset_id IS NOT NULL;
CREATE INDEX idx_facts_contact ON facts(contact_id) WHERE contact_id IS NOT NULL;
CREATE INDEX idx_facts_deal ON facts(deal_id) WHERE deal_id IS NOT NULL;

Entities Table

Projection of current Entity state:

CREATE TABLE entities (
  id TEXT PRIMARY KEY,
  type TEXT NOT NULL,
  subtype TEXT,

  name TEXT,
  identifier TEXT,

  tenant_id TEXT,
  parent_id TEXT,
  owner_id TEXT,

  status TEXT NOT NULL DEFAULT 'active',

  created_at INTEGER NOT NULL,
  updated_at INTEGER NOT NULL,

  metadata TEXT  -- JSON
);

CREATE INDEX idx_entities_tenant_type ON entities(tenant_id, type);
CREATE INDEX idx_entities_parent ON entities(parent_id) WHERE parent_id IS NOT NULL;
CREATE INDEX idx_entities_identifier ON entities(identifier) WHERE identifier IS NOT NULL;

Configs Table

Projection of Config versions (current and historical):

CREATE TABLE configs (
  id TEXT NOT NULL,
  version INTEGER NOT NULL,
  type TEXT NOT NULL,

  applies_to TEXT NOT NULL,
  scope TEXT NOT NULL,
  tenant_id TEXT,

  effective_at INTEGER NOT NULL,
  superseded_at INTEGER,  -- NULL = current

  settings TEXT NOT NULL,  -- JSON

  PRIMARY KEY (id, version)
);

CREATE INDEX idx_configs_current ON configs(applies_to, type)
  WHERE superseded_at IS NULL;
CREATE INDEX idx_configs_tenant ON configs(tenant_id, type);

Query Patterns

Cross-Entity Aggregations

-- Total charges by tenant for a time period
SELECT
  tenant_id,
  SUM(amount) as total_cents,
  currency,
  COUNT(*) as charge_count
FROM facts
WHERE type = 'charge'
  AND timestamp BETWEEN ? AND ?
GROUP BY tenant_id, currency;

Attribution Queries

-- All touches for a contact
SELECT
  f.id,
  f.type,
  f.subtype,
  f.timestamp,
  f.asset_id,
  e.name as asset_name
FROM facts f
LEFT JOIN entities e ON f.asset_id = e.id
WHERE f.contact_id = ?
ORDER BY f.timestamp;

Config History

-- Full history of a pricing config
SELECT
  version,
  effective_at,
  superseded_at,
  settings
FROM configs
WHERE id = ?
ORDER BY version;

-- Config at a point in time
SELECT settings
FROM configs
WHERE id = ?
  AND effective_at <= ?
  AND (superseded_at IS NULL OR superseded_at > ?)
LIMIT 1;

Search

-- Find entities by identifier pattern
SELECT id, type, name, identifier
FROM entities
WHERE tenant_id = ?
  AND identifier LIKE ?
  AND status = 'active';

Constraints and Limits

Database Limits

Limit	Value
Max database size	10 GB
Max row size	1 MB
Max query result	10 MB
Max concurrent connections	100
Max rows per query	Unlimited (but paginate large results)

Performance Characteristics

Operation	Typical Latency
Point query (by PK)	1-5 ms
Index scan	5-50 ms
Full table scan	Avoid
Write (insert)	5-20 ms
Batch write (100 rows)	20-100 ms

SQLite Considerations

D1 is SQLite under the hood. Respect SQLite patterns:

Do:

Use INTEGER for timestamps (epoch ms)
Store money as INTEGER (cents)
Use JSON for flexible data fields
Create targeted indexes
Use EXPLAIN QUERY PLAN to verify index usage

Don’t:

Use DECIMAL/NUMERIC for money (use INTEGER cents)
Create indexes on JSON fields (extract to columns if needed)
Rely on concurrent write performance (single writer)
Use complex JOINs across large tables (denormalize)

Multi-Tenant Isolation

Tenant Scoping

Every query must include tenant_id:

-- CORRECT: scoped to tenant
SELECT * FROM facts
WHERE tenant_id = ? AND type = 'charge';

-- WRONG: scans all tenants
SELECT * FROM facts WHERE type = 'charge';

Row-Level Security Pattern

For applications requiring strict isolation:

function scopedQuery(db: D1Database, tenantId: string, sql: string, params: any[]) {
  // Verify tenant_id appears in WHERE clause
  if (!sql.includes('tenant_id')) {
    throw new Error('Query must include tenant_id scope');
  }
  return db.prepare(sql).bind(tenantId, ...params).all();
}

Database Per Tenant

For strict isolation requirements, z0 supports database-per-tenant:

Shared D1 (default):
  └── All tenants in single database, scoped by tenant_id

Isolated D1 (optional):
  ├── d1_tenant_abc123
  ├── d1_tenant_def456
  └── d1_tenant_ghi789

Trade-offs:

Shared: Simpler operations, cross-tenant queries possible
Isolated: Stronger isolation, no cross-tenant query risk, higher operational overhead

Backup and Recovery

Built-in Point-in-Time Recovery

D1 provides automatic backups with point-in-time recovery:

# List available restore points
wrangler d1 time-travel list <database>

# Restore to point in time
wrangler d1 time-travel restore <database> --timestamp <iso-timestamp>

Export

# Export full database
wrangler d1 export <database> --output backup.sql

Recovery from DO Ledgers

If D1 is corrupted or needs full rebuild:

Create fresh D1 database
Deploy replication worker pointing to new database
Trigger replay from all DO ledgers
Switch read traffic when caught up

This follows Principle 7: D1 is disposable, DOs are truth.

Observability

Replication Health

Monitor these metrics (see observability.md):

z0_replication_lag_ms           // Time from DO write to D1 availability
z0_replication_queue_depth      // Pending facts in queue
z0_replication_failures_total   // Failed D1 writes

Query Performance

z0_d1_query_duration_ms{query_type}
z0_d1_rows_scanned_total
z0_d1_rows_returned_total

Alert Thresholds

Metric	Warning	Critical
Replication lag	> 10s	> 60s
Query latency p99	> 500ms	> 2s
Queue depth	> 10,000	> 100,000

Summary

Question	Answer
What is D1?	Distributed SQLite at the edge
Is D1 source of truth?	No. Durable Objects are.
What is D1 used for?	Cross-entity queries, reporting, search
Can D1 data be rebuilt?	Yes, from DO ledgers
What’s the consistency model?	Eventually consistent (strong reads available)
How does data get to D1?	Async replication via Queue

D1 enables the query patterns that Durable Objects cannot efficiently support, while maintaining the architectural principle that the ledger (in DOs) is the source of truth.