R2 Object Storage

S3-compatible object storage with zero egress fees. Where z0 stores large blobs.

Prerequisites: PRINCIPLES.md, PRIMITIVES.md

Overview

R2 is Cloudflare’s S3-compatible object storage. Key characteristics:

Property	Value
API Compatibility	S3 (most operations)
Egress Fees	Zero
Storage Class	Single (no tiers to manage)
Consistency	Strong read-after-write
Max Object Size	5 TB
Multipart Upload	Yes (required for objects > 100 MB)

Key Insight: R2’s zero egress fees change the economics of object storage. Large payloads that would be expensive to serve from traditional cloud storage become cost-effective.

How z0 Uses R2

R2 serves three distinct purposes in z0:

Use Case	What’s Stored	Access Pattern	Retention
Context Store	Call recordings, transcripts, documents	Read by reference from Facts	Days to months
Fact Archives	Cold storage for old Facts	Bulk export, compliance queries	Years
Backup/Export	Tenant data exports, system backups	On-demand retrieval	Per policy

1. Context Store

Large payloads associated with invocations and outcomes.

Stored Objects:

Call recordings (audio files from Twilio)
Transcripts (text output from speech-to-text)
Documents (uploaded files, generated reports)
AI context (conversation history, embeddings)

Naming Convention:

{tenant_id}/context/{entity_type}/{entity_id}/{object_type}/{timestamp}_{id}.{ext}

Examples:
tenant_abc/context/invocation/inv_123/recording/2025-01-15_rec_456.mp3
tenant_abc/context/invocation/inv_123/transcript/2025-01-15_tr_789.json
tenant_abc/context/asset/asset_001/document/2025-01-15_doc_012.pdf

Fact Reference Pattern:

Facts don’t store large payloads inline. Instead, they reference R2 objects:

Fact {
  id: "inv_123",
  type: "invocation",
  subtype: "inbound_call",
  data: {
    duration_seconds: 342,
    recording_url: "r2://tenant_abc/context/invocation/inv_123/recording/2025-01-15_rec_456.mp3",
    transcript_url: "r2://tenant_abc/context/invocation/inv_123/transcript/2025-01-15_tr_789.json"
  }
}

Why not inline? Facts are immutable and replicated. Embedding large blobs would bloat the ledger, slow replication, and waste storage on data that rarely needs the same durability guarantees as economic Facts.

2. Fact Archives

Cold storage for Facts that are past their hot query window.

When Facts Move to Archive:

Facts older than retention window (default: 90 days)
Tenant explicitly requests archival
System migration/consolidation

Archive Format:

{tenant_id}/archive/facts/{year}/{month}/{batch_id}.parquet

Example:
tenant_abc/archive/facts/2024/06/batch_001.parquet

Why Parquet? Columnar format optimized for analytical queries. Compressed. Can be queried directly with tools like DuckDB without extraction.

Archive Lifecycle:

Facts accumulate in D1 (hot storage)
Archival Workflow identifies Facts past retention window
Facts exported to Parquet, uploaded to R2
Lifecycle Fact written recording the archival (see below)
Facts removed from D1 (ledger reference in DO remains)

Important: The lifecycle Fact must be written BEFORE removing Facts from D1. This ensures the archival operation is auditable even if the removal fails partway through.

Fact {
  type: "lifecycle",
  subtype: "facts_archived",
  entity_id: "tenant_abc",
  data: {
    archive_url: "r2://tenant_abc/archive/facts/2024/06/batch_001.parquet",
    fact_count: 145678,
    date_range: { from: "2024-06-01", to: "2024-06-30" },
    checksum: "sha256:abc123..."
  }
}

Invariant: Archived Facts are never deleted. The archive is append-only, just like the ledger itself.

3. Backup/Export Storage

On-demand exports and system backups.

Export Types:

Tenant data export (GDPR, offboarding)
System backup snapshots
Migration artifacts

Naming Convention:

{tenant_id}/export/{export_type}/{timestamp}_{export_id}.{ext}
_system/backup/{backup_type}/{timestamp}_{backup_id}.{ext}

Examples:
tenant_abc/export/full/2025-01-15_exp_001.zip
_system/backup/d1/2025-01-15_bak_001.sqlite

Presigned URLs

R2 supports presigned URLs for secure, time-limited access without exposing credentials.

Use Cases

Scenario	URL Type	Expiry
Playback recording in UI	GET	15 minutes
Download transcript	GET	1 hour
Upload document	PUT	15 minutes
Export download	GET	24 hours

Implementation Pattern

// Worker generates presigned URL
async function getRecordingUrl(tenant_id: string, recording_path: string): Promise<string> {
  // Verify caller has access to tenant
  // ...

  const url = await env.R2_BUCKET.createSignedUrl(recording_path, {
    action: 'get',
    expiresIn: 900  // 15 minutes
  });

  return url;
}

// Client receives presigned URL, fetches directly from R2
// No traffic through Workers for large file downloads

Security Notes:

Never expose R2 bucket credentials to clients
Always validate tenant access before generating presigned URLs
Use shortest reasonable expiry times
Log presigned URL generation (who, what, when)

Presigned URL Facts

For auditability, presigned URL generation can be tracked:

Fact {
  type: "invocation",
  subtype: "presigned_url_generated",
  tenant_id: "tenant_abc",
  user_id: "user_123",
  data: {
    object_path: "tenant_abc/context/invocation/inv_123/recording/...",
    action: "get",
    expires_in_seconds: 900,
    purpose: "recording_playback"
  }
}

Lifecycle Policies

R2 supports object lifecycle rules for automatic management.

z0 Lifecycle Policies

Bucket/Prefix	Rule	Action
`/context/invocation/`	Age > 90 days	Delete (unless archived)
`/context/asset/`	Age > 365 days	Delete (unless flagged)
`/archive/`	Never	No automatic deletion
`/export/`	Age > 30 days	Delete
`_system/backup/*`	Keep last 30	Delete oldest beyond 30

Tenant Override

Tenants can request extended retention via billing Config:

Config {
  type: "billing",
  applies_to: "tenant_abc",
  settings: {
    recording_retention_days: 365,  // Override default 90
    export_retention_days: 90       // Override default 30
  }
}

Lifecycle Workflow:

Lifecycle Worker runs daily
Queries objects approaching expiry
Checks for tenant retention overrides
Checks for hold flags (legal, compliance)
Deletes eligible objects
Records lifecycle Fact

Fact {
  type: "lifecycle",
  subtype: "objects_deleted",
  entity_id: "tenant_abc",
  data: {
    prefix: "tenant_abc/context/invocation/",
    objects_deleted: 1234,
    bytes_freed: 5678901234,
    policy: "default_90_day"
  }
}

When to Use R2 vs D1 vs DO Storage

Data Type	Storage	Rationale
Facts (hot)	DO ledger + D1	Queryable, fast, need cross-entity joins
Facts (cold)	R2 archive	Bulk storage, rare access, compliance
Configs	DO + D1	Versioned, queryable, frequently accessed
Entities	DO + D1	Queryable, relationships matter
Cached State	DO memory/SQLite	Hot path, single-entity access
Recordings	R2	Large blobs, served directly to clients
Transcripts	R2	Large text, referenced from Facts
Documents	R2	Variable size, user-uploaded content
Exports	R2	Large, temporary, downloadable

Decision Flowchart

Is it a blob > 1 KB?
├── Yes → R2 (reference from Facts)
└── No → Is it queryable across entities?
    ├── Yes → D1
    └── No → Is it per-entity state?
        ├── Yes → DO (ledger or cache)
        └── No → Probably doesn't need storage

Size Thresholds

Threshold	Guidance
< 1 KB	Can be in Fact.data if truly needed
1 KB - 100 KB	R2, consider inline for critical data
100 KB - 100 MB	R2, standard upload
> 100 MB	R2, multipart upload required

Access Patterns

From Workers

// Bind R2 bucket in wrangler.toml
// [[r2_buckets]]
// binding = "CONTEXT_BUCKET"
// bucket_name = "z0-context"

export default {
  async fetch(request, env) {
    // Read object
    const object = await env.CONTEXT_BUCKET.get("tenant_abc/context/...");
    if (!object) return new Response("Not found", { status: 404 });

    // Stream response
    return new Response(object.body, {
      headers: { "Content-Type": object.httpMetadata.contentType }
    });
  }
};

From Durable Objects

export class InvocationLedger extends DurableObject {
  async storeRecording(recording: ArrayBuffer, metadata: RecordingMetadata) {
    const path = `${this.tenant_id}/context/invocation/${this.id}/recording/${timestamp}_${recordingId}.mp3`;

    await this.env.CONTEXT_BUCKET.put(path, recording, {
      httpMetadata: { contentType: "audio/mpeg" },
      customMetadata: {
        tenant_id: this.tenant_id,
        invocation_id: this.id,
        duration_seconds: metadata.duration.toString()
      }
    });

    return `r2://${path}`;
  }
}

Observability

Metrics to Track

z0_r2_operations_total{operation, bucket, tenant_id}
z0_r2_bytes_written_total{bucket, tenant_id}
z0_r2_bytes_read_total{bucket, tenant_id}
z0_r2_presigned_urls_generated_total{action, purpose, tenant_id}
z0_r2_lifecycle_deletions_total{policy, tenant_id}

Alerts

Condition	Severity	Action
Write failure rate > 1%	Critical	Page on-call
Storage growth > 20%/day	Warning	Review lifecycle policies
Presigned URL generation spike	Warning	Check for abuse
Archive job failure	Critical	Manual intervention required

Cost Model

R2 pricing (as of 2025):

Component	Cost	Notes
Storage	$0.015/GB/month	First 10 GB free
Class A ops (write)	$4.50/million	PUT, POST, LIST
Class B ops (read)	$0.36/million	GET, HEAD
Egress	$0.00	Zero, always

z0 Cost Allocation:

Context storage costs allocated to tenant via metadata
Archive storage allocated to tenant
System backups allocated to platform

Cost Tracking Pattern:

Fact {
  type: "cost",
  subtype: "storage",
  tool_id: "tool_r2",
  tenant_id: "tenant_abc",
  amount: 15.00,
  currency: "USD",
  data: {
    period: "2025-01",
    storage_gb: 1000,
    class_a_ops: 50000,
    class_b_ops: 500000
  }
}

Summary

Question	Answer
What is R2?	S3-compatible object storage, zero egress
What does z0 store in R2?	Recordings, transcripts, documents, archives, exports
How are objects referenced?	`r2://` URLs in Fact.data fields
How do users access objects?	Presigned URLs (never direct credentials)
When are objects deleted?	Lifecycle policies, tenant-configurable retention
R2 vs D1 vs DO?	R2 for blobs, D1 for queries, DO for per-entity state

R2 extends z0’s storage capabilities to handle large payloads without compromising the ledger model. Facts remain lean and queryable; large context lives in R2, referenced by URL.