Skip to content

R2 Object Storage

S3-compatible object storage with zero egress fees. Where z0 stores large blobs.

Prerequisites: PRINCIPLES.md, PRIMITIVES.md


R2 is Cloudflare’s S3-compatible object storage. Key characteristics:

PropertyValue
API CompatibilityS3 (most operations)
Egress FeesZero
Storage ClassSingle (no tiers to manage)
ConsistencyStrong read-after-write
Max Object Size5 TB
Multipart UploadYes (required for objects > 100 MB)

Key Insight: R2’s zero egress fees change the economics of object storage. Large payloads that would be expensive to serve from traditional cloud storage become cost-effective.


R2 serves three distinct purposes in z0:

Use CaseWhat’s StoredAccess PatternRetention
Context StoreCall recordings, transcripts, documentsRead by reference from FactsDays to months
Fact ArchivesCold storage for old FactsBulk export, compliance queriesYears
Backup/ExportTenant data exports, system backupsOn-demand retrievalPer policy

Large payloads associated with invocations and outcomes.

Stored Objects:

  • Call recordings (audio files from Twilio)
  • Transcripts (text output from speech-to-text)
  • Documents (uploaded files, generated reports)
  • AI context (conversation history, embeddings)

Naming Convention:

{tenant_id}/context/{entity_type}/{entity_id}/{object_type}/{timestamp}_{id}.{ext}
Examples:
tenant_abc/context/invocation/inv_123/recording/2025-01-15_rec_456.mp3
tenant_abc/context/invocation/inv_123/transcript/2025-01-15_tr_789.json
tenant_abc/context/asset/asset_001/document/2025-01-15_doc_012.pdf

Fact Reference Pattern:

Facts don’t store large payloads inline. Instead, they reference R2 objects:

Fact {
id: "inv_123",
type: "invocation",
subtype: "inbound_call",
data: {
duration_seconds: 342,
recording_url: "r2://tenant_abc/context/invocation/inv_123/recording/2025-01-15_rec_456.mp3",
transcript_url: "r2://tenant_abc/context/invocation/inv_123/transcript/2025-01-15_tr_789.json"
}
}

Why not inline? Facts are immutable and replicated. Embedding large blobs would bloat the ledger, slow replication, and waste storage on data that rarely needs the same durability guarantees as economic Facts.

Cold storage for Facts that are past their hot query window.

When Facts Move to Archive:

  • Facts older than retention window (default: 90 days)
  • Tenant explicitly requests archival
  • System migration/consolidation

Archive Format:

{tenant_id}/archive/facts/{year}/{month}/{batch_id}.parquet
Example:
tenant_abc/archive/facts/2024/06/batch_001.parquet

Why Parquet? Columnar format optimized for analytical queries. Compressed. Can be queried directly with tools like DuckDB without extraction.

Archive Lifecycle:

  1. Facts accumulate in D1 (hot storage)
  2. Archival Workflow identifies Facts past retention window
  3. Facts exported to Parquet, uploaded to R2
  4. Lifecycle Fact written recording the archival (see below)
  5. Facts removed from D1 (ledger reference in DO remains)

Important: The lifecycle Fact must be written BEFORE removing Facts from D1. This ensures the archival operation is auditable even if the removal fails partway through.

Fact {
type: "lifecycle",
subtype: "facts_archived",
entity_id: "tenant_abc",
data: {
archive_url: "r2://tenant_abc/archive/facts/2024/06/batch_001.parquet",
fact_count: 145678,
date_range: { from: "2024-06-01", to: "2024-06-30" },
checksum: "sha256:abc123..."
}
}

Invariant: Archived Facts are never deleted. The archive is append-only, just like the ledger itself.

On-demand exports and system backups.

Export Types:

  • Tenant data export (GDPR, offboarding)
  • System backup snapshots
  • Migration artifacts

Naming Convention:

{tenant_id}/export/{export_type}/{timestamp}_{export_id}.{ext}
_system/backup/{backup_type}/{timestamp}_{backup_id}.{ext}
Examples:
tenant_abc/export/full/2025-01-15_exp_001.zip
_system/backup/d1/2025-01-15_bak_001.sqlite

R2 supports presigned URLs for secure, time-limited access without exposing credentials.

ScenarioURL TypeExpiry
Playback recording in UIGET15 minutes
Download transcriptGET1 hour
Upload documentPUT15 minutes
Export downloadGET24 hours
// Worker generates presigned URL
async function getRecordingUrl(tenant_id: string, recording_path: string): Promise<string> {
// Verify caller has access to tenant
// ...
const url = await env.R2_BUCKET.createSignedUrl(recording_path, {
action: 'get',
expiresIn: 900 // 15 minutes
});
return url;
}
// Client receives presigned URL, fetches directly from R2
// No traffic through Workers for large file downloads

Security Notes:

  • Never expose R2 bucket credentials to clients
  • Always validate tenant access before generating presigned URLs
  • Use shortest reasonable expiry times
  • Log presigned URL generation (who, what, when)

For auditability, presigned URL generation can be tracked:

Fact {
type: "invocation",
subtype: "presigned_url_generated",
tenant_id: "tenant_abc",
user_id: "user_123",
data: {
object_path: "tenant_abc/context/invocation/inv_123/recording/...",
action: "get",
expires_in_seconds: 900,
purpose: "recording_playback"
}
}

R2 supports object lifecycle rules for automatic management.

Bucket/PrefixRuleAction
*/context/invocation/*Age > 90 daysDelete (unless archived)
*/context/asset/*Age > 365 daysDelete (unless flagged)
*/archive/*NeverNo automatic deletion
*/export/*Age > 30 daysDelete
_system/backup/*Keep last 30Delete oldest beyond 30

Tenants can request extended retention via billing Config:

Config {
type: "billing",
applies_to: "tenant_abc",
settings: {
recording_retention_days: 365, // Override default 90
export_retention_days: 90 // Override default 30
}
}

Lifecycle Workflow:

  1. Lifecycle Worker runs daily
  2. Queries objects approaching expiry
  3. Checks for tenant retention overrides
  4. Checks for hold flags (legal, compliance)
  5. Deletes eligible objects
  6. Records lifecycle Fact
Fact {
type: "lifecycle",
subtype: "objects_deleted",
entity_id: "tenant_abc",
data: {
prefix: "tenant_abc/context/invocation/",
objects_deleted: 1234,
bytes_freed: 5678901234,
policy: "default_90_day"
}
}

Data TypeStorageRationale
Facts (hot)DO ledger + D1Queryable, fast, need cross-entity joins
Facts (cold)R2 archiveBulk storage, rare access, compliance
ConfigsDO + D1Versioned, queryable, frequently accessed
EntitiesDO + D1Queryable, relationships matter
Cached StateDO memory/SQLiteHot path, single-entity access
RecordingsR2Large blobs, served directly to clients
TranscriptsR2Large text, referenced from Facts
DocumentsR2Variable size, user-uploaded content
ExportsR2Large, temporary, downloadable
Is it a blob > 1 KB?
├── Yes → R2 (reference from Facts)
└── No → Is it queryable across entities?
├── Yes → D1
└── No → Is it per-entity state?
├── Yes → DO (ledger or cache)
└── No → Probably doesn't need storage
ThresholdGuidance
< 1 KBCan be in Fact.data if truly needed
1 KB - 100 KBR2, consider inline for critical data
100 KB - 100 MBR2, standard upload
> 100 MBR2, multipart upload required

// Bind R2 bucket in wrangler.toml
// [[r2_buckets]]
// binding = "CONTEXT_BUCKET"
// bucket_name = "z0-context"
export default {
async fetch(request, env) {
// Read object
const object = await env.CONTEXT_BUCKET.get("tenant_abc/context/...");
if (!object) return new Response("Not found", { status: 404 });
// Stream response
return new Response(object.body, {
headers: { "Content-Type": object.httpMetadata.contentType }
});
}
};
export class InvocationLedger extends DurableObject {
async storeRecording(recording: ArrayBuffer, metadata: RecordingMetadata) {
const path = `${this.tenant_id}/context/invocation/${this.id}/recording/${timestamp}_${recordingId}.mp3`;
await this.env.CONTEXT_BUCKET.put(path, recording, {
httpMetadata: { contentType: "audio/mpeg" },
customMetadata: {
tenant_id: this.tenant_id,
invocation_id: this.id,
duration_seconds: metadata.duration.toString()
}
});
return `r2://${path}`;
}
}

z0_r2_operations_total{operation, bucket, tenant_id}
z0_r2_bytes_written_total{bucket, tenant_id}
z0_r2_bytes_read_total{bucket, tenant_id}
z0_r2_presigned_urls_generated_total{action, purpose, tenant_id}
z0_r2_lifecycle_deletions_total{policy, tenant_id}
ConditionSeverityAction
Write failure rate > 1%CriticalPage on-call
Storage growth > 20%/dayWarningReview lifecycle policies
Presigned URL generation spikeWarningCheck for abuse
Archive job failureCriticalManual intervention required

R2 pricing (as of 2025):

ComponentCostNotes
Storage$0.015/GB/monthFirst 10 GB free
Class A ops (write)$4.50/millionPUT, POST, LIST
Class B ops (read)$0.36/millionGET, HEAD
Egress$0.00Zero, always

z0 Cost Allocation:

  • Context storage costs allocated to tenant via metadata
  • Archive storage allocated to tenant
  • System backups allocated to platform

Cost Tracking Pattern:

Fact {
type: "cost",
subtype: "storage",
tool_id: "tool_r2",
tenant_id: "tenant_abc",
amount: 15.00,
currency: "USD",
data: {
period: "2025-01",
storage_gb: 1000,
class_a_ops: 50000,
class_b_ops: 500000
}
}

QuestionAnswer
What is R2?S3-compatible object storage, zero egress
What does z0 store in R2?Recordings, transcripts, documents, archives, exports
How are objects referenced?r2:// URLs in Fact.data fields
How do users access objects?Presigned URLs (never direct credentials)
When are objects deleted?Lifecycle policies, tenant-configurable retention
R2 vs D1 vs DO?R2 for blobs, D1 for queries, DO for per-entity state

R2 extends z0’s storage capabilities to handle large payloads without compromising the ledger model. Facts remain lean and queryable; large context lives in R2, referenced by URL.