Cloudflare Queues

Message queues for reliable async processing.

Prerequisites: PRINCIPLES.md, PRIMITIVES.md

Overview

Cloudflare Queues provide durable message queues for asynchronous workloads. In z0, Queues decouple the hot path (routing decisions, Fact writes) from background processing (webhooks, projections, batch operations).

Characteristic	Value
Delivery	At-least-once
Ordering	FIFO per producer (not across producers or retries)
Max message size	128 KB
Max batch size	100 messages or 256 KB
Retention	4 days (unacknowledged)
Visibility timeout	Configurable (default 30s)

Ordering semantics: Messages from the same producer (e.g., same DO) are delivered in order. Messages from different producers or messages that retry may be delivered out of order. Design handlers to be order-independent or use sequence numbers.

Key Insight: Queues are for work that can tolerate delay and retry. If it must happen immediately and exactly once, use a Durable Object. If it can happen eventually with retries, use a Queue.

How z0 Uses Queues

1. Webhook Delivery

When a webhook Config triggers, the delivery is queued—not sent inline.

Fact(webhook_triggered) → Queue → Worker → External endpoint
                                     ↓
                           Fact(webhook_sent) or retry

Why Queue: External endpoints are unreliable. Queuing isolates their latency and failures from the hot path.

Queue: z0-webhooks

Message schema:

{
  webhook_config_id: string,
  webhook_config_version: integer,
  fact_id: string,
  fact_type: string,
  tenant_id: string,
  payload: {},           // Pre-computed payload
  attempt: integer,      // Retry count
  first_attempted_at: timestamp
}

2. D1 Projection Updates

Facts are written to DO ledgers first (source of truth), then projected to D1 for queryability.

DO Ledger → Queue → D1 Writer → D1

Why Queue: D1 writes are slower than DO writes. Queuing allows the hot path to complete while D1 catches up.

Queue: z0-projections

Message schema:

{
  fact_id: string,
  fact_type: string,
  tenant_id: string,
  entity_id: string,
  timestamp: timestamp,
  payload: {}            // Full Fact for D1 insert
}

3. Async Fact Processing

Some Facts trigger downstream work that doesn’t need to block the original write:

outcome → Evaluate qualification rules, trigger charges/payouts
charge → Update budget state projections
contact_merged → Update canonical contact cache

Fact written → Queue → Processor → Derived Facts/State

Queue: z0-fact-processing

Message schema:

{
  fact_id: string,
  fact_type: string,
  subtype: string,
  tenant_id: string,
  processing_type: string,  // qualification, budget_update, merge_resolution
  payload: {}
}

Traceability requirement: Webhook Facts must include trace_id to link back to the triggering Fact. See PRIMITIVES.md Fact Traceability Invariants.

4. Batch Operations

Bulk imports, exports, and reconciliation run through Queues to avoid blocking.

API request → Queue → Batch Worker → Many writes

Queue: z0-batch

Message schema:

{
  batch_id: string,
  operation: string,      // import, export, reconcile
  tenant_id: string,
  items: [],              // Work items (or reference to R2 manifest)
  offset: integer,
  total: integer
}

Retry Policies

Standard Retry Configuration

Queue	Max Retries	Initial Delay	Max Delay	Backoff
z0-webhooks	5	10s	5m	Exponential
z0-projections	10	1s	1m	Exponential
z0-fact-processing	5	5s	2m	Exponential
z0-batch	3	30s	5m	Exponential

Retry Decision Logic

On message failure:
  if attempt < max_retries:
    delay = min(initial_delay * 2^attempt, max_delay)
    retry after delay
  else:
    send to dead letter queue
    emit metric: z0_queue_dlq_total{queue, reason}
    if critical: alert

Idempotency Requirements

At-least-once delivery means handlers MUST be idempotent:

Operation	Idempotency Strategy
Webhook delivery	Idempotency key in payload; recipient dedupes
D1 projection	Upsert by fact_id; duplicate = no-op
Fact processing	Check if derived Fact exists before creating
Batch operations	Track progress in batch state; skip completed items

Dead Letter Queues

Messages that exhaust retries go to dead letter queues (DLQs) for inspection and manual intervention.

DLQ Configuration

Primary Queue	Dead Letter Queue	Retention
z0-webhooks	z0-webhooks-dlq	14 days
z0-projections	z0-projections-dlq	14 days
z0-fact-processing	z0-fact-processing-dlq	14 days
z0-batch	z0-batch-dlq	14 days

DLQ Handling

1. Message lands in DLQ
2. Alert fires: z0_queue_dlq_total > 0
3. Operator inspects message in DLQ dashboard
4. Decision:
   a. Fix and replay → Move back to primary queue
   b. Skip → Delete from DLQ, record Fact(reconciliation)
   c. Escalate → Manual intervention required

DLQ Message Enrichment

When moving to DLQ, add failure context:

{
  original_message: {...},
  failure: {
    reason: string,
    last_error: string,
    attempts: integer,
    first_attempted_at: timestamp,
    last_attempted_at: timestamp
  }
}

Batching and Throughput

Consumer Batching

Queue consumers receive messages in batches for efficiency:

Queue	Max Batch Size	Max Wait Time
z0-webhooks	10	5s
z0-projections	100	1s
z0-fact-processing	50	2s
z0-batch	10	10s

Batch processing pattern:

export default {
  async queue(batch, env) {
    // Process all messages in batch
    const results = await Promise.allSettled(
      batch.messages.map(msg => processMessage(msg, env))
    );

    // Ack successful, retry failed
    for (let i = 0; i < results.length; i++) {
      if (results[i].status === 'fulfilled') {
        batch.messages[i].ack();
      } else {
        batch.messages[i].retry();
      }
    }
  }
};

Throughput Considerations

Metric	Limit	Mitigation
Messages per second	400/queue	Multiple queues, partition by tenant
Consumers per queue	20	Sufficient for most workloads
Message size	128 KB	Use R2 for large payloads, queue references

Large payload pattern:

1. Write payload to R2: r2://z0-payloads/{message_id}
2. Queue message with reference: { payload_ref: "r2://..." }
3. Consumer fetches from R2, processes, deletes

When to Use What

Decision Matrix

Need	Use	Why
Immediate, exactly-once	Durable Object	Single-threaded, transactional
Async, at-least-once	Queue	Decouples, retries, batches
Multi-step with state	Workflow	Durable execution, checkpointing
Fire and forget	Queue	Simplest async pattern
Long-running (>30s)	Workflow	Survives Worker timeout
External system integration	Queue	Isolates external failures
Real-time requirement	Durable Object	Lowest latency

Common Patterns

Webhook delivery: Queue

External system reliability varies
Retries needed
Latency not critical

Routing decision: Durable Object

Must be fast (RTB)
Needs cached state
Single point of consistency

Fact → D1 projection: Queue

D1 slower than DO
Eventual consistency acceptable
Batching improves throughput

CRM sync (multi-step): Workflow

Multiple API calls
Needs to survive failures
State across steps

Budget check: Durable Object

Must be real-time
Affects routing eligibility
Authoritative state

Observability

Metrics

z0_queue_messages_sent_total{queue}
z0_queue_messages_received_total{queue}
z0_queue_messages_acked_total{queue}
z0_queue_messages_retried_total{queue}
z0_queue_dlq_total{queue, reason}
z0_queue_processing_duration_ms{queue}
z0_queue_batch_size{queue}
z0_queue_depth{queue}

Alert Thresholds

Metric	Warning	Critical
DLQ depth	> 0	> 100
Queue depth	> 10,000	> 100,000
Processing latency p99	> 10s	> 60s
Retry rate	> 5%	> 20%

Dashboard Panels

┌─────────────────────────────────────────────────────────────┐
│ Queue Health                                     [Last 1h]  │
├─────────────────────────────────────────────────────────────┤
│ Queue          │ Depth  │ Rate   │ Retry% │ DLQ   │ Lag    │
│ z0-webhooks    │ 45     │ 120/s  │ 2.1%   │ 0     │ 1.2s   │
│ z0-projections │ 1,234  │ 890/s  │ 0.3%   │ 0     │ 0.8s   │
│ z0-fact-proc   │ 89     │ 456/s  │ 1.5%   │ 2     │ 2.1s   │
│ z0-batch       │ 12     │ 5/s    │ 0%     │ 0     │ 4.5s   │
└─────────────────────────────────────────────────────────────┘

Configuration

wrangler.toml

[[queues.producers]]
queue = "z0-webhooks"
binding = "WEBHOOK_QUEUE"

[[queues.consumers]]
queue = "z0-webhooks"
max_batch_size = 10
max_batch_timeout = 5
max_retries = 5
dead_letter_queue = "z0-webhooks-dlq"

[[queues.producers]]
queue = "z0-projections"
binding = "PROJECTION_QUEUE"

[[queues.consumers]]
queue = "z0-projections"
max_batch_size = 100
max_batch_timeout = 1
max_retries = 10
dead_letter_queue = "z0-projections-dlq"

Queue Naming Convention

z0-{purpose}[-{tenant}][-dlq]

Examples:
z0-webhooks          // Platform webhook queue
z0-projections       // D1 projection queue
z0-webhooks-dlq      // Dead letter queue
z0-tenant-abc-batch  // Per-tenant batch queue (if sharding)

Anti-Patterns

Do Not

Anti-Pattern	Problem	Instead
Queue for routing decisions	Too slow for RTB	Use Durable Object
Assume ordering	Queues don’t guarantee order	Design for out-of-order
Large messages (>128KB)	Will be rejected	Use R2 + reference
Blocking on queue in hot path	Adds latency	Fire and forget
Ignoring DLQ	Silent failures	Alert and handle
No idempotency	Duplicate processing	Design idempotent handlers

Common Mistakes

Mistake: Waiting for queue acknowledgment in API response

// BAD: Adds queue latency to API response
await queue.send(message);
await waitForProcessing(message.id);
return { status: 'processed' };

// GOOD: Fire and forget, return immediately
await queue.send(message);
return { status: 'queued', message_id: message.id };

Mistake: Assuming message order

// BAD: Relies on order
messages.forEach(msg => applyDelta(msg.delta));

// GOOD: Handles out-of-order
messages.forEach(msg => upsertByTimestamp(msg));

Summary

Queue	Purpose	Retry	Batch	DLQ
z0-webhooks	External notifications	5x	10	Yes
z0-projections	DO → D1 replication	10x	100	Yes
z0-fact-processing	Derived Fact generation	5x	50	Yes
z0-batch	Bulk operations	3x	10	Yes

Queues isolate background work from the hot path. Use them for anything that can tolerate delay and benefits from retry semantics. Use Durable Objects for real-time, exactly-once operations. Use Workflows for multi-step processes that need durable execution.