Skip to content

Cloudflare Queues

Message queues for reliable async processing.

Prerequisites: PRINCIPLES.md, PRIMITIVES.md


Cloudflare Queues provide durable message queues for asynchronous workloads. In z0, Queues decouple the hot path (routing decisions, Fact writes) from background processing (webhooks, projections, batch operations).

CharacteristicValue
DeliveryAt-least-once
OrderingFIFO per producer (not across producers or retries)
Max message size128 KB
Max batch size100 messages or 256 KB
Retention4 days (unacknowledged)
Visibility timeoutConfigurable (default 30s)

Ordering semantics: Messages from the same producer (e.g., same DO) are delivered in order. Messages from different producers or messages that retry may be delivered out of order. Design handlers to be order-independent or use sequence numbers.

Key Insight: Queues are for work that can tolerate delay and retry. If it must happen immediately and exactly once, use a Durable Object. If it can happen eventually with retries, use a Queue.


When a webhook Config triggers, the delivery is queued—not sent inline.

Fact(webhook_triggered) → Queue → Worker → External endpoint
Fact(webhook_sent) or retry

Why Queue: External endpoints are unreliable. Queuing isolates their latency and failures from the hot path.

Queue: z0-webhooks

Message schema:

{
webhook_config_id: string,
webhook_config_version: integer,
fact_id: string,
fact_type: string,
tenant_id: string,
payload: {}, // Pre-computed payload
attempt: integer, // Retry count
first_attempted_at: timestamp
}

Facts are written to DO ledgers first (source of truth), then projected to D1 for queryability.

DO Ledger → Queue → D1 Writer → D1

Why Queue: D1 writes are slower than DO writes. Queuing allows the hot path to complete while D1 catches up.

Queue: z0-projections

Message schema:

{
fact_id: string,
fact_type: string,
tenant_id: string,
entity_id: string,
timestamp: timestamp,
payload: {} // Full Fact for D1 insert
}

Some Facts trigger downstream work that doesn’t need to block the original write:

  • outcome → Evaluate qualification rules, trigger charges/payouts
  • charge → Update budget state projections
  • contact_merged → Update canonical contact cache
Fact written → Queue → Processor → Derived Facts/State

Queue: z0-fact-processing

Message schema:

{
fact_id: string,
fact_type: string,
subtype: string,
tenant_id: string,
processing_type: string, // qualification, budget_update, merge_resolution
payload: {}
}

Traceability requirement: Webhook Facts must include trace_id to link back to the triggering Fact. See PRIMITIVES.md Fact Traceability Invariants.

Bulk imports, exports, and reconciliation run through Queues to avoid blocking.

API request → Queue → Batch Worker → Many writes

Queue: z0-batch

Message schema:

{
batch_id: string,
operation: string, // import, export, reconcile
tenant_id: string,
items: [], // Work items (or reference to R2 manifest)
offset: integer,
total: integer
}

QueueMax RetriesInitial DelayMax DelayBackoff
z0-webhooks510s5mExponential
z0-projections101s1mExponential
z0-fact-processing55s2mExponential
z0-batch330s5mExponential
On message failure:
if attempt < max_retries:
delay = min(initial_delay * 2^attempt, max_delay)
retry after delay
else:
send to dead letter queue
emit metric: z0_queue_dlq_total{queue, reason}
if critical: alert

At-least-once delivery means handlers MUST be idempotent:

OperationIdempotency Strategy
Webhook deliveryIdempotency key in payload; recipient dedupes
D1 projectionUpsert by fact_id; duplicate = no-op
Fact processingCheck if derived Fact exists before creating
Batch operationsTrack progress in batch state; skip completed items

Messages that exhaust retries go to dead letter queues (DLQs) for inspection and manual intervention.

Primary QueueDead Letter QueueRetention
z0-webhooksz0-webhooks-dlq14 days
z0-projectionsz0-projections-dlq14 days
z0-fact-processingz0-fact-processing-dlq14 days
z0-batchz0-batch-dlq14 days
1. Message lands in DLQ
2. Alert fires: z0_queue_dlq_total > 0
3. Operator inspects message in DLQ dashboard
4. Decision:
a. Fix and replay → Move back to primary queue
b. Skip → Delete from DLQ, record Fact(reconciliation)
c. Escalate → Manual intervention required

When moving to DLQ, add failure context:

{
original_message: {...},
failure: {
reason: string,
last_error: string,
attempts: integer,
first_attempted_at: timestamp,
last_attempted_at: timestamp
}
}

Queue consumers receive messages in batches for efficiency:

QueueMax Batch SizeMax Wait Time
z0-webhooks105s
z0-projections1001s
z0-fact-processing502s
z0-batch1010s

Batch processing pattern:

export default {
async queue(batch, env) {
// Process all messages in batch
const results = await Promise.allSettled(
batch.messages.map(msg => processMessage(msg, env))
);
// Ack successful, retry failed
for (let i = 0; i < results.length; i++) {
if (results[i].status === 'fulfilled') {
batch.messages[i].ack();
} else {
batch.messages[i].retry();
}
}
}
};
MetricLimitMitigation
Messages per second400/queueMultiple queues, partition by tenant
Consumers per queue20Sufficient for most workloads
Message size128 KBUse R2 for large payloads, queue references

Large payload pattern:

1. Write payload to R2: r2://z0-payloads/{message_id}
2. Queue message with reference: { payload_ref: "r2://..." }
3. Consumer fetches from R2, processes, deletes

NeedUseWhy
Immediate, exactly-onceDurable ObjectSingle-threaded, transactional
Async, at-least-onceQueueDecouples, retries, batches
Multi-step with stateWorkflowDurable execution, checkpointing
Fire and forgetQueueSimplest async pattern
Long-running (>30s)WorkflowSurvives Worker timeout
External system integrationQueueIsolates external failures
Real-time requirementDurable ObjectLowest latency

Webhook delivery: Queue

  • External system reliability varies
  • Retries needed
  • Latency not critical

Routing decision: Durable Object

  • Must be fast (RTB)
  • Needs cached state
  • Single point of consistency

Fact → D1 projection: Queue

  • D1 slower than DO
  • Eventual consistency acceptable
  • Batching improves throughput

CRM sync (multi-step): Workflow

  • Multiple API calls
  • Needs to survive failures
  • State across steps

Budget check: Durable Object

  • Must be real-time
  • Affects routing eligibility
  • Authoritative state

z0_queue_messages_sent_total{queue}
z0_queue_messages_received_total{queue}
z0_queue_messages_acked_total{queue}
z0_queue_messages_retried_total{queue}
z0_queue_dlq_total{queue, reason}
z0_queue_processing_duration_ms{queue}
z0_queue_batch_size{queue}
z0_queue_depth{queue}
MetricWarningCritical
DLQ depth> 0> 100
Queue depth> 10,000> 100,000
Processing latency p99> 10s> 60s
Retry rate> 5%> 20%
┌─────────────────────────────────────────────────────────────┐
│ Queue Health [Last 1h] │
├─────────────────────────────────────────────────────────────┤
│ Queue │ Depth │ Rate │ Retry% │ DLQ │ Lag │
│ z0-webhooks │ 45 │ 120/s │ 2.1% │ 0 │ 1.2s │
│ z0-projections │ 1,234 │ 890/s │ 0.3% │ 0 │ 0.8s │
│ z0-fact-proc │ 89 │ 456/s │ 1.5% │ 2 │ 2.1s │
│ z0-batch │ 12 │ 5/s │ 0% │ 0 │ 4.5s │
└─────────────────────────────────────────────────────────────┘

[[queues.producers]]
queue = "z0-webhooks"
binding = "WEBHOOK_QUEUE"
[[queues.consumers]]
queue = "z0-webhooks"
max_batch_size = 10
max_batch_timeout = 5
max_retries = 5
dead_letter_queue = "z0-webhooks-dlq"
[[queues.producers]]
queue = "z0-projections"
binding = "PROJECTION_QUEUE"
[[queues.consumers]]
queue = "z0-projections"
max_batch_size = 100
max_batch_timeout = 1
max_retries = 10
dead_letter_queue = "z0-projections-dlq"
z0-{purpose}[-{tenant}][-dlq]
Examples:
z0-webhooks // Platform webhook queue
z0-projections // D1 projection queue
z0-webhooks-dlq // Dead letter queue
z0-tenant-abc-batch // Per-tenant batch queue (if sharding)

Anti-PatternProblemInstead
Queue for routing decisionsToo slow for RTBUse Durable Object
Assume orderingQueues don’t guarantee orderDesign for out-of-order
Large messages (>128KB)Will be rejectedUse R2 + reference
Blocking on queue in hot pathAdds latencyFire and forget
Ignoring DLQSilent failuresAlert and handle
No idempotencyDuplicate processingDesign idempotent handlers

Mistake: Waiting for queue acknowledgment in API response

// BAD: Adds queue latency to API response
await queue.send(message);
await waitForProcessing(message.id);
return { status: 'processed' };
// GOOD: Fire and forget, return immediately
await queue.send(message);
return { status: 'queued', message_id: message.id };

Mistake: Assuming message order

// BAD: Relies on order
messages.forEach(msg => applyDelta(msg.delta));
// GOOD: Handles out-of-order
messages.forEach(msg => upsertByTimestamp(msg));

QueuePurposeRetryBatchDLQ
z0-webhooksExternal notifications5x10Yes
z0-projectionsDO → D1 replication10x100Yes
z0-fact-processingDerived Fact generation5x50Yes
z0-batchBulk operations3x10Yes

Queues isolate background work from the hot path. Use them for anything that can tolerate delay and benefits from retry semantics. Use Durable Objects for real-time, exactly-once operations. Use Workflows for multi-step processes that need durable execution.