Cloudflare Queues
Message queues for reliable async processing.
Prerequisites: PRINCIPLES.md, PRIMITIVES.md
Overview
Section titled “Overview”Cloudflare Queues provide durable message queues for asynchronous workloads. In z0, Queues decouple the hot path (routing decisions, Fact writes) from background processing (webhooks, projections, batch operations).
| Characteristic | Value |
|---|---|
| Delivery | At-least-once |
| Ordering | FIFO per producer (not across producers or retries) |
| Max message size | 128 KB |
| Max batch size | 100 messages or 256 KB |
| Retention | 4 days (unacknowledged) |
| Visibility timeout | Configurable (default 30s) |
Ordering semantics: Messages from the same producer (e.g., same DO) are delivered in order. Messages from different producers or messages that retry may be delivered out of order. Design handlers to be order-independent or use sequence numbers.
Key Insight: Queues are for work that can tolerate delay and retry. If it must happen immediately and exactly once, use a Durable Object. If it can happen eventually with retries, use a Queue.
How z0 Uses Queues
Section titled “How z0 Uses Queues”1. Webhook Delivery
Section titled “1. Webhook Delivery”When a webhook Config triggers, the delivery is queued—not sent inline.
Fact(webhook_triggered) → Queue → Worker → External endpoint ↓ Fact(webhook_sent) or retryWhy Queue: External endpoints are unreliable. Queuing isolates their latency and failures from the hot path.
Queue: z0-webhooks
Message schema:
{ webhook_config_id: string, webhook_config_version: integer, fact_id: string, fact_type: string, tenant_id: string, payload: {}, // Pre-computed payload attempt: integer, // Retry count first_attempted_at: timestamp}2. D1 Projection Updates
Section titled “2. D1 Projection Updates”Facts are written to DO ledgers first (source of truth), then projected to D1 for queryability.
DO Ledger → Queue → D1 Writer → D1Why Queue: D1 writes are slower than DO writes. Queuing allows the hot path to complete while D1 catches up.
Queue: z0-projections
Message schema:
{ fact_id: string, fact_type: string, tenant_id: string, entity_id: string, timestamp: timestamp, payload: {} // Full Fact for D1 insert}3. Async Fact Processing
Section titled “3. Async Fact Processing”Some Facts trigger downstream work that doesn’t need to block the original write:
- outcome → Evaluate qualification rules, trigger charges/payouts
- charge → Update budget state projections
- contact_merged → Update canonical contact cache
Fact written → Queue → Processor → Derived Facts/StateQueue: z0-fact-processing
Message schema:
{ fact_id: string, fact_type: string, subtype: string, tenant_id: string, processing_type: string, // qualification, budget_update, merge_resolution payload: {}}Traceability requirement: Webhook Facts must include trace_id to link back to the triggering Fact. See PRIMITIVES.md Fact Traceability Invariants.
4. Batch Operations
Section titled “4. Batch Operations”Bulk imports, exports, and reconciliation run through Queues to avoid blocking.
API request → Queue → Batch Worker → Many writesQueue: z0-batch
Message schema:
{ batch_id: string, operation: string, // import, export, reconcile tenant_id: string, items: [], // Work items (or reference to R2 manifest) offset: integer, total: integer}Retry Policies
Section titled “Retry Policies”Standard Retry Configuration
Section titled “Standard Retry Configuration”| Queue | Max Retries | Initial Delay | Max Delay | Backoff |
|---|---|---|---|---|
| z0-webhooks | 5 | 10s | 5m | Exponential |
| z0-projections | 10 | 1s | 1m | Exponential |
| z0-fact-processing | 5 | 5s | 2m | Exponential |
| z0-batch | 3 | 30s | 5m | Exponential |
Retry Decision Logic
Section titled “Retry Decision Logic”On message failure: if attempt < max_retries: delay = min(initial_delay * 2^attempt, max_delay) retry after delay else: send to dead letter queue emit metric: z0_queue_dlq_total{queue, reason} if critical: alertIdempotency Requirements
Section titled “Idempotency Requirements”At-least-once delivery means handlers MUST be idempotent:
| Operation | Idempotency Strategy |
|---|---|
| Webhook delivery | Idempotency key in payload; recipient dedupes |
| D1 projection | Upsert by fact_id; duplicate = no-op |
| Fact processing | Check if derived Fact exists before creating |
| Batch operations | Track progress in batch state; skip completed items |
Dead Letter Queues
Section titled “Dead Letter Queues”Messages that exhaust retries go to dead letter queues (DLQs) for inspection and manual intervention.
DLQ Configuration
Section titled “DLQ Configuration”| Primary Queue | Dead Letter Queue | Retention |
|---|---|---|
| z0-webhooks | z0-webhooks-dlq | 14 days |
| z0-projections | z0-projections-dlq | 14 days |
| z0-fact-processing | z0-fact-processing-dlq | 14 days |
| z0-batch | z0-batch-dlq | 14 days |
DLQ Handling
Section titled “DLQ Handling”1. Message lands in DLQ2. Alert fires: z0_queue_dlq_total > 03. Operator inspects message in DLQ dashboard4. Decision: a. Fix and replay → Move back to primary queue b. Skip → Delete from DLQ, record Fact(reconciliation) c. Escalate → Manual intervention requiredDLQ Message Enrichment
Section titled “DLQ Message Enrichment”When moving to DLQ, add failure context:
{ original_message: {...}, failure: { reason: string, last_error: string, attempts: integer, first_attempted_at: timestamp, last_attempted_at: timestamp }}Batching and Throughput
Section titled “Batching and Throughput”Consumer Batching
Section titled “Consumer Batching”Queue consumers receive messages in batches for efficiency:
| Queue | Max Batch Size | Max Wait Time |
|---|---|---|
| z0-webhooks | 10 | 5s |
| z0-projections | 100 | 1s |
| z0-fact-processing | 50 | 2s |
| z0-batch | 10 | 10s |
Batch processing pattern:
export default { async queue(batch, env) { // Process all messages in batch const results = await Promise.allSettled( batch.messages.map(msg => processMessage(msg, env)) );
// Ack successful, retry failed for (let i = 0; i < results.length; i++) { if (results[i].status === 'fulfilled') { batch.messages[i].ack(); } else { batch.messages[i].retry(); } } }};Throughput Considerations
Section titled “Throughput Considerations”| Metric | Limit | Mitigation |
|---|---|---|
| Messages per second | 400/queue | Multiple queues, partition by tenant |
| Consumers per queue | 20 | Sufficient for most workloads |
| Message size | 128 KB | Use R2 for large payloads, queue references |
Large payload pattern:
1. Write payload to R2: r2://z0-payloads/{message_id}2. Queue message with reference: { payload_ref: "r2://..." }3. Consumer fetches from R2, processes, deletesWhen to Use What
Section titled “When to Use What”Decision Matrix
Section titled “Decision Matrix”| Need | Use | Why |
|---|---|---|
| Immediate, exactly-once | Durable Object | Single-threaded, transactional |
| Async, at-least-once | Queue | Decouples, retries, batches |
| Multi-step with state | Workflow | Durable execution, checkpointing |
| Fire and forget | Queue | Simplest async pattern |
| Long-running (>30s) | Workflow | Survives Worker timeout |
| External system integration | Queue | Isolates external failures |
| Real-time requirement | Durable Object | Lowest latency |
Common Patterns
Section titled “Common Patterns”Webhook delivery: Queue
- External system reliability varies
- Retries needed
- Latency not critical
Routing decision: Durable Object
- Must be fast (RTB)
- Needs cached state
- Single point of consistency
Fact → D1 projection: Queue
- D1 slower than DO
- Eventual consistency acceptable
- Batching improves throughput
CRM sync (multi-step): Workflow
- Multiple API calls
- Needs to survive failures
- State across steps
Budget check: Durable Object
- Must be real-time
- Affects routing eligibility
- Authoritative state
Observability
Section titled “Observability”Metrics
Section titled “Metrics”z0_queue_messages_sent_total{queue}z0_queue_messages_received_total{queue}z0_queue_messages_acked_total{queue}z0_queue_messages_retried_total{queue}z0_queue_dlq_total{queue, reason}z0_queue_processing_duration_ms{queue}z0_queue_batch_size{queue}z0_queue_depth{queue}Alert Thresholds
Section titled “Alert Thresholds”| Metric | Warning | Critical |
|---|---|---|
| DLQ depth | > 0 | > 100 |
| Queue depth | > 10,000 | > 100,000 |
| Processing latency p99 | > 10s | > 60s |
| Retry rate | > 5% | > 20% |
Dashboard Panels
Section titled “Dashboard Panels”┌─────────────────────────────────────────────────────────────┐│ Queue Health [Last 1h] │├─────────────────────────────────────────────────────────────┤│ Queue │ Depth │ Rate │ Retry% │ DLQ │ Lag ││ z0-webhooks │ 45 │ 120/s │ 2.1% │ 0 │ 1.2s ││ z0-projections │ 1,234 │ 890/s │ 0.3% │ 0 │ 0.8s ││ z0-fact-proc │ 89 │ 456/s │ 1.5% │ 2 │ 2.1s ││ z0-batch │ 12 │ 5/s │ 0% │ 0 │ 4.5s │└─────────────────────────────────────────────────────────────┘Configuration
Section titled “Configuration”wrangler.toml
Section titled “wrangler.toml”[[queues.producers]]queue = "z0-webhooks"binding = "WEBHOOK_QUEUE"
[[queues.consumers]]queue = "z0-webhooks"max_batch_size = 10max_batch_timeout = 5max_retries = 5dead_letter_queue = "z0-webhooks-dlq"
[[queues.producers]]queue = "z0-projections"binding = "PROJECTION_QUEUE"
[[queues.consumers]]queue = "z0-projections"max_batch_size = 100max_batch_timeout = 1max_retries = 10dead_letter_queue = "z0-projections-dlq"Queue Naming Convention
Section titled “Queue Naming Convention”z0-{purpose}[-{tenant}][-dlq]
Examples:z0-webhooks // Platform webhook queuez0-projections // D1 projection queuez0-webhooks-dlq // Dead letter queuez0-tenant-abc-batch // Per-tenant batch queue (if sharding)Anti-Patterns
Section titled “Anti-Patterns”Do Not
Section titled “Do Not”| Anti-Pattern | Problem | Instead |
|---|---|---|
| Queue for routing decisions | Too slow for RTB | Use Durable Object |
| Assume ordering | Queues don’t guarantee order | Design for out-of-order |
| Large messages (>128KB) | Will be rejected | Use R2 + reference |
| Blocking on queue in hot path | Adds latency | Fire and forget |
| Ignoring DLQ | Silent failures | Alert and handle |
| No idempotency | Duplicate processing | Design idempotent handlers |
Common Mistakes
Section titled “Common Mistakes”Mistake: Waiting for queue acknowledgment in API response
// BAD: Adds queue latency to API responseawait queue.send(message);await waitForProcessing(message.id);return { status: 'processed' };
// GOOD: Fire and forget, return immediatelyawait queue.send(message);return { status: 'queued', message_id: message.id };Mistake: Assuming message order
// BAD: Relies on ordermessages.forEach(msg => applyDelta(msg.delta));
// GOOD: Handles out-of-ordermessages.forEach(msg => upsertByTimestamp(msg));Summary
Section titled “Summary”| Queue | Purpose | Retry | Batch | DLQ |
|---|---|---|---|---|
| z0-webhooks | External notifications | 5x | 10 | Yes |
| z0-projections | DO → D1 replication | 10x | 100 | Yes |
| z0-fact-processing | Derived Fact generation | 5x | 50 | Yes |
| z0-batch | Bulk operations | 3x | 10 | Yes |
Queues isolate background work from the hot path. Use them for anything that can tolerate delay and benefits from retry semantics. Use Durable Objects for real-time, exactly-once operations. Use Workflows for multi-step processes that need durable execution.