Skip to main content
Event-driven architecture is one of those topics where the conference talks make it sound inevitable and the production experience makes you question your life choices. I’ve built event-driven systems that beautifully decoupled complex domains, and I’ve built ones that turned simple operations into untraceable chains of events that nobody could debug. The difference wasn’t the technology. It was knowing when the pattern fit the problem — and critically — when it didn’t. Here’s what you need to know before you put events at the center of your system. Team collaborating at a whiteboard
“Data on the outside is immutable. The only thing you can do with data on the outside is interpret it.” — Pat Helland
Pat’s insight underpins all event-driven thinking: events are facts about things that happened. They’re immutable. Subscribers interpret them however they need to. That’s the power — and the complexity.

When to Use Events vs. Request-Response

This is the most important decision in event-driven architecture, and teams get it wrong constantly. Here’s the mental model:
SignalUse Request-ResponseUse Events
Caller needs the result to proceed✅ Get a balance before authorizing a payment
Caller doesn’t care about the outcome✅ Send a notification after an approval
Work must happen before the response✅ Validate permissions, check constraints
Work can happen later✅ Update analytics, sync to a third-party system
Failure must be immediately visible✅ Show “insufficient funds” to the user
Failure can be retried in background✅ Retry sending an email, retry syncing a record
One consumer✅ Simple function call
Multiple consumers with different concerns✅ Approval triggers accounting, reimbursement, notifications, audit
The rule: synchronous for reads and validations, asynchronous for side effects and notifications. The power of events shows up when a single action triggers multiple downstream effects. An expense approval, for example, might need to update the accounting ledger, queue a reimbursement, notify the employee, and log an audit trail. Without events, the approval endpoint directly calls four different services and becomes a fragile coordination point. With events, it publishes one expense.approved event, and each downstream system subscribes independently.
If adding a new side effect requires modifying the original endpoint, you need events. If the side effect is a core part of the operation (not optional, not independent), keep it synchronous. The test is: “Would the operation be considered complete without this step?”

Message Broker Comparison

Choosing a broker is a consequential decision. Here’s an honest comparison from production experience:
FeatureRedis Pub/SubBullMQ (Redis)Amazon SQSApache Kafka
Delivery guaranteeAt-most-once (fire & forget)At-least-once (with retries)At-least-onceAt-least-once / exactly-once
PersistenceNone — miss it, lose itRedis-backed job storage14-day retentionConfigurable (days to forever)
OrderingPer-channelPer-queue (FIFO)FIFO queues availablePer-partition
ComplexityLowMediumMediumHigh
Dead letter queueDIYBuilt-inBuilt-inDIY or Kafka Streams
Best forReal-time notifications, cache invalidationBackground jobs, retries, scheduled workServerless / AWS-native stacksEvent streaming, log aggregation, replay
My default stack: BullMQ for background jobs and task queues, SQS for cross-service communication, Redis Pub/Sub only for real-time UI updates where message loss is acceptable (typing indicators, presence).
Redis Pub/Sub has no persistence. If a subscriber is offline when a message is published, that message is gone forever. I’ve seen teams use Redis Pub/Sub for critical business events and lose data during routine deployments. Use BullMQ or SQS for anything that can’t be lost.

CQRS: Simpler Than It Sounds

Command Query Responsibility Segregation sounds academic, but the core idea is practical: separate the models for reading data and writing data. Think of it like a restaurant. The kitchen (write side) has a workflow optimized for preparing food — recipes, ingredient lists, timing. The menu (read side) is optimized for answering questions — what’s available, what does it cost, what’s popular. They’re both about the same food, but they’re structured completely differently because they serve different purposes.
AspectWrite Model (Commands)Read Model (Queries)
Optimized forEnforcing business rules, maintaining consistencyAnswering questions quickly
ShapeNormalized, domain-drivenDenormalized, UI-driven
ValidationHeavy — permissions, constraints, invariantsLight — just fetch and return
StoragePrimary database, transactionalCan be a separate DB, materialized view, or cache
ExampleApproveExpense(expenseId, approverId)GetExpenseSummaryByMonth(orgId, month)
You don’t need to go full CQRS with separate databases to benefit from the principle. Even separating your DTOs into CreateExpenseDto (command) and ExpenseResponseDto (query) is a form of CQRS — and often all you need.

Eventual Consistency: The Trade-Off You Must Explain

Event-driven systems are eventually consistent. When a manager approves an expense, the approval is recorded immediately, but the reimbursement, the accounting entry, and the analytics update happen asynchronously. For a brief window, different parts of the system disagree about the world’s state. This is fine — as long as everyone knows it. The problem isn’t eventual consistency itself. It’s when the team builds a UI that assumes immediate consistency.
Action TypeConsistency StrategyWhy
Low-stakes (mark notification read)Optimistic update — change UI immediatelyUsers expect instant response; if it fails, retry silently
Medium-stakes (update a profile)Optimistic + background syncShow the change immediately, confirm async
High-stakes (financial transaction)Show “processing” state, poll or push for confirmationUsers expect payments to take a moment; accuracy matters more than speed
The rule: optimistic updates for low-stakes actions, explicit processing states for high-stakes actions. Users expect a payment to take a moment. They don’t expect marking a notification as read to take a moment.

Idempotency: The Non-Negotiable

In event-driven systems, messages can arrive more than once. Network hiccups, consumer restarts, retry logic — all cause duplicate delivery. Every consumer must be idempotent: processing the same message twice produces the same result as processing it once.
async function handleExpenseApproved(event: ExpenseApprovedEvent) {
  const existing = await db.reimbursement.findUnique({
    where: { expenseId: event.expenseId },
  });
  if (existing) return; // Already processed — skip

  await db.reimbursement.create({
    data: {
      expenseId: event.expenseId,
      amount: event.amount,
      idempotencyKey: `reimburse-${event.expenseId}`,
    },
  });
}
The pattern: check before you act, use a unique constraint as the last line of defense. The database unique index catches the race condition that the application-level check might miss.
Event-driven architecture isn’t inherently better or worse than request-response. It’s a tool that trades simplicity for decoupling. Use it where the decoupling has clear value — cross-team boundaries, unreliable downstream systems, work that doesn’t need to happen synchronously. Keep it out of places where a simple function call would do.
Event-driven systems reward disciplined thinking about data flow, failure modes, and consistency trade-offs. Start small — a single background job queue for slow operations. Graduate to pub/sub when you have genuine multi-consumer needs. Reach for event sourcing only when an immutable audit trail is a regulatory requirement. The best event-driven architectures aren’t the most sophisticated — they’re the ones where every event earns its place.