Event-driven architecture is one of those topics where the conference talks make it sound inevitable and the production experience makes you question your life choices. I’ve built event-driven systems that beautifully decoupled complex domains, and I’ve built ones that turned simple operations into untraceable chains of events that nobody could debug.
The difference wasn’t the technology. It was knowing when the pattern fit the problem — and critically — when it didn’t. Here’s what you need to know before you put events at the center of your system.
“Data on the outside is immutable. The only thing you can do with data on the outside is interpret it.” — Pat Helland
Pat’s insight underpins all event-driven thinking: events are facts about things that happened. They’re immutable. Subscribers interpret them however they need to. That’s the power — and the complexity.
When to Use Events vs. Request-Response
This is the most important decision in event-driven architecture, and teams get it wrong constantly. Here’s the mental model:
| Signal | Use Request-Response | Use Events |
|---|
| Caller needs the result to proceed | ✅ Get a balance before authorizing a payment | |
| Caller doesn’t care about the outcome | | ✅ Send a notification after an approval |
| Work must happen before the response | ✅ Validate permissions, check constraints | |
| Work can happen later | | ✅ Update analytics, sync to a third-party system |
| Failure must be immediately visible | ✅ Show “insufficient funds” to the user | |
| Failure can be retried in background | | ✅ Retry sending an email, retry syncing a record |
| One consumer | ✅ Simple function call | |
| Multiple consumers with different concerns | | ✅ Approval triggers accounting, reimbursement, notifications, audit |
The rule: synchronous for reads and validations, asynchronous for side effects and notifications.
The power of events shows up when a single action triggers multiple downstream effects. An expense approval, for example, might need to update the accounting ledger, queue a reimbursement, notify the employee, and log an audit trail. Without events, the approval endpoint directly calls four different services and becomes a fragile coordination point. With events, it publishes one expense.approved event, and each downstream system subscribes independently.
If adding a new side effect requires modifying the original endpoint, you need events. If the side effect is a core part of the operation (not optional, not independent), keep it synchronous. The test is: “Would the operation be considered complete without this step?”
Message Broker Comparison
Choosing a broker is a consequential decision. Here’s an honest comparison from production experience:
| Feature | Redis Pub/Sub | BullMQ (Redis) | Amazon SQS | Apache Kafka |
|---|
| Delivery guarantee | At-most-once (fire & forget) | At-least-once (with retries) | At-least-once | At-least-once / exactly-once |
| Persistence | None — miss it, lose it | Redis-backed job storage | 14-day retention | Configurable (days to forever) |
| Ordering | Per-channel | Per-queue (FIFO) | FIFO queues available | Per-partition |
| Complexity | Low | Medium | Medium | High |
| Dead letter queue | DIY | Built-in | Built-in | DIY or Kafka Streams |
| Best for | Real-time notifications, cache invalidation | Background jobs, retries, scheduled work | Serverless / AWS-native stacks | Event streaming, log aggregation, replay |
My default stack: BullMQ for background jobs and task queues, SQS for cross-service communication, Redis Pub/Sub only for real-time UI updates where message loss is acceptable (typing indicators, presence).
Redis Pub/Sub has no persistence. If a subscriber is offline when a message is published, that message is gone forever. I’ve seen teams use Redis Pub/Sub for critical business events and lose data during routine deployments. Use BullMQ or SQS for anything that can’t be lost.
CQRS: Simpler Than It Sounds
Command Query Responsibility Segregation sounds academic, but the core idea is practical: separate the models for reading data and writing data.
Think of it like a restaurant. The kitchen (write side) has a workflow optimized for preparing food — recipes, ingredient lists, timing. The menu (read side) is optimized for answering questions — what’s available, what does it cost, what’s popular. They’re both about the same food, but they’re structured completely differently because they serve different purposes.
| Aspect | Write Model (Commands) | Read Model (Queries) |
|---|
| Optimized for | Enforcing business rules, maintaining consistency | Answering questions quickly |
| Shape | Normalized, domain-driven | Denormalized, UI-driven |
| Validation | Heavy — permissions, constraints, invariants | Light — just fetch and return |
| Storage | Primary database, transactional | Can be a separate DB, materialized view, or cache |
| Example | ApproveExpense(expenseId, approverId) | GetExpenseSummaryByMonth(orgId, month) |
You don’t need to go full CQRS with separate databases to benefit from the principle. Even separating your DTOs into CreateExpenseDto (command) and ExpenseResponseDto (query) is a form of CQRS — and often all you need.
Eventual Consistency: The Trade-Off You Must Explain
Event-driven systems are eventually consistent. When a manager approves an expense, the approval is recorded immediately, but the reimbursement, the accounting entry, and the analytics update happen asynchronously. For a brief window, different parts of the system disagree about the world’s state.
This is fine — as long as everyone knows it. The problem isn’t eventual consistency itself. It’s when the team builds a UI that assumes immediate consistency.
| Action Type | Consistency Strategy | Why |
|---|
| Low-stakes (mark notification read) | Optimistic update — change UI immediately | Users expect instant response; if it fails, retry silently |
| Medium-stakes (update a profile) | Optimistic + background sync | Show the change immediately, confirm async |
| High-stakes (financial transaction) | Show “processing” state, poll or push for confirmation | Users expect payments to take a moment; accuracy matters more than speed |
The rule: optimistic updates for low-stakes actions, explicit processing states for high-stakes actions. Users expect a payment to take a moment. They don’t expect marking a notification as read to take a moment.
Idempotency: The Non-Negotiable
In event-driven systems, messages can arrive more than once. Network hiccups, consumer restarts, retry logic — all cause duplicate delivery. Every consumer must be idempotent: processing the same message twice produces the same result as processing it once.
async function handleExpenseApproved(event: ExpenseApprovedEvent) {
const existing = await db.reimbursement.findUnique({
where: { expenseId: event.expenseId },
});
if (existing) return; // Already processed — skip
await db.reimbursement.create({
data: {
expenseId: event.expenseId,
amount: event.amount,
idempotencyKey: `reimburse-${event.expenseId}`,
},
});
}
The pattern: check before you act, use a unique constraint as the last line of defense. The database unique index catches the race condition that the application-level check might miss.
Event-driven architecture isn’t inherently better or worse than request-response. It’s a tool that trades simplicity for decoupling. Use it where the decoupling has clear value — cross-team boundaries, unreliable downstream systems, work that doesn’t need to happen synchronously. Keep it out of places where a simple function call would do.
Event-driven systems reward disciplined thinking about data flow, failure modes, and consistency trade-offs. Start small — a single background job queue for slow operations. Graduate to pub/sub when you have genuine multi-consumer needs. Reach for event sourcing only when an immutable audit trail is a regulatory requirement. The best event-driven architectures aren’t the most sophisticated — they’re the ones where every event earns its place.