In tropical forests, there’s a tree called the strangler fig. It starts life as a seed dropped by a bird onto the branch of a host tree. It sends roots down to the ground, wraps around the host, and over years — sometimes decades — gradually replaces it. The host tree decays and eventually disappears, leaving the fig standing in its place. From the outside, the tree looks the same the entire time. It’s only when you cut it open that you see the old wood is gone.
That’s exactly how you should migrate a legacy system. Not with a dramatic rewrite. Not with a “big bang” launch weekend. Slowly, patiently, one piece at a time — until one day the old system is gone and nobody noticed the transition.
“When you do a big-bang rewrite, the only thing you’re guaranteed of is a big bang.” — Martin Fowler
Fowler coined the strangler fig pattern for software for a reason. He’d seen the same story play out at organization after organization. The rewrite is always “six months, tops.” It’s never six months.
Why Big-Bang Rewrites Fail
Before the incremental approach, let me tell you what almost always gets proposed instead: “Let’s just rebuild it from scratch.”
The pitch is compelling. The legacy system is a tangled mess — business logic baked into route handlers, no test coverage, a database schema that grew organically into something resembling a plate of spaghetti. The engineering team is frustrated. “Six months and we’ll have a clean, modern system.”
I’ve heard this pitch on multiple projects. I’ve watched a rewrite attempt run for 18 months, consume a team of 8, and get cancelled at 60% completion because the original system kept evolving during the rewrite. The rewrite was chasing a moving target, and it never caught up.
Big-bang rewrites fail for predictable reasons:
| Failure Mode | Why It Happens | How Often |
|---|
| Moving target | Business doesn’t stop — new features keep getting added to the old system while you rewrite | Almost always |
| Hidden knowledge | That weird if on line 347 handles an undocumented edge case. You won’t discover it until production. | Very common |
| Motivation collapse | Month 8, nothing shipped. The team is demoralized. Leadership is asking hard questions. | Common |
| Testing gap | The old system has been tested by years of real traffic. The new system has your unit tests. The gap is enormous. | Always |
| Two-system maintenance | Someone proposes “let’s ship what we have” — now you maintain both systems indefinitely | More often than you’d think |
If someone proposes a big-bang rewrite of a system actively serving production traffic, your default answer should be “no.” The burden of proof is on the rewrite to explain why incremental migration won’t work — not the other way around.
The Strangler Fig Approach
The pattern has three phases, and each one is designed to be safe and reversible:
Phase 1: The Proxy. Place a routing layer in front of the legacy system. It can direct traffic to either the old system or the new one, endpoint by endpoint. From the client’s perspective, nothing changes — same URLs, same request shapes, same authentication. The proxy is invisible.
Phase 2: Migrate Incrementally. Move one feature at a time to the new system. Start with the lowest-risk, highest-signal endpoint. Route its traffic through the proxy to the modern implementation. Leave everything else pointing at the legacy system.
Phase 3: Decommission. When all traffic flows through the new system and the legacy system serves zero requests, keep it running as a rollback target for a comfortable period. Then turn it off.
Controlling the Migration with Feature Flags
The real power comes from gradual rollout. Don’t flip an endpoint from legacy to modern all at once. Use feature flags to shift traffic gradually:
| Week | Traffic to Modern System | What You’re Watching |
|---|
| 1 | Internal team only | Error rates, response times, data consistency |
| 2 | 5% of users (smallest accounts) | Edge cases, query performance at small scale |
| 3 | 25% of users | Database load, cache hit rates |
| 4 | 50% of users | Aggregate metrics vs. legacy baseline |
| 5 | 90% of users | Long-tail edge cases |
| 6 | 100% | Monitoring period, legacy route removed |
At any point, if you see errors or performance degradation, flip the flag back. The rollback is instantaneous. This is the psychological advantage of the strangler fig — every step is reversible, so the team moves with confidence instead of anxiety.
Start with your internal team, then your smallest or newest users. They have the least data, the fewest edge cases, and the highest tolerance for issues. Migrate the largest, most complex accounts last — by then you’ve shaken out most bugs.
Data Migration: The Hardest Part
Code migration is straightforward compared to data migration. Code is stateless — deploy a new version, roll back if needed. Data is stateful — once you’ve written to a new schema, rolling back means migrating data backwards.
The proven approach is dual-write with reconciliation:
- The new system writes to the new database (source of truth going forward)
- It also writes to the legacy database (keeps the legacy system consistent during transition)
- A background job migrates historical data in batches
- A reconciliation job compares data between old and new, flagging discrepancies
The reconciliation job is where you’ll discover every undocumented assumption in the legacy system — timezone conversions, currency rounding differences, nullable fields handled inconsistently. Budget twice as long as you think for data migration. In one project I worked on, the code migration took 3 months. The data migration and reconciliation took 4.
Start the reconciliation job early — ideally in the first month. If you wait until you think the data migration is “done,” you’ll discover mapping bugs much later when they’re harder and more expensive to fix.
The Emotional Side: Convincing Stakeholders
The hardest part of a strangler fig migration isn’t technical. It’s political.
Product managers want features, not infrastructure work. Leadership wants to know why engineers are spending months on something that doesn’t add new capabilities. The legacy system “works fine” — from the outside.
Here’s the framing that works:
| Don’t Say | Say Instead |
|---|
| ”We need to rewrite the system" | "We’re incrementally modernizing — new features ship alongside migration work" |
| "The codebase is terrible" | "Every new feature takes 3x longer than it should. 40% of last quarter was spent on workarounds." |
| "Trust us, it’ll be worth it" | "Here’s a dashboard showing migration progress — X% of traffic is on the modern system this week" |
| "We need a migration quarter" | "Each sprint delivers customer value AND migration progress” |
Ship something to the modern system within the first two weeks — even if it’s a simple, low-risk endpoint. Demonstrating that the approach works builds confidence faster than any slide deck.
A Typical Timeline
Every migration is different, but here’s a realistic shape:
| Phase | Duration | What Happens |
|---|
| Setup | 2 weeks | Proxy layer, feature flags, CI/CD for the new service |
| First endpoint | 2 weeks | Migrate something low-risk, high-signal (like authentication) |
| Core CRUD | 2 months | Migrate the 5 most-used API modules |
| Complex logic | 3 months | Payment processing, reporting, integrations |
| Data migration | 2 months | Historical data sync, reconciliation, validation |
| Long tail | 1 month | Edge cases, legacy-only features, cleanup |
| Decommission | 2 weeks | Remove legacy routes, archive old code |
The legacy system should run in production for a month or more after you think you’re “done” — serving zero traffic but available as a rollback. Decommission it when you’re confident the new system handles every edge case.
The strangler fig is patient. It doesn’t rush. It wraps around the host tree slowly, replacing one piece at a time, until one day the old tree is gone and nobody noticed the transition. That’s the migration you want — one where the biggest compliment is “wait, that’s done already?”