Monolith vs. Microservices: How We Made the Decision
Our team's actual decision-making process for whether to break up a Rails monolith. Spoiler: we didn't go full microservices.

The microservices conversation started because our deploys were taking 45 minutes. That's how long it took for our CI pipeline to run tests, build the artifact, and push to production. Forty-five minutes. For a single service. Every time someone merged a PR.
We have about 45 developers across 6 teams, all committing to the same Rails monolith. The monolith handles authentication, order processing, inventory management, notifications, and a few other things. It started as a small app years ago and grew the way monoliths always grow โ features got added, teams got bigger, and eventually what was a clean little codebase became this massive thing that nobody fully understands anymore.
The deploy time was the most visible symptom, but it wasn't the only one. Teams were stepping on each other's database migrations. A change to the Order model would break a test in the Notification module that nobody on the Orders team knew about. Code reviews took forever because the PR touched files from multiple domains and it wasn't clear who should approve what.
Something needed to change. The question was: what kind of change?
The Obvious Answer (That We Didn't Take)
The obvious answer โ the one everyone suggests on Hacker News โ is microservices. Break the monolith into independent services. Each team owns their service, deploys independently, picks their own tech stack. Clean boundaries. Independent scaling. Total autonomy.
It sounds great. And for some organizations, it works well. Netflix famously runs hundreds of microservices. Amazon does too. But both of those companies have thousands of engineers and dedicated platform teams whose entire job is maintaining the infrastructure that makes microservices possible.
We have 45 engineers and zero platform team members. That detail turned out to be pretty important.
What Microservices Actually Require
Before I talk about what we decided, I want to lay out what adopting microservices would have actually meant for us. Not in theory. In practice. Because I think there's a gap between the conference talk version of microservices and the reality on the ground.
Every function call becomes a network call. In our monolith, when the Orders module needs to check inventory, it calls Inventory.check_stock(product_id). That takes microseconds and never fails due to network issues. With separate services, that becomes an HTTP request or a gRPC call. Now you need to handle timeouts. What if the Inventory service is slow? What if it's down? Do you retry? How many times? With what backoff strategy?
For every inter-service call, you need error handling that didn't exist before. Circuit breakers that detect when a downstream service is struggling and stop sending it traffic. Retry logic with exponential backoff so you don't make a struggling service worse. Timeout configuration that's tuned correctly โ too short and you get false failures, too long and one slow service cascades latency to everything upstream.
None of this code existed in our monolith. None of it would need to exist if the services stayed in the same process. It's pure overhead created by the architectural boundary.
Database transactions become distributed problems. Our order creation flow currently looks something like this in simplified form:
ActiveRecord::Base.transaction do
order = Order.create!(params)
InventoryItem.decrement!(product_id, quantity)
Payment.charge!(order.total, customer.payment_method)
Notification.schedule!(customer, :order_confirmation)
end
If anything fails, the whole transaction rolls back. The inventory doesn't get decremented without a payment. The payment doesn't go through without an order. ACID guarantees. Simple.
With separate services, there's no shared database and no cross-service transaction. You'd need to implement the Saga pattern โ a sequence of steps where each service performs its local transaction and publishes an event. If a later step fails, you have to reverse the earlier steps by sending compensating events.
Order service creates the order โ publishes OrderCreated event โ Inventory service decrements stock โ publishes StockReserved event โ Payment service charges the card โ if payment fails, publishes PaymentFailed event โ Inventory service receives that event and restores the stock โ Order service receives that event and marks the order as failed.
That's the happy path of the failure path. What if the Payment service crashes after charging the card but before publishing the event? What if the message broker loses a message? What if the compensation event fails?
I'm not saying these problems are unsolvable. They're not. People solve them every day. But the complexity jump from "wrap it in a transaction" to "implement distributed sagas with compensating actions" is enormous. And the failure modes are subtle โ you might not discover them until months into production when a specific combination of timing and failures produces an inconsistent state.
Debugging requires distributed tracing. In the monolith, an error produces a stack trace. You can see exactly which line of code failed, what called it, and what the parameters were. You add a breakpoint, reproduce the issue, and step through the code.
With microservices, a user-facing error might originate in any of the services involved in the request. The error in the API gateway might be caused by a timeout from the Auth service, which was caused by slow queries in the User service, which was caused by a missing database index. Following that chain requires distributed tracing โ correlation IDs in every HTTP header, trace collection infrastructure (OpenTelemetry), and a trace visualization tool (Jaeger, Zipkin, Datadog).
Setting up and maintaining that infrastructure is a full project by itself. And without it, debugging cross-service issues is largely guesswork.
The Modular Monolith Option
There's a middle ground that doesn't get as much hype. Instead of splitting into separate services, you can enforce boundaries within the monolith itself. Each domain (Orders, Inventory, Payments, Notifications) becomes a module with an explicit public API. Modules can't directly access each other's database tables โ they go through the module's API.
The appeal is that you get organizational boundaries without the operational complexity of distributed systems. Teams can work independently on their modules. The module APIs serve as contracts, similar to service APIs. But you still have a single deployment, a single database, and the ability to use transactions across modules when needed.
We explored this option using Rails engines. Each domain becomes a gem/engine with its own models, routes, and tests. The engine defines a public interface โ a set of service objects or query objects โ and other engines can only interact through that interface.
The implementation is far from perfect. Enforcing the boundaries requires discipline because Ruby doesn't have the same module access controls as, say, Java packages. You can technically reach across boundaries, and developers under deadline pressure sometimes do. We'd need custom linting rules or CI checks to catch violations.
But compared to the microservices approach, the operational overhead is close to zero. Same deployment pipeline. Same database. Same debugging workflow. The complexity increase is primarily in code organization, not infrastructure.
What We Actually Chose
We went with a hybrid approach, and I'll be honest โ it felt like a compromise at the time. Some people on the team wanted full microservices. Others wanted to just fix the monolith's test suite and call it done.
Here's what we decided:
The tightly coupled domains โ Orders, Inventory, Payments โ stay in the monolith. We're refactoring them into a modular monolith with strict internal boundaries. These domains have strong transactional requirements and tight coupling that would make them painful to separate. The cost of implementing distributed sagas and dealing with eventual consistency between orders and payments outweighed the benefits of deployment independence.
We're pulling Notifications and Search out into separate services. The reasoning: notifications are loosely coupled by nature. An order being placed triggers a notification, but the notification doesn't need to be in the same transaction. A message queue between them handles the asynchronous communication naturally. Search is also a good candidate because it has different scaling characteristics โ it's read-heavy and benefits from specialized infrastructure (Elasticsearch) that doesn't fit neatly into the Rails monolith.
For the CI pipeline problem, we reorganized the test suite so each module's tests can run independently. If your PR only touches the Notification engine, only the Notification tests run. This brought our average CI time down from 45 minutes to about 12 minutes for most PRs. Full suite still runs on merges to main, but the fast feedback loop on feature branches made a big difference to how productive people felt.
What I'd Do Differently
Looking back, I think we spent too long evaluating the full microservices option. We did architecture spikes, proof of concepts, cost analyses. A month of engineering time went into exploring an approach we ended up mostly not taking. If I did this again, I'd start with the modular monolith refactoring and see how far that got us before even considering extraction.
I'd also invest more upfront in the boundary enforcement. Our module boundaries are currently conventions backed by code review. That's fragile. Adding automated checks โ linting rules that flag cross-module database queries, CI gates that detect boundary violations โ would have been worth doing from day one. We're adding them now, but some violations have already crept in and we have to clean them up.
The two services we did extract are working fine. Notifications runs independently and deploys about twice a day. Search is in Elasticsearch and handles our query load without any issues. Neither extraction was as scary as I expected, partly because we picked the easiest candidates first.
I don't know if we'll extract more services in the future. Probably, eventually, as the team grows and the remaining domains develop different scaling needs. But I've become more skeptical of microservices as a default architecture than I was before this project. The operational complexity is real and it's ongoing. It's not a one-time migration cost โ it's a permanent increase in the surface area of things that can go wrong.
For most teams of our size, I think the modular monolith is the right starting point. Extract services when you have a specific, concrete reason to โ not because it sounds like the modern thing to do. Our two extractions were justified. Six would not have been.
Written by
Anurag Sinha
Developer who writes about the stuff I actually use day-to-day. If I got something wrong, let me know.
Found this useful?
Share it with someone who might find it helpful too.
Comments
Loading comments...
Related Articles
GraphQL vs REST: A Dialogue Between Frontend and Backend
A pragmatic debate on data fetching, over-fetching, N+1 queries, and caching complexities.
An Interview with an Exhausted Redis Node
I sat down with our caching server to talk about cache stampedes, missing TTLs, and the things backend developers keep getting wrong.
Debugging Slow PostgreSQL Queries in Production
How to track down and fix multi-second query delays when your API starts timing out.