The microservices gold rush is over.
Teams that chased the pattern through 2019 and 2022 are now managing systems that take three engineers to debug a single failed transaction, require five teams to coordinate a two-line configuration change, and go down in four places when one database has a bad morning.
The original promise was real: split your application into independent services so each piece can be built, deployed, and scaled without touching anything else. When it works, it is genuinely powerful. When the boundaries are wrong, you do not get the benefits of microservices. You get all of the cost.
In 2026, the most important architectural question in Java shops is no longer “how many services should we build?” It is “do these boundaries actually make sense?”
| “The right architecture is the one that matches the actual structure of your organization and your problem. Everything else is decoration.” |
The Myth That Started Most of the Trouble
The assumption underneath most over-engineered microservice systems is this: smaller services scale better.
On the surface, the logic sounds reasonable. Less code, fewer responsibilities, simpler deployments. In practice, splitting a system into many small pieces does not make each piece faster. It adds a cost every time those pieces need to communicate. And in any real application, the pieces communicate constantly.
Here is what that looks like in a Java e-commerce system.
A user searches for running shoes. Fifty results come back from the Catalogue service. Each result needs a current price check from the Discount service. The Catalogue service, using Spring Cloud OpenFeign, makes 50 individual HTTP calls, one per product, before it can return the page.
Underneath this, Kubernetes is running, Docker containers are optimized, auto-scaling is configured. The page is still slow. The bottleneck is not computing power. It is the time spent on 50 separate “please respond to me” round-trips across a network. Each one is fast in isolation, 5 to 10 milliseconds. Fifty of them in sequence adds up to a visible delay on every single page load.
| Plain English: What is a network round-trip? |
| Every time one service asks another for information, it sends a request and waits for a reply. That waiting time is called a round-trip. On a local network it might be 2ms. Multiply that by 50 calls and you have added 100ms to every page load before any business logic runs. Users feel this. |
The Distributed Monolith: The Worst of Both Worlds
The e-commerce example above has a name: a Distributed Monolith. The services are physically separated, running in different containers on different infrastructure. But they cannot function independently. The Catalogue service is useless without the Discount service. If Discount goes down, the product page breaks. The system behaves like a single application, but with all the operational overhead of a distributed one.
This is the failure mode that is not discussed enough, because it does not look like a failure from the outside. The architecture diagram has all the right boxes and arrows. The Kubernetes cluster is running. Teams feel like they did the modern thing.
The tell is this: if two services must be updated together every time a feature changes, they are not two services. They are one service distributed across two repositories.
| “The question is not how small can this be. The question is: if this service went down for four hours, what else would break?” |
If the answer is “everything,” the boundary is wrong. Real service independence means a service can go down, recover, and catch up without any other part of the system losing data or failing its users.
Three Ways Java Teams Are Breaking Their Own Systems
1. Everything talks synchronously
Synchronous communication means Service A sends a request to Service B and waits for a reply before doing anything else.
When everything is healthy, this works fine. When Service B is slow or briefly unavailable, Service A is stuck waiting. Every new request to Service A backs up behind the previous one. If Service C also depends on Service B, it backs up too. The failure moves outward until the whole system is unresponsive.
This is how a slow email verification service takes down an order confirmation flow. The Order Service waits for a 200 OK from the Email Service before confirming the order. The Email Service is under load. Orders queue up. Users see errors. Nobody touched the Order Service.
| Plain English: What is synchronous vs. asynchronous? |
| Synchronous is like a phone call. You wait on the line until the other person answers and responds before you do anything else. Asynchronous is like sending a text. You send the message and continue with your day. The reply comes when it comes. In software, asynchronous communication between services means neither side has to wait on the other to keep working. |
The fix is to shift operations that do not need an immediate response to asynchronous messaging. Apache Kafka is the standard tool for this in Java ecosystems. Instead of Service A waiting for Service B, Service A drops a message into a Kafka topic and moves on. Service B picks up the message when it is ready.
There is a specific pattern that makes this reliable called the Transactional Outbox.
| Plain English: The Transactional Outbox Pattern |
| When your Order Service saves an order to the database, it also writes a small note to a special ‘outbox’ table in the same save operation. A background process reads that outbox table and publishes the message to Kafka. Because the order and the note are saved together in a single database transaction, if the application crashes mid-process, the note survives. The message still gets sent. No orders fall silently into a gap between ‘saved to database’ and ‘sent to Kafka.’ |
2. The database is doing work nobody is watching
Spring Data JPA is the most widely used database tool in the Java ecosystem. It generates SQL queries from your code automatically, which saves enormous amounts of development time. It also generates queries you did not intend, at volumes you did not anticipate, if you stop paying attention to what it produces.
The most common problem is the N+1 query.
| Plain English: What is an N+1 query? |
| Imagine you ask a library assistant for a list of 100 books. That is 1 request. Then, for each book, you walk back to the desk and ask separately who the author is. That is 100 more requests. Total: 101 trips to the desk instead of 1. In software, this happens when JPA loads a list of records (say, 100 orders), then makes a separate database call for each record to load the related data (the customer details for each order). One request to your application produces 101 database queries. At low traffic, this is invisible. At scale, it saturates the database connection pool and slows everything that touches the database. |
The fix is to tell JPA to load related data in the same query using a JOIN, or to use batch loading. Both are straightforward once you know the problem exists. The challenge is that the problem is invisible unless you are watching the queries.
The rule is simple: every significant query your application runs in production should be reviewed. Enable SQL logging during development. Use tools like P6Spy or Hibernate’s built-in logging to see the actual SQL being sent to the database. If you see repeated queries with a pattern, you have an N+1 problem. Fix it before it reaches production.
3. When something breaks, nobody knows where
A request to a single-application system touches one codebase. When it fails, you look at one log file.
A request to a distributed system might touch eight services before something goes wrong. Which service failed? At what point in the chain? Was it slow, or did it return an error? Which downstream service caused the problem?
Without the right tooling, the answers to these questions require manually correlating timestamps across eight separate log files. This takes hours. In a production incident, hours are expensive.
| Plain English: What is distributed tracing? |
| Every request that enters the system gets a unique tracking number (called a Trace ID) that travels with it through every service it visits. When something fails, you look up that Trace ID and see the complete picture: every service the request touched, how long each step took, and exactly where it broke. It works like a package tracking number, except for your API calls. |
The standard for implementing this in 2026 is OpenTelemetry. It is vendor-neutral, widely supported in the Java ecosystem, and integrates with Jaeger, Grafana, Datadog, and most observability platforms.
The non-negotiable rule: if you cannot trace a request through your entire system on your local machine before you deploy, you cannot operate it in production. Observability is an engineering requirement. It should be built before the first service ships, not retrofitted six months later when something breaks in a way nobody understands.
The Decision Framework: When a Separate Service Is Justified
Before splitting anything into its own service, answer these four questions honestly.
| Service Boundary Checklist |
| • Can this component be deployed without coordinating with any other team or codebase?
• Can this component fail completely without taking anything else with it? • Does this component have a genuinely different scaling requirement than the rest of the system? • Does a separate team own this, with no shared development dependencies? |
If three or four answers are yes, a separate service is appropriate. If two or more answers are no, the service boundary is premature. Build a well-isolated module within your existing codebase instead, and revisit the question when the conditions change.
The Modular Monolith: The Most Underrated Architecture in Java
The industry spent five years treating “monolith” as an insult. In 2026, the teams shipping fastest are building Modular Monoliths, and they are outpacing their microservice-heavy counterparts on delivery speed and system stability.
| Plain English: What is a Modular Monolith? |
| A single application, but with strict internal walls between business domains. The billing code cannot reach directly into inventory code. The order management module cannot call the user management module through a back door. Each module owns its own data, its own logic, and its own interface with the outside world. The boundaries are enforced in code. It deploys as one unit, so there are no network calls between modules, no distributed transaction problems, and no distributed tracing needed just to understand what a single user action did. |
When a module grows to the point where it genuinely needs to scale independently or be owned by a fully autonomous team, extracting it into a real service is straightforward, because the boundary was already clean and well-defined.
A Modular Monolith is not a compromise or a step backward. It is the responsible default for any system that has not yet proven it needs the operational complexity of distributed services. The operational complexity of microservices is a cost you should pay only when the benefit justifies it.
| “A well-structured Modular Monolith will beat a poorly partitioned microservice system in delivery speed, incident response time, and developer experience. Almost every time.” |
Building Resilience Into What You Already Have
Distributed systems have failures. Services go down. Networks slow. Third-party APIs miss their response time commitments. The goal is not to eliminate failures. It is to make sure individual failures do not become system-wide outages.
Resilience4j is the standard Java library for this. It gives you three core tools.
Circuit Breaker
When a downstream service starts failing repeatedly, the circuit breaker stops sending it requests for a set period. Instead of continuously hammering a struggling service and making the failure worse, the system gives it time to recover. Requests during the recovery window get a fallback response.
| Plain English: Circuit Breaker |
| Like a fuse box in your house. When a circuit is overloaded, the fuse trips and cuts the power to that circuit before the wiring catches fire. You fix the problem, reset the fuse, power comes back on. A circuit breaker in software works the same way. When a service is failing, you stop sending it traffic temporarily, let it recover, then gradually let traffic flow again. |
Retry with Backoff
Many failures are transient. A service might briefly be unreachable and recover within two seconds. A retry mechanism automatically re-attempts the request a defined number of times, with a pause between each attempt. This handles momentary blips without surfacing an error to the user. The pause between retries (called exponential backoff) prevents the retrying system from overwhelming the recovering service with requests.
Fallback
A fallback defines what the system does when a service is genuinely unavailable. In a user registration flow, if the email verification service is down, a fallback might be: complete the registration in the database, queue the verification email in Kafka for when the service recovers, and return a success response to the user. The user is not blocked. The email goes when the system is healthy again.
These three patterns together represent the minimum viable safety net for any distributed system. None of them are optional once services depend on each other.
The Architecture Audit: Six Questions for Your Next Design Review
| Run this before your next technical design session |
| • Are service boundaries drawn at domain lines, or at ‘it felt too big’ lines?
• Are services communicating synchronously for operations that do not need an immediate response? • Are you monitoring the actual SQL that JPA generates in production? • Can you trace a single user request across every service it touches, in under two minutes? • Do you have circuit breakers on every external service dependency? • Could your system be reorganized into a well-structured Modular Monolith without losing any meaningful technical capability? |
If more than two answers are uncomfortable, the architecture review is overdue. These are not edge-case concerns. Each one represents a category of production incident that is entirely preventable with the right design decision made earlier.
The Principle That Settles Most Architecture Debates
Microservices are a solution to organizational and scaling problems that have already materialized. They are a destination, not a starting point.
The right size for a service is the smallest unit that can be genuinely deployed, owned, and operated independently by a team, for a clear purpose, without negotiating with anyone else. If that definition does not describe what you are building, the service is too small.
Every architectural decision has a carrying cost: the operational complexity you take on and maintain indefinitely. That cost is only worth paying when the capability you gain cannot be achieved any other way.
Build for the actual problem in front of you. The architecture should serve the business, not validate a technical preference.
| Is Your Architecture Ready for What’s Next?
We help engineering teams audit their service boundaries, identify operational risk, and build the right foundation for scale. |
Pravin Durai
Latest posts by Pravin Durai (see all)
- The Microservice Hangover - 22nd April 2026
- Generic Message Buffering | Camunda 7 - 11th July 2023
- We are Sandhata: Meet Pravin - 2nd December 2021