Microservices - asking the right questions

· by Raghu Rajagopalan · Read in about 7 min · (1479 words) ·

In my previous post, I talked about microservices trade offs and how you need to be aware of them to navigate the waters. I also touched upon how we went about it successfully. In this post, I’ll go over some of the questions to pose to yourself/your team from an engineering/infrastructure standpoint as you embark on your microservices journey.

Questions

It helps to try and answer some of the questions below. Not everything needs to be addressed right at the outset, but it helps to have it on your radar as you build your system.

questions
How do you plan to:
Development
  1. Document a service’s API signature?

  2. Will you provide a client or expect clients to write their own service clients?

  3. Authenticate requests to a service?

  4. How are services (and clients) versioned? What happens if you need to change an API signature?

  5. Figure out how to address a service, given that there are probably an unknown number of service instances running at any given point?

  6. How do you plan on enforcing good development practices for each service that might come up? At a minimum, you need all of them to diagnostics, tracing, logging and authentication and authorization the same way. Will you do:

    1. Shared libraries?

    2. Project templates?

Testing
  1. How do you plan to test a service?

    1. In isolation?

    2. In an integrated manner?

      1. How will you generate test data?

      2. How will you clean up test environment/data between runs?

Diagnostics/Observability
  1. Aggregate logs from all service instances and see them centrally?

  2. Trace a particular call end to end through the system?

Fault Tolerance
  1. Deal with retried messages?

  2. Retry and back off when a service is down temporarily?

  3. Deal with a failing service so that non-critical failures don’t cascade?

Deployment
  1. Orchestrate deployment of all your services? How do you verify that the entire deployment went fine?

  2. Release features in stages - esp when a feature spans across multiple services?

  3. Promote a release through different environments?

Operations
  1. Configure the service and update a service’s configuration at run time and have the change propagated to all running instances of the service?

  2. Monitor services - Track service latency/response time and health.

Prioritize, prioritize, prioritize

Now the list above is daunting at best. I’ve also never seen anyone have all the ingredients in place on day 1. It doesn’t make sense and if you try to get all these in place first, you’ll probably end up in analysis paralysis.

However, good teams are aware and have it on their roadmap or make a conscious decision on why something doesn’t make sense to do (usually, supported by data).

As with anything else, you will need to prioritize. Some of the questions above are must-haves - in the sense that not addressing them has a near term cost as well as the costs increase as you progress further. A few examples:

  • If you don’t do automated deployments from day 1, its likely that your code is going to be harder to deploy later.

  • Similarly, if you don’t have some rudimentary way to centralize logs, you lose time debugging and tracing through services when there are issues.

On the other hand, other concerns can be deferred at the outset - and you have the freedom to take them up after you reach an inflection point. For instance, service tracing can be built pretty transparently without affecting your services themselves.

My opinion

No silver bullet

In essence, there aren’t any right answers that would fit every situation - you’re going to have to make lots and lots of judgment calls and with each of them you need to weigh the consequences.

With that out of the way, let’s see some of the options.

Development
  1. Document a service’s API signature?

    1. OpenAPI Spec (aka Swagger). As number of services grow, having a well documented, rigourous API helps. Integrating Swagger UI goes a long way in making service APIs accessible.

  2. Will you provide a client or expect clients to write their own service clients?

    1. Generate clients for your primary implementation language. Again, swagger shines here - it lets you generate API clients in a host of languages and this is something you should probably integrate into your service build step.

  3. Authenticate requests to a service?

    1. If you need A&A, JWT tokens is a good way to go stateless.

  4. How are services (and clients) versioned? What happens if you need to change an API signature?

    1. Till you release, everyone tracks latest. Post release, version if it’s public facing API and you cannot control clients to be updated in step. Have a clear policy on how long you’ll support older versions, since it’s an additional burden to maintain.

  5. Figure out how to address a service, given that there are an unknown number of service instances running at any given point?

    1. DNS based addressing system is well understood and works quite well. If you’re using a platform like service fabric, discovery’s built in. In platforms like Kubernetes, DNS based discovery works OOB. If you’re rolling out your own, then take a look at Netflix Eureka. However, IMO, DNS based solutions are just a whole lot simpler.

  6. How do you plan on enforcing good development practices for each service that might come up? At a minimum, you need all of them to diagnostics, tracing, logging and authentication and authorization the same way. Will you do:

    1. Shared libraries?

    2. Project templates?

      1. Both - logging, A&A, tracing & monitoring all hived off to libraries and a good exemplar service implementation go a long way in helping make sure that new teams generate well behaved services.

Testing
  1. How do you plan to test a service?

    1. In isolation

      1. Unit tests of course should use mocks but in some cases, you need to have more involved tests that exercise a couple of services, consider http mocking server like Wiremock

    2. Integration - work with deployed instances of collaborators?

      1. How will you generate test data?

      2. How will you clean up test environment/data between runs?

        1. Easiest option is to use some sort of embedded db here along with a combination of mocks for non -essential services. It’s usually non trivial effort but it’s important to have automated testing so you know that stuff works end-to-end.

Diagnostics/Observability
  1. Aggregate logs from all service instances and see them centrally? ELK

  2. Trace a particular call end to end through the system? OpenZipkin

Fault Tolerance
  1. Deal with retried messages?

  2. Retry and back off when a service is down temporarily?

    1. Implement some sort of interceptor that retries a fixed number of times with incremental backoff.

  3. Deal with a failing service so that non-critical failures don’t cascade?

    1. See for example:

Deployment
  1. Orchestrate deployment of all your services? How do you verify that the entire deployment went fine?

    1. Octopus deploy, followed by triggering a set of system tests.

  2. Release features in stages - esp when a feature spans across multiple services?

    1. Feature toggles/feature switches - Depending on your platform, this might require some work. Also, see how this relates to your branching strategy. I discussed this sometime ago here - Feature branches vs Feature Flags

  3. Promote a release through different environments?

    1. Octopus worked well for this.

Operations
  1. Configure the service and update a service’s configuration at run time and have the change propagated to all running instances of the service?

    1. Depends on your platform. Spring Cloud config does this. On Azure Service Fabric, rolling deployments takes care of this but it means you have to redeploy for config only changes.

  2. Monitor services - Track service latency/response time and health.

    1. Have any good recommendations? I don’t know enough about this. The last time around, we were pulling elapsed time from our commands and events and pushing that to ELK so we had some visibility into how long our services were taking to process messages. I’m sure there’s lots of better ways of doing this.

A caveat though - success with microservices is heavily dependent on making sure that you understand the domain and break it down into services whose boundaries are aligned well with the domain. If you miss the bus on that count, getting the technical details right isn’t going to help much (albeit, its probably easier to dig yourself out of the hole with a refactoring exercise.)

Summary

There’s a lot of things to take care of when you’re developing microservices - some of these are good development practices anyway, whereas others are specific to a microservices architecture (or significantly more complex in a microservices world).

It helps to take conscious decisions on how sophisticated you want to be with your infrastructure. Be pragmatic and build a good foundation with enough infrastructure to get you off the ground initially and refine as you go along. This is especially important since as the system grows you’re banking on having lots of teams working on services in parallel.

Good engineering’s all about making the right trade-offs after all. Keep shipping!