Managing Microservices with Kubernetes and Istio

In a distributed microservice architecture it can become harder to understand and manage the network of services, as it grows in size and complexity.
11.10.2018
Tags

In a distributed microservice architecture it can become harder to understand and manage the network of services, as it grows in size and complexity. Monitoring and aspects like A/B testing, canary releases, access control and end-to-end authentication are often operational requirements. The term service mesh is used to describe a network of microservices and the interactions between them.

This article aims to give a general overview on what a service mesh is and how it can be implemented. Then it will show how to manage traffic, inject faults and monitor services with Istio and Kubernetes using a simple example application.

Service Mesh Overview

A service mesh is a communication layer that rides on top of request/response unlocking some patterns essential for healthy microservices:

  • Zero-trust security that doesn’t assume a trusted perimeter.
  • Tracing that shows you how and why every microservice talked to another microservice.
  • Fault injection and tolerance that lets you experimentally verify the resilience of your application.
  • Advanced routing that lets you do things like A/B testing, rapid versioning and deployment and request shadowing.

This communication layer can live in different locations:

  • In a Library that your microservices applications import and use.
  • In a Node Agent or daemon that services all of the containers on a particular node/machine.
  • In a Sidecar container that runs alongside your application container.

This definition is taken from the SaaS platform aspenmesh.

Library

Library.001
Service Mesh implemented using Libraries

In the library approach each microservice application includes a library that implements service mesh features (Hystrix and Ribbon are examples).

An advantage for this approach is that resource allocation for performing the work on behalf of the microservice is handled by the OS as the code is actually running inside the microservice. Another advantage is that is doesn’t require much cooperation from the underlying infrastructure, i.e. the container runner does not need to be aware that you are running a Hystrix-enhanced microservice.

A major disadvantage is that the libraries need to be ported to different languages in order to support them which produces effort in replicating the same behaviour over and over again.

Node Agent

agent.001
Service Mesh implemented with Node Agents

In the node agent model there is a separate agent running on every node that services all the different microservice tenants on that particular node. This works similar to Kubernetes’ default kube-proxy which serves all pods on a node.

As a result this approach allows servicing heterogenous applications written in different languages which additionally allows efficient resource usage.

Contrary to the library approach this deployment requires some cooperation from the infrastructure. Applications need to delegate their network calls to the agent.

Sidecar

sidecar.001
Service Mesh implemented with Sidecars

In a sidecar deployment for every application container there is an adjacent container deployed (the “sidecar”) which handles all network traffic in and out of the application. This is the model used by Istio with Envoy Proxy. The approach acts as a tradeoff between the two previously discussed approaches. For instance, you can deploy a sidecar service mesh without having to run a new agent on every node (so you don’t need infrastructure-wide cooperation to deploy that shared agent), but you’ll be running multiple copies of an identical sidecar.

The disadvantage of having slightly more overhead and resource consumption compared to the node agent approach is compensated with the benefit that App-to-sidecar communication is easier to secure than app-to-agent and that it can be gradually adapted to to an existing cluster without central coordination.

Istio Architecture and Features

Istio’s provides the following core features across a network of services.

Traffic Management
Fine-grained control of traffic behaviour with routing rules, retries, failovers, and fault injection. Configuration of service-level properties like circuit breakers, timeouts, and retries, allows to set up important tasks like A/B testing, canary rollouts, and staged rollouts with percentage-based traffic splits.

Security
Secure service-to-service communication in a cluster with strong identity-based authentication and authorization.

Observability
Automatic metrics, logs, and traces for all traffic within a cluster, including cluster ingress and egress.

Platform Support
Istio currently supports deployment on Kubernetes, Consul and services running on individual virtual machines.

Architecture

An Istio service mesh is logically split into a data plane and a control plane.

  • The data plane is composed of a set of intelligent proxies (Envoy Proxy) deployed as sidecars. These proxies mediate and control all network communication between microservices along with Mixer, a general-purpose policy and telemetry hub.
  • The control plane manages and configures the proxies to route traffic, enforce policies and collect telemetry.

arch
Istio Architecture Overview

Pilot provides service discovery for the Envoy sidecars and converts high level routing rules into Envoy-specific configurations and propagates them to the sidecars.

Citadel is responsible to provide service-to-service and end-user authentication.

Mixer is responsible for providing policy controls and telemetry collection.

Sample application running on Kubernetes with Istio

To get some hands-on experience with Istio the sample Bookinfo Application running on kubernetes was used to try some of the traffic management and fault injection features. But first Istio needed to be downloaded

curl -L https://git.io/getLatestIstio | sh -
cd istio-1.0.2
export PATH=$PWD/bin:$PATH

and installed on kubernetes (for simplicity without mutual TLS authentication).

kubectl apply -f install/kubernetes/helm/istio/templates/crds.yaml
kubectl apply -f install/kubernetes/istio-demo.yaml

This will install a number of services and pods in a new namespace called “istio-system”.

The Bookinfo application is broken into four separate microservices:

  • The productpage microservice calls the details and reviews microservices to populate the page.
  • The details microservice contains book information.
  • The reviews microservice contains book reviews. It also calls the ratings microservice. (v1 doesn’t call the ratings service, v2 calls the service and displays black stars, v3 also but displays red stars)
  • The ratings microservice contains book ranking information that accompanies a book review.

noistio
Bookinfo Application Architecture Overview

Bringing up the application containers.

#label the default namespace with istio-injection=enabled to allow automatic sidecar injection
kubectl label namespace default istio-injection=enabled
#deploy the application
kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml
 
#Define destination rule to allow routing of available versions with Istio
kubectl apply -f samples/bookinfo/networking/destination-rule-all.yaml

When pointing the browser to application URL the application’s main page can be seen and refreshing it reveals the different application versions.

Screenshot 2018-09-27 at 16.38.31

Screenshot 2018-09-27 at 16.39.03

Screenshot 2018-09-27 at 16.39.24
Main view of Bookinfo application without ratings (v1), with black star ratings (v2) and red star ratings (v3)

Configuring Request Routing

To route to one version only, apply a virtual service that sets the default version for a microservice:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: ratings
spec:
  hosts:
  - ratings
  http:
  - route:
     - destination:
host: ratings
subset: v1

To apply the virtual service run the following command

kubectl apply -f samples/bookinfo/networking/virtual-service-all-v1.yaml

Now the bookinfo application will display ratings without stars since it is configured to only use version 1.

Next, the route configuration can be changed so that all traffic from a specific user is routed to a specific version.

   apiVersion: networking.istio.io/v1alpha3
   kind: VirtualService
   metadata:
    name: reviews
   spec:
    hosts:
      - reviews
   http:
    - match:
      - headers:
          end-user:
         exact: jason
   route:
   - destination:
        host: reviews
        subset: v2
  - route:
    - destination:
        host: reviews
        subset: v1

This example is enabled by the fact that the productpage service adds a custom end-user header to all outbound HTTP requests to the reviews service.

When logging in as user “jason” we will see the star ratings next to each review.

Screenshot 2018-09-27 at 17.11.02
Main view shows black stars for logged in user “jason”

In conclusion we configured Istio to route 100% traffic to version v1 of the Bookinfo services and then set a rule to selectively send traffic to v2 of the reviews service based on a custom end-user header.

Fault Injection

To test microservices for resiliency delays and aborts can be injected to simulate a faulty or overloaded service. In our case we will inject a 7s delay for the user “jason”.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: ratings
spec:
  hosts:
- ratings
  http:
    - match:
      - headers:
          end-user:
            exact: jason
    fault:
      delay:
        percent: 100
        fixedDelay: 7s
    route:
    - destination:
        host: ratings
        subset: v1
  - route:
    - destination:
       host: ratings
       subset: v1

Now we expect the home page to load without errors in about 7 seconds. However, we uncover an error in the reviews section which displays an error message.

This is due to the productpage application’s failure handling - which still needs to happen despite Istio’s out-of-the-box failure recovery. The productpage times out prematurely and throws an error.

In this case the fault injection helped us to reveal such an anomaly without affecting end users.

Traffic Shifting

After fixing the bug and deploying the new version of the application we want to shift the traffic to the fixed version.
At first we shift 50% of the traffic to the review service v3.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: reviews
spec:
  hosts:
    - reviews
  http:
  - route:
    - destination:
        host: reviews
        subset: v1
      weight: 50
    - destination:
        host: reviews
        subset: v3
      weight: 50

After a while we may assume the service is stable and can route 100% of the traffic to v3.

Collecting and Visualizing Metrics

To collect telemetry data Istio’s Mixer component provides several adapters to interface a variety of infrastructure backends, such as Prometheus, Datadog and Fluentd.

To visualize the data an addon called Servicegraph can be used to show how the services are connected.

Screenshot 2018-09-28 at 16.09.56
Servicegraph view of Bookinfo Application shows the services interdepedencies

Grafana allows to visualize the metrics using several views and dashboards.

Screenshot 2018-09-28 at 16.34.46
A Dashboard from Grafana that visualizes the requests latency and fail rate

Security

Istio support various means to authenticate services and end users, authorization (Role-based Access Control) to control services in a service mesh as well as auditing tools. A discussion of these topics would go beyond the scope of this article you can refer to the official documentation.

Conclusion

To wrap things up in this article we discovered what a service mesh is and how it can be implemented. It also showed how to install Istio on Kubernetes and deploy a sample application. The Bookinfo sample application demonstrated how to manage traffic to different versions based on a user property and weights. Additionally, we learned how to inject faults to uncover potential flaws in the microservice interaction in order to increase system resiliency. Introductions into the collection and visualization of metrics as well as security aspects were also given.

As the next step we want to deploy Istio to a real world application to see how Istio works in practice and if it can live up to the expectations we have gathered when writing this article.

Credits for cover image go to: AMIS TECHNOLOGY BLOG