Introduction to Distributed Tracing

In this episode we take a look at distributed tracing. We'll take a look at the concept, what distributed tracing is, what problems it solves, how to emit traces and the platform architecture to collect traces.

Example microservice architecture

First of all, we need an example application. In this demo, I have a few microservices that work together to form a video catalog.

A simple Web UI: videos-web

Consider videos-web
It's an HTML application that lists a bunch of playlists with videos in them.

+------------+
| videos-web |
|            |
+------------+

A simple API: playlists-api

For videos-web to get any content, it needs to make a call to playlists-api

+------------+     +---------------+
| videos-web +---->+ playlists-api |
|            |     |               |
+------------+     +---------------+

Playlists consist of data like title, description etc, and a list of videos.
Playlists are stored in a database.
playlists-api stores its data in a database

+------------+     +---------------+    +--------------+
| videos-web +---->+ playlists-api +--->+ playlists-db |
|            |     |               |    |              |
+------------+     +---------------+    +--------------+

A little complexity

Each playlist item contains only a list of video id's.
A playlist does not have the full metadata of each video.

Example playlist:

{
  "id" : "playlist-01",
  "title": "Cool playlist",
  "videos" : [ "video-1", "video-x" , "video-b"]
}

Take not above videos: [] is a list of video id's

Videos have their own title and description and other metadata.

To get this data, we need a videos-api
This videos-api has its own database too

+------------+       +-----------+
| videos-api +------>+ videos-db |
|            |       |           |
+------------+       +-----------+

For the playlists-api to load all the video data, it needs to call videos-api for each video ID it has.

Traffic flow

A single `GET` request to the `playlists-api` will get all the playlists from its database with a single DB call

For every playlist and every video in each list, a separate GET call will be made to the videos-api which will retrieve the video metadata from its database.

This will result in many network fanouts between playlists-api and videos-api and many call to its database.
This is intentional to demonstrate a busy network.

Full application architecture


+------------+     +---------------+    +--------------+
| videos-web +---->+ playlists-api +--->+ playlists-db |
|            |     |               |    |    [redis]   |
+------------+     +-----+---------+    +--------------+
                         |
                         v
                   +-----+------+       +-----------+
                   | videos-api +------>+ videos-db |
                   |            |       |  [redis]  |
                   +------------+       +-----------+

Run the apps: Docker

There is a `docker-compose.yaml` in this directory.
Change your terminal to this folder and run:

cd tracing

docker-compose build

docker-compose up

You can access the app on http://localhost.
You should now see the complete architecture in the browser

Traces

To see our traces, we can access the Jaeger UI on http://localhost:16686