22 KiB
🎬 Linux Network Monitoring
💡 Preface
This module is part of a course on DevOps.
Checkout the course introduction for more information
This module is part of chapter 3
This module is based on my long experience looking after servers, performance, monitoring and diagnosing issues.
This is not your usual average Linux network monitoring guide.
Althought we'll be covering theory, the objective is not to bombard the viewer with too much detail.
We'll cover the theory conceptually, and then use practical examples to show you real world concepts in action
This guide will feature basic and some deeper advanced topics, tools and techniques for dealing with network observability and monitoring
Although its basic, this guide touches on all the components that I still use in day to day modern DevOps & Cloud engineering
It will be important to pay attention as all of the details in this module will form the foundation of monitoring HTTP web & microservices, especially when:
- One service or server cannot talk to another service or server
- Troublshooting connection errors
- Understanding latency
- Understanding basic network bottlenecks
Network usage - How the Network works
In our chapter on Operating Systems, we covered the basics of system resources that the operating system manages, including the Network which is used by processes to communicate with one another.
Processes can communicate with one another on the same server or across servers, if the network allows it
Unlike CPU, memory & disk, the network has a few components to understand, as each one of the components can cause connection failure, errors, delays and bottlenecks
In the first module, we looked at a high level overview of networking. And in this below diagram, I highlight some of the components we will cover in this module.
As a engineer you want to have an understanding of how servers\processes talk to one another over the network. This happens through a network connection.
There are two types of connection (which are referred to as "network protocols"), called TCP and UDP.
Each protocol varies slightly in the way connections are establised.
The box on the left is our source server which runs a process and makes a network connection to the box on the right which is another server with a process on it
This could be a Web browser (left box) opening a web page (Github.com) which connects to a Web server (right box) on the internet somewhere hosted by Github.
It could be Web client to server, Two applications talking to another, Two microservices, a service talking to a database, or technically anything talking to something else over a network or internet
connection from server-a to 143.0.1.2:443
or https://143.0.1.2
--- 144.0.1.2 -------- 143.0.1.2:443 -\
---/ public IP public IP ---\
----/ ---\
--/ --
+----------------------+ +----------------------+
| private IP | | private IP |
| 10.0.0.4:1024 | | 10.0.0.4:443 |
| port | | port |
| server-a | | server-b |
+----------------------+ +----------------------+
Importand Network components
Let's keep referring to the diagram above, and talk about each network component in this diagram to understand network connectivity
IP addresses
We covered in previous modules, that IP addresses are identifiers for servers belonging to a network and a server must have an IP address in order to belong to a network.
An IP address can be either public or private.
Generally speaking, servers always have a private IP address when belonging to a network. A server may or may NOT have a public IP address depending on network setup and configuration.
Private IP address
For example, in the diagram, server-a
has a private IP address 10.0.0.4
. In our module on virtualisation we learned that a DHCP service provides us with that private IP when our server joins the network. Virtualization software as well as Cloud providers will generally handle this IP address assignment for you.
It's also important to note that every server will have an IP as 127.0.0.1
which we also call "localhost". IF a server refers to this IP, it is technically referring to itself
For example, opening a Web browser and going to address http://127.0.0.1
will go to itself, so we would need a Web server running on our server that we can reach via that IP address
Public IP address
For server-a
to have a public address, it depends on network configuration.
Generally speaking, the Operating system on a server is configured to have what's called a "gateway" IP address so it knows where to send all outbound network traffic.
On home networks, this "gateway" address would usually be your home router address.
Virtualization software and Cloud providers will also generally handle this for you and you don't need to worry about setting up gateways.
In our module on Virtualization and Servers, our software used 10.0.0.0
as the gateway address and software would send that to the Host machine gateway and that way it ends up going out via the router on the home network.
So basically that "gateway" is the gateway to the public internet as the network goes to your router and the router gets a Public IP address from your Internet service provider.
That's why when you reboot your home router, your Public IP address may change.
A similar architecture is generally followed in company and office networks. Your computer in the office will route outbound traffic to a network device or router and that will have a Public IP address provided by the companies ISP. All similar to what is shown in the above diagram
In the cloud, servers would generally have a Public IP address you can visibly see in the cloud provider web interface
So each server could have it's own Public IP address. Cloud providers also allow you to remove the Public IP address, which renders this server completely private and inaccesible from public networks
It's important to know that Public IP addresses are used for both inbound and outbound traffic.
So network requests can go from server-a
to the router, and out via the router's Public IP address
server-b
or any destination that receives requests from server-a
will see that it originates from the Public IP address we have for server-a
If server-b
needs to respond to that request, it may just respond to server-a
over the same connection
Because the illustration shows a network request from left to right, its important to know that request can also go from right to left
However to do this, server-a
needs to listen on a port and have a process running that can accept requests. Also the router device on the left needs to have a "port forwarding" rule to tell the router which Private IP address to send all traffic that is coming over a given port.
Therefore a server and its router needs to be configured in order to allow network requests to flow all the way through
Server VS Client
In a previous chapter we talked about "servers" and what a server is.
This is not to be confused with the concept of servers and clients in networking.
In networking terminology, clients are referred to as the server or process that is initiating the network request.
In networking terminology, servers are referred to as the server or process that is receiving the network request.
Ports & Connections
In order for a client and server to talk, a network connection must be made.
The client will need a private IP as we've mentioned earlier and it will also need a source port for the connection. This is so that the reply can find its way back to the client. The source port is generally assigned by the client's Operating System.
Source ports are limited and each Operating System can have different limits for the number of source ports it can allow. This means that we could have port exhaustion if a client tries to create too many connections.
Once the client has a source IP and Port, it establishes a connection to the destination IP address. Now there are some technical naunces to establishing network connections and there is more to it, however I'll be keeping this brief and simple.
In my opinion, a simplified understanding is always a better place to start instead of drowning in the deph of theory and details.
When it comes to monitoring we'll have everything we need to know to form a great fundamental understanding in troubleshooting systems.
Now this connection attempt from the client will end up at a destination server which would have a process running on it and listening on a port. This process could be a web server or application.
To accept a connection, a process must "listen" on a port
That port must also be open on the server, meaning no firewall or anti-virus should be blocking that port
If there is a network device, proxy, load balancer or router in front as per our diagram, that device needs a port forwarding rule to send traffic to that destination server on a given port.
Please make an important note here, when you see "Connection Refused" It generally means there is no process or application listening on a destination port you are trying to reach. This is a popular error thats often misinterpreted by developer and engineers.
Another important error is "Connection Timeout". If you see this error or simply a network request hanging, it generally means that the port you are trying to reach is being blocked by something. This could be a cloud security rule , firewall , network device like a router etc.
Network Bandwith
Once a connection is established between client and server, than the client can start sending network requests and the server can respond with network responses. These requests and responses often contains data.
This can be a client web browser getting HTML and web page content from a server, it could be a web browser client calling an API service for data, it could be two microservices communicating with one another.
These network requests and responses may sometimes contain large datasets. These request and responses generally take up whats called network bandwidth.
Network bandwidth is limited by network speeds which can involve the client and server network interfaces (or network cards), network cables, devices and ISP speeds
Bandwidth is often monitored and measured in bytes per sec, megabytes per sec, gigabytes per sec, etc.
If you are using the cloud, the server will have limits and specifications on its network bandwidth capabilities.
You shouldn't have to worry about this most of the time unless you are working for a company that deals with large data transmission where services that talk to each other exchange large messages. Most servers in the cloud can handle pretty large amounts of bandwidth, and latency will likely be your biggest challenge
Latency
Latency is the time taken for the client to receive a response from the server.
Latency is generally measured in milliseconds (ms) or seconds (s).
In my experience, the main way I monitor latency is generally through proxy or web server logs. Web logs will generally list each network request and details about the request such as latency so we know how long it took for the server to respond to the client.
This helps us understand timeouts, and long running requests which could impact customers
In our Chapter on Web servers we will take a look at web server monitoring which covers this topic in further detail.
To summarise, generally when a server is slow and not responding to requests from clients in a timely manner, we can get evidence of this by looking at web server logs.
Network devices in front of web servers, like load balancers , proxies or what we call "ingresses" may also provide web traffic logs which can indicate latency of requests as well.
This is a key focus point when monitoring HTTP web server traffic performance and stability.
Network Protocols
There are two major network protocols you will come across in the field
The two are TCP
and UDP
So when we talked about network connection and how the client and server establish this connection I mentioned there are some nuances to how this connection is established.
TCP
is the main network protocol used by the Web because its designed to be reliable.
Networks are flakey meaning a network packet is never garaunteed to arrive at from client to server
To make the network more reliable, TCP involves a handshake and a few network requests back and forth between client and server to establish connection.
This network connection handshake in TCP is designed to help ensure connections are established when network can be flakey in nature.
This comes at a performance cost, therefore there is another protocol called UDP
which is more of a "send and forget" type of network request. Where a client sends a request and waits for a response and will simply retry if it does not get a response.
A client may throw an error after a few retries when a UDP request fails.
Making a Server
We have learned that a server is the receiver of network traffic
A server has to run a process that listens on a port that is ready to receive traffic
If this server needs to be accessible on a private network only, we just need to ensure we can access it from another server on the same network and no firewall is blocking the port. Sometimes firewall software on a server can block ports. Linux has a basic network firewall and Windows has a firewall too.
Remember what I said - If we see "Connection refused" when trying to connect to our server, it means the port we are trying to connect to is not listening.
If we see "Connection timeout" it's generally a firewall blocking the request, so it hangs and times out
It's important to learn how to validate if a server port is open.
To do this lets use a tool called netcat or nc
:
nc -zv localhost 12345
nc: connect to localhost (127.0.0.1) port 12345 (tcp) failed: Connection refused
- The
-z
flag can be used to tell nc to report open ports, rather than initiate a connection - The
-v
flag tellsnc
to turn on verbose output so we can get more helpful information about the port being open or not.
Let's use nc
to start a server.
We can create a script to start our server if we wanted:
echo "Starting TCP server on port 12345..."
nc -lk 12345
- The
-l
flag tells netcat to listen on a give port for incoming connections - The
-k
flag tells netcat to listen for another connection once it has received on. Without-k
, netcat will close once it receives one connection
We can leave our server running in a terminal and open another to test the port
This helps us during monitoring to test whether a server is listening on a port or not:
nc -zv localhost 12345
Connection to localhost (127.0.0.1) 12345 port [tcp/*] succeeded!
Making a client
Now that we have a server , we can make a client that sends requests to the server.
The whole point of this exercise is to learn about how client and server connectivity works, so that we can learn how to monitor these requests and connections
We can use the nc
command to troubleshoot and test connectivity between clients and servers.
We will also learn about a couple of other important and popular network commands.
We can use the nc
command to create network requests.
Send one request to our server but hold the connection:
echo "Sending one message and holding connection!" | nc localhost 12345
The above is a basic TCP client.
Another network protocol which is built on top of TCP, is HTTP
.
HTTP is used in Web communications and we'll learn more about the Web in a future Chapter. This is how we create an HTTP connection and send an HTTP message:
curl -X POST http://localhost:12345 -d "test"`
We'll get into more variations of what the client can do. But before we do this, we need to learn a couple of popular network tools and we can monitor the network at the same time.
Network Connection Monitoring Tools
To monitor and further understand TCP connections, we can use two popular tools:
ss
- Another utility to investigate sockets.
ss
is used to dump socket statistics. It allows showing information similar to netstat. It can display more TCP and state information than other tools.
- Another utility to investigate sockets.
netstat
- Print network connections, routing tables, interface statistics, masquerade connections, and multicast memberships
We can monitor current TCP connections and see our server listening and also see the current in-progress open and established TCP connections between client and server
ss:
ss -a | grep 12345
netstat:
netstat -a | grep 12345
This will help us understand the connection lifecycle as well. We can view the connection state, in this case its LISTEN
and ESTABLISHED
We've learned now that LISTEN
is for servers generally listening on a port for inbound connections. ESTABLISHED
is the state when a connection is open and the client can send and receive messages to each other.
In our example client and server, we are keeping the connection open so we can observe it in our command line tools above.
TCP socket states represent the various stages a TCP connection goes through during its lifecycle.
Interestingly if we close our client by pressing CTRL+C
, we quickly use ss
or netstat
to check the connection and we will notice the ESTABLISHED
has gone and we now see a TIME_WAIT
state.
This is when a connection has been used by client and server and about to be closed.The connection is now in a "recycling" state where the operating system will get to re-use that port. Generally a connection will spend roughly 60 sec in TIMED_WAIT
before the operating system will be able to re-use that connection
The time wait timing can be adjusted in Linux, if we require a lot more connections quickly, we may reduce the time wait time
We can run the following loop that keeps sending messages every 5 seconds and will close the connection after sending each message. This allows us to observe TIME_WAIT
sockets
echo "Sending messages one after another!"
while true; do
echo "Sending message and closing connection!" | nc -q 5 localhost 12345
done
Here are the TCP socket states in order:
CLOSED
: The initial state. No connection exists.LISTEN
: The server is waiting for incoming connection requests.SYN_SENT
: The client has sent a connection request (SYN) and is waiting for a matching connection request (SYN-ACK) from the server.SYN_RECEIVED
: The server has received the client's connection request (SYN) and sent a connection acknowledgment (SYN-ACK), waiting for the final acknowledgment (ACK) from the client.ESTABLISHED
: The connection is open, and data can be sent and received between the client and server.FIN_WAIT_1
: The client or server has initiated the connection termination and is waiting for the other side to acknowledge (FIN).FIN_WAIT_2
: The side that initiated the termination has received the acknowledgment (ACK) of its FIN and is waiting for the other side to send its FIN.CLOSE_WAIT
: The side that received the first FIN is waiting to send its own FIN.CLOSING
: Both sides have sent FINs, but neither has received the final acknowledgment (ACK).LAST_ACK
: The side that sent the first FIN is waiting for the final acknowledgment (ACK) of its FIN.TIME_WAIT
: The side that sent the final acknowledgment (ACK) is waiting for a period of time to ensure the other side received it.CLOSED
: The connection is fully terminated, and no further communication is possible.
Network Traffic Monitoring
For all the above, we monitored network connections.
But once a connection is established the client and server can go back and forth and send and receive messages to one another.
To monitor this network traffic, I often rely on logging.
Traffic will flow from client to server, and the applications we run on servers generally have capability to log traffic requests.
All popular web servers have this feature that you can configure.
Once configured, server process will write incoming traffic logs to a file, where each line in the file represents a request.
It is generally in a similar format like this, either delimited by spaces or commas seperating each field:
<date> <client-IP> <request-info> <status> <time-taken-in-milliseconds> <etc>
With these logs, we can monitor where traffic is coming from, how long the request has taken to complete and also whether or not it was a success or failure.
This will be of great help when we start troubleshooting slow requests which can result in systems slowing down
We'll be covering web server logs and log monitoring in more detail in our chapter on Web servers