⬅  Return

An Introduction to eBPF


eBPF-website

Looking forward to the documentary!

What eBPF stands for

eBPF stands for extended Berkeley Packet Filter, or extended BPF. BPF was first introduced to Linux in 1997 in the tcpdump utility to capture packets for tracing. It then evolved to extended BPF in 2014, with significant extended functionalities, such as changes in the BPF instruction set, the addition of map, and more.

Why eBPF

Historically, to extend Linux kernel’s behaviours, one would either:

  • propose changes directly to the Linux kernel, which could take quite some while for community to evaluate and accept; or
  • make kernel modules, which can be loaded and unloaded.

In both approaches, safety remains a concern. If the new kernel code is unsafe, it can corrupt the entire kernel.

eBPF brings a new approach to the table by running sand-boxed programs in the kernel that can be dynamically loaded and unloaded at runtime. In other words, it is a safer and more efficient approach to extend kernel functionalities, without the need to restart existing processes.

eBPF improves the extensibility of kernel and spark more innovations at the kernel level. The technology could be used for a variety of applications in observability, tracing, networking, security, and more.

Example: run eBPF program with BCC

In this example, you will run an eBPF program with BCC to see how eBPF can be used to monitor TCP traffic. The example is fairly simple and you will not need to write any eBPF code yourself. This serves a good starting point for subsequent learning and exploration.

So what is BCC? BCC (BPF Compiler Collection) is a toolkit or framework that offers an easy approach to write eBPF programs in Python or Lua. While it is not necessarily the recommended approach to develop eBPF programs for production, it is a low-barrier way to get hands-on and learn about eBPF.

To start, first install BCC on your system.

BCC comes with many tools available for use. These tools are installed with -bpfcc extensions. Here’s a diagram for BCC tracing tools:

BCC-tools

source: https://github.com/iovisor/bcc/tree/master/images

For this example, run the tcpconnect tool:

tcpconnect-bpfcc    # might need sudo

You should see an empty table with the following headers:

PID    COMM         IP SADDR            DADDR            DPORT

eBPF programs are event-driven. The tcpconnect tool should print one table entry for every active TCP connection, so the next step would be to start some TCP connections.

Open a new terminal and run the following:

curl httpbin.org/get    # HTTP GET request to httpbin.org
ssh 192.168.3.2         # SSH connection to any random IP

In the first terminal where tcpconnect is running, you should see two entries:

PID    COMM         IP SADDR            DADDR            DPORT
11035  curl         4  10.188.0.2       54.83.187.171    80  
11043  ssh          4  10.188.0.2       192.168.3.2      22

With this tool, you can look for unexpected connections and improve infrastructure monitoring.

For more example on the tcpconnect tool, see here.

Final Words

eBPF is a large and interesting topic to learn about and this blog only scratches the surface, so here are some more resources.

I found the book Learning eBPF by Liz Rice exceptionally written. Chapters are complemented with examples that are easy to follow. This is a superb resource for developers who want to start programming in eBPF.

The BCC repository mentioned earlier also includes many tools and examples to play with.

Lastly, on the offensive side, there are also DEF CON talks on eBPF, such as Evil eBPF.

Happy exploring!