eBPF: Observability with Zero Code Instrumentation [Video]

Current observability practice is largely based on manual instrumentation, which requires adding code in relevant points in the user’s business logic code to generate telemetry data. This can become quite burdensome and create a barrier to entry for many wishing to implement observability in their environment. This is especially true in Kubernetes environments and microservices architecture.

eBPF is an exciting technology for Linux kernel-level instrumentation, which bears the promise of no-code instrumentation and easier observability into Kubernetes environments (alongside other benefits for networking and security).

Open Networking for Network Switches – How the Open-Source DENT Project Levels the Playing Field

The promise of an open-source networking operating system (NOS) is enticing. Compared to legacy networking (Cisco, Arista, Juniper) which is proprietary, expensive, and complex to operate, the open networking model is disaggregated, easy to automate, and provides major cost reductions. 

An open-source NOS could give segments like data centers, retail, remote offices, and campuses an alternative solution that has significantly reduced the cost of goods and services (COGS), reduced integration time, wide access to hardware, and provides support with existing Linux toolchains, using, for example, the Ethernet switch device driver model Switchdev as infrastructure with value add apps on top.

How to Trace Linux System Calls in Production (Without Breaking Performance)

If you need to dynamically trace Linux process system calls, you might first consider strace. strace is simple to use and works well for issues such as "Why can't the software run on this machine?" However, if you're running a trace in a production environment, strace is NOT a good choice. It introduces a substantial amount of overhead. According to a performance test conducted by Arnaldo Carvalho de Melo, a senior software engineer at Red Hat, the process traced using strace ran 173 times slower, which is disastrous for a production environment.

So are there any tools that excel at tracing system calls in a production environment? The answer is YES. This blog post introduces perf and traceloop, two commonly used command-line tools, to help you trace system calls in a production environment.

Why We Disable Linux’s THP Feature for Databases

Linux's memory management system is clear to the user. However, if you're not familiar with its working principles, you might meet unexpected performance issues. That's especially true for sophisticated software like databases. When databases are running in Linux, even small system variations might impact performance.

After an in-depth investigation, we found that Transparent Huge Page (THP), a Linux memory management feature, often slows down database performance. In this post, I'll describe how THP causes performance to fluctuate, the typical symptoms, and our recommended solutions.

Why We Switched from bcc-tools to libbpf-tools for Linux BPF Performance Analysis

Distributed clusters might encounter performance problems or unpredictable failures, especially when they are running in the cloud. Of all the kinds of failures, kernel failures may be the most difficult to analyze and simulate.

A practical solution is Berkeley Packet Filter (BPF), a highly flexible, efficient virtual machine that runs in the Linux kernel. It allows bytecode to be safely executed in various hooks, which exist in a variety of Linux kernel subsystems. BPF is mainly used for networking, tracing, and security.