Using a Possibility Tree for Fast String Parsing

The Raygun data processing pipeline is kept pretty busy — handling over 90 million crash reports daily. So, needless to say, we need the processing pipeline to be as efficient as possible to reduce resource usage and avoid costly scaling.

One of the many operations during processing is to parse a user-agent string (UA string) wherever one is present. We've gone through several rounds of performance optimizations over the years, and during one of these rounds, the user-agent parser stuck out as the slowest component. From the UA string, we determine the operating system, browser, device, and their versions (throughout this post, I’ll sometimes refer to all 3 of these things as “products”). This information is used for indexing so that Raygun users can filter their crash reports by these dimensions.

CategoriesUncategorized