production | The Blog Pros

May 18, 2022

Debugging the Java Message Service (JMS) API Using Lightrun

The Java Message Service API (JMS) was developed by Sun Microsystems in the days of Java EE. The JMS API provides us with simple messaging abstractions, including message producer, message consumer, and so on. Messaging APIs let us place a message on a “queue” and consume messages placed into said queue. This is immensely useful for high-throughput systems — instead of wasting user time by performing a slow operation in real time, an enterprise application can send a message. This non-blocking approach enables extremely high throughput while maintaining reliability at scale.

The message carries a transactional context which provides some guarantees on deliverability and reliability. As a result, we can post a message in a method and then just return, which provides similar guarantees to the ones we have when writing to an ACID database.

April 4, 2022

Debugging JAXB Production Issues

Java Architecture for XML Binding (AKA JAXB API) is a popular API for marshaling XML data. It's a framework for mapping between XML documents and Java POJOs (Plain Old Java Objects, AKA regular Java classes) almost seamlessly. The API is very easy to use and many frameworks leverage it to provide their XML support. JAXB2.0 has gained popularity both in desktop applications (Java SE) and in application server code (Spring Boot, Java EE/Jakarta EE, Microprofile, etc.).

JAXB requires a runtime library but doesn't require static analysis, XML schema, or anything like that. While the schema isn't required, it's still the basis of a cool JAXB feature: the ability to generate Java sources from source schema!

March 21, 2022

Debugging Java Equals and Hashcode Performance in Production

I wrote a lot about the performance metrics of the equals method and hash code in this article. There are many nuances that can lead to performance problems in those methods. The problem is that some of those things can be well hidden.

To summarize the core problem: the hashcode method is central to the java collection API. Specifically, with the performance of hash tables (specifically the Map interface hash table). The same is true with the equals method. If we have anything more complex than a string object or a primitive, the overhead can quickly grow.

March 17, 2022

Debugging Race Conditions in Production

Race conditions can occur when a multithreaded application accesses a shared resource using over one thread. Unless we have guards in place, the result might depend on which thread "got there first". This is especially problematic when the state is changed externally.

A race can cause more than just incorrect behavior. It can enable a security vulnerability when the resource in question can be corrupted in the right way. A good example of race condition vulnerabilities is mangling memory. Let's say we have an admin user name that is restricted and privileged. You can't change your user name to admin because of validation. But you can change it to anything else...

March 14, 2022

How to Effectively Bridge the DevOps – R&D Gap Without Sacrificing Reliability

DevOps culture revolutionized our industry. Continuous Delivery and Continuous Integration made six sigma reliability commonplace. 20 years ago we would kick the production servers and listen to the hard drive spin, that was observability. Today’s DevOps teams deploy monitoring tools that provide development teams with deep insight into the production environment.

“O brave new world That has such people in’t!” – William Shakespeare

December 19, 2021

Extending Apache SkyWalking With Non-Breaking Breakpoints

Non-breaking breakpoints are breakpoints specifically designed for live production environments. With non-breaking breakpoints, reproducing production bugs locally or in staging is conveniently replaced with capturing them directly in production.

Like regular breakpoints, non-breaking breakpoints can be:

November 11, 2021

Production Horrors – Customer Miscommunication Leads to Ongoing Nightmare

This is a bit of a different story in the series. When I came up with the concept for production horrors my thoughts were mostly about a single day or a single event that made our production fail. Naturally, our mind gravitates towards crashes or issues like the recent Facebook outage. But last time around, I gave the example of problematic caching that led to a billing problem…

This time the production horror is of a different kind. It started well before the product reached production and in a different era. In a time before ajax, when the web was still in request-response mode and IE 6 was state of the art (truly a horror story). I was approached about consulting for a major bank that was running a huge project to modernize its trading infrastructure.

November 9, 2021

Kubernetes Logging in Production

Historically, in monolithic architectures, logs were stored directly on bare metal or virtual machines. They never left the machine disk and the operations team would check each one for logs as needed.

This worked on long-lived machines, but machines in the cloud are ephemeral. As more companies run their services on containers and orchestrate deployments with Kubernetes, logs can no longer be stored on machines, and implementing a log management strategy is of the utmost importance.

June 16, 2021June 18, 2021

A First Glimpse of Production Constraints for Developers

In most organizations, developers are not allowed to access the production environment for stability, security, or regulatory reasons. This is a quite good practice (enforced by many frameworks like COBIT or ITIL) to restrict access to production but a major drawback is a mental distance created between developers and the real world. Likewise, the monitoring is usually only managed by operators and very little feedback is provided to developers except when they have to fix application bugs (ASAP, of course). As a matter of fact, most developers have very little idea of what a real production environment looks like and, more important, of the non-functional requirements allowing to write production-proof code.

Involving developers into resolving production issues is a good thing for two main reasons:

February 17, 2021

Writing Better Production Readiness Checklists

When we think of reliability tools, we may overlook the humble checklist. While tools like SLOs represent the cutting edge of SRE, checklists have been recommended in many industries such as surgery and aviation for almost a century. But checklists owe this long and widespread adoption to their usefulness.

Checklists can help limit errors when deploying code to production. In this blog post, we’ll cover:

August 9, 2020

The Challenges of Adopting K8s for Production and Tips to Avoid Them

From its discreet debut in 2000 with the jail command introduced by FREEBSD, container technology is now firmly occupying the center stage of modern software delivery. Kubernetes is the de facto standard today for container orchestration and reputedly the best in the containerization space. And the timing is right for the platform, as Gartner has projected that by 2023, over 70% of global enterprises will be running two or more containerized applications -- up by 20% over last year.

Yet, Kubernetes remains complex to manage at enterprise scale, where workloads are heavy, and SLA compliance is critical. Even when Kubernetes is running smoothly in the test environment, running it in production needs to be approached with care to avoid pitfalls.

April 23, 2020

CUBA: Getting Ready for Production

“It works on my local machine!” Nowadays it sounds like a meme, but the problem of development environment vs. production environment still exists. As a developer, you should always keep in mind that your application will start working in the production environment one day. In this article, we will talk about some CUBA-specific things that will help you to avoid problems when your application will go to production.

Coding Guidelines

Prefer Services

Almost every CUBA application implements some business logic algorithms. The best practice here is to implement all business logic in CUBA Services. All other classes: screen controllers, application listeners, etc. should delegate business logic execution to services. This approach has the following advantages:

February 12, 2019

The Path to Production: How And Where to Segregate Test Environments

Bringing a new tool into an organization is no small task. Adopting a CI/CD tool, or any other tool should follow a period of research, analysis, and alignment within your organization.

In my last post, I explained how the precursor to any successful tool adoption is about people: alignment on purpose, getting some “before” metrics to support your assessment, and setting expectations appropriately. I recommend you revisit that post to best prepare your team before you enact any tool change.