7 Common Kubernetes Backup and Recovery Mistakes

As enterprises move Kubernetes into production and increase the number of Kubernetes clusters and applications in use, they need to deliver the same “enterprise-level” services as for other production applications. Implementing Kubernetes backup is critical to protect your applications in the event of an accident, system failure, or deliberate attack. You need an effective and appropriate backup strategy—in addition to whatever built-in resiliency and data protection features your applications may have. 

There are several use cases that your Kubernetes backup and recovery strategy should satisfy:

Delete Multiple Resources and Resource Groups in Azure With Tags

You might have noticed that resources comprising some Azure services such as Azure Kubernetes Service (AKS) span multiple resource groups by default. In some cases, you might intentionally want to segregate resources such as disks and network interfaces from VMs by placing them in different resource groups for better management. A common problem arising from the resource spread is that you might find it challenging to delete multiple resources and resource groups to entirely remove a service from a subscription.

We can solve the problem by using resource tags to associate resources and resource groups to a service. Tags are key-value pairs that can be applied to your Azure resources, resource groups, and subscriptions. Of course, you can use tags for many other purposes apart from resource management. The Azure docs website has a detailed guide on the various resource naming and tagging strategies and patterns.

On Git and Cognitive Load

Any developer working in a modern engineering organization is likely working in git. Git, written by Linus Torvalds in 2005, has been since its creation, basically the ubiquitous tool for source control management. Git gained widespread adoption across engineering organizations as the successor to existing source control management such as Apache Subversion (SVN) and Concurrent Versions System (CVS), and this was mostly a byproduct of the timing.

Git predates cloud and improved network connectivity, and therefore the best solution to managing source control at scale was to decentralize and leverage local dev environments to power engineering organizations. Fast forward 17 years, this is no longer the case. Cloud is de facto, networking and internet connectivity is blazing fast, and this changes everything.

How To Set Up a Scalable and Highly-Available GraphQL API in Minutes

A modern GraphQL API layer for cloud-native applications needs to possess two characteristics: horizontal scalability and high availability. 

Horizontal scalability adds more machines to your API infrastructure, whereas vertical scalability adds more CPUs, RAM, and other resources to an existing machine that runs the API layer. While vertical scalability works to a certain extent, the horizontally scalable API layer can scale beyond the capacity of a single machine. 

6 New Features in Data Grid 8.3 Release

Red Hat Data Grid is a distributed, cloud-based datastore offering very fast response times as an in-memory database. The latest version features cross-site replication with more observability, two new types of SQL cache stores for scaling applications with large datasets, improved security, support for Helm charts, and a better command-line interface (CLI).

This article is an overview of new features and enhancements in this latest version of Red Hat Data Grid.

AWS Lambda Aliases: A Practical Approach

Lambda functions are a fundamental component of the AWS serverless model. They provide a simple, cost-effective, and easily scalable programming model based on FaaS (functions as a service).

Lambda ARNs

Lambda functions can be referenced by their ARN (Amazon Resource Name). For example, the ARN to reference a 'helloworld' function in the 'us-east-2' region in account '3445435' would be:

How To Build a Self-Serve Data Architecture for Presto Across Clouds

This article highlights the synergy between the two widely adopted open-source projects, Alluxio and Presto, and demonstrates how together they deliver a self-serve data architecture across clouds

What Makes an Architecture Self-Serve?

Condition 1: Evolution of the Data Platform Does Not Require Changes

All data platforms evolve over time, including the addition of a new data store, compute engine, or a new team that needs to access shared data. In either case, a data platform is self-serve if it does not require changes to accommodate evolution.

Next-Gen Data Pipes With Spark, Kafka, and K8s: Part 2

Introduction 

In our previous article, we discussed two emerging options for building new-age data pipes using stream processing. One option leverages Apache Spark for stream processing and the other makes use of a Kafka-Kubernetes combination of any cloud platform for distributed computing. The first approach is reasonably popular, and a lot has already been written about it. However, the second option is catching up in the market as that is far less complex to set up and easier to maintain. Also, data-on-the-cloud is a natural outcome of the technological drivers that are prevailing in the market. So, this article will focus on the second approach to see how it can be implemented in different cloud environments.

Kafka-K8s Streaming Approach in Cloud

In this approach, if the number of partitions in the Kafka topic matches with the replication factor of the pods in the Kubernetes cluster, then the pods together form a consumer group and ensure all the advantages of distributed computing. It can be well depicted through the below equation:

Introducing KoolKits: OSS Debugging Toolkits for Kubernetes

KoolKits (Kubernetes toolkits) are highly-opinionated, language-specific, batteries-included debug container images for Kubernetes. In practice, they’re what you would’ve installed on your production pods if you were stuck during a tough debug session in an unfamiliar shell.

To briefly give some background, note that these container images are intended for use with the new kubectl debug feature, which spins up Ephemeral containers for interactive troubleshooting. A KoolKit will be pulled by kubectl debug, spun up as a container in your pod, and have the ability to access the same process namespace as your original container.

The Cloud Challenge: Choice Paralysis and the Bad Strategy of “On-Premising” the Cloud

The cloud is vast. It is natural that we look at the cloud and understand it through the narrow lens of our previous experiences. This can translate into solutions that underutilize/overutilize one area of the cloud over other areas. Innovative and robust solutions often require the use of the full spectrum.

Most companies are migrating to the cloud because they want to unlock new business opportunities, but many of them stumble because they continue to build solutions that are only suitable for on-premises. Imagine your IT workloads are on servers that sit in the basement of your corporate office. Would moving these servers to the first floor of that office open any new opportunities for your business? Of course not. Lifting and shifting your servers to the cloud might save you money, but it certainly won’t take you any further. The first and most important thing to remember about the cloud is that the cloud is not a place, it is a model. Building for the cloud requires a mindset change, not a location change. 

Remote Debugging Cloud Foundry Apps

Context

While debugging Java applications in an IDE like IntelliJ Idea is straightforward, it involves slightly more effort to debug a remotely running app. While debugging a remotely running app in production is generally not a good idea, the ability to remote debug apps in lower environments, like integration testing, may be useful.

Cloud Foundry is a platform that allows you to deploy and run your workloads easily and intuitively. IntelliJ Idea is a popular IDE, especially for Java developers.

When NOT To Use Apache Kafka

Apache Kafka is the de facto standard for event streaming to process data in motion. With its significant adoption growth across all industries, I get a very valid question every week: When do I NOT use Apache Kafka? What limitations does the event streaming platform have? When does Kafka simply not provide the needed capabilities? How do I qualify Kafka out as not the right tool for the job? 

This blog post explores the DOs and DONTs. Separate sections explain when to use Kafka, when NOT to use Kafka, and when to MAYBE use Kafka.

Cross-Region Lambda Invocation in AWS

AWS Lambda makes it easy to build highly available serverless applications quickly. However, setting up resources in multiple regions makes it difficult to manage the applications since cross-region access is limited by design in AWS for security and performance reasons. Fortunately, AWS makes it easy to access resources across regions in your intend to do so. 

In this example. I'll show you how to invoke a Lambda in one region from a Lambda in another region.

Using a Custom CockroachDB Image With Docker and Kubernetes

Motivation

Cockroach Labs ships new images when a new maintenance release is available, typically on monthly basis. CockroachDB does not rely on the base OS image for any third-party libraries except for geospatial and kerberos packages. That said, OS images may be vulnerable to security exposures depending on their age. Given CockroachDB is written in Golang, replacing the OS can be a trivial task.

Originally, CockroachDB image was shipped with a Debian OS image. CRL then switched to UBI to accommodate a wider scope of use cases including Red Hat OpenShift. UBI images are shipped regularly with CVE patches and given the nature of security vulnerabilities, it is common that the latest and greatest images may or may not have CVEs. Given the preamble, we're going to cover how to replace the base image for CockroachDB and use it in Kubernetes.

Kubernetes Hardening Tutorial Part 2: Network

In the first part of this tutorial, we discussed how to enhance your Pod security in your K8s cluster. If you haven't read it yet, here's the link.

Today, we will walk you through networking-related security issues in a Kubernetes cluster and how to enhance them. After reading this tutorial, you will be able to:

Expert: Bestpal (Chatting) Application Using Huawei Clouddb, Auth Service, Cloud Function, and Push Kit

Introduction

In this article, we will learn about chat options between two people, including how they can share texts with each other. The application needs to have instant messaging so once a user sends the message to a friend over the application, the friend will receive the push notification at the given time. The quintessence of an app like the instant application is available and reacts to ongoing actions. Also, push notifications can be an excellent promotional tool that can be used to inform users about updates and new functionalities.

The Future Trends Driving Open-Source Database Programs

Introduction

There are now thousands of options for deciding what open source project to choose for in-house development or what project to join as a contributor.

According to the open-source Wikipedia page, there are “more than 180,000 open-source projects available and more than 1,400 unique licenses, the complexity of deciding how to manage open-source use within ‘closed-source’ commercial enterprises has dramatically increased.”

There are infinite combinations for what an open-source tech stack might look like for a given project. However, the petabytes of data these apps produce will end up in one location: Databases. And specifically, open-source databases, tools, and middleware to optimize and access that data. 

So, if you want to back a winner in choosing a project to contribute to or build upon, open-source database programs are a good bet. 

What is driving the shift from proprietary, vendor-driven innovation to the open-source model for data management? We outline the top trends shaping the future of open source database programs. And if you have not started by joining an open-source project, we highlight a few example projects where you can help lead the way.

Cloud Computing and SaaS

The one-two punch of open source code repositories and cloud computing have permanently disrupted tech innovation. ‘Data is new oil’ has collided with ‘software is eating the world’ to create the SaaS industry. And it’s all in the cloud. Developers can scale services to fit their needs, customize applications, and access cloud services from anywhere, on any device with an internet connection. For users, SaaS software is already developed ‘out of the box’ and automatically paired with a database where the expectation is that any action or query performed in the software will get an instant response whether on desktop or mobile. 

In a sense, this all works like magic to users but to developers, it’s a daily challenge to come up with new computing models and code to keep it all running. Open-source communities have become the engine that drives this innovation.

Under this relatively new computing model, developers can get applications to market quickly, without heavy investments in infrastructure costs and maintenance. This was simply not possible 20 years ago. It’s turning virtually every company into a software-driven company. The cloud, paired with open source databases, gives developers access to the innovative storage and access technologies available to the Amazons and Googles of the world, beyond what proprietary vendors can possibly do to keep up. In its latest quarter reported in September 2021, Oracle’s proprietary database license revenue was down 8% over last year, but its cloud business was up 40%.

Take traditional banks with their proprietary databases and code. Suddenly they find themselves competing for share-of-wallet with challenger banks like Chime. Chime only exists in the cloud (they have no branches) and present itself as a mobile-first banking SaaS app. They have built a billion-dollar company using a cloud-first, open-source strategy that established banks are scrambling to equal.

Hybrid Cloud

Hybrid Cloud is where the open-source databases will shine. As the name implies, hybrid clouds use a combination of on-premises, private cloud, and third-party cloud services with orchestration between the platforms. Common applications work between the models. This configuration has grown recently to support the need by enterprises to keep certain workloads on-premises while enjoying the benefits the cloud has to offer. 

Open source allows them to have a common set of tools, even if the databases are different, that are adapted to all environments for use cases such as disaster and recovery and balancing workloads.

Big Data 

The rapidly increasing volume and complexity of data are due to growing mobile traffic, cloud-computing traffic, and the proliferation of new technologies including IoT and AI. According to Research And Market’s 2020 report on data usage, over 2.5 quintillions (that’s 1018) bytes of data are generated every day. The need is for constant innovation in data storage and retrieval to keep up. Open source is showing itself to be the best approach to innovation in data science, along with hardware optimization and coding efficiency.

Data is the most crucial reason open-source database projects will remain among the most popular (second only to operating systems).  It keeps piling up, and new applications like IoT and social media produce vast amounts of it daily, and in high volume, at high velocity, and highly variable formats.

There need to be ways to analyze all this data, at scale. This is where open-source databases are outperforming traditional ones. Even though the dynamics of open source and community-developed projects have changed in recent years, communal development is still the best way to promote innovation in data access for 99% of the applications out there.

Database Agnostic Tools and Middleware

The new reality is a technology universe where firms have a more distributed approach to database services. The need is to explore multiple database instances from various database vendors to create hybrids that can be hosted on-prem, cloud or both, but simplified by a standard set of tools and middleware to access the data.

Whether the database is SQL or NoSQL there is a commonality underpinning modern relational databases in tables, rows, and columns. This evergreen structure allows new access tools to emerge that unite data distributed across hybrid infrastructures. For example, open-source solutions such as ShardingSphere have SQL agnostic tools to query and retrieve data across distributed data stores.

Call it a fear of commitment on the part of customers, but the trend is that customers don’t want to be locked into a single massive vendor like Microsoft or IBM anymore.

Source Code Access

Vendor lock-in was considered a good thing 30 years ago. You had one ‘neck to choke’ if something went wrong, and the vendor had a staff of engineers on hand to patch and update continually. You also paid a fortune for the licenses and a hefty annual fee to support and upgrade the software. A single Oracle-based app in the enterprise can have a lifetime cost running into the millions of dollars, just for the database software and maintenance, not including development.

Vendor lock-in is no longer cool in the era of agile development and ‘break things fast’ code sprints. The key to open-source is the ‘source’ part. Having access to communally curated source code allows developers to make changes based on their needs and priorities, not of the software vendor. 

Open Source Communities

Access to the source code is not just valuable to a single developer or company. It’s valuable to the entire ecosystem of open-source community members. It’s a virtuous cycle where the more people contribute to make the software stable and useful, the more people will join a project, and so on.

Much like how a good blog post or a tweet spreads virally, great open-source software leverages network effects. It is the community that is the source of promotion for that virality. Dozens of contributor communities and thousands of developers worldwide are happily iterating open-source code for database projects built natively as distributed systems from the ground up. 

If worked on by a diverse community, this approach leads to more stable software than a single team of developers hacking away at bugs in a proprietary system. Examples of forward-thinking open-source database communities include Apache ShardingSphere, CockroachDB, Yugabite, and ClickHouse. For example, ShardingSphere has over 450 contributors to its open-source codebase in Asia alone and is spreading rapidly around the world. Significant growth in contribution and adoption for database software is certain in the coming years as companies demand ever greater access, speed, and control of their growing data streams.

Momentum is on the side of open-source database projects. There are rich communities of open source contributors who worked out the major flaws of 0.x versions, create libraries, repositories, documentation, and even YouTube videos to ‘pay it forward’ for new contributors. A 2020 survey by O’Reilly Media and IBM polled 3,400 developers and tech managers in 2020. The survey reported:


  • 94% of respondents rated open-source software equal or better than proprietary software. 
  • 70% of respondents preferred open-source cloud providers.
  • 65%  of respondents agreed that contributing to open-source projects results in better professional opportunities.

The Virtual Software Catalog

Open-source apps are easy to find. Do a Google search for an app like ‘open-source data back up and restore’ and the top result shows there at least 17 of these in the top search result, ‘The Top 17 Free and Open Source Backup Solutions’, and probably many more.

As of Jan 2020, GitHub reports having 40+ million users and 190+ million repositories (including 28 million public repositories).  You can search GitHub in many ways, for example, GitHub supports advanced search in certain fields, like repository title, description, and README. If you want to find some cool repository to learn database stuff, you can search like this: in:name database.

You are more likely to find a relevant solution ready to modify in the open-source software universe than in a proprietary product suite. 

CIOs and Security

More than two-thirds of CIOs are concerned about losing their freedom to cloud providers. As a result, this has become another main driver of open-source database adoption.

In the age of ransomware, data security is a matter of survival for enterprises. Open-source technology enables organizations to take complete control over their security needs by providing full access to source code and configuration to extend the software however they like.

There is certainly a counter-argument to the security of open-source.  Rapid adoption by enterprises seems to be settling the argument in favor of open-source.  No company will remain untouched by the power of open-source database progress.

Emerging Examples

As this multiverse progresses, we will experience a tapestry of emerging technologies that will become the top choice for open-source database programs. 

As an example, the emerging open-source ecosystem provided by Apache Shardingsphere with their technology agnostic distributed database plug and play modules provides ease of use and adoption for developers as contributors or in the enterprise. 

Kubernetes (the cloud container orchestration technology by Google is now open source) is another key API of choice for open-source database deployments that programmers can benefit from. 

In the next 5 years, open-source development will be driven by necessity. Industry trends predict that open-source won’t be an option, it will be the standard. As a result, corporations will have to embrace it to stay relevant.