The Future Trends Driving Open-Source Database Programs

Introduction

There are now thousands of options for deciding what open source project to choose for in-house development or what project to join as a contributor.

According to the open-source Wikipedia page, there are “more than 180,000 open-source projects available and more than 1,400 unique licenses, the complexity of deciding how to manage open-source use within ‘closed-source’ commercial enterprises has dramatically increased.”

There are infinite combinations for what an open-source tech stack might look like for a given project. However, the petabytes of data these apps produce will end up in one location: Databases. And specifically, open-source databases, tools, and middleware to optimize and access that data.

So, if you want to back a winner in choosing a project to contribute to or build upon, open-source database programs are a good bet.

What is driving the shift from proprietary, vendor-driven innovation to the open-source model for data management? We outline the top trends shaping the future of open source database programs. And if you have not started by joining an open-source project, we highlight a few example projects where you can help lead the way.

Cloud Computing and SaaS

The one-two punch of open source code repositories and cloud computing have permanently disrupted tech innovation. ‘Data is new oil’ has collided with ‘software is eating the world’ to create the SaaS industry. And it’s all in the cloud. Developers can scale services to fit their needs, customize applications, and access cloud services from anywhere, on any device with an internet connection. For users, SaaS software is already developed ‘out of the box’ and automatically paired with a database where the expectation is that any action or query performed in the software will get an instant response whether on desktop or mobile.

In a sense, this all works like magic to users but to developers, it’s a daily challenge to come up with new computing models and code to keep it all running. Open-source communities have become the engine that drives this innovation.

Under this relatively new computing model, developers can get applications to market quickly, without heavy investments in infrastructure costs and maintenance. This was simply not possible 20 years ago. It’s turning virtually every company into a software-driven company. The cloud, paired with open source databases, gives developers access to the innovative storage and access technologies available to the Amazons and Googles of the world, beyond what proprietary vendors can possibly do to keep up. In its latest quarter reported in September 2021, Oracle’s proprietary database license revenue was down 8% over last year, but its cloud business was up 40%.

Take traditional banks with their proprietary databases and code. Suddenly they find themselves competing for share-of-wallet with challenger banks like Chime. Chime only exists in the cloud (they have no branches) and present itself as a mobile-first banking SaaS app. They have built a billion-dollar company using a cloud-first, open-source strategy that established banks are scrambling to equal.

Hybrid Cloud

Hybrid Cloud is where the open-source databases will shine. As the name implies, hybrid clouds use a combination of on-premises, private cloud, and third-party cloud services with orchestration between the platforms. Common applications work between the models. This configuration has grown recently to support the need by enterprises to keep certain workloads on-premises while enjoying the benefits the cloud has to offer.

Open source allows them to have a common set of tools, even if the databases are different, that are adapted to all environments for use cases such as disaster and recovery and balancing workloads.

Big Data

The rapidly increasing volume and complexity of data are due to growing mobile traffic, cloud-computing traffic, and the proliferation of new technologies including IoT and AI. According to Research And Market’s 2020 report on data usage, over 2.5 quintillions (that’s 1018) bytes of data are generated every day. The need is for constant innovation in data storage and retrieval to keep up. Open source is showing itself to be the best approach to innovation in data science, along with hardware optimization and coding efficiency.

Data is the most crucial reason open-source database projects will remain among the most popular (second only to operating systems). It keeps piling up, and new applications like IoT and social media produce vast amounts of it daily, and in high volume, at high velocity, and highly variable formats.

There need to be ways to analyze all this data, at scale. This is where open-source databases are outperforming traditional ones. Even though the dynamics of open source and community-developed projects have changed in recent years, communal development is still the best way to promote innovation in data access for 99% of the applications out there.

Database Agnostic Tools and Middleware

The new reality is a technology universe where firms have a more distributed approach to database services. The need is to explore multiple database instances from various database vendors to create hybrids that can be hosted on-prem, cloud or both, but simplified by a standard set of tools and middleware to access the data.

Whether the database is SQL or NoSQL there is a commonality underpinning modern relational databases in tables, rows, and columns. This evergreen structure allows new access tools to emerge that unite data distributed across hybrid infrastructures. For example, open-source solutions such as ShardingSphere have SQL agnostic tools to query and retrieve data across distributed data stores.

Call it a fear of commitment on the part of customers, but the trend is that customers don’t want to be locked into a single massive vendor like Microsoft or IBM anymore.

Source Code Access

Vendor lock-in was considered a good thing 30 years ago. You had one ‘neck to choke’ if something went wrong, and the vendor had a staff of engineers on hand to patch and update continually. You also paid a fortune for the licenses and a hefty annual fee to support and upgrade the software. A single Oracle-based app in the enterprise can have a lifetime cost running into the millions of dollars, just for the database software and maintenance, not including development.

Vendor lock-in is no longer cool in the era of agile development and ‘break things fast’ code sprints. The key to open-source is the ‘source’ part. Having access to communally curated source code allows developers to make changes based on their needs and priorities, not of the software vendor.

Open Source Communities

Access to the source code is not just valuable to a single developer or company. It’s valuable to the entire ecosystem of open-source community members. It’s a virtuous cycle where the more people contribute to make the software stable and useful, the more people will join a project, and so on.

Much like how a good blog post or a tweet spreads virally, great open-source software leverages network effects. It is the community that is the source of promotion for that virality. Dozens of contributor communities and thousands of developers worldwide are happily iterating open-source code for database projects built natively as distributed systems from the ground up.

If worked on by a diverse community, this approach leads to more stable software than a single team of developers hacking away at bugs in a proprietary system. Examples of forward-thinking open-source database communities include Apache ShardingSphere, CockroachDB, Yugabite, and ClickHouse. For example, ShardingSphere has over 450 contributors to its open-source codebase in Asia alone and is spreading rapidly around the world. Significant growth in contribution and adoption for database software is certain in the coming years as companies demand ever greater access, speed, and control of their growing data streams.

Momentum is on the side of open-source database projects. There are rich communities of open source contributors who worked out the major flaws of 0.x versions, create libraries, repositories, documentation, and even YouTube videos to ‘pay it forward’ for new contributors. A 2020 survey by O’Reilly Media and IBM polled 3,400 developers and tech managers in 2020. The survey reported:

94% of respondents rated open-source software equal or better than proprietary software.
70% of respondents preferred open-source cloud providers.
65% of respondents agreed that contributing to open-source projects results in better professional opportunities.

The Virtual Software Catalog

Open-source apps are easy to find. Do a Google search for an app like ‘open-source data back up and restore’ and the top result shows there at least 17 of these in the top search result, ‘The Top 17 Free and Open Source Backup Solutions’, and probably many more.

As of Jan 2020, GitHub reports having 40+ million users and 190+ million repositories (including 28 million public repositories). You can search GitHub in many ways, for example, GitHub supports advanced search in certain fields, like repository title, description, and README. If you want to find some cool repository to learn database stuff, you can search like this: in:name database.

You are more likely to find a relevant solution ready to modify in the open-source software universe than in a proprietary product suite.

CIOs and Security

More than two-thirds of CIOs are concerned about losing their freedom to cloud providers. As a result, this has become another main driver of open-source database adoption.

In the age of ransomware, data security is a matter of survival for enterprises. Open-source technology enables organizations to take complete control over their security needs by providing full access to source code and configuration to extend the software however they like.

There is certainly a counter-argument to the security of open-source. Rapid adoption by enterprises seems to be settling the argument in favor of open-source. No company will remain untouched by the power of open-source database progress.

Emerging Examples

As this multiverse progresses, we will experience a tapestry of emerging technologies that will become the top choice for open-source database programs.

As an example, the emerging open-source ecosystem provided by Apache Shardingsphere with their technology agnostic distributed database plug and play modules provides ease of use and adoption for developers as contributors or in the enterprise.

Kubernetes (the cloud container orchestration technology by Google is now open source) is another key API of choice for open-source database deployments that programmers can benefit from.

In the next 5 years, open-source development will be driven by necessity. Industry trends predict that open-source won’t be an option, it will be the standard. As a result, corporations will have to embrace it to stay relevant.