Is Your Data Infrastructure Stifling Innovation?

There are myriad reasons why an estimated 90% of startups fail. You need a great idea (and not just one great idea); you need inspiration, funding, smart people — and a fair amount of luck. Miss any one of these factors, and failure might be a foregone conclusion.

For young companies or small teams that build applications, data can be another stumbling block. The databases they rely on have historically stymied innovation by being complex and costly to spin up, manage, and maintain. Proofs of concept — ideas with the potential to turn into something big — can die before even being tested due to a lack of funding or database capacity.

Taking Your Database Beyond a Single Kubernetes Cluster

Global applications need a data layer that is as distributed as the users they serve. Apache Cassandra has risen to this challenge, handling data needs for the likes of Apple, Netflix, and Sony. Traditionally, managing data layers for a distributed application was handled with dedicated teams to manage the deployment and operations of thousands of nodes — both on-premises and in the cloud.

To alleviate much of the load felt by DevOps teams, we evolved a number of these practices and patterns in K8ssandra, leveraging the common control plane afforded by Kubernetes (K8s) There has been a catch though — running a database (or indeed any application) across multiple regions or K8s clusters is tricky without proper care and planning up front.

Getting Started With ScyllaDB Cloud Using Node.js (Part 1)

In this article, we will review the basics of ScyllaDB, then create and deploy a cluster on AWS using Scylla Cloud.

What’s a CRUD App?

CRUD stands for Create, Read, Update and Delete. In this article, we will build a simple application that will connect to our database and do just that using NodeJS and Scylla Cloud.

How Databases Have Changed

To learn about the current and future state of databases, we spoke with and received insights from 19 IT professionals. We asked, "How have databases changed in the past year or two?" Here’s what they shared with us:

Cloud

  • The biggest trend is a massive transition to fully managed database services in the cloud. This shift gives developers the ability to work with data to support both real-time transactional apps and deep analytics, by using a single platform that minimizes data movement and allows them to extract value faster.
  • 1) It used to be all about cost reduction, today the bigger motivation is becoming more real-time as a company. Make decisions at real-time — detect fraud, risk, inventory optimization. Trying to build next-gen apps to provide more contextual UX or improve business process.

    2) Also, the ability to do advanced analytics like AI/ML.

    3) Not just do for internal purposes, want to build a data-intensive application and make available to users — personalization, offers, real-time risk engines, fraud detection, recommendation, predictive maintenance. Go beyond managing the business to improving UX/CX to drive revenue and reduce cost. Continue to be performant and scale. Different types of data require different types of data models.

    4) Cloud delivery model able to deploy data platform and databases hybrid multi-cloud, on-prem and port workloads to different deployment environments.

Choice

  • The most substantial change we’ve seen in the past couple of years is the explosion of choices available through mainstream and specialty cloud vendors. Companies like Snowflake are capturing a lot of enterprises looking for help managing their data warehouses, while major vendors like Azure and Google Cloud are capitalizing on popular products like MySQL and Postgres by offering them as a managed service in their offerings.

DBaaS

  • Not a lot in the databases themselves. Most of the activity has been on the NoSQL side. We are seeing more comprehensive and better support from the cloud vendors. E.g., AWS support for managed SQL server. DBaaS has growth potential and potential to mean customers who won’t spend on a top-notch DBA have access to a database.
  • 1) As a service acceleration as a delivery modality for testing and production. 2) New kinds of databases evolving a set of tools that can be provided by MongoDB, Redis, Neo, and partners. 3) What’s happening from a category with graph and time series. 4) Move to the cloud and containerization includes the fluidity of different platforms. Playing well with different technological evolutions. Spark and HDFS running on their own rather than Hadoop.

Fit

  • Some of the hype has died down and people are more pragmatic about using the right tools to solve their problems. Customers are excited about particular solutions to the problems they are trying to solve rather than focusing on the most recent solution to be rolled out. People are savvier and pragmatic about what tools are good for.
  • There’s been a real shift towards matching the right database tool for the right database job, and the number of databases that teams use is dramatically increasing. Relatedly, databases are also more and more niche (e.g. time series databases, CockroachDB, etc).

Other 

  • The emergence of databases and technology to deal with unstructured data. Traditionally the database world managed structured data with a relational database. The other one is databases opening themselves up to tools like Python and R for data science and machine learning. Combining data science tools with databases has been a big theme we have seen.
  • In the past few years, we see major adoption in geospatial data management. Almost every database vendor (IBM, Oracle, and MS) has support for spatial data. NoSQL (or Document) databases are seeing an increase in adoption too, to handle lots of those pictures/photos that we (mobile users) share online!
  • As storage speed and capacity of SSD drives have increased, it’s opened a lot of doors to concentrate on the data and what to do with it. Because data is growing at such a rapid pace, databases are seen as more than tools — they are strategic elements in managing change and growth.
  • How to handle hybrid data. Modern use cases like customer journey and hyper-personalization are something that’s become important. For all of these use cases, you need behavioral, social, and transactional data. How you integrate this data to solve specific business problems to come up with good recommendation engines is the key. Maniacal focus on solving the hybrid data problem where data different data from different locations separate compute storage from performance while working at elastic scale. Ease of use is going to be a huge focus in the future but it’s not there yet.
  • The realization that there’s a new set of database requirements in the SQL relational model with the release of Google Cloud Spanner and Cockroach DB.
  • 1) A lot of the NoSQL started offering SQL because that’s what people want. You don’t have to learn the nuances of the query languages. 2) Other than the niche products giving you features provided by the traditional players there are specialty products AWS came out with a blockchain database.
  • There is so much data generated that being able to consume and query has become the main asset over the past year or so, as results need to get closer and closer to real time. Edge storage has really become a thing, and there have been some impacts in the open source community.
  • Earlier, databases were used for more transactional workloads. MongoDB used transactions so that a mobile application can update a bunch of records atomically or none at all. Neo4j used transactions so that you can accurately update a set of graph edges. In the last two years, we are seeing more people use a database solely for fast analytics. Elasticsearch is trying to move over from log analytics to search analytics. We're pushing ahead with a strong focus on event analytics.
  • Databases are being made easier to work with today. When developers don’t have to worry hugely about schemas, scaling, and performance, they can focus in on what they do best; writing great code! Nowadays, the leading-edge databases and data grids are also self-managing, self-healing and can scale elastically triggered by the demands of the business.
  • Ability to handle streaming data and the democratization of the location of data. Now people have sensor and mobile data to look at streaming data over time. Move from training ML algorithms to run against the data and write to tables immediately.
  • 1) More connectivity to external forces, tables that sit on top an object store like S3. 2) Expanded capabilities to enable customers to process data. 3) Processing data within the database. 4) On the DataDevOps side, self-tuning automation with fewer DBAs needed with self-patch and upgrades. More databases being used rather than companies just selecting one.

Here are the contributors of insight, knowledge, and experience: