Databases For Front-End Developers: The Concepts Under The Hood (Part 2)

In Part 1, The Rise Of Serverless Databases, of the “Databases For Front-End Developers” series, we talked about the hurdles and traps of scaling and maintaining your databases. We went from simpler and specialized alternatives like Content Management Systems and spreadsheets to self-hosted databases and, finally, to Serverless Databases.

Today, we go deeper into the rabbit hole. We will explore concepts to equip you to have your own opinions about which kinds of databases suit your specific needs. And this is important to stress up front: there is no right answer. Each database carries its own set of tradeoffs and advantages. If something looks like a “one-size-fits-all” solution, be careful: you might be missing something!

Anatomy Of A Database

Before we begin, it’s important to highlight that what we loosely refer to as “databases” are actually “Database Management Systems (DBMS).” A DBMS is a piece of software that enables the user to more ergonomically write, read, delete, or update information in a given set of data. For this series, we will focus mainly on Relational and Non-Relational DBMSs. There are many other types, all categorized by their data structures, but relational and non-relational are the most common for web development by any measure.

Both Relational (R) and Non-Relational (NR) DBMS have different terms for the parts that compose them. Such components are almost interchangeable in definition, and that’s why you can commonly hear a developer referring to a Document (NR term) as “Table,” which is its Relational equivalent structure. Don’t be afraid of confusing them; they appear often enough for this cognitive overload to disappear quickly with usage. Additionally, once you get more familiar with the differences of each data structure, you will realize they probably shouldn’t be used interchangeably because there are differences among them. But for now, and for the sake of simplicity, let’s focus on the similarities:

  • Schema
    It’s the description of the database (like a blueprint), describing the landscape of the database in a supported language. This is required by relational databases, and though not required by non-relational DBMS, many interfaces offer a possibility to define it too.
  • Tables (R/NR), Collections (NR)
    It’s the logical data structure, unsorted. A relatable example of this would be a spreadsheet table or a group (collection) of JSON objects.
  • Databases (R/NR)
    It’s the logical grouping of data. You group your tables in databases (user database, invoices databases) and your documents in indexes based on how you intend to query them.

Keys And Columns

To use a more familiar example, let’s take a JSON object as an example:

{
  "name": "Atila"
  "role": "DX Engineer at Xata"
}

Given that JSON, a column would be represented by the key-value pair (column name has value “Atila”), and the key follows the same meaning as the JSON spec, key name will access the value “Atila.”

A table has columns, and each column’s name determines the key with which you can access a value in a record.

In addition to the above definition, there are special kinds of keys. Such keys play a special role in the schema of your database and how you will interact with your data. Define these wisely: any minimal change to them can be considered a breaking change to your data layer.

  • Unique Keys (UK)
    The keys that are unique between records on a table. They also accept null.
  • Primary Keys (PK)
    It’s a special kind of Unique Key. There can only be one Primary Key per table, and it can never be null (Primary Keys are always considered required). Primary Keys are also indexed by default, helping with queries that want to filter on the value.
  • Foreign Keys (FK)
    When there is a relation between tables, the keys are represented as foreign keys at the extraneous table.

Example of Foreign Key: if each author in an authors table has posts (and posts is another table), post.title will be a FK at authors. And the junction between authors and posts is called a relation.

Mapping Your DBMS

Once you look at the different data structures and choose your DBMS, you are ready to draw the first connection from your data layer to your application layer. And suddenly, you noticed it isn’t quite straightforward to bring data from the database to your client-side (or even to your server-side API in some cases).

Here comes ORM (Object Relational Mapping) and ODM (Object Document Mapping) to assist your developer experience. Prisma is probably the most widely used ORM at the moment, while Mongoose has the largest ODM user base. It’s important to note they are not a requirement for connecting to databases. Still, if you keep an eye on how they build your queries (some specific cases can present performance issues on the account of the abstraction), they tend to make your life easier and fetching or writing data much more ergonomic.

When it comes to serverless databases, the need for them becomes a bit more questionable. And this is because many of these databases provide the users with an officially supported Software Development Kit (SDK). Your mileage will vary depending on the SDK, but they tend to have a big feature overlap with ORMs and ODMs, especially on databases that will keep the data layer behind an API (Xata, for example). This way, you won’t need to worry about translating your queries, and you can demand equivalent ergonomics between the SDK and an ORM.

Common Concepts

In this part of our series, we are learning what to look for when choosing the stack for our data layers. It is essential to understand common concepts around maintaining and choosing a DBMS (from here on out, we’re back into referring to them as “databases” to remain consistent with the rest of the world).

The next sections will not go so deep that you jump out and find a job as Database Administrator (DBA), but hopefully, it will offer you enough ammo to engage in conversation with experts and identify the best solution for your use cases. These concepts are common for every kind of data layer, from a spreadsheet to a self-hosted database and even to a serverless database. What will vary is how each solution will balance the variables intertwined in these paradigms.

As Thanos (from Marvel: Infinity Wars) would say: “perfectly balanced, like all things should be.”

The most important concepts, to begin with, are Consistency, Availability, and Partition Tolerance. They’re better understood if presented together because the balance between them will guide how predictable your data is in different contexts.

CAP Theorem

This theorem describes the relationship between 3 components in a distributed system: Consistency, Availability, and Partition tolerance (C.A.P). The overall conclusion is easy to summarize: any system is only able to contemplate 2 of these components at the same time. Though just a simple sentence, this idea requires a bit of unpacking.

Consistency (C)

Within the CAP Theorem constraints, “consistency” refers directly to the data. When different clients make the same request, they will get the same response. When a written request is accepted and confirmed, all users will have access to this updated information at the same time.

Availability (A)

Every request will receive a response with data. No errors. However, this comes with no promises on whether the data is up to date or not. Depending on whether this is paired with consistency (C) or partition tolerance (P), you will get different behavior.

Partition Tolerance (P)

A “partition” is when the connection between two nodes in a system is broken. The CAP Theorem essentially describes how a system will tolerate the partition: enforcing availability (AP system) or enforcing consistency (CP system). Regardless of how small a system is, partitions can always happen. Therefore there is no such thing as a CA system.

To provide a more graphical example of each system, we can consider:

  • AP System
    Another node has a copy of the data. The user will get information back but without promises of it being 100% correct (up-to-date). It’s commonly implemented with DynamoDB, Cassandra, and so on.
  • CP System
    To return an error and accept the transaction is impossible. It’s commonly implemented with traditional sharded DBs, Citus, for example.

Though popular and very often referred to, the CAP Theorem can be considered incomplete because it doesn’t consider situations beyond the network partition. Especially today, with high-availability content delivery networks (CDN), and extreme connectivity, it’s crucial to take into consideration latency and deeper aspects of consistency (linearizability and serializability).

Consistency Confusion

It’s funny how “consistency” is a term that switches meanings based on the context. So, in the CAP Theorem, as we just saw, it’s all about data and how reliably a user will get the same response to the same request regardless of which partition they reach. Once we take a step away from our own system, a new perspective makes us ask: “how consistently will we handle concurrent operations?”. And just like that, the CAP Theorem does not sufficiently describe the intricacies of operating at scale.

There is a third meaning of “consistency,” which we will save for later; it’s part of yet another acronym (ACID). If the last paragraph left you scratching your head, I can’t recommend enough “Inconsistent Thoughts on Database Consistency” by Alex DeBrie.

PACELC Theorem

The first three letters are the same as CAP, just reordered. The “consistency” in the PACELC Theorem goes deeper than in the CAP Theorem. It follows the Consistency Model, which determines the contract between a data store and a system. To make things less complicated, consider the PACELC an extension of the CAP Theorem.

Beyond asking the developer to strategize for the event of a network partition, the PACELC also considers what happens when the system has no partitions (healthy network).

ELC stands for Else: Latency or Consistency?

  • Latency: Can you accept an occasional stale response for the sake of performance?
  • Consistency:
    • Linearizability: Will you accept a higher latency to sync data in all nodes of a network before considering the transaction done?
    • Serializability: In the event of concurrent transactions on the same data, will you handle them in parallel or in a queue?

Because of this, I consider the PACELC carries a better mental model for this new era of Serverless Databases. When healthy, it’s not a scalar classification “AP” or “CP”; it accepts a spectrum because latency can be high or not, and consistency can have different levels as well.

Once we start talking about the types of databases and their data structures and which guarantees each can give, we will also talk about a bit of system architecture and how your architecture can reduce the tradeoffs when even by enforcing consistency, you can strive for a low latency scenario.

See You Next Time

With that, I believe we are ready to start narrowing down our discussions on the differences between each database type. Moreover, it’s possible to discuss and level expectations on each solution taken and analyze architectures individually. From now on, we will focus on Relational and Non-Relational databases.

In a few days, on our season finale, we will cover the differences between NoSQL and SQL: what guarantees to expect, what that means to your data, and how that affects development workflow. Then we will be ready to jump right into Serverless Databases and what to expect going forward. I can’t wait!

As usual, feel free to reach out to me with feedback, questions, and/or requests for the next part.

Stargate and Cassandra 4.0 – Better Together

Six years in the making, Apache Cassandra has reached its 4.0 GA release. This is a big milestone for one of the most important open source projects, and a significant step forward in the world of NoSQL and Fast Data.

Highlights in 4.0

This new release brings a lot of exciting features to Apache Cassandra. A few highlights include:

Open-Source SPL Helps Java Handle Files of Open Formats: TXT, CSV, JSON, XML, and XLS

It is common to process data files of open formats like TXT, CSV, JSON, XML, and XLS in Java applications. Hardcoding in Java is extremely complicated, so we often turn to certain ready-for-use open-source packages, but each package has its weaknesses.

Parsing Library

This type of class library enables to read an external file in Java as the latter’s internal object through a simpler coding process than hardcoded data retrieval. Common products include:

Using PHP Headers When Serving JSON Data

Web browsers and other similar applications rely on headers to understand the content being served to them by a web server. While modern browsers will try to “guess” what format content is in and intelligently format it, it’s still good practice to use headers to ensure your application’s output is handled correctly. When sending a JSON Payload, the header is set using the following PHP code:

 
header('Content-type: application/json');


A Case for Databases on Kubernetes from a Former Skeptic

Kubernetes is everywhere. Transactional apps, video streaming services, and machine learning workloads are finding a home on this ever-growing platform. But what about databases? If you had asked me this question five years ago, the answer would have been a resounding “No!” — based on my experience in development and operations. In the following years, as more resources emerged for stateful applications, my answer would have changed to “Maybe,” but always with a qualifier: “It’s fine for development or test environments…” or “If the rest of your tooling is Kubernetes-based, and you have extensive experience…”

But how about today? Should you run a database on Kubernetes? With complex operations and the requirements of persistent, consistent data, let’s retrace the stages in the journey to my current answer: “In a cloud-native environment? Yes!

MongoDB to Couchbase, Part 4: Data Modeling

To Embed or Not to Embed. That is the Question.  - Hamlet

Data modeling is a well-defined and mature field in relational database systems. The model provides a consistent framework for application developers to add and manipulate the data. Everything is great until you need to change. This lack of schema flexibility was a key trigger for NoSQL database systems.  As we've learned before, both MongoDB and Couchbase are JSON-based document databases. JSON gives the developers schema flexibility: the indexes, collections, query engine provide access paths to this data. The developer uses MQL in MongoDB and N1QL in Couchbase to query this data.  Let's compare the modeling methods in Couchbase. Note: This article is short because the modeling options are similar. That's a good thing. Some differences in modeling options, access methods, and optimizations are highlighted below.

MongoDB to Couchbase (Part 3): Data Types

 In the article on MongoDB and Couchbase database objects, we saw the mapping from databases to buckets, collections to collections, documents to documents, and field to fields. The data itself is stored within these fields. The {"field": "value"} is commonly called key-value pairs. Every key has a value and the value is the data. This value self describes the data under  BSON or JSON definition in MongoDB and Couchbase.

MongoDB uses BSON, Couchbase uses JSON -- both are document-oriented and JSON-based. Couchbase uses JSON exactly, MongoDB uses extended JSON called BSON, binary JSON.

Read SAP Tables With RFC_READ_TABLE in Mule 4 Using SAP Connector

In this article, we are going to see how to make an RFC (Remote Function Call) to SAP and what different options are available to get your desired data. We'll also learn how to make use of various operators like AND, OR, IN, and many more, and what the default structure of RFC_READ_TABLE input looks like.

Whenever you are working with SAP, you will surely go with RFC calls and a lot of developers struggle to write XML queries for RFC.

Using SingleStoreDB as a JSON Document Database

Abstract

Continuing our series on the multi-model capabilities of SingleStoreDB, we'll discuss SingleStoreDB's support for JSON data in this article.

We'll build a small inventory system to model an online store that sells various electronic equipment. This example is derived from an excellent tutorial available on DigitalOcean. We'll apply that tutorial to SingleStoreDB, and we'll see that it is effortless to store, retrieve and query JSON data using SingleStoreDB. We'll also build a quick visual front-end to our inventory system using Laravel and PHP.

MongoDB to Couchbase for Developers, Part 1: Architecture

Introduction

With this article on MongoDB to Couchbase comparison, I'm starting a new, detailed series for MongoDB developers to learn Couchbase by comparison. Two years ago, I wrote a single high-level comparative article on MongoDB and Couchbase.  I recommend you read that for a bird's view comparison. Both MongoDB and Couchbase have done multiple major releases since then, and developers need a deeper treatment of topics. As developers know, the devil is in the details :-) 

This series will be similar to the series for Oracle developers to learn Couchbase by comparison, but for MongoDB developers.  

Amazon Dynamo DB Connector Operations Walkthrough in Mule 4, Part 1

Amazon Dynamo DB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability. Dynamo DB lets you offload the administrative burdens of operating and scaling a distributed database so that you don't have to worry about hardware provisioning, setup and configuration, replication, software patching, or cluster scaling.

The following provides a quick example of how to use the “Amazon Dynamo DB Connector Operations – Mule 4” using Anypoint Studio. While there is a lot of documentation available on this connector usage, none could be found that specifically shows how to structure the JSON requests, required for multiple connector operations.

How To Build Web Service Using Spring Boot 2.x

Architecture Contains

  • MVC Architecture
  • JWT Based Authentication
  • Spring Data (JPA)
  • Application User Password Encryption
  • DB password Encryption.
  • SQL Server
  • Slf4j
  • Swagger For API Doc

Repository Contains

  • Application Source code
  • SQL script of Data Base along with key data
  • DB.txt file contains the DB config details.
  • Postman JSON script to test all web services.

Steps to Run Applications

  • Install JDK 11 or the latest version.
  • Clone the Project repository into local.
  • Git Url: https://github.com/VishnuViswam/sample-web-service.git
  • Install SQL server 2012.
  • Create application DB and user
  • Insert the DB key data.
  • Add the decoding key of the database password into the system variables. It is present in the DB.txt file.
  • Sometimes we may need to restart the windows to pick up the updated system variables.
  • Run the project source code.
  • To call the web services, import provided postman JSON scripts into the postman client application.

About Project Configurations

Web-Service Declaration

Each Web-services of the application will be declared in the controller layer.

Example

Java
 
@RequestMapping("/api/v1/user")
@RestController
@Validated
public class UserController {

    private static final Logger logger = LoggerFactory.getLogger(UserController.class);

    @Autowired
    private GeneralServices generalServices;

    @Autowired
    private UserService userService;

    /**
     * Web service to create new user
     *
     * @param httpServletRequest
     * @param user
     * @return
     */
    @PostMapping(consumes = MediaType.APPLICATION_JSON_VALUE, produces = MediaType.APPLICATION_JSON_VALUE)
    public ResponseEntity<Object> createUser(HttpServletRequest httpServletRequest,
                                             @Valid @RequestBody UserCreateModel user) {
        logger.debug("<--- Service to save new user request : received --->");
        ApiSuccessResponse apiResponse = userService.createUser(user, generalServices.getApiRequestedUserId(httpServletRequest));
        logger.debug("<--- Service to save new user response : given --->");
        return ResponseEntity.status(HttpStatus.CREATED).body(apiResponse);

    }

}

  • @RequestMapping("/api/v1/user") annotation is used to mention the category of web service.
  • @RestController annotation will configure the class to receive the rest-full web service call.
  • @PostMapping() annotation will decide the HTTP request type.
  • consumes & consumes tags will decide the content type of the HTTP request and response.

From this "controller layer," API requests will be taken to the service layer. All business logic will be handled here, then it will talk with the database using JPA.

Introduction to Couchbase for Oracle Developers and Experts: Part 6: Indexing

Here are the previous articles comparing architecture, database objects, data types, data modeling, and statements and features of Oracle with Couchbase. This post will focus on indexing.  

"Use the Index, Luke!"  -- Source 

Overview



Oracle

Couchbase

Index Documentation

Index Documentation

Types of Indexes: 

Introduction to Couchbase for Oracle Developers and Experts: Part 5: Statements and Features

SQL is the only 22nd century tool available in 21st century

Here are the previous articles comparing architecture, database objects, data types, and data modeling of Oracle with Couchbase. This will focus on SQL support.  

Oracle was the first and staunch supporter of SQL. Oracle's SQL implementation beat IBM to market by two years.  That changed the fortune of one or two people. :-) All of the modern relational databases implement SQL. So much so, the relational databases are sometimes called SQL databases, much to the chagrin of C. J. Date.  Nations are known by their languages... English, French, and American(!). It's not a stretch for a class of database systems to be known by their languages as well.  SQL has been so effective, many big data and non-relational systems have picked up SQL as the lingua franca. SQL is here to stay, even for NoSQL systems. 

Introduction to Couchbase for Oracle Developers and Experts: Part 4: Data Modeling

There are three things important in the database world: Performance, Performance, and Performance.  Bruce Lindsay

Here’s Part 1, Part 2, and Part 3 of this series.

Let me start with a real-world effect of right modeling on the application performance.  Here's the excerpt from a talk by Amadeus engineers on their customer experience management application (traveler loyalty app) which they migrated from an enterprise RDBMS to Couchbase.  

Introduction to Couchbase for Oracle Developers and Experts: Part 3: Data Types

Article Image

As part of data remodeling, while moving from the relational model to the JSON model, you’ll have to consider the data type mapping. In Oracle, you’ll have to create and declare the types of each column explicitly before you load the data or write queries. In Couchbase, you simply conform to JSON syntax and the data type interpretation is automatic and implicit. Here’s the overview of mappings, conversion, and arithmetic on these data types.

Here’s Part 1 and Part 2 of this series. 

Snapshot - Data Remodeling

ReModeling: From Relational to Document Model


Oracle Couchbase
Model Relational, Object-relational JSON model with N1QL (SQL for JSON)
Data Types A long list of data types https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/datatypes.html
String Data types CHAR, VARCHAR, VARCHAR2, NCHAR, NVARCHAR, LOONGVARCHAR String, up to 20MB in length. All of the string data in JSON is Unicode.
Date and Time Data Types DATE, TIME, TIMESTAMP, all these with TIMEZONE, INTERVAL Date, Time, Timestamp, all these with timezones should be represented in ISO 8601 format and saved as strings. N1QL has extensive functions for formatting, extraction, conversion, and arithmetic.