postgresql | The Blog Pros

February 12, 2022

Configure Single Sign-On for CockroachDB Dedicated With Okta

Motivation

CockroachDB Dedicated is a fully-managed, reserved CockroachDB cluster ideal for a cloud database. We frequently get asked how to set up SSO for the individual CockroachDB Dedicated clusters and we have a detailed tutorial to walk you through that with a local, self-hosted cluster.

What was unclear was that you can use the same steps to set up SSO with Dedicated. Based on this detailed document, CockroachDB Dedicated supports OIDC authentication. Today, we're going to provide details on how to leverage OIDC specifically with the Dedicated offering and Okta OIDC.

February 7, 2022

How to Secure a Previously Insecure Cluster in Production

Cockroach Labs does not recommend running an insecure cluster in production. There are only a few additional steps necessary to secure an instance, so why do it? Convenience, you say. It can hurt you down the line but fret not, this article will demonstrate how to fix this. We are going to follow the standard insecure cluster start-up procedure. Once complete, we're going to flip to the documentation for a secure cluster to turn each node on with security enabled. Here's a handy video of the procedure in action:

Step by step instructions are below:

February 7, 2022

MuleSoft: Connect PostgreSQL Database and Call PostgreSQL Function

What Is PostgreSQL?

PostgreSQL is an open source object-relational database system that uses and extends the SQL language to store and scale complicated data workloads.

A prerequisite to start with this walkthrough is to have a database connector added in Anypoint Studio. You can add these connectors from the Add modules option. Also, you need to have an ElephantSQL account, as we are going to use the PostgreSQL database hosted on the ElephantSQL platform. You can use any other PostgreSQL service provider platform or self-managed PostgreSQL server.

February 7, 2022

Simplifying CockroachDB Kerberos Architecture With a Load Balancer

Today, I'm going to try to simplify our architecture or at least management of Kerberos artifacts as they relate to CockroachDB by introducing a load balancer. Given the presence of LB, we can obfuscate the CockroachDB cluster architecture from Kerberos and ease the management of Kerberos keytabs as well as service principal names.

Articles Covering CockroachDB and Kerberos

I find the topic of Kerberos very interesting and my colleagues commonly refer to me for help with this complex topic. I am by no means an expert at Kerberos, I am however familiar enough with it to be dangerous. That said, I've written multiple articles on the topic which you may find below:

February 2, 2022

Hacking PostgreSQL Internals to Deliver Push Notifications

PostgreSQL announced its latest version (PostgreSQL 14) on September 30th, which includes a bunch of features like pipeline API, gathering statistics on replication slots, query parallelism improvements and so on.

While the origin of PostgreSQL can be traced back to 1986, it has been in active development for the past 30 years. Tons of companies, agnostic of the the type and size, have trusted Postgres over the years and their tagline “The world's most advanced open source relational database” is hardly an overstatement.

February 2, 2022

CockroachDB With Django and MIT Kerberos

Today, I'm going to talk about the means of using Django with a kerberized CockroachDB and what that entails. This is not uncommon in a production use case and expecting enterprise-grade access to development frameworks is table stakes for some of our customers.

Articles Covering CockroachDB and Kerberos

January 29, 2022

CockroachDB With SQLAlchemy and MIT Kerberos

Articles Covering CockroachDB and Kerberos

Part 1: CockroachDB with MIT Kerberos

January 26, 2022

Import data into CockroachDB with Kerberos authentication

Articles Covering CockroachDB and Kerberos

Part 1: CockroachDB with MIT Kerberos
Part 2: CockroachDB With Active Directory
Part 3: CockroachDB With MIT Kerberos and Docker Compose

I was recently asked by a customer whether GSSAPI gets in the way of doing a table import in CockroachDB. The short answer is it shouldn't as GSSAPI is abstracted from any bulk-io operations. I've previously written articles on doing an import into Cockroach, here and here and encourage you to review those articles. So today we're going to focus on specifically the import with Kerberos.

January 22, 2022

Data Federation With CockroachDB and Presto

Motivation

A customer inquired whether data federation is possible natively in CockroachDB. Unfortunately, CockroachDB does not support features like foreign data wrappers and such. A quick search returned a slew of possibilities and Presto being a prominent choice, sparked my interest.

High-Level Steps

Install Presto
Configure Postgresql catalog
Configure TPCH catalog
Verify
Wrap up

Step by Step Instructions

Install Presto

I'm using a Mac and luckily there's a homebrew package available.

January 22, 2022

Postgres UNNEST Cheat Sheet for Bulk Operations

Postgres is normally very fast, but it can become slow (or even fail completely), if you have too many parameters in your queries. When it comes to operating on data in bulk, UNNEST is the only way to achieve fast, reliable queries. This post has examples for using UNNEST to do all types of bulk transactions.

All the examples in this article assume a database schema that looks like:

January 20, 2022

Exploring CockroachDB With Flyway Schema Migration Tool

Today, I am going to quickly introduce you to Flyway and some of the new capabilities in CockroachDB leveraging schema migrations. This is by no means a deep-dive on Flyway, for that, I highly recommend you get familiar with Flyway's documentation. With that, let's dive in.

I will continue to use a docker-compose environment for the following tutorial as it fits nicely with the iterative model of development and deployment with schema migration tools. We will need a recent CockroachDB image. My current folder tree looks like so:

January 17, 2022

Querying SQL Databases With PySpark

SQL is a powerful language that provides a deep understanding of what can and cannot be done with data. SQL excels at bringing order to disorganized, large data sets and helps you discover how distinct data sets are related. Spark is an open-source analytics engine for processing large amounts of data (what you might call "big data").

It allows us to maximize distributed computing when carrying out time-intensive operations on lots of data, or even when building ML models. PySpark is a Python application programming interface that allows us to use Apache Spark in Python. Querying SQL databases with PySpark thus lets us take advantage of Spark's implicit data parallelism and fault tolerance from a Python interface. This gives us the ability to process large quantities of data quickly.

January 11, 2022

How to Develop Your Distributed SQL Statement in Apache ShardingSphere

In the previous articles “An Introduction to DistSQL” and “Integrating SCTL Into DistSQL’s RAL— Making Apache ShardingSphere Perfect for Database Management”, the Apache ShardingSphere committers shared the motivations behind the development of DistSQL, explained its syntax system, and impressively showcased how you can use just one SQL to create a sharding table.

Today, to help you gain a better understanding of DistSQL and develop your own DistSQL syntax, our community author analyzes the design & development process of DistSQL and showcases how you can implement a brand new DistSQL grammar in four stages of the development life cycle (i.e. demand analysis, design, development & testing).

January 7, 2022

Import Data From Hadoop Into CockroachDB

CockroachDB can natively import data from HTTP endpoints, object storage with respective APIs, and local/NFS mounts. The full list of supported schemes can be found here.

It does not support the HDFS file scheme and we're left to our wild imagination to find alternatives.

As previously discussed, the Hadoop community is working on Hadoop Ozone, a native scalable object store with S3 API compatibility. For reference, here's my article demonstrating CockroachDB and Ozone integration. The limitation here is that you need to run Hadoop 3 to get access to it.

What if you're on Hadoop 2? There are several choices I can think of off the top of my head. One approach is to expose webhdfs and IMPORT using an http endpoint. The second option is to leverage previously discussed Minio to expose HDFS via HTTP or S3. Today, we're going to look at both approaches.

My setup consists of a single-node pseudo-distributed Hadoop cluster with Apache Hadoop 2.10.0 running inside a VM provisioned by Vagrant. Minio runs as a service inside the VM and CockroachDB is running inside a docker container on my host machine.

Information on CockroachDB can be found here.
Information on Hadoop Ozone can be found here.
Information on Minio can be found here.

Upload a file to HDFS.

I have a CSV file I created with my favorite data generator tool, Mockaroo.

curl "https://api.mockaroo.com/api/38ef0ea0?count=1000&key=a2efab40" > "part5.csv"
hdfs dfs -mkdir /data
hdfs dfs -chmod -R 777 /data
hdfs dfs -put part5.csv /data

January 6, 2022

Inspecting Joins in PostgreSQL

Introduction

Relational databases distribute their data across many tables by normalization or according to business entities. This makes maintaining a growing database schema easier. Real-world queries often span across multiple tables, and hence joining these tables is inevitable.

PostgreSQL uses many algorithms to join tables. In this article, we will see how joins work behind the scenes from a planner perspective and understand how to optimize them.

January 6, 2022

CockroachDB With Kerberos and Docker Compose

Articles covering CockroachDB and Kerberos

Part 1: CockroachDB with MIT Kerberos
Part 2: CockroachDB with Active Directory
Part 3: CockroachDB with MIT Kerberos and Docker Compose
Part 4: CockroachDB with MIT Kerberos and custom SPN
Part 5: Executing CockroachDB table import via GSSAPI
Part 6: CockroachDB, MIT Kerberos, HAProxy, and Docker Compose
Part 7: CockroachDB with Django and MIT Kerberos
Part 8: CockroachDB with SQLAlchemy and MIT Kerberos
Part 9: CockroachDB with MIT Kerberos using a native client
Part 10: CockroachDB with MIT Kerberos along with cert user authentication
Part 11: CockroachDB with GSSAPI deployed via systemd
Part 12: Selecting proper cipher for CockroachDB with GSSAPI
Part 13: Overriding KRB5CCNAME for CockroachDB with GSSAPI

As our customers are increasing their footprint and considering production use cases, I'm being asked to walk through the typical steps of enabling Kerberos auth for Cockroach. As this process is pretty heavy-handed, a few of us at Cockroach had sought out a repeatable process in getting an on-demand environment quickly and efficiently. CockroachDB source code is a good starting point for learning the inner workings of CRDB. I knew there were compose recipes available with Kerberos to test through the integration but typically they are written for Go language tests. We decided to introduce our own docker compose with nothing but a Kerberos realm, an instance of CockroachDB, and a Postgres client to connect with. The last part is only necessary for a bit longer as we're actively working on building GSSAPI support into the cockroach CLI.

January 5, 2022

CockroachDB CDC Using Minio as Cloud Storage Sink – Part 3

This is the third in the series of tutorials on CockroachDB and Docker Compose. Today, we’re going to explore CDC capability in CockroachDB Enterprise Edition using Minio object store as a sink. To achieve this, we’re going to reuse the compose file from the first two tutorials and finally bring this to a close. Without further ado

You can find the first post here and the second post here.

January 1, 2022

Running CockroachDB With Docker Compose and Minio – Part 2

CockroachDB, Docker Compose, and Minio

This is my second post on creating a multi-service architecture with docker-compose. We're building a microservice architecture with CockroachDB writing changes in real-time to an S3 bucket in JSON format. S3 bucket is served by a service called Minio. It can act like an S3 appliance on-premise or serve as a local gateway to your cloud storage.

You can find the first post here.