Using Azure Load Balancer With CockroachDB

Motivation

The purpose of this tutorial is to provide step-by-step instructions in getting an Azure Load Balancer up quickly. Our docs do a great job at covering the CockroachDB portion but the granular steps to get ALB up are missing. Since this is my first foray into managed load balancers, I decided to do the hard work.

High-Level Steps

  • Provision a cluster in Azure
  • Provision a load balancer
  • Test connectivity
  • Clean up

Step by Step Instructions

This article assumes you've set up a Resource Group and a Virtual Network associated with it in your Azure subscription. Following this document will walk you through setting up a CockroachDB cluster. When you have these prerequisites in place, we can continue with setting up a load balancer.

Configure Single Sign-On for CockroachDB Dedicated With Google OAuth

Motivation

CockroachDB Dedicated is a fully-managed, reserved CockroachDB cluster ideal for a cloud database. We frequently get asked how to set up SSO for the individual CockroachDB Dedicated clusters and we have a detailed tutorial to walk you through that with a local, self-hosted cluster. What was unclear was that you can use the same steps to set up SSO with Dedicated. Based on this detailed document, CockroachDB Dedicated supports OIDC authentication. Today, we're going to provide details on how to leverage OIDC specifically with the Dedicated offering.

High-Level Steps

  • Provision Dedicated cluster
  • Configure OAuth Client ID
  • Configure CockroachDB with the OAuth details
  • Verify

Step by Step Instructions

Provision Dedicated Cluster

Follow this tutorial to set up a Dedicated cluster.

Configure Single Sign-On for CockroachDB Dedicated With Okta

Motivation

CockroachDB Dedicated is a fully-managed, reserved CockroachDB cluster ideal for a cloud database. We frequently get asked how to set up SSO for the individual CockroachDB Dedicated clusters and we have a detailed tutorial to walk you through that with a local, self-hosted cluster. 

What was unclear was that you can use the same steps to set up SSO with Dedicated. Based on this detailed document, CockroachDB Dedicated supports OIDC authentication. Today, we're going to provide details on how to leverage OIDC specifically with the Dedicated offering and Okta OIDC.

How to Secure a Previously Insecure Cluster in Production

Cockroach Labs does not recommend running an insecure cluster in production. There are only a few additional steps necessary to secure an instance, so why do it? Convenience, you say. It can hurt you down the line but fret not, this article will demonstrate how to fix this. We are going to follow the standard insecure cluster start-up procedure. Once complete, we're going to flip to the documentation for a secure cluster to turn each node on with security enabled. Here's a handy video of the procedure in action:

Step by step instructions are below:

Simplifying CockroachDB Kerberos Architecture With a Load Balancer

Today, I'm going to try to simplify our architecture or at least management of Kerberos artifacts as they relate to CockroachDB by introducing a load balancer. Given the presence of LB, we can obfuscate the CockroachDB cluster architecture from Kerberos and ease the management of Kerberos keytabs as well as service principal names.

Articles Covering CockroachDB and Kerberos

I find the topic of Kerberos very interesting and my colleagues commonly refer to me for help with this complex topic. I am by no means an expert at Kerberos, I am however familiar enough with it to be dangerous. That said, I've written multiple articles on the topic which you may find below:

CockroachDB With Django and MIT Kerberos

Today, I'm going to talk about the means of using Django with a kerberized CockroachDB and what that entails. This is not uncommon in a production use case and expecting enterprise-grade access to development frameworks is table stakes for some of our customers.

Articles Covering CockroachDB and Kerberos

I find the topic of Kerberos very interesting and my colleagues commonly refer to me for help with this complex topic. I am by no means an expert at Kerberos, I am however familiar enough with it to be dangerous. That said, I've written multiple articles on the topic which you may find below:

Import data into CockroachDB with Kerberos authentication

Articles Covering CockroachDB and Kerberos

I find the topic of Kerberos very interesting and my colleagues commonly refer to me for help with this complex topic. I am by no means an expert at Kerberos, I am however familiar enough with it to be dangerous. That said, I've written multiple articles on the topic which you may find below:

I was recently asked by a customer whether GSSAPI gets in the way of doing a table import in CockroachDB. The short answer is it shouldn't as GSSAPI is abstracted from any bulk-io operations. I've previously written articles on doing an import into Cockroach, here and here and encourage you to review those articles. So today we're going to focus on specifically the import with Kerberos.

Data Federation With CockroachDB and Presto

Motivation

A customer inquired whether data federation is possible natively in CockroachDB. Unfortunately, CockroachDB does not support features like foreign data wrappers and such. A quick search returned a slew of possibilities and Presto being a prominent choice, sparked my interest.

High-Level Steps

  • Install Presto
  • Configure Postgresql catalog
  • Configure TPCH catalog
  • Verify
  • Wrap up

Step by Step Instructions

Install Presto

I'm using a Mac and luckily there's a homebrew package available.

Exploring CockroachDB With Flyway Schema Migration Tool

Today, I am going to quickly introduce you to Flyway and some of the new capabilities in CockroachDB leveraging schema migrations. This is by no means a deep-dive on Flyway, for that, I highly recommend you get familiar with Flyway's documentation. With that, let's dive in.

I will continue to use a docker-compose environment for the following tutorial as it fits nicely with the iterative model of development and deployment with schema migration tools. We will need a recent CockroachDB image. My current folder tree looks like so:

Exploring CockroachDB with ipython-sql and Jupyter Notebook

Today, I will demonstrate how ipython-sql can be leveraged in querying CockroachDB.  This will require a secure instance of CockroachDB for the reasons I will explain below. 

Running a secure docker-compose instance of CRDB is beyond the scope of this tutorial. Instead, I will publish everything you need to get through the tutorial in my repo, including the Jupyter Notebook. You may also use CRDB docs to stand up a secure instance and change the URL in the notebook to follow along.

This post will dive deeper into the Python ecosystem and build on my previous Python post. Instead of reliance on pandas alone, we're going to use a popular SQL extension called ipython-sql, a.k.a. SQLmagic to execute SQL queries against CRDB.


As stated earlier, we need to use a secure instance of CockroachDB. In fact, from this point forward, I will attempt to write posts only with secure clusters, as that's the recommended approach. Ipython-sql uses sqlalchemy underneath and it expects database URLs in the format postgresql://username:password@hostname:port/dbname. CockroachDB does not support password fields with insecure clusters, as passwords alone will not protect your data.

Import Data From Hadoop Into CockroachDB

CockroachDB can natively import data from HTTP endpoints, object storage with respective APIs, and local/NFS mounts. The full list of supported schemes can be found here.

It does not support the HDFS file scheme and we're left to our wild imagination to find alternatives.
As previously discussed, the Hadoop community is working on Hadoop Ozone, a native scalable object store with S3 API compatibility. For reference, here's my article demonstrating CockroachDB and Ozone integration. The limitation here is that you need to run Hadoop 3 to get access to it. 

What if you're on Hadoop 2? There are several choices I can think of off the top of my head. One approach is to expose webhdfs and IMPORT using an http endpoint. The second option is to leverage previously discussed Minio to expose HDFS via HTTP or S3. Today, we're going to look at both approaches.

My setup consists of a single-node pseudo-distributed Hadoop cluster with Apache Hadoop 2.10.0 running inside a VM provisioned by Vagrant. Minio runs as a service inside the VM and CockroachDB is running inside a docker container on my host machine.
  • Information on CockroachDB can be found here.
  • Information on Hadoop Ozone can be found here.
  • Information on Minio can be found here.
  1. Upload a file to HDFS.

I have a CSV file I created with my favorite data generator tool, Mockaroo.

curl "https://api.mockaroo.com/api/38ef0ea0?count=1000&key=a2efab40" > "part5.csv"
hdfs dfs -mkdir /data
hdfs dfs -chmod -R 777 /data
hdfs dfs -put part5.csv /data


CockroachDB With Kerberos and Docker Compose

Articles covering CockroachDB and Kerberos

I find the topic of Kerberos very interesting and my colleagues commonly refer to me for help with this complex topic. I am by no means an expert at Kerberos, I am however familiar enough with it to be dangerous. That said, I've written multiple articles on the topic which you may find below:

As our customers are increasing their footprint and considering production use cases, I'm being asked to walk through the typical steps of enabling Kerberos auth for Cockroach. As this process is pretty heavy-handed, a few of us at Cockroach had sought out a repeatable process in getting an on-demand environment quickly and efficiently. CockroachDB source code is a good starting point for learning the inner workings of CRDB. I knew there were compose recipes available with Kerberos to test through the integration but typically they are written for Go language tests. We decided to introduce our own docker compose with nothing but a Kerberos realm, an instance of CockroachDB, and a Postgres client to connect with. The last part is only necessary for a bit longer as we're actively working on building GSSAPI support into the cockroach CLI.

CockroachDB CDC Using Minio as Cloud Storage Sink – Part 3

This is the third in the series of tutorials on CockroachDB and Docker Compose. Today, we’re going to explore CDC capability in CockroachDB Enterprise Edition using Minio object store as a sink. To achieve this, we’re going to reuse the compose file from the first two tutorials and finally bring this to a close. Without further ado

You can find the first post here and the second post here.

Running CockroachDB With Docker Compose and Minio – Part 2

CockroachDB, Docker Compose, and Minio

This is my second post on creating a multi-service architecture with docker-compose. We're building a microservice architecture with CockroachDB writing changes in real-time to an S3 bucket in JSON format. S3 bucket is served by a service called Minio. It can act like an S3 appliance on-premise or serve as a local gateway to your cloud storage.

You can find the first post here.

Running CockroachDB With Docker Compose – Part 1

Over the next few weeks, I will be publishing a series of tutorials on CockroachDB and various third-party tools to demonstrate how easy it is to integrate CockroachDB into your daily work. Today, we're covering docker-compose and a single node Cockroach cluster as that will be a foundation for the next blog post.

CockroachDB and Docker Compose

This is the first in a series of tutorials on CockroachDB and Docker Compose.

Configuring CockroachDB With Active Directory

Today, I'm going to cover CockroachDB Active Directory integration. Under the covers, Cockroach utilizes GSSAPI. Today, Cockroach only supports user mapping. It does not support user sync between AD OU to a Cockroach role.

My lab environment consists of an AD controller running Windows Server 2016 as a VirtualBox VM and a Vagrant VM with CentOS 7 hosting CockroachDB. The VMs share a host-only network.  This was critical in my setup so that my Cockroach node could interact with AD on port 88.

Loading Thousands of Tables in Parallel With Ray Into CockroachDB Because Why Not?

I came across an interesting scenario working with one of our customers. They are using a common data integration tool to load hundreds of tables into CockroachDB simultaneously. They reported an issue that their loads fail intermittently due to an unrecognized error. As a debug exercise I set out to write a script to import data from an http endpoint into CRDB in parallel. 

Disclosure: I do not claim to be an expert in CRDB, Python, or anything else for that matter. This is an exercise in answering a why not? question more so than anything educational. I wrote a Python script to execute an import job and need to make sure it executes in parallel to achieve the concurrency scenario I've originally set out to do.