Geek Reading – Cloud, SQL, NoSQL, HTML5
I have talked about human filters and my plan for digital curation. These items are the fruits of those ideas, the items I deemed worthy from my Google Reader feeds. These items are a combination of tech business news, development news and programming tools and techniques.
- Real-Time Ad Impression Bids Using DynamoDB (Amazon Web Services Blog)
- The mother of all M&A rumors: AT&T, Verizon to jointly buy Vodafone (GigaOM)
- Is this the future of memory? A Hybrid Memory Cube spec makes its debut. (GigaOM)
- Dew Drop – April 2, 2013 (#1,518) (Alvin Ashcraft's Morning Dew)
- Rosetta Stone acquires Livemocha for $8.5m to move its language learning platform into the cloud (The Next Web)
- Double Shot #1098 (A Fresh Cup)
- Extending git (Atlassian Blogs)
- A Thorough Introduction To Backbone.Marionette (Part 2) (Smashing Magazine Feed)
- 60 Problem Solving Strategies (Javalobby – The heart of the Java developer community)
- Why asm.js is a big deal for game developers (HTML5 Zone)
- Implementing DAL in Play 2.x (Scala), Slick, ScalaTest (Javalobby – The heart of the Java developer community)
- “It’s Open Source, So the Source is, You Know, Open.” (Javalobby – The heart of the Java developer community)
- How to Design a Good, Regular API (Javalobby – The heart of the Java developer community)
- Scalding: Finding K Nearest Neighbors for Fun and Profit (Javalobby – The heart of the Java developer community)
- The Daily Six Pack: April 2, 2013 (Dirk Strauss)
- Usually When Developers Are Mean, It Is About Power (Agile Zone – Software Methodologies for Development Managers)
- Do Predictive Modelers Need to Know Math? (Data Mining and Predictive Analytics)
- Heroku Forces Customer Upgrade To Fix Critical PostgreSQL Security Hole (TechCrunch)
- DYNAMO (Lambda the Ultimate – Programming Languages Weblog)
- FitNesse your ScalaTest with custom Scala DSL (Java Code Geeks)
- LinkBench: A database benchmark for the social graph (Facebook Engineering's Facebook Notes)
- Khan Academy Checkbook Scaling to 6 Million Users a Month on GAE (High Scalability)
- Famo.us, The Framework For Fast And Beautiful HTML5 Apps, Will Be Free Thanks To “Huge Hardware Vendor Interest” (TechCrunch)
- Why We Need Lambda Expressions in Java – Part 2 (Javalobby – The heart of the Java developer community)
I hope you enjoy today’s items, and please participate in the discussions on those sites.
CockroachDB TIL: Volume 7
This is my series of articles covering short "today I learned" topics as I work with CockroachDB. Read the previous installments:
Topic 1: Show Last Query Statistics
CockroachDB has a very user-friendly and helpful UI called DB Console. I like to refer to it when I debug query performance. There is a very useful Statements page that shows a statement overview and explains plans and execution stats.
PostgreSQL EXPLAIN – What Are the Query Costs?
Understanding the Postgres EXPLAIN Cost
EXPLAIN
is very useful for understanding the performance of a Postgres query. It returns the execution plan generated by the PostgreSQL query planner for a given statement. The EXPLAIN
command specifies whether the tables referenced in a statement will be searched using an index scan or a sequential scan. When reviewing the output of EXPLAIN
the command, you'll notice the cost statistics, so it’s natural to wonder what they mean, how they’re calculated, and how they’re used. In short, the PostgreSQL query planner estimates how much time the query will take (in an arbitrary unit), with both a startup cost and a total cost for each operation. More on that later. When it has multiple options for executing a query, it uses these costs to choose the cheapest, and therefore hopefully fastest, option.
What Unit Are the Costs In?
The costs are in an arbitrary unit. A common misunderstanding is that they are in milliseconds or some other unit of time, but that’s not the case. The cost units are anchored (by default) to a single sequential page read costing 1.0 units (seq_page_cost
). Each row processed adds 0.01 (cpu_tuple_cost
), and each non-sequential page read adds 4.0 (random_page_cost
). There are many more constants like this, all of which are configurable. That last one is a particularly common candidate, at least on modern hardware. We’ll look into that more in a bit.
How to Set Up and Run PostgreSQL Change Data Capture
The architecture of modern web applications consists of several software components such as dashboards, analytics, databases, data lakes, caches, search, etc.
The database is usually the core part of any application. Real-time data updates keep disparate data systems in continuous sync and respond quickly to new information. So how to keep your application ecosystem in sync? How do these other components get information about changes in the database? Change Data Capture or CDC refers to any solution that identifies new or changed data.
Using PGBouncer With CockroachDB Serverless
Given CockroachDB scales with vCPU, there's a hard limit to how many active connections we can support per vCPU before a serious problem arises. PGBouncer stretches the limits a bit making it a cost-effective option. In serverless architectures, there is no client-side connection pooling, and using middleware like PGBouncer can alleviate the problem of connection storms. Please see my previous articles on the topic for more details.
Previous Articles
- Using PGBouncer with CockroachDB
- Using PGBouncer with Cockroach Cloud Free Tier
- Exploring PGBouncer auth_type(s) with CockroachDB
Motivation
We've covered how to deploy PGBouncer with a self-hosted CockroachDB cluster. Today, I'm going to demonstrate how to run PGBouncer along with the Cockroach Cloud free-forever tier database. The overall concepts are identical, but we will highlight some of the major differences in deploying PGBouncer with a cloud product.
CockroachDB TIL: Volume 6
This is my series of articles covering short "Today I learned" topics as I work with CockroachDB.
Topic 1: Cockroach Init Container
Use init
container to initialize Cockroach and exit, you no longer need to explicitly run init
when you bring up CockroachDB.
Postgres Connection Pooling and Proxies
One essential concept that every backend engineer should know is connection pooling. This technique can improve the performance of an application by reducing the number of open connections to a database. Another related term is "proxies," which help us implement connection pools.
In this article, we'll discuss connection pooling, implementing it in Postgres, and how proxies fit in. We'll do this while also examining some platform-specific considerations.
Hooks: The Secret Feature Powering the Postgres Ecosystem
What do developers mean when they say Postgres is "extensible"? They're referring to low-level APIs which can change the core functionality of the database. In this post, we will explore a secret - meaning undocumented - feature called hooks which allow developers to not only add features to Postgres, but modify the way that queries are executed and data is parsed.
Hooks are used by some of the most popular projects in the Postgres ecosystem, including Timescale, pg_stat_statements, and Supabase. To give you ideas of what can be built, we will see how each of these projects uses specific hooks. At the end of this post, we provide a Makefile, some C code, and compilation instructions to get started customizing Postgres on your own.
DNS Gotchas With CockroachDB and GSS-API
We just pushed a new release of CockroachDB and Postgres also had a recent release with a vulnerability impacting GSS. I figured it was a good time as any to update my repos with the latest versions of Postgres and Cockroach, and thereby test that everything works. I discovered an issue that is easily fixed but changes the behavior of cockroach
and psql
clients.
High-Level Steps
- Start a three-node CockroachDB cluster in Docker with GSSAPI.
- Demonstrate the problem scenario.
- Verify.
Step by Step Instructions
Start a Cluster
There's nothing more special about this tutorial than what was covered in my previous tutorials. Feel free to set up a stand-alone environment to follow along or use my docker-compose environment.
PostgreSQL Rewrite Rules
Since version 10, PostgreSQL supports the concept of rewrite rules, which allow you to change how queries are executed.
In fact, rewrite rules are how views are implemented in PostgreSQL. When you access a view, your query is actually rewritten according to the rule that was created when you created the view. A view is basically a rewrite rule, as we're about to see.
Configure SSO for CockroachDB Dedicated With Microsoft Identity Platform and OpenID
Motivation
CockroachDB Dedicated is a fully-managed, reserved CockroachDB cluster ideal for a cloud database. We frequently get asked how to set up SSO for the individual CockroachDB Dedicated clusters and we have a detailed tutorial to walk you through that with a local, self-hosted cluster. What was unclear was that you can use the same steps to set up SSO with Dedicated. Based on this detailed document, CockroachDB Dedicated supports OIDC authentication for the DB Console today. In a future release, we are going to bring OIDC integration across the stack. Today, we're going to provide details on how to leverage OIDC specifically with the CockroachDB Dedicated DB Console and Microsoft using the OpenID Connect protocol.
Previous Articles on OpenID Connect
High-Level Steps
- Provision a dedicated cluster
- Configure Microsoft identity platform integration
- Configure CockroachDB with the OpenID details
- Verify
Step-by-Step Instructions
Provision a Dedicated Cluster
Follow this tutorial to set up a Dedicated cluster.
Using a Custom CockroachDB Image With Docker and Kubernetes
Motivation
Cockroach Labs ships new images when a new maintenance release is available, typically on monthly basis. CockroachDB does not rely on the base OS image for any third-party libraries except for geospatial and kerberos packages. That said, OS images may be vulnerable to security exposures depending on their age. Given CockroachDB is written in Golang, replacing the OS can be a trivial task.
Originally, CockroachDB image was shipped with a Debian OS image. CRL then switched to UBI to accommodate a wider scope of use cases including Red Hat OpenShift. UBI images are shipped regularly with CVE patches and given the nature of security vulnerabilities, it is common that the latest and greatest images may or may not have CVEs. Given the preamble, we're going to cover how to replace the base image for CockroachDB and use it in Kubernetes.
Using CockroachDB Workloads With Kerberos
GSSAPI authentication is becoming increasingly popular as CockroachDB starting to make inroads in Fortune 2000 customer bases and financial services segment. That said, ecosystem coverage for GSS needs to improve for parity with other authN methods. Today, we are providing a workaround and a look at the future. By the way, do you realize this is my 15th article on Kerberos and CockroachDB?
Articles Covering CockroachDB and Kerberos
I find the topic of Kerberos very interesting and my colleagues commonly refer to me for help with this complex topic. I am by no means an expert at Kerberos, I am however familiar enough with it to be dangerous. That said, I've written multiple articles on the topic which you may find below:
CockroachDB With GSSAPI Deployed via Systemd
Articles Covering Topics on CockroachDB and Kerberos
I find the topic of Kerberos very interesting and my colleagues commonly refer to me for help with this complex topic. I am by no means an expert at Kerberos, I am however familiar enough with it to be dangerous. That said, I've written multiple articles on the topic which you may find below:
- CockroachDB With MIT Kerberos
- CockroachDB With Active Directory
- CockroachDB With MIT Kerberos and Docker Compose
- Executing CockroachDB table import via GSSAPI
- CockroachDB With SQLAlchemy and MIT Kerberos
- CockroachDB With MIT Kerberos Cert User Authentication
- CockroachDB with Django and MIT Kerberos
- CockroachDB With Kerberos and Custom Service Principal Name (SPN)
- Simplifying CockroachDB Kerberos Architecture With a Load Balancer
- CockroachDB With MIT Kerberos Using a Native Client
- CockroachDB With Mixed Kerberos and Certificates Authentication
Motivation
Systemd has become a standard approach for deploying Linux services. We have documentation to deploy CockroachDB via systemd, however, we do not have steps documented to deploy CockroachDB with GSSAPI and systemd. This tutorial attempts to bridge that gap.
Automating SQL User Generation and Password Rotation With CockroachDB
Motivation
As with most of my tutorials, topic ideas come from user inquiries. I see this question come up quite often and we don't have a documented approach to bridge the gap today. Cockroach Labs engineering is hard at work to build an API that will make this point moot but until then this can be a viable alternative. It is primarily directed at our cloud offering where we rely on password authentication today. There are also cases where password authentication serves other purposes and we need ways to automate the provisioning of passwords other than ALTER USER username WITH PASSWORD "password";
command.
You can take this approach and incorporate it into your CI/CD pipelines to onboard new users and manage their passwords in absence of certificate-based authentication and its associated revocation mechanisms or directory services and its password management capabilities.
Using Azure Load Balancer With CockroachDB
Motivation
The purpose of this tutorial is to provide step-by-step instructions in getting an Azure Load Balancer up quickly. Our docs do a great job at covering the CockroachDB portion but the granular steps to get ALB up are missing. Since this is my first foray into managed load balancers, I decided to do the hard work.
High-Level Steps
- Provision a cluster in Azure
- Provision a load balancer
- Test connectivity
- Clean up
Step by Step Instructions
This article assumes you've set up a Resource Group and a Virtual Network associated with it in your Azure subscription. Following this document will walk you through setting up a CockroachDB cluster. When you have these prerequisites in place, we can continue with setting up a load balancer.
Configure Single Sign-On for CockroachDB Dedicated With Google OAuth
Motivation
CockroachDB Dedicated is a fully-managed, reserved CockroachDB cluster ideal for a cloud database. We frequently get asked how to set up SSO for the individual CockroachDB Dedicated clusters and we have a detailed tutorial to walk you through that with a local, self-hosted cluster. What was unclear was that you can use the same steps to set up SSO with Dedicated. Based on this detailed document, CockroachDB Dedicated supports OIDC authentication. Today, we're going to provide details on how to leverage OIDC specifically with the Dedicated offering.
High-Level Steps
- Provision Dedicated cluster
- Configure OAuth Client ID
- Configure CockroachDB with the OAuth details
- Verify
Step by Step Instructions
Provision Dedicated Cluster
Follow this tutorial to set up a Dedicated cluster.