Yitaek Hwang | The Blog Pros

October 19, 2023

The State of Kubernetes: Self-Managed vs. Managed Platforms

This is an article from DZone's 2023 Kubernetes in the Enterprise Trend Report.

For more:

Read the Report

Kubernetes celebrates its ninth year since the initial release this year, a significant milestone for a project that has revolutionized the container orchestration space. During the time span, Kubernetes has become the de facto standard for managing containers at scale. Its influence can be found far and wide, evident from various architectural and infrastructure design patterns for many cloud-native applications.

April 4, 2023

Processing Time Series Data With QuestDB and Apache Kafka

Apache Kafka is a battle-tested distributed stream-processing platform popular in the financial industry to handle mission-critical transactional workloads. Kafka’s ability to handle large volumes of real-time market data makes it a core infrastructure component for trading, risk management, and fraud detection. Financial institutions use Kafka to stream data from market data feeds, transaction data, and other external sources to drive decisions.

A common data pipeline to ingest and store financial data involves publishing real-time data to Kafka and utilizing Kafka Connect to stream that to databases. For example, the market data team may continuously update real-time quotes for a security to Kafka, and the trading team may consume that data to make buy/sell orders. Processed market data and orders may then be saved to a time series database for further analysis.

February 20, 2023

Source Code Management for GitOps and CI/CD

This is an article from DZone's 2023 DevOps Trend Report.

For more:

Read the Report

GitOps has taken the DevOps world by storm since Weaveworks introduced the concept back in 2017. The idea is simple: use Git as the single source of truth to declaratively store and manage every component for a successful application deployment. This can include infrastructure-as-code (e.g., Terraform, etc.), policy documents (e.g., Open Policy Agent, Kyverno), configuration files, and more. Changes to these components are captured by Git commits and trigger deployments via CI/CD tools to reflect the desired state in Git.

February 7, 2023

Data Integration Strategies for Time Series Databases

As digital transformation reaches more industries, the number of data points generated is growing exponentially. As such, data integration strategies to collect such large volumes of data from different sources in varying formats and structures are now a primary concern for data engineering teams. Traditional approaches to data integration, which have largely focused on curating highly structured data into data warehouses, struggle to deal with the volume and heterogeneity of new data sets.

Time series data present an additional layer of complexity. By nature, the value of each time series data point diminishes over time as the granularity of the data loses relevance as it gets stale. So it is crucial for teams to carefully plan data integration strategies into time series databases (TSDBs) to ensure that the analysis reflects the trends and situation in near real-time.

In this article, we’ll examine some of the most popular approaches to data integration for TSDBs:

January 10, 2023

Change Data Capture With QuestDB and Debezium

Modern data architecture has largely shifted away from the ETL (Extract-Transform-Load) paradigm to the ELT (Extract-Load-Transform) paradigm where raw data is first loaded into a data lake before transformations are applied (e.g., aggregations, joins) for further analysis. Traditional ETL pipelines were hard to maintain and relatively inflexible with changing business needs. As new cloud technologies promised cheaper storage and better scalability, data pipelines could move away from pre-built extractions and batch uploads to a more streaming architecture.

Change data capture (CDC) fits nicely into this paradigm shift where changes to data from one source can be streamed to other destinations. As the name implies, CDC tracks changes in data (usually a database) and provides plugins to act on those changes. For event-driven architectures, CDC is especially useful as a consistent data delivery mechanism between service boundaries (e.g., Outbox Pattern). In a complex microservice environment, CDC helps to simplify data delivery logic by offloading the burden to the CDC systems.

November 18, 2022

A Deep Dive Into Distributed Tracing

This is an article from DZone's 2022 Performance and Site Reliability Trend Report.

For more:

Read the Report

Distributed tracing, as the name suggests, is a method of tracking requests as it flows through distributed applications. Along with logs and metrics, distributed tracing makes up the three pillars of observability. While all three signals are important to determine the health of the overall system, distributed tracing has seen significant growth and adoption in recent years.

October 22, 2022

Advanced Guide to Helm Charts for Package Management in Kubernetes

This is an article from DZone's 2022 Kubernetes in the Enterprise Trend Report.

For more:

Read the Report

Helm is undoubtedly one of the most popular and successful open-source projects in the Kubernetes ecosystem. Since it was introduced at the inaugural KubeCon in 2015, Helm's usage has steadily grown to solidify its status as the de facto package manager for Kubernetes. It graduated from the Cloud Native Computing Foundation in 2020, and in the CNCF Survey 2020, more than 60% of the survey respondents reported Helm as their preferred method for packaging Kubernetes applications. This rates significantly higher than Kustomize and other managed Kubernetes offerings.

March 26, 2022

Zero to Hero on Kubernetes With Devtron

One of the hot keywords in the DevOps space is AppOps. As the DevOps ecosystem matures, the focus is shifting from automation and continuous delivery to enriching the developer experience. AppOps takes an app-centric approach to enable developers with self-service tools to develop, deploy, and operate applications on modern, cloud-native platforms. While we have seen a proliferation of great open-source tools to achieve parts of this goal in recent years, creating a seamless experience that spans over CI/CD, security, cost management, and observability remains a challenging task.

Devtron is an open-source tool that pulls together a number of popular components such as ArgoCD, Clair, external secrets, and minio to bootstrap a fully managed application delivery platform on Kubernetes. Underneath the hood, it leverages GitOps principles to create sample CI/CD pipelines, integrated with security scanning and observability tools via a slick application dashboard. For teams looking to adopt Kubernetes at scale, Devtron offers a quick way to provide developers and platform teams a way to onboard their applications onto Kubernetes without having to fumble with various YAML files and piecing together complex tools.

January 28, 2022

Kubernetes Security Guide: High-Level K8s Hardening Guide

This is an article from DZone's 2021 Kubernetes and the Enterprise Trend Report.

For more:

Read the Report

As more organizations have begun to embrace cloud-native technologies, Kubernetes adoption has become the industry standard for container orchestration. This shift toward Kubernetes has largely automated and simplified the deployment, scaling, and management of containerized applications, providing numerous benefits over legacy management protocols for traditional monolithic systems. However, securely managing Kubernetes at scale comes with a unique set of challenges, including hardening the cluster, securing the supply chain, and detecting threats at runtime.

July 21, 2021

Comparing InfluxDB, TimescaleDB, and QuestDB Time Series Databases

We're living in the golden age of databases, as money flows into the industry at historical rates (e.g., Snowflake, MongoDB, Cockroach Labs, Neo4j). If the debate between relational vs. non-relational or online analytical processing (OLAP) vs. online transaction processing (OLTP) ruled the past decade, a new type of database has been steadily growing in popularity. According to DB-Engines, an initiative to collect and present information on database management systems, time series databases are the fastest growing sector since 2020:

Why Use a Time Series Database?

Time series databases (TSDB) are databases optimized to ingest, process, and store timestamped data. Such data may include metrics from servers and applications, readings from IoT sensors, user interaction on a website or an app, or trading activity on financial markets.

April 5, 2021

Running QuestDB on GKE Autopilot

Recently, I’ve been experimenting with QuestDB as the primary time-series database to stream and analyze IoT/financial data:

While I was able to validate the power of QuestDB in storing massive amounts of data and querying them quickly in those two projects, I was mostly running them on my laptop via Docker. In order to scale my experiments, I wanted to create a more production-ready setup, including monitoring and disaster recovery on Kubernetes. So in this guide, we’ll walk through setting up QuestDB on GKE with Prometheus and Velero.

February 20, 2021

Real-time Crypto Tracker with Kafka and QuestDB

“Bitcoin soars past $50,000 for the first time.” — CNN

“Tesla invests $1.5 billion in bitcoin, will start accepting it as payment.” — Washington Post

Not a day goes by without some crypto news stealing the headlines these days. From institutional support of Bitcoin to central banks around the world exploring some form of digital currency, interest in cryptocurrency has never been higher. This is also reflected in the daily exchange volume:

As someone interested in the future of DeFi (decentralized finance), I wanted to better track the price of different cryptocurrencies and store them into a time-series database for further analysis. I found an interesting talk by Ludvig Sandman and Bruce Zulu at Kafka Summit London 2019, 'Using Kafka Streams to Analyze Live Trading Activity for Crypto Exchanges,' so I decided to leverage Kafka and modify it for my own use. In this tutorial, we will use Python to send real-time cryptocurrency metrics into Kafka topics, store these records in QuestDB, and perform moving average calculations on this time-series data with NumPy.

February 19, 2021

Stream Heart Rate Data into QuestDB via Google IoT Core

Background

Thanks to the growing popularity of fitness trackers and smartwatches, more people are tracking their biometrics data closely and integrating IoT into their everyday lives. In my search for a DIY heart rate tracker, I found an excellent walkthrough from Brandon Freitag and Gabe Weiss, using Google Cloud services to stream data from a Raspberry Pi with a heart rate sensor to BigQuery via IoT Core and Cloud Dataflow.

Although Cloud Dataflow supports streaming inserts to BigQuery, I wanted to take this opportunity to try out a new time-series database I came across called QuestDB. QuestDB is a fast open-source time-series database with Postgres and Influx line protocol compatibility. The live demo on the website queries the NYC taxi rides dataset with over 1.6 billion rows in milliseconds, so I was excited to give this database a try. To round out the end-to-end demo, I used Grafana to pull and visualize data from QuestDB.