July 14, 2023 by Abhishek Gupta

Ingesting Data Into OpenSearch Using Apache Kafka and Go

Scalable data ingestion is a key aspect of a large-scale distributed search and analytics engine like OpenSearch. One of the ways to build a real-time data ingestion pipeline is to use Apache Kafka. It's an open-source event streaming platform used to handle high data volume (and velocity) and integrates with a variety of sources including relational and NoSQL databases. For example, one of the canonical use cases is the real-time synchronization of data between heterogeneous systems (source components) to ensure that OpenSearch indexes are fresh and can be used for analytics or consumed downstream applications via dashboards and visualizations.

This blog post will cover how to create a data pipeline wherein data written into Apache Kafka is ingested into OpenSearch. We will be using Amazon OpenSearch Serverless and Amazon Managed Streaming for Apache Kafka (Amazon MSK) Serverless. Kafka Connect is a great fit for such requirements. It provides sink connectors for OpenSearch as well as ElasticSearch (which can be used if you opt for the ElasticSearch OSS engine with Amazon OpenSearch). Sometimes though, there are specific requirements or reasons which may warrant the use of a custom solution.

PostgreSQL BiDirectional Replication
No categories
As you can understand from my previous blogs I am really into PostgreSQL. Previously we ran Debezium in Embedded mode. Behind the scenes, Debezium consumes the changes that were committed to the transaction log. This happens by utilizing the logical de... […]
Twenty Things Every Java Software Architect Should Know
No categories
As the software development landscape continues to evolve at a rapid pace, Java stands out as a foundational language that drives a multitude of applications on a global scale. In 2024, the role of a Java software architect has assumed unprecedented si... […]
How To Plan a (Successful) MuleSoft VPN Migration (Part II)
No categories
In this second post, we'll be reviewing more topics that you should take into consideration if you're planning a VPN migration. If you missed the first part, you can start from there. […]
Leveraging Microsoft Graph API for Unified Data Access and Insights
No categories
In today's world driven by data, it is essential for businesses and developers to efficiently access and manage data. The Microsoft Graph API serves as a gateway to connect with Microsoft services, like Office 365 Azure AD, OneDrive, Teams, and more. B... […]
AWS CDK: Infrastructure as Abstract Data Types
No categories
Infrastructure as Code (IaC), as the name implies, is a practice that consists of defining infrastructure elements with code. This is opposed to doing it through a GUI (Graphical User Interface) like, for example, the AWS Console. The idea is that in o... […]

Proudly powered by WordPress