Monitoring and the ELK Stack

Any application monitoring solution should maintain an open design, build upon proven technologies, be accessible, and require a low learning curve. The end goal is simple: provide teams with the ability to identify issues or unexpected behavior within minutes, if not seconds. The ELK Stack meets these expectations and more. In this Refcard, you'll cover the basic components of the ELK Stack, how it maps to a log analysis workflow, and step-by-step instructions for installation, configuration, and reporting.

Elasticsearch vs. CloudSearch: AWS Cloud

Today, more than 100 billion searches are conducted every month on the Google search engine alone. Search engine users conduct searches for several reasons including the foundational conversion of information into action. An action could be a decision to purchase, consume information for decision-making, or seek a better understanding of an issue or topic among others. Search engines make information available at our fingertips right whenever we need it. 

In this era of big data, search solutions are useful not only for popular search engines like Google, Yahoo, and Bing but also for enterprises for monitoring and managing the growing volumes of data in their databases to enhance operational efficiency. The enterprise search industry has grown remarkably and is expected to be worth $8.90 billion by 2024.

Stopping Cybersecurity Threats: Why Databases Matter

From intrusion detection to threat analysis to endpoint security, the effectiveness of cybersecurity efforts often boils down to how much data can be processed in real-time with the most advanced algorithms and models.

Many factors are obviously involved in stopping cybersecurity threats effectively. However, the databases responsible for processing the billions or trillions of events per day (from millions of endpoints) play a particularly crucial role. High throughput and low latency directly correlate with better insights as well as more threats discovered and mitigated in near real-time. Cybersecurity data-intensive systems are incredibly complex: many span 4+ data centers with database clusters exceeding 1000 nodes and petabytes of heterogeneous data under active management.

How to Implement Typeahead Search with Elasticsearch

Today, search is an important functionality in enterprise applications and end-users are obsessed with the experience of Google Search and expecting the application search also provides similar experiences.

This requires us to design and implement a search engine along with your golden source (RDBMS/NOSQL). There are many search engines available in the market today like Elasticsearch, Apache Solr, Azure Cognitive Search, etc. These provide a better search experience and features like typeahead, fuzzy search, boosting the search results based on relevancy, similarity search, etc.

Apache Kafka in Cybersecurity for Threat Intelligence

Apache Kafka became the de facto standard for processing data in motion across enterprises and industries. Cybersecurity is a key success factor across all use cases. Kafka is not just used as a backbone and source of truth for data. It also monitors, correlates, and proactively acts on events from various real-time and batch data sources to detect anomalies and respond to incidents. This blog series explores use cases and architectures for Kafka in the cybersecurity space, including situational awareness, threat intelligence, forensics, air-gapped and zero trust environments, and SIEM/SOAR modernization. This post is part three: Cyber Threat Intelligence.

Blog Series: Apache Kafka for Cybersecurity

This blog series explores why security features such as RBAC, encryption, and audit logs are only the foundation of a secure event streaming infrastructure. Learn about use cases,  architectures, and reference deployments for Kafka in the cybersecurity space:

Top 10 June ’21 Big Data Articles to Read Now

Introduction

Big Data is now adapted by a lot of businesses. Its popularity and use are expanding globally. How awesome would it be to find top trending Big Data articles in one place so that you can always stay up to date with the latest trends in technology? We dug into Google analytics to find the top 10 most popular Big Data articles in June. Let's get started!

10. Kafka Administration and Monitoring UI Tools

Kafka is used for streaming data and much more! This article covers Kafka basics and Kafta Administration, Kafka Manager, and Monitoring tools. 

Introduction to Spring Data Elasticsearch 4.1

Preface

I was recently playing around with Spring Data Elasticsearch (the Spring Data project for Elasticsearch) and came across several issues. One of these was a lack of up-to-date articles. This led me to share my experience using the latest Elasticsearch 7 and Spring Data Elasticsearch 4.1. I hope that my advice can help others gain insight into the tool and how to effectively use it for a variety of reasons.

In this Article, You Will Learn

  • What Elasticsearch and Spring Data Elasticsearch are.
  • Basic Elasticsearch cluster setup via Docker (including management and monitoring tool).
  • How to configure Spring Data Elasticsearch in a project.
  • How to use Spring Data Elasticsearch to upload and access Elasticsearch data.
  • What are some pitfalls related to Elasticsearch usage?

First, I will briefly explain the purpose of Elasticsearch.

ElasticPress.io Service Considers Next Move after Elasticsearch Abandons Open Source Licensing

Elastic, makers of the search and analytic engine Elasticsearch, have re-licensed its core product so that it is no longer open source. The company is moving new versions of both Kibana and Elasticsearch from the Apache 2.0-license to be dual-licensed under the Server Side Public License (SSPL) and the Elastic License, which do not meet the Open Source Definition.

In a post titled “Amazon: NOT OK – why we had to change Elastic licensing,” Elastic blames Amazon for the license change:

Our license change is aimed at preventing companies from taking our Elasticsearch and Kibana products and providing them directly as a service without collaborating with us.

Our license change comes after years of what we believe to be Amazon/AWS misleading and confusing the community – enough is enough.

Elastic claims AWS’s behavior has “forced” the company to abandon its open source licensing, citing examples of what they perceive to be “ethically challenged behavior.” In 2019, Amazon created an Open Distro for Elasticsearch, and Elastic claims they used code copied by a third party from their commercial code, further dividing the community.

As a result of the license change, Amazon announced its intention to officially fork Elasticsearch and Kibana, with plans to roll the forks into its Open Distro distributions:

Our forks of Elasticsearch and Kibana will be based on the latest ALv2-licensed codebases, version 7.10. We will publish new GitHub repositories in the next few weeks. In time, both will be included in the existing Open Distro distributions, replacing the ALv2 builds provided by Elastic. We’re in this for the long haul, and will work in a way that fosters healthy and sustainable open source practices—including implementing shared project governance with a community of contributors.

The Open Source Initiative (OSI) reacted to the news of the license change, calling the SSPL a “fauxpen” source license:

Fauxpen source licenses allow a user to view the source code but do not allow other highly important rights protected by the Open Source Definition, such as the right to make use of the program for any field of endeavor. By design, and as explained by the most recent adopter, Elastic, in a post it unironically titled “Doubling Down on Open,” Elastic says that it now can “restrict cloud service providers from offering our software as a service” in violation of OSD6. Elastic didn’t double down, it threw its cards in.

Elastic’s license changes may affect a few companies in the WordPress ecosystem that are redistributing Elasticsearch as a commercial offering. 10up, creators of ElasticPress, by far the most popular Elasticsearch plugin for WordPress, also runs the ElasticPress.io SaaS platform. More than 6,000 sites are using the open source plugin, but the company said these users will not be affected.

“No matter what this won’t affect the EP plugin,” 10up vice president of engineering Taylor Lovett said. “I would say the news is definitely discouraging and not a great look for Elastic.”

10up launched ElasticPress.io in 2017 and Lovett says it has become “an active part of the business with a plethora of customers,” and continues to grow. The company is currently seeking legal advice on how Elasticsearch’s licensing change will affect the ElasticPress.io service. Since previous versions of Elasticsearch remain open source, the company has time to figure out a new way forward.

“Right now we really don’t know what’s going to happen,” Lovett said. “There is no rush for us to upgrade ElasticPress.io to Elasticsearch 7.11+ so we have plenty of time to decide how to address the issue.”

Lovett confirmed that 10up is considering using the Amazon fork as an option but has not made a decision on the matter yet.

“I will say this does affect the end user in a way that they may end up having to choose between different flavors of Elasticserarch,” Lovett said.

“For example, you may need to decide if you want the official Elastic distribution or if you want to go with AWS’s fork.”

Unfortunately, for businesses that built services on top of redistributing the previously open source Elasticsearch, Elastic’s creators have gone back on the promise they made in 2018 to never change the license of any of the Apache 2.0 code of Elasticsearch, Kibana, Beats, and Logstash projects. As a consequence, Amazon has emerged as the one to drive the truly open source option for Elasticsearch and Kibana for the future.

“Elastic’s relicensing is not evidence of any failure of the open source licensing model or a gap in open source licenses,” the OSI board of directors stated in a recent post on the matter. “It is simply that Elastic’s current business model is inconsistent with what open source licenses are designed to do. Its current business desires are what proprietary licenses (which includes source available) are designed for.”

How to Optimize AWS Observability Tools

Amazon Web Services (AWS) is a powerhouse cloud computing service allowing companies to produce computational functionality. They enable developers to quickly create serverless functions, which quickly delivers new features to consumers without scaling up infrastructure, taking both time and cost. The downside to this speed is that tracking and observing these functions’ health issues can be difficult, especially when running microservices. AWS provides several tools to assist developers in understanding their system’s health and are in the process of delivering new tools as well.

Observability With AWS CloudWatch

CloudWatch is AWS’s monitoring and insight service. Developers use CloudWatch to collect logs from compute functions and track performance information for many AWS services. Using this data, CloudWatch can create insights on which developers can develop alarms or insights. Using the combination of these tools, developers can create AWS observability tools that meet their needs.

Elasticsearch New Features: 2020 in Review

What a year 2020 has been! Social distancing and a lot of very weird situations. For some, it was a year full of difficulties, and hopefully a lot of growth and some good things too.

It has definitely been an interesting year for Elasticsearch. Many things happened, new features added and the product evolved significantly. We wanted to recap and share highlights of new features and usage recommendations. This post is about the things we consider as big changes, and important steps forward, based on our experience and what we see as important while actively working with hundreds of customers on Elasticsearch clusters of all shapes and sizes, from full-text search to log analytics and anomaly detection.

Which AWS Storage Solution Is Right for Your Elasticsearch Cluster?

Amazon Web Services (AWS) is one of the most competent cloud service providers around right now. It offers a number of different kinds of storage. It provides low-cost data storage with high durability and high availability.

This article will help you to understand the different storage services and features available in the AWS Cloud and how to select the right the storage type for your ELK stack.

How to Keep Elasticsearch in Sync with Relational Databases?

This article was published in Java Advent Calendar on December 6, 2020

Many businesses are looking to take advantage of Elasticsearch’s powerful search capabilities using it in close relationship with existing relational databases. In this context, it’s not rare to use Elasticsearch as a caching layer. At this point, a basic and important need arises which is synchronizing Elasticsearch with the database.

Deploy Quarkus Faster in the Cloud with Platform.sh. Part 4: Hibernate Search With Elasticsearch

Quarkus is, in its own words, a Cloud Native, (Linux) Container First framework for writing Java applications. It has become popular because of the amazingly fast boot time, incredibly low RSS memory. In this series of articles about Quarkus, we'll teach you how to deploy Quarkus with Hibernate search with Elasticsearch.

Full-text searching provides the capability to identify natural-language documents that satisfy a query, and optionally to sort them by relevance to the query.  Hibernate Search automatically extracts data from Hibernate ORM entities to push it to local Apache Lucene indexes or remote Elasticsearch indexes.

Elasticsearch Index v7.6

Elasticsearch, which is based on Lucene, is a distributed document store. It is a highly effective way of indexing your information for correlation and quick query for analysis. In this blog, I will just walk you through the steps required to create an Index, search, and visualize.

What Is an Index?

In the context of ES an index is a collection of documents.

8 Open Source Projects You Need to See!

The year 2020 has already started and with it, the huge amount of languages/frameworks/tools that we developers have to know, learn or just 'take a look at' only increases. In this short article, I try to demonstrate the 8 OpenSource projects that appear to be promising for the year. Many of these projects are already in use today (some even on a large scale), others are coming into focus just this year, either through community adoption or relevance in the current context of software development.

Frontend

React — Gatsby

Gatsby is an SSG (Static Site Generator) OpenSource based on React that aims to make development easier and more efficient. Gatsby is a framework that brings together the main features of React and several other modern tools in the same package, facilitating the creation of fast and powerful websites and web applications. 

What CDOs and CAOs Struggle With Most

Our team recently attended the Chief Data & Analytics officers (CDAO) conference in Boston and used the opportunity to conduct an informal poll. The conference wills packed with C-suit executives trying to wrangle big data at companies like Tesla, Lionsgate, AMD, Capital One, and Ford. We asked everyone about their analytics challenges. There were two standout issues that we kept hearing about again and again.

1. Their data scientists get bogged down with data access challenges

A recent study showed that data preparation and data engineering tasks represent over 80% of the time consumed in most AI and Machine Learning projects.