solr | The Blog Pros

October 10, 2022

Designing Search (part 3): Keeping on track

Part 1

In the previous post we looked at techniques to help us create and articulate more effective queries. From auto-complete for lookup tasks to auto-suggest for exploratory search, these simple techniques can often make the difference between success and failure.

But occasionally things do go wrong. Sometimes our information journey is more complex than we’d anticipated, and we find ourselves straying off the ideal course. Worse still, in our determination to pursue our original goal, we may overlook other, more productive directions, leaving us endlessly finessing a flawed strategy. Sometimes we are in too deep to turn around and start again.

October 14, 2020

Why Disintegration of Apache Zookeeper From Kafka Is in the Pipeline

The main objective of this article is to highlight why to cut the bridge between Apache Zookeeper and Kafka which is an upcoming project from the Apache software foundation. Also, the proposed architecture/solution aims to make the Kafka completely independent in delivering the entire functionalities that currently offering today with Zookeeper.

Article Structure

This article has been segmented into 4 parts.

April 17, 2020

In Search of Quality: QA Must Be Engaged in Search Engine Development

If you’re reading this, you’re likely already well aware of the value of watertight QA practice and have a good understanding of what it entails. Yet, there is possibly a team delivering business-critical software at your organization that has thus far escaped the forensic focus of your testing. You need to talk to them, and this blog is a primer to help you do just that.

So, if you want to do one thing today to increase the measurable impact of QA at your organization, do this: find out which team is developing your organization’s search technologies and ask them how they’re testing them. There’s a fair chance that it’s a third-party search specialist, working with their own set of cutting-edge tools. In this case, verifying the quality of their testing practices becomes even more pertinent.

December 11, 2019

Hadoop Ecosystem: Hadoop Tools for Crunching Big Data

Hadoop Ecosystem

In this blog, let's understand the Hadoop Ecosystem. It is an essential topic to understand before you start working with Hadoop. This Hadoop ecosystem blog will familiarize you with industry-wide used Big Data frameworks, required for a Hadoop certification.

The Hadoop Ecosystem is neither a programming language nor a service; it is a platform or framework which solves big data problems. You can consider it as a suite that encompasses a number of services (ingesting, storing, analyzing, and maintaining) inside it. Let us discuss and get a brief idea about how the services work individually and in collaboration.

October 10, 2019

Read-Only Collections in Solr

An actual image of a ''read-only'' collection.

Have you ever wonder how to avoid accidental, or on purpose, modification of collection data? Of course, we could reject access as one of the possible solutions, but it is not always possible. In today's blog post, we will look into how to easily protect your collection from accidental modifications by setting it to read-only.

You may also like: Java 10 Immutable/Unmodifiable Stream API Collectors

Default Behavior

When we create the collection, either via the API or using the script, it is created in the all-access mode — you can both read data from it and write data to it. Let's create a collection using the following command:

August 26, 2019

Using Apache Solr in Production

Solr is a search engine built on top of Apache Lucene. Apache Lucene uses an inverted index to store documents(data) and gives you search and indexing functionality via a Java API. However, to use features like full text, you would need to write code in Java.

Solr is a more advanced version of Lucene’s search. It offers more functionality and is designed for scalability. Solr comes loaded with features like Pagination, sorting, faceting, auto-suggest, spell check, etc. Also, Solr uses a trie structure for numeric and date data types e.g. there is a normal int field and another tint field, which represents the trie int field.