How to Best Fit Filtering into Vector Similarity Search

Attribute filtering, or simply "filtering," is a basic function desired by users of vector databases. However, such a simple function faces great complexity.

Suppose Steve saw a photograph of a fashion blogger on a social media platform. He would like to search for a similar jean jacket on an online shopping platform that supports image similarity search. After uploading the image to the platform, Steve was shown a plethora of results of similar jean jackets. However, he only wears Levi’s. Then the results of image similarity search need to be filtered by brand. But the problem is when to apply the filter? Should it be applied before or after approximate nearest neighbor search (ANNS)?

Power Up Your Rails Apps With a NewSQL Database

If you are a Ruby on Rails developer, I think you'll really enjoy this article. It aims to help you get started with TiDB, an open-source NewSQL database, and use it to power up your Rails applications.

Use TiDB to Build Up Your Ruby on Rails Applications

TiDB is an open-source NewSQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.

Easy Local Development with TiDB

When you develop an application, you begin by coding and testing in your local environment. Many applications interface with a database, so in this early stage, you might use SQLite rather than the database brand used in production. This is an issue, however, because ideally, you want to develop the application with the production database in mind.

When using a distributed system setting up and starting/stopping the components needed for this can become error-prone and time-consuming.

Best Practices for TiDB Load Balancing

Load balancing distributes connections from applications to TiDB Server instances. This helps to distribute the load over multiple machines and, depending on the load balancing option, can automatically reroute connections if a TiDB instance becomes unavailable.

Load Balancing Types

There are many different ways to implement a load balancer. This section describes the most common types.

TiFS: A TiKV-Based, Partition Tolerant, Strictly Consistent File System

TiKV is a distributed key-value storage engine, featuring strong consistency and partition tolerance. It can act either as the storage engine for TiDB or as an independent transactional key-value database. Do you know what else it is capable of?

At TiDB Hackathon 2020, our team built a TiKV-based distributed POSIX file system, TiFS, which inherits the powerful features of TiKV and also taps into TiKV's possibilities beyond data storage.

Why Is Fuzzy Matching Software a Key for Deduplication?

Identifying golden and unique records across or within datasets is crucial to prevent identity theft, meet compliance regulations, and improve customer acquisition. Banks, government organizations, healthcare providers, and marketing companies all require matching algorithms to identify and deduplicate redundant entries to enrich their master database.

Fuzzy matching is a known set of algorithms for measuring the distance between two similar entities. But certain limitations hinder its effectiveness to quickly find matches for larger, disparate datasets. 

What Is Mobile Backend As A Service (MBaaS)?

Introduction

In the software as a service (SaaS) landscape, there are many variations of "____ as a service." In this article, we will explain what mobile backend as a service, or MBaaS, means.

Mobile backend as a service (MBaaS) is an online service designed to be an all-in-one solution for backend app development. This typically includes data and database management, API management, security, and push notifications.

Add Databases to Your Spring Cleaning List

Every time you delete or update a row in your database, the old records are secretly still hiding in the background and taking up space on your hard drive.

A VACUUM process is like emptying the recycling bin on your laptop. It clears up space, reduces indexing time, and keeps your database squeaky clean.

Cloud Applications Require a Distributed Database

We are well past the experimental stage with the cloud. It’s become mission-critical, and we have entered a stage where our applications and services need to take advantage of the globally distributed nature of the cloud and deliver on the expectations of our consumers.

Legacy relational databases are simply not built for the cloud. They are difficult to scale in this environment and costly to maintain their uptime. NoSQL stores were built to address legacy limitations; however, they fall short when it comes to providing consistent transactions. They are casually consistent. Some of the most successful global organizations have purpose-built databases that achieve the reliability of the relational store with the benefits of scale and global coverage that comes with the cloud. These databases are a new breed called Distributed SQL.

mysqldump Best Practices (Part 1): MySQL Prerequisites

mysqldump is a client utility that is used to perform logical backups of the MySQL database. This popular migration tool is useful for various use cases of MySQL such as:

  • Backup and restore of databases.
  • Migrating data from one server to another.
  • Migrating data across different managed MySQL service providers.
  • Migrating data between different versions of MySQL.

mysqldump works by reading the source database objects and generating a set of SQL statements that are stored in a dump file. By replaying these statements on the destination database server, the original data is reconstructed. Since this model uses a reading of the whole database and then essentially rebuilding, both dump and restore are time-consuming operations for a large database. The process might even turn cumbersome if you encounter errors during either dump or restore as it may lead you to fix the issues and re-run the operations. This is why it's important to plan well before you take up the dump and restore activity.

MongoDB Design: Tips AND Tricks

MongoDB is a popular database that works without imposing any kind of schema. The data is stored in a JSON-like format and can contain different kinds of structures. For example, in the same collection we can have the next two documents:

JSON


To get the best out of MongoDB, you have to understand and follow some basic database design principles. Before getting to some design tips, we have to first understand how MongoDB structures the data.

Setting Up a CrateDB Cluster With Kubernetes to Store and Query Machine Data

Because of its horizontally scalable shared-nothing architecture, the CrateDB open source database is well-suited for working with Kubernetes. Setting up a CrateDB cluster with Kubernetes can be done in just a few steps, and scaling up and down is straightforward – making the cluster particularly flexible. This step-by-step tutorial will show you how to get CrateDB and Kubernetes working together.

CrateDB is used for real-time machine data processing, monitoring, and analytics. The open source database is suited for applications with high volumes of machine data (like anomaly detection), log data (like ecommerce), network data (like capacity planning), and IoT/IIoT data (like smart manufacturing, smart home products, and fitness gear). However, this database is probably not what you want to use if you require strong (ACID) transactional consistency or highly normalized schemas with many tables and joins.

How to View MongoDB Collections as Diagrams

MongoDB doesn’t need a big introduction. It’s one of the fastest-growing databases in the market, and for a good reason. MongoDB has a unique approach to working with data by focusing on flexibility.

Offering Flexibility

Compared to a relational database like MySQL that uses well-defined tables to store data, MongoDB offers more flexibility by storing the data in JSON-like objects. The objects are then stored in collections. Two objects from the same collection can have different data-fields. For example, we can have the next two objects in the same collection:

Top 4 Database Design Tools

Good database design will significantly decrease maintenance work and minimize the chances of errors in a project. As every project has different requirements, finding the right tool for it can be a difficult task.

This article compares 4 of the best database design tools. The comparison was made with 4 main points in focus:

Apache Cassandra

Distributed non-relational database Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors and is used at some of the most well-known, global organizations. This Refcard covers data modeling, Cassandra architecture, replication strategies, querying and indexing, libraries across eight languages, and more.

The Best Way to Host MySQL on Azure Cloud

Are you looking to get started with the world’s most popular open-source database and wondering how you should set up your MySQL hosting? So many default to Amazon RDS when MySQL performs exceptionally well on Azure Cloud. While Microsoft Azure does offer a managed solution, Azure Database, the solution has some major limitations you should know about before migrating your MySQL deployments. In this post, we outline the best way to host MySQL on Azure, including managed solutions, instance types, high availability replication, backup, and disk types to use to optimize your cloud database performance.

You may also like: MySQL Tutorial: A Beginners Guide to Learn MySQL

MySQL DBaaS vs. Self-Managed MySQL

The first thing to consider when weighing between self-management and a MySQL Database-as-a-Service (DBaaS) solution is what internal resources you have available. If you’re reading this, you likely already know the magnitude of operational tasks associated with maintaining a production deployment, but for a quick recap, there’s provisioning, de-provisioning, master-slave configurations, backups, scaling, upgrades, log rotations, OS patching, and monitoring to name a few.

Synchronizing Data in SQL Server

We looked at why exactly change synchronization is important and how to synchronize SQL Server databases by using Schema Compare Tool in my previous article Synchronizing MS SQL Server Databases. Feel free to check it out before reading further if you want to know more on the topic.

In this article, I’ll give an example of how to synchronize SQL Server data changes between servers with the help ofdbForgeData Compare for SQL Server.