In recent years, the issue of IP protection has come under the limelight as people's awareness of IP infringement is ever-increasing. Most notably, the multi-national technology giant Apple Inc. has been actively
filing lawsuits against various companies for IP infringement, including trademark, patent, and design infringement. Apart from those most notorious cases, Apple Inc. also
disputed a trademark application by Woolworths Limited, an Australian supermarket chain, on the grounds of trademark infringement in 2009. Apple. Inc argued that the logo of the Australian brand, a stylized "w", resembles their own logo of an apple. Therefore, Apple Inc. took objection to the range of products, including electronic devices, that Woolworths applied to sell with the logo. The story ends with Woolworths amending its logo and Apple withdrawing its opposition.
With the ever-increasing awareness of brand culture, companies are keeping a closer eye on any threats that will harm their intellectual properties (IP) rights. IP infringement includes:
-
Copyright infringement
- Patent infringement
- Trademark infringement
- Design infringement
- Cybersquatting
The aforementioned dispute between Apple and Woolworths is mainly over trademark infringement, precisely the similarity between the trademark images of the two entities. To refrain from becoming another Woolworths, an exhaustive trademark similarity search is a crucial step for applicants both prior to the filing as well as during the review of trademark applications. The most common resort is through a search on the
United States Patent and Trademark Office (USPTO) database that contains all of the active and inactive trademark registrations and applications. Despite the not-so-charming UI, this search process is also deeply
flawed by its text-based nature as it relies on words and Trademark Design codes (which are hand-annotated labels of design features) to search for images.
This article thereby intends to showcase how to build an efficient image-based trademark similarity search system using
Milvus, an open-source vector database.
A Vector Similarity Search System for Trademarks
To build a vector similarity search system for trademarks, you need to go through the following steps:
- Prepare a massive dataset of logos. Likely, the system can use a dataset like this.
- Train an image feature extraction model using the dataset and data-driven models or AI algorithms.
- Convert logos into vectors using the trained model or algorithm in Step 2.
- Store the vectors and conduct vector similarity searches in Milvus, the open-source vector database.
To accelerate the process of feature extraction, you can deploy the trained AI model on multiple servers. However, you do not have to worry about data inconsistency if you use distributed service to process data because you can use
Flask to ensure synchronized data processing.
When the system is built, your user only needs to upload an image of a logo, and then the system converts this new image into a new vector using the same AI model you trained. The system searches for similar vectors to the new vector in the Milvus database and returns the corresponding vector IDs. Ultimately, your user will be able to see all the results of similar logos to the one he or she has uploaded. The following screenshot is a demonstration of a vector similarity search system for trademarks. As you can see, the user uploaded the logo, the swoosh, of the sportswear brand Nike. The system returns all images that are similar to this logo.
In the following sections, let's take a closer look at the two major steps in building a vector similarity search system for trademarks: using AI models for image feature extraction, and using Milvus for vector similarity search. In our case, we used VGG16, a convolutional neural network (CNN), to extract image features and convert them into embedding vectors.
Using VGG16 For Image Feature Extraction
![Article Image]()
To help you quickly get started with Milvus, the open-source vector database, we released another affiliated open-source project,
Milvus Bootcamp on GitHub. The Milvus Bootcamp not only provides scripts and data for benchmark tests, but also includes projects that use Milvus to build some MVPs (minimum viable products), such as a reverse image search system, a video analysis system, a QA chatbot, or a recommender system. You can learn how to apply vector similarity search in a world full of unstructured data and get some hands-on experience in Milvus Bootcamp.
We provide both front-end and back-end services for the projects in Milvus Bootcamp. However, we have recently made the decision to change the adopted web framework from Flask to FastAPI. This article aims to explain our motivation behind such a change in the adopted web framework for Milvus Bootcamp by clarifying why we chose FastAPI over Flask.
Web Frameworks for Python
A web framework refers to a collection of packages or modules. It is a set of software architecture for web development that allows you to write web applications or services and saves you the trouble of handling low-level details such as protocols, sockets, or process/thread management. Using a web framework can significantly reduce the workload of developing web applications as you can simply "plugin" your code into the framework, with no extra attention needed when dealing with data caching, database access, and data security verification. For more information about what a web framework for Python is, see
Web Frameworks.
There are various types of Python web frameworks. The mainstream ones include Django, Flask, Tornado, and FastAPI.
Flask is a lightweight micro-framework designed for Python, with a simple and easy-to-use core that allows you to develop your own web applications. In addition, the Flask core is also extensible. Therefore, Flask supports an on-demand extension of different functions to meet your personalized needs during web application development. This is to say, with a library of various plug-ins in Flask, you can develop powerful websites.
Flask has the following characteristics:
- Flask is a microframework that does not rely on other specific tools or components of third-party libraries to provide shared functionalities. Flask does not have a database abstraction layer and does not require form validation. However, Flask is highly extensible and supports adding application functionality in a way similar to implementations within Flask itself. Relevant extensions include object-relational mappers, form validation, upload processing, open authentication technologies, and some common tools designed for web frameworks.
- Flask is a web application framework based on WSGI (Web Server Gateway Interface). WSGI is a simple interface connecting a web server with a web application or framework defined for the Python language.
- Flask includes two core function libraries, Werkzeug and Jinja2. Werkzeug is a WSGI toolkit that implements request, response objects, and practical functions, which allows you to build web frameworks on top of it. Jinja2 is a popular full-featured templating engine for Python. It has full support for Unicode, with an optional but widely-adopted integrated sandbox execution environment.
FastAPI is a modern Python web application framework that has the same level of high performance as Go and NodeJS. The core of FastAPI is based on
Starlette and
Pydantic. Starlette is a lightweight
ASGI (Asynchronous Server Gateway Interface) framework toolkit for building high-performance
Asyncio services. Pydantic is a library that defines data validation, serialization, and documentation based on Python-type hints.
FastAPI has the following characteristics:
Sound is an information-dense data type. Although it may feel antiquated in the era of video content, audio remains a primary information source for many people. Despite long-term decline in listeners, 83% of Americans ages 12 or older listened to terrestrial (AM/FM) radio in a given week in 2020 (down from 89% in 2019). Conversely, online audio has seen a steady rise in listeners over the past two decades, with 62% of Americans reportedly listening to some form of it on a weekly basis according to the same Pew Research Center study.
As a wave, sound includes four properties: frequency, amplitude, waveform, and duration. In musical terminology, these are called pitch, dynamics, tone, and duration. Sounds also help humans and other animals perceive and understand our environment, providing context clues for the location and movement of objects in our surroundings.
With 71% of Americans getting their news recommendations from social platforms, personalized content has quickly become how new media is discovered. Whether people are searching for specific topics or interacting with recommended content, everything users see is optimized by algorithms to improve click-through rates (CTR), engagement, and relevance. Sohu is a NASDAQ-listed Chinese online media, video, search, and gaming group. It leveraged Milvus, an open-source vector database built by Zilliz, to build a semantic vector search engine inside its news app. This article explains how the company used user profiles to fine-tune personalized content recommendations over time, improving user experience and engagement.
Recommending Content Using Semantic Vector Search
Sohu News user profiles are built from browsing history and adjusted as users search for, and interact with, news content. Sohu’s recommender system uses semantic vector search to find relevant news articles. The system works by identifying a set of tags that are expected to be of interest to each user based on browsing history. It then quickly searches for relevant articles and sorts the results by popularity (measured by average CTR), before serving them to users.
Milvus is an ongoing open-source software (OSS) project focused on building the world's fastest and most reliable vector database. New features inside Milvus v1.1.0 are the first of many updates to come, thanks to long-term support from the open-source community and sponsorship from Zilliz. This blog article covers the new features, improvements, and bug fixes included with Milvus v1.1.0.
New Features
Like any OSS project, Milvus is a perpetual work in progress. We strive to listen to our users and the open-source community to prioritize the features that matter most. The latest update, Milvus v1.1.0, offers the following new features:
Milvus is an open-source vector database designed to manage massive million, billion, or even trillion vector datasets. Milvus has broad applications spanning new drug discovery, computer vision, autonomous driving, recommendation engines, chatbots, and much more.
In March 2021, Zilliz, the company behind Milvus released the platform's first long-term support version — Milvus v1.0. After months of extensive testing, a stable, production-ready version of the world's most popular vector database is ready for prime time. This blog article covers some Milvus fundamentals as well as key features of v1.0.
Cybersecurity remains a persistent threat to both individuals and businesses, with data privacy concerns increasing for 86% of companies in 2020 and just 23% of consumers believing their personal data is very secure. As malware becomes steadily more omnipresent and sophisticated, a proactive approach to threat detection has become essential. Trend Micro is a global leader in hybrid cloud security, network defense, small business security, and endpoint security. To protect Android devices from viruses, the company built Trend Micro Mobile Security — a mobile app that compares APKs (Android Application Package) from the Google Play Store to a database of known malware. The virus detection system works as follows:
- External APKs (Android application package) from the Google Play Store are crawled.
- Known malware is converted into vectors and stored in Milvus.
- New APKs are also converted into vectors, then compared to the malware database using similarity search.
- If an APK vector is similar to any of the malware vectors, the app provides users with detailed information about the virus and its threat level.
To work, the system has to perform a highly efficient similarity search on massive vector datasets in real-time. Initially, Trend Micro used MySQL. However, as its business expanded so did the number of APKs with nefarious code stored in its database. The company’s algorithm team began searching for alternative vector similarity search solutions after quickly outgrowing MySQL.
Building machine learning (ML) applications is a complex and iterative process. As more companies realize the untapped potential of unstructured data, the demand for AI-powered data processing and analytics will continue to rise. Without effective machine learning operations, or MLOps, most ML application investments will wither on the vine. Research has found that as little as 5% of the AI adoptions companies plan to deploy actually reach deployment. Many organizations incur "model debt," where changes in market conditions, and failure to adapt to them, result in unrealized investments in models that linger unrefreshed (or worse, never get deployed at all).
This article explains MLOps, a systemic approach to AI model life cycle management, and how the open-source vector data management platform Milvus can be used to operationalize AI at scale.
Background
A recommendation system (RS) can identify user preferences based on their historical data and suggest products or items to them accordingly. Companies will enjoy considerable economic benefits from a well-designed recommendation system.
There are three elements in a complete set of recommendation systems: user model, object model, and the core element—recommendation algorithm. Currently, established algorithms include collaborative filtering, implicit semantic modeling, graph-based modeling, combined recommendation, and more. In this article, we will provide some brief instructions on how to use Milvus to build a graph-based recommendation system.