Run Hundreds of Experiments with OpenCV and Hydra

Feature Matching Problem

Image matching is an important task in computer vision. Real-world objects may be captured on different photos from any angle with any lightning conditions and may be occluded. But while images contain the same objects they must be categorized accordingly. For this purpose, computer vision gives us invariant feature extractors that help to match objects on different images

Detectors, Descriptors, and Matchers

Image matching is a three-step algorithm. Fortunately, they are all covered in the OpenCV library

Build a Plagiarism Checker Using Machine Learning

Plagiarism is rampant on the internet and in the classroom. With so much content out there, it’s sometimes hard to know when something has been plagiarized. Authors writing blog posts may want to check if someone has stolen their work and posted it elsewhere. Teachers may want to check students’ papers against other scholarly articles for copied work. News outlets may want to check if a content farm has stolen their news articles and claimed the content as its own.

So, how do we guard against plagiarism? Wouldn’t it be nice if we could have software do the heavy lifting for us? Using machine learning, we can build our own plagiarism checker that searches a vast database for stolen content. In this article, we’ll do exactly that.

Visualize Airflow Workflows Without Airflow

Apache Airflow has gained a lot of traction in the data processing world. It is a Python-based orchestration tool. When I say "Python-based" it is not just that the application has been developed using Python. The directed acyclic graphs (DAGs) — Airflows term for workflows — are also written as Python. In other words, workflows are code. Many of the popular workflows tools like Informatica and Talend have visual tools that allow developers to lay out the workflow visually. As Airflow workflows are Python code, we are able to visualize the workflow only after uploading it. While this is an acceptable situation, in some cases, it can become problematic because Airflow refuses to load the workflow due to errors. Additionally, during development, it is difficult to visualize all the connections mentioned in Python code.

While looking for a way to visualize the workflow, I came across a Sankey diagram. Not just that, I also came across a gist where Python code has been conveniently packaged into a function. All I had to do was download the gist and include it in my program.

Learning Python By Example: Bagels

Intro

I’m improving my Python skills by coding through The Big Book of Small Python Projects by Al Sweigart. I’ve written a few Python scripts in the past, but never went very deep and had big gaps of time between uses.

In 1981 style, I’m manually typing in the code from the book. In 2021 style, I’m blogging about each program as I go to help me reinforce the learning even more and hopefully support others on the same journey.

What Is Data Engineering? Skills and Tools Required

In the last decade, as most organizations began receiving advanced change, data scientists and data engineers have developed into two separate jobs, obviously, with specific covers. The business generates data constantly from people and products. Every event is a snapshot of company functions (and dysfunctions) such as revenue, losses, third-party partnerships, and goods received. But if the data isn't explored, there will be no insights gained. The intention of data engineering is to help the process and make it workable for buyers of data. In this article, we’ll explore the definition of data engineering, data engineering skills, what data engineers do and their responsibilities, and the future of data engineering.

Data Engineering: What Is It?

In the world of data, a data scientist is just comparable to the information or data they approach. Most companies store their information or data in an assortment of arrangements across data sets and text formats. This is the situation where data engineering enters. In simple form, data engineering means organizing and designing the data, which is done by the data engineers. They construct data pipelines that change that information, organize them, and make them useful. Data engineering is similarly as significant as data science.  However, data engineering requires realizing how to get an incentive form of data, just as the commonsense designing abilities to move data from guide A toward point B without defilement.

Processing 3D Data Using Python Multiprocessing Library

Today we’ll cover the tools that are very handy with large amount of data. I'm not going to tell you only general information that might be found in manuals but share some little tricks that I’ve discovered, such as using tqdm with multiprocessing imap, working with archives in parallel, plotting and processing 3D data, and how to search for a similar object within object meshes if you have a point cloud.

So why should we resort to parallel computing? Nowadays, if you work with any kind of data you might face problems related to "big data". Each time we have the data that doesn’t fit the RAM we need to process it piece by piece. Fortunately, modern programming languages allow us to spawn multiple processes (or even threads) that work perfectly on multi-core processors.  (NB: That doesn’t mean that single-core processors cannot handle multiprocessing.  Here’s the Stack Overflow thread on that topic.)

How To Use CockroachDB With Your Django Application

This tutorial is intended to be a quick ramp-up on CockroachDB with Django.  In case you're searching for a proper Django tutorial, this is not it. At the time of writing, django-cockroachdb library is available in two versions, (2 and 3). This tutorial will cover version 3, and is inspired by the Digital Ocean tutorial using Django with PostgreSQL. I am going to highlight the steps where this tutorial differs from the original. For everything else, we will assume the tutorial is followed as is.

I originally wrote this post two years ago and had since updated it as CockroachDB RBAC is no longer an enterprise-only feature so we can skip that step. I'm also including steps to launch a Docker instance to make this more portable and comprehensive.

Talk Python To Me Podcast Founder Michael Kennedy Interview

Introduction

Michael Kennedy is a successful entrepreneur and software development professional. He created and hosts the weekly podcast, Talk Python To Me, which focuses on Python and related software development topics. He also founded and is the chief author for Talk Python Training, an online Python training program. We had the opportunity to interview Michael about his programming experiences, and we have included the entire transcript below. Hope you enjoy it!

The Interview

Evrone: We at Evrone have done custom development for many years, and we try to organize professional communities in Russia, like a community for Ruby developers, for Python developers. Your podcasts are famous in Russia. So, first of all, from the Russian Python community, thank you for your hard work.

Async Support in Django

Hello, my dear readers! Yes, this article is about the web framework for perfectionists with deadlines, as well as Django’s lack of async support. It’s more like an Enhancement Proposal (less formal than it could be) or RFC. So, if you like that sort of thing, you might be interested. 

Django Foundation has also considered the issue of adding async support. Their discussions have resulted in DEP-09, which describes the current approximate roadmap. I have even discovered that my post doesn’t contradict it. It’s just that it has very little information about async-native support. It is considered the last stage that still needs to be reached. This reminds me of a meme about how to draw an owl: first, we draw two circles, and then we finish the rest.

The 5 Best SQL Adapters for Your Python Project

Choose the Best SQL Adapter for Your Python Project

Introduction

This article will explain what a database connector is and cover the pros and cons of some popular python SQL connectors.

What is a Database Connector?

A database connector is a driver that works like an adapter that connects a software interface to a specific database vendor implementation.

A Guide to Web Scraping in Python using BeautifulSoup

Today we’ll discuss how to use the BeautifulSoup library to extract content from an HTML page. After extraction, we’ll convert it to a Python list or dictionary using BeautifulSoup!

What Is Web Scraping, and Why Do I Need It?

The simple answer is this: not every website has an API to fetch content. You might want to get recipes from your favorite cooking website or photos from a travel blog. Without an API, extracting the HTML, or scraping, might be the only way to get that content. I’m going to show you how to do this in Python.

Boosted Embeddings with Catboost

Introduction

When working with a large amount of data, it becomes necessary to compress the space with features into vectors. An example is text embeddings, which are an integral part of almost any NLP model creation process. Unfortunately, it is far from always possible to use neural networks to work with this type of data — the reason, for example, maybe a low fitting or inference rates.

I want to suggest an interesting way to use gradient boosting that few people know about.

How to Set Up PostgreSQL High Availability With Patroni

PostgreSQL is an open-source, versatile, and most popular database system around the world. However, it does not have any features for high availability.

Enter Patroni.  Patroni is a cluster manager tool used for customizing and automating deployment and maintenance of high availability PostgreSQL clusters. It is written in Python and uses etcd, Consul, and ZooKeeper as a distributed configuration store for maximum accessibility. In addition, Patroni is capable of handling database replication, backup, and restoration configurations.

Setting up Modern Web Test Automation Framework with Selenium and Python

Nowadays, it has been a dominant trend to deploy big releases on an infrequent basis without sacrificing the quality of the product. With Every new deployment that introduces new features, bug fixes need in-depth end-to-end testing to ensure the success rate of deployment. The small product or projects can be covered up with manual testing but the products or applications that are huge in amount of features definitely require automation testing to provide maximum test coverage in minimum time. Such use cases can be achieved with the use of Selenium with any robust programming language, in this post we will be using Selenium with Python.

Selenium WebDriver is a web framework that permits the automation of web-based applications on various supported browsers like Chrome, Firefox, Safari, etc. The selenium test automation framework enables you to define step-by-step interactions with a web application and adding assertions to uncover maximum bugs.

How to Scrape Target Store Locations Data From Target.Com Using Python?

Web data scraping is a quicker and well-organized way of getting details about the store locations or scrape locations from website rather than using time to collect information physically. This tutorial blog is for scraping store locations as well as contact data accessible on Target.com, amongst the biggest discounted store retailers in the USA.
For the tutorial blog here, our Target store locator will scrape the information for Target store locations by the provided zip code.

We can scrape the following data fields: