Taras Baranyuk | The Blog Pros

February 12, 2023

ClickHouse: Windows Functions From Scratch

ClickHouse is a highly scalable, column-oriented, relational database management system optimized for analytical workloads. It is an open-source product developed by Yandex, a search engine company. One of the key features of ClickHouse is its support for advanced analytical functions, including windows functions.

Windows functions were first introduced in the late 1990s by SQL Server, and since then, have become a standard feature in many relational databases, including ClickHouse. Today, windows functions are an indispensable tool for data analysts and developers and are widely used in many industries.

February 10, 2023

ChatGPT for Newbies in Data Science

ChatGPT is a cutting-edge artificial intelligence model developed by OpenAI, designed to generate human-like text based on the input provided. This model is trained on a massive dataset of text data, giving it extensive knowledge of the patterns and relationships in language. With its ability to understand and generate text, ChatGPT can perform a wide range of Natural Language Processing (NLP) tasks, such as language translation, question-answering, and text generation.

One of the most famous examples of ChatGPT's capabilities is its use in generating realistic chatbot conversations. Many companies and organizations have used chatbots to interact with customers, providing quick and accurate responses to common questions. Another example is the use of ChatGPT in language translation, where it can automatically translate text from one language to another, making communication more manageable and more accessible.

July 8, 2022

Common Table Expression in ClickHouse

It is convenient to use CTE in the following cases:

When one request can get data, and its size fits in memory space
Multiple uses of the results of this query are required
Creating recursive queries

A bonus would be the improved readability of your SQL query.

August 25, 2021

Boosted Embeddings with Catboost

Introduction

When working with a large amount of data, it becomes necessary to compress the space with features into vectors. An example is text embeddings, which are an integral part of almost any NLP model creation process. Unfortunately, it is far from always possible to use neural networks to work with this type of data — the reason, for example, maybe a low fitting or inference rates.

I want to suggest an interesting way to use gradient boosting that few people know about.

May 21, 2021

Mobile Apps Dataset

Introduction

My main job is related to mobile advertising, and from time to time, I have to work with mobile application datasets.

I decided to make some of the data publicly available for those who want to practice building models or get an idea of some of the data that can be collected from open sources. I believe that open-source datasets are always useful as they allow you to learn and grow. Collecting data is often a difficult and dreary job, and not everyone has the ability to do it.