Data Lakes, Warehouses and Lakehouses. Which is Best?

Twenty years ago, your data warehouse probably wouldn’t have been voted hottest technology on the block. These bastions of the office basement were long associated with siloed data workflows, on-premises computing clusters, and a limited set of business-related tasks (i.e., processing payroll, and storing internal documents). 

Now, with the rise of data-driven analytics, cross-functional data teams, and most importantly, the cloud, the phrase “cloud data warehouse” is nearly analogous to agility and innovation. 

When NOT To Use Apache Kafka

Apache Kafka is the de facto standard for event streaming to process data in motion. With its significant adoption growth across all industries, I get a very valid question every week: When do I NOT use Apache Kafka? What limitations does the event streaming platform have? When does Kafka simply not provide the needed capabilities? How do I qualify Kafka out as not the right tool for the job? 

This blog post explores the DOs and DONTs. Separate sections explain when to use Kafka, when NOT to use Kafka, and when to MAYBE use Kafka.

Redshift vs. Snowflake: The Definitive Guide

What Is Snowflake?

At its core Snowflake is a data platform. It's not specifically based on any cloud service which means it can run any of the major cloud providers like Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP). As a SaaS (Software-as-a-Service) solution, it helps organizations consolidate data from different sources into a central repository for analytics purposes to help solve Business Intelligence use cases.

Once data is loaded into Snowflake, data scientists, engineers, and analysts can use business logic to transform and model that data in a way that makes sense for their company. With Snowflake users can easily query data using simple SQL. This information is then used to power reports and dashboards so business stakeholders can make key decisions based on relevant insights.

Databricks vs Snowflake: The Definitive Guide

There is a lot of discussion surrounding Snowflake and Databricks in determining which modern cloud solution is better for analytics. However, both solutions were purpose-built to handle different tasks, so neither should be compared from an “apples to apples” perspective.

With that in mind, I’ll do my best to break down some of the core differences between the two and share the pros/cons of each as unbiasedly as possible. Before diving into the weeds of Snowflake and Databricks though, it is important to understand the overall ecosystem.

Azure Synapse vs Snowflake: The Definitive Guide

With the world on pace to reach 175 Zettabytes of data by 2025, it’s no wonder why organizations are placing such a high emphasis on building out their technology stacks. Now more than ever, companies need a way to collect and consolidate data into a single platform to derive insights quickly.

This is one of the core reasons that Snowflake and Azure Synapse Analytics have risen to such popularity. However, Synapse and Snowflake are different solutions and both should be analyzed from an unbiased lens. With that in mind, here are some of the core differences and pros/cons to Snowflake and Synapse.

How to Generate Customer Success Analytics in Snowflake

As the distinction between data professionals and non-data professionals becomes smaller and smaller, the need for technology that bridges the gap between the two parties is crucial. The benefits of interacting with a data warehouse, especially with large amounts of data, are unquestionable, but as a peripheral member of the core technology team who might not be very technical, it is not always practical to generate SQL queries on the fly. 

This poses a problem, especially when departments such as sales, customer success, account management, etc., want the robust insights that could come from the vast amount of data that a company is storing, but they don’t necessarily know how to quickly gather these insights. 

What is Data Lineage and How Can It Ensure Data Quality?

Introduction

Are you spending too much time tracking down bugs for your C-level dashboards? Are different teams struggling to align on what data is needed throughout the organization? Or are you struggling with getting a handle on what the impact of a potential migration could be?

Data lineage could be the answer you need for data quality issues. By improving data traceability and visibility, a data lineage system can improve data quality across your whole data stack and simplify the task of communicating about the data that your organization depends on.

How to Migrate Your Data From Redshift to Snowflake

For decades, data warehousing solutions have been the backbone of enterprise reporting and business intelligence. But, in recent years, cloud-based data warehouses like Amazon Redshift and Snowflake have become extremely popular. So, why would someone want to migrate from one cloud-based data warehouse to another?

The answer is simple: More scale and flexibility. With Snowflake, users can quickly scale out data and compute resources independently by automatically adding nodes. Using the VARIANT data type, Snowflake also supports storing richer data such as objects, arrays, and JSON data. Debugging Redshift is not always straightforward as well, as Redshift users know. Sometimes it goes beyond feature differences that could trigger a desire to migrate. Maybe your team just knows how to work with Snowflake better than Redshift, or perhaps your organization wants to standardize on one particular technology.

What Is OLAP in Data Warehouse?

We are now living in a world that’s driven by data, where huge amounts of information are being gathered and stored daily. The more data that an organization generates, the more crucial it is to have the ability to access and analyze it effectively.

Unfortunately, data analysis is considered to be a weak link for many companies today. This is primarily because of selecting the wrong type of data storage systems while performing ineffective analytics.

Cloud Data Warehouse Comparison: Redshift vs. BigQuery vs. Azure vs. Snowflake for Real-Time Workloads

Data helps companies take the guesswork out of decision-making. Teams can use data-driven evidence to decide which products to build, which features to add, and which growth initiatives to pursue. And, such insights-driven businesses grow at an annual rate of over 30%.

But, there’s a difference between being merely data-aware and insights-driven. Discovering insights requires finding a way to analyze data in near real-time, which is where cloud data warehouses play a vital role. As scalable repositories of data, warehouses allow businesses to find insights by storing and analyzing huge amounts of structured and semi-structured data.

Utilizing BigQuery as A Data Warehouse in A Distributed Application

Introduction

Data plays an integral part in any organization. With the data-driven nature of modern organizations, almost all businesses and their technological decisions are based on the available data. Let's assume that we have an application distributed across multiple servers in different regions of a cloud service provider, and we need to store that application data in a centralized location. The ideal solution for that would be to use some type of database. However, traditional databases are ill-suited to handle extremely large datasets and lack the features that would help data analysis. In that kind of situation, we will need a proper data warehousing solution like Google BigQuery.

What is Google BigQuery?

BigQuery is an enterprise-grade, fully managed data warehousing solution that is a part of the Google Cloud Platform. It is designed to store and query massive data sets while enabling users to manage data via the BigQuery data manipulation language (DML) based on the standard SQL dialect.

Your Ultimate Guide to Redshift ETL: Best Practices, Advanced Tips, and Resources

Introduction

Amazon Redshift makes it easier to uncover transformative insights from big data. Analytical queries that once took hours can now run in seconds. Redshift allows businesses to make data-driven decisions faster, which in turn unlocks greater growth and success.

For a CTO, full-stack engineer, or systems architect, the question isn’t so much what is possible with Amazon Redshift, but how? How do you ensure optimal, consistent runtimes on analytical queries and reports? And how do you do that without taxing precious engineering time and resources?

Goodbye XML, Hello SQL! ClickHouse User Management Goes Pro

Access control is one of the essential features of database management. Starting in late 2019, ClickHouse contributor Vitaly Baranov began to introduce robust, full-featured Role-Based Access Control (RBAC). As a result of this work, which included a huge number of tests implemented by the Altinity QA team, ClickHouse can now rightfully boast enterprise-level access control. Best of all, the commands are all in SQL.

User management is the front gate of RBAC. It controls access to ClickHouse itself. This article digs into new commands like CREATE USER that allow you to create, change, and delete users conveniently. We’ll focus on ways to control authentication for single ClickHouse servers. 

Using PostgreSQL as a Data Warehouse

At Narrator, we support many data warehouses, including Postgres. Though it was designed for production systems, with a little tweaking Postgres can work extremely well as a data warehouse.  

For those that want to cut to the chase, here's the tl;dr

Data Mining: Use Cases, Benefits, and Tools

In the last decade, advances in processing power and speed have allowed us to move from tedious and time-consuming manual practices to fast and easy automated data analysis. The more complex the data sets collected, the greater the potential to uncover relevant information. Retailers, banks, manufacturers, healthcare companies, etc., are using data mining to uncover the relationships between everything from price optimization, promotions, and demographics to how economics, risk, competition, and online presence affect their business models, revenues, operations, and customer relationships. Today, data scientists have become indispensable to organizations around the world as companies seek to achieve bigger goals than ever before with data science. In this article, you will learn about the main use cases of data mining and how it has opened up a world of possibilities for businesses.

Today, organizations have access to more data than ever before. However, making sense of the huge volumes of structured and unstructured data to implement improvements across the organization can be extremely difficult due to the sheer volume of information.

5 Customer Data Integration Best Practices

For the last few years, you have heard the terms "data integration" and "data management" dozens of times. Your business may already invest in these practices, but are you benefitting from this data gathering? 

Too often, companies hire specialists, collect data from many sources and analyze it for no clear purpose. And without a clear purpose, all your efforts are in vain. You can take in more customer information than all your competitors and still fail to make practical use of it.