How Much Is One Terabyte of Data?

Featured Imgs 26

It seems that a one-mile distance isn’t very long and that a cubic mile isn’t that big if compared with the size of the earth. You may be surprised if I tell you the entire world’s population can fit into a cubic mile of space. The statement is not from me; Hendrik Willem van Loon, a Dutch-American writer, once wrote this in his book.

Teradata is a famous data warehouse product. Over 30 years ago, such a brand name aimed to impress people with its ability to handle massive amounts of data. Today, TB is already the smallest unit many database vendors use when talking about the amount of data they can handle. And PB, even ZB, is often used. It seems that TB is not a big unit, and hundreds of terabytes of data, even a petabyte of data, is not intimidating at all. 

What You Possibly Don’t Know About Columnar Storage

Featured Imgs 23

Columnar storage is a commonly used storage technique. Often, it implies high performance and has basically become a standard configuration for today’s analytical databases.

The basic principle of columnar storage is reducing the amount of data retrieved from the hard disk.  A data table can have a lot of columns, but the computation may use only a very small number of them. With columnar storage, useless columns do not need to be retrieved, while with row-wise storage, all columns need to be scanned. When the retrieved columns only take up a very small part of the total, columnar storage has a big advantage in terms of IO time, and computation seems to get much faster.

Simple SQL Statements Only Exist in Coursebooks and Training Courses

Featured Imgs 23

The sample SQL statements in coursebooks are usually simple and easy to understand. They even read like English sentences, giving the impression that SQL is rather simple and easy to learn.

Actually, such a SQL statement consisting of only a few lines of code can only be found in coursebooks and training courses. In real-world businesses, the amount of SQL code is measured by KB instead of the number of lines. One SQL statement having several hundred lines of code and N layers of nested subqueries often reaches 3KB to 5KB in size. Such SQL statements are not easy to learn at all but rather a nightmare even to professional programmers.