Technically Speaking, What Is Data Governance?

The term data governance has been around for decades, but only in the last few years have we begun to redefine our understanding of what the term means outside the world of regulatory compliance, and to establish data standards. This rapid evolution of data governance can be attributed to businesses looking to leverage massive amounts of data for analytics across the enterprise, while attempting to navigate the increasingly rugged terrain of worldwide regulatory requirements. 

Data governance is a critical data management mechanism. Most businesses today have a data governance program in place. However, according to a recent Gartner survey, “more than 87 percent of organizations are classified as having low business intelligence (BI) and analytics maturity,” highlighting how organizations struggle to develop governance strategies that do more than ensure regulatory compliance. 

Tom’s Tech Notes: The Big Concerns With Big Data [Podcast]

Welcome to our latest episode of Tom's Tech Notes! This week, we'll hear advice from 11 industry experts about their biggest concerns with the modern Big Data ecosystem. From poor governance to bad data quality to the removal of human beings from decision-making, check out what Tom's sources have to say about Big Data.

As a primer and reminder from our initial post, these podcasts are compiled from conversations our analyst Tom Smith has had with experts from around the world as part of his work on our research guides.

Automated Machine Learning: Is It the Holy Grail?

Machine learning is in the ascendancy. Particularly, when it comes to pattern recognition, machine learning is the method of choice. Tangible examples of its applications include fraud detection, image recognition, predictive maintenance, and train delay prediction systems. In day-to-day machine learning (ML) and the quest to deploy the knowledge gained, we typically encounter these three main problems (but not only these).

Data Quality — Data from multiple sources across multiple time frames can be difficult to collate into clean and coherent data sets that will yield the maximum benefit from machine learning. Typical issues include missing data, inconsistent data values, autocorrelation, and so forth.

Identifying Data Warehouse Quality Issues During Staging and Loads to the DWH

This is the fourth blog in a series on Identifying Data Integrity Issues at Every DWH Phase.

Before looking into data quality problems during data staging, we need to know how the ETL system handles data rejections, substitutions, cleansing, and enrichment. To ensure success in testing data quality, include as many data scenarios as possible. Typically, data quality rules are defined during design. For example: