AI and BI Projects Get Bogged Down With Data Preparation Tasks

IBM is reporting that data quality challenges are a top reason why organizations are reassessing (or ending) artificial-intelligence (AI) and business intelligence (BI) projects.

Arvind Krishna, IBM’s senior vice president of cloud and cognitive software, stated in a recent interview with the Wall Street Journal, “about 80% of the work with an AI project is collecting and preparing data. Some companies are not prepared for the cost and work associated with that going in. And you say: ‘Hey, wait a moment, where’s the AI? I’m not getting the benefit.’ And you kind of bail on it.” [1]

Data Quality Testing Skills Needed For Data Integration Projects

The impulse to cut project costs is often strong, especially in the final delivery phase of data integration and data migration projects. At this late phase of the project, a common mistake is to delegate testing responsibilities to resources with limited business and data testing skills.

Data integrations are at the core of data warehousing, data migration, data synchronization, and data consolidation projects. 

The Process of ETL Testing: How it Maintains Data Integrity and Consistency

First, let's understand what is ETL. This notation stands for Extract-Transform-Load. For large-scale firms, initially, the data is extracted from the source systems and then transformed into specific data types and, ultimately, loaded into a distinct repository. And this process should be tested efficiently to make sure that the data is managed properly in the warehouse.

What Does Testing of ETL Refer To?

It is a procedure that tests the withdrawal of data for further transformation, authentication of data during the transformation stages, and loading or filling of data in the endpoint.

BI Testing: Identifying Quality Issues During the DWH Design Phase

Decisions in today's organizations have become increasingly data-driven and real-time, so the systems that support business decisions must be of exceptional quality. People sometimes confuse testing data warehouses that produce business intelligence (BI) reports with backend or database testing or with testing the BI reports themselves. Data warehouse testing is much more complex and diverse. Nearly everything in BI applications involves the data that "drives" intelligent decision making.

Data integrity can be compromised at all DWH/BI phases: when data is created, integrated, moved, or transformed. However, testing of data warehouses is usually deferred until late in the cycle. If testing is shortchanged (e.g., due to schedule overruns or limited resource availability), there's a high risk that critical data integrity issues may slip through the verification efforts. Even if thorough testing is performed, it's difficult and costly to address any data integrity issues exposed by this late-cycle testing. At this phase, the cause of the error can be anything from a data quality issue stemming from when the data enters the data warehouse, to a data processing issue caused by a malfunction of the business logic along the layers of the data warehouse and its BI components. This is a painstakingly tedious task and often consumes considerable resources.