Comparing and Assuring Data with iData

Introduction to Comparison and Assurance

Prerequisites

We start this topic with a Project created

Within the project we have created AdventureWorks_Old as a Data Source and Data Entities have been defined.


Video


Introduction

In this short article we will be:

  1. Adding a New Datasource and entities to an existing project
  1. Creating a new stage Comparison and Assurance stage
  1. Run a comparison against a table in two databases, and apply a transform to accept some of the differences discovered.


Steps

We have previously covered topics on Projects, Data Sources, Data Entities in our iData videos. We recommend that you view these as we will be using some of the functionality already set up in this article.


Today we are going to be looking at the Comparison and Assurance.

Running comparisons between two sets of data can be really useful when migrating data from one system to another. It can identify when records have been dropped, or incorrectly modified and show where the problem occurred.


Logging In

From the welcome screen press the Login button taking us to the Login screen.

We enter our iData credentials and press Login.


Select our Adventures Project

Creating a Data Source

If we open the Data Sources tile, we can see that we have a data source defined here, one is AdventureWorks_Old.

We will create a data source with our new data in it that we wish to compare to.

To do this select New Data Source

Select MSSQL, 


Add the data source name AdventureWorks_New, Server Name localhost\SQLEXPRESS, Database Name AdventureWorks_New  and click on Save.


Creating Entities

Now we can add the entities for the new source. 

Click on the Action button, Create Entities.


Click / Select All, then remove BuildVersion and ErrorLog

Click Next and then Save.

Note 
For detailed information, please see: Introduction to Data Sources and Data Entities


Creating the Stage

We are going to compare the two databases, but looking at the Product Description table.

Navigating back to the Project page [We can do this from the Project Location Ribbon]:



Click on Data Processing Stages


Then Click on New Stage  or 

The stage type we want now is Comparison and Assurance.

Confirm by clicking on the 'Setup processing stage' button.

Add a Name of Compare

We can see the entities in the Sources and Destinations are the same source. We will Filter Sources as AdventureWorks_Old and Destination as AdventureWorksNew.

Now we can select the ProductDescription from the Sources and Map it by Selecting ProductDescription from Destination by dragging the obeject into the Mappings section.


Save the Stage.

If the now run the stage, we can see the results of the comparison.

Running the Stage

Select the new stage's Action button, and click on Run.

From here click on Run.


Wait until the View Report button is enabled, then click View Report.


When the SalesLT_ProductDescription status shows a tick, select the Reports tab.


First of all we can see that we have a total of 762 rows, but a compared row count of 761. So we have a different row count.

We can also see we have zero matched rows. At the bottom of the Summary report, the Description fails on all rows. And the other fields match for all but one row, which accounts for our missing row.

If we look at the Detail tab we can see that some of the data has compared.


The first thing to notice is the first line has a symbol telling us there is no source row. This identifies our row count difference.

The third line shows full stop has been added, and capitalisation has changes.

Finally we can see fifth row that a semicolon has been replaced with a comma.

We might have expected these changes in the description, so we can edit the comparison transform to accommodate for that.

Writing Transformation Rules to Account For Known Changes

Lets go back to the Stage definition, and click on Transforms.

We can do this quickly by selecting the stage name from the Project Breadcrumb ribbon at the top of the screen:


Click on the Transforms tile



Then click on SalesLT_ProductDescription.js to opening the transform up in the built-in editor, to make these changes.


We can change the Description comparison rules on line 23 by updating it and adding an extra two line for the other changes.


First off we can let the comparison ignore case change in the source.

Secondly we can account to the change in the semi colon to a comma.

And finally to handle the extra full stop we the following rule.


  1.    stream.Make($Description).Lowercase();
  2.    stream.Make($Description).Replace(";",",");
  3.    stream.If($Description).EndsWith(".").Make().RemoveLast(1);

Like this.

This won’t change our data just allow the report to pass the differences.

Now if we save the transform, and go back to our Stage, we can rerun it and see how that has reflected in the results.

Rerunning the Stage

Click on iData Browser Tab / Navigate to the Compare in the breadcrumb ribbon


Select Run Stage / Run / View Reports then select the SalesLT_ProductDescription report.


Taking a look at the summary we can see we now have a percentage of matching rows and descriptions. 


Looking at the Details, we are not reporting on the case, semi colon nor the added punctuation.


Instead we can see differences in test in the description field we may wish to investigate.


iData provides many more transformation rules permitting a great deal of flexibility in accepting comparison changes and enabling better in the quality assurance.


    • Related Articles

    • Introduction To Profiling

      Prerequisites We start this topic with a Project created, a Data Source and Data Entities have been defined. In this Topic we have AcmeData database defined as the Data Source and all tables imported as Data Entities.  Video Introduction In this ...
    • Introduction to Similar Records

      Prerequisites We start this topic with a Project created, a Data Source and Data Entities have been defined. We will be using the AcmeData project we created in previous articles. Video Introduction In this short article we will be: Creating a ...
    • How Can iData Be Used To Provide Quality Assurance In A Data Migration Project?

      Problem Data migration projects are often protracted and complex affairs. Comprehensive quality assurance activities are an essential part of such projects, in order to prove that the output is acceptable to all stake holders, and is thus fit for its ...
    • Introduction to Data Validation

      Prerequisites We start this topic with a Project created, a Data Source and Data Entities have been defined. We have created and run a Profile against all entities in the AcmeData database. Video Introduction In this short article will be: Creating a ...
    • Introduction to Obfuscation

      Prerequisites We start this topic with a Project created, a Data Source and Data Entities have been defined. Video Introduction In this short article we will be: Creating a new Validate and Clean stage Editing transformation rules to: Replace data ...