Introduction to Data Validation with iData 2

Introduction to Data Validation

Prerequisites

We start this topic with a Project created, a Data Source and Data Entities have been defined.

We have created and run a Profile against all entities in the AcmeData database.

Video


Introduction

In this short article will be:

  1. Creating a Validation Stage
  1. Looking at adding simple validation rules.
  1. Reviewing the output reports on the application of the validation rules.

Steps

In our previous articles we have set up an iData project and defined data sources and entities, and we have run a profile on the data.

Today we are going to be looking at the validation of data using iData.


Logging In

From the welcome screen press the Login button taking us to the Login screen.

We enter our iData credentials and press Login.


From the Projects page, select and existing the project.


Create a New Stage

To Create the New Data Stage, Select Data Stages


Click on New Stage, and then ‘Validate & Clean’:

Press ‘Setup processing stage’.

Setting up Schema Mapping

We can now see the stage definition page for a Validation and Clean stage.

This is similar to the profiling set up, we get a choice of data entities to use as part of this stage, but in this case we are either going to be looking at the data or potentially cleansing it and validating it, so I need a source and destination for the data that is going to be processed. I have entities I can read on the left and entities I can write to on the right.

We need to map the two together. I can do this manually by dragging the two together into the Mappings area or let iData do the hard work and map those entries together.


Pressing Automap does this for us.

We could also do the same at column level if any of the columns do not actually match using the ‘Column Mapping’ button. But as we are going from the same source and destination, everything is going to line up.

We name the stage:

Then press save.



Here is our new stage in the stages list.

Select the new Clean and Validate stage from the screen


Defining Validation Rules

The stage we have created provides a starting point to perform some cleansing of validation, but we need to define the rules that we are going to apply to validate the data or to those cleansing operations. 

The way we do that is by editing the transformations for the stage by selecting Transforms.

Here we can see the data entities that make up this stage. We can edit any of these to provide some transformation or some cleansing to that table.

To Edit the Client table. Click on the Client, which will open up a new tab in the browser with the transformation in the editor.

At the top we have the Command button to assist in using the editor, the Save button to commit our changes to iData, and the Close button, which will close the tab.

Using the integrated editor we can apply validation rules or cleansing commands to our data in order to improve the data quality. The editor is user friendly, and will auto complete commands as we type.

If you are familiar with Javascript, you will see that the transform follows the Javascript syntax.

iData completes a line for each column in the data entity for our reference. We also have a reference or alias to the column defined using a preceding $.

The transformation rules are contained within the function section of the transform.

We can just make simple adjustments to apply some validation rules. 

For example we will update the Age rule to validate that everybody in the database is at least 18 year old.

Updating line 24, to change the age.


If we Start Typing 

stream.If('Age').LessThan(18).Fail('Too Young'); 

over the line 

// stream.Make('Age').Lowercase(); 


And as you can see we are being prompted for the options that we can apply to this column as I go along, so it’s all very straight forward.


This will report if a client has an age of under 18 with a comment of ‘Too Young’.

Save this by pressing ‘Save’, and Close, then we can run our stage from the main iData browser tab.

Running the Stage

Click on the stage name in the Project Breadcrumb ribbon in the upper left corner of the screen.

 

Click on Run stage

And press Run to start the stage job.

When the job is initiated and the View Reports button is enabled, clicking the View Report button  will take us to the output for the stage.

Selecting the Client report and scrolling to the bottom, we can see that we have some rows reported as ‘Too Young’.

Viewing the Results

We can view the details of these by clicking on the Details tab.

Scrolling to the right, we can see both the records identified by the validation rule, and the message output by our transform command.

These were the entries that were flagged by the profiler as not having a full stop in them.

This is the simplest example of a validation rule, but we can build up more sophisticated rules to make sure that your data is valid. iData also has lots of built in facilities that help you validate your data.

For example, to apply more rules on the email address.

Navigate the stage from the project breadcrumb ribbon: 

Click on Transforms, and then on Client once more.

iData understands email addresses and has built in facilities to validate them. We can tag this as an email and then add a simple validation rule that says if the email address is not valid, fail that as well. 

To simplify the report we will comment our the Age validation rule from before, on line 24.

Verifying and Email Address

We can tag this as an email and then add a simple validation rule that says if the email address is not valid, we will fail that as well. To apply the rule on line 28 for the email address make the changes like this:


  1. stream.Make($Email).TagAsEmail();

Then add a line following to state 

  1. stream.If('Email').NotValid().Fail('Bad Email');

I.E.


Press Save, and return to the main iData browser tab.

Select the Stage again from the Project ribbon and rerun the stage as before.

 

[Run Stage -> Press Run]

Press on View Reports and Open the Client Report.

The summary screen shows us that we have rows with Bad Email addresses

And by pressing the Details tab, shows us what they are:


Checking an eMail Domain

We can enhance the script further by enabling a domain check on the email address. 

To do this we jump to the Client editor browser tab we left open before and update line 28 by adding EnableDomainCheck to the end of the line.

Press Save.

Now we rerun the Stage as before.

Select the Stage again from the Project ribbon and rerun the stage as before.

 

Run Stage -> Press Run

Press on View Reports and Open the Client Report.


And if we look at our client report, we now have more email addresses flagged as being a Bad email. 

Clicking on the Details tab we see the rows we found before, we can see row where the email has a valid format but does not have a valid domain, as there is no email server attached to it, so we can flag this as being invalid.



    • Related Articles

    • Introduction to Data Sources and Data Entities

      Prerequisites User access to iData Single, empty project Access to the local server data ‘localhost\SQLEXPRESS’ database ‘AcmeData’ Video Introduction In this short article will be: Using an existing project Adding a data source for a SQL Server ...
    • Introduction to Synthetic Data Generation

      Prerequisites We start this topic with a Project created, a Data Source and Data Entities have been defined. This will require the AcmeData source to be defined. Video Introduction In this short article we will be: Creating a new Data Generation ...
    • Introduction To Profiling

      Prerequisites We start this topic with a Project created, a Data Source and Data Entities have been defined. In this Topic we have AcmeData database defined as the Data Source and all tables imported as Data Entities.  Video Introduction In this ...
    • Introduction to Similar Records

      Prerequisites We start this topic with a Project created, a Data Source and Data Entities have been defined. We will be using the AcmeData project we created in previous articles. Video Introduction In this short article we will be: Creating a ...
    • Introduction to Cleansing Data From Profile Information

      Prerequisites We start this topic with a Project created, a Data Source and Data Entities have been defined. We have created and run a Profile against all entities in the AcmeData database. We have created a Profile stage and run this for the ...