We start this topic with a Project created, a Data Source and Data Entities have been defined.
We have created and run a Profile against all entities in the AcmeData database.
In this short article will be:
In our previous articles we have set up an iData project and defined data sources and entities, and we have run a profile on the data.
Today we are going to be looking at the validation of data using iData.
From the welcome screen press the Login button taking us to the Login screen.
We enter our iData credentials and press Login.
From the Projects page, select and existing the project.
Click on New Stage, and then ‘Validate & Clean’:
Press ‘Setup processing stage’.
We can now see the stage definition page for a Validation and Clean stage.
This is similar to the profiling set up, we get a choice of data entities to use as part of this stage, but in this case we are either going to be looking at the data or potentially cleansing it and validating it, so I need a source and destination for the data that is going to be processed. I have entities I can read on the left and entities I can write to on the right.
We need to map the two together. I can do this manually by dragging the two together into the Mappings area or let iData do the hard work and map those entries together.
Pressing Automap does this for us.
We could also do the same at column level if any of the columns do not actually match using the ‘Column Mapping’ button. But as we are going from the same source and destination, everything is going to line up.
We name the stage:
Then press save.
Here is our new stage in the stages list.
Select the new Clean and Validate stage from the screen
The stage we have created provides a starting point to perform some cleansing of validation, but we need to define the rules that we are going to apply to validate the data or to those cleansing operations.
The way we do that is by editing the transformations for the stage by selecting Transforms.
Here we can see the data entities that make up this stage. We can edit any of these to provide some transformation or some cleansing to that table.
To Edit the Client table. Click on the Client, which will open up a new tab in the browser with the transformation in the editor.
At the top we have the Command button to assist in using the editor, the Save button to commit our changes to iData, and the Close button, which will close the tab.
Using the integrated editor we can apply validation rules or cleansing commands to our data in order to improve the data quality. The editor is user friendly, and will auto complete commands as we type.
If you are familiar with Javascript, you will see that the transform follows the Javascript syntax.
iData completes a line for each column in the data entity for our reference. We also have a reference or alias to the column defined using a preceding $.
The transformation rules are contained within the function section of the transform.
We can just make simple adjustments to apply some validation rules.
For example we will update the Age rule to validate that everybody in the database is at least 18 year old.
Updating line 24, to change the age.
stream.If('Age').LessThan(18).Fail('Too Young');
over the line
// stream.Make('Age').Lowercase();
And as you can see we are being prompted for the options that we can apply to this column as I go along, so it’s all very straight forward.
This will report if a client has an age of under 18 with a comment of ‘Too Young’.
Save this by pressing ‘Save’, and Close, then we can run our stage from the main iData browser tab.
Click on Run stage
And press Run to start the stage job.
When the job is initiated and the View Reports button is enabled, clicking the View Report button will take us to the output for the stage.
Selecting the Client report and scrolling to the bottom, we can see that we have some rows reported as ‘Too Young’.
Scrolling to the right, we can see both the records identified by the validation rule, and the message output by our transform command.
These were the entries that were flagged by the profiler as not having a full stop in them.
This is the simplest example of a validation rule, but we can build up more sophisticated rules to make sure that your data is valid. iData also has lots of built in facilities that help you validate your data.
For example, to apply more rules on the email address.
Navigate the stage from the project breadcrumb ribbon:
Click on Transforms, and then on Client once more.
iData understands email addresses and has built in facilities to validate them. We can tag this as an email and then add a simple validation rule that says if the email address is not valid, fail that as well.
To simplify the report we will comment our the Age validation rule from before, on line 24.
We can tag this as an email and then add a simple validation rule that says if the email address is not valid, we will fail that as well. To apply the rule on line 28 for the email address make the changes like this:
Then add a line following to state
I.E.
Press Save, and return to the main iData browser tab.
Select the Stage again from the Project ribbon and rerun the stage as before.
[Run Stage -> Press Run]
Press on View Reports and Open the Client Report.
The summary screen shows us that we have rows with Bad Email addresses
And by pressing the Details tab, shows us what they are:
To do this we jump to the Client editor browser tab we left open before and update line 28 by adding EnableDomainCheck to the end of the line.
Press Save.
Now we rerun the Stage as before.
Select the Stage again from the Project ribbon and rerun the stage as before.
Run Stage -> Press Run
Press on View Reports and Open the Client Report.
And if we look at our client report, we now have more email addresses flagged as being a Bad email.
Clicking on the Details tab we see the rows we found before, we can see row where the email has a valid format but does not have a valid domain, as there is no email server attached to it, so we can flag this as being invalid.