We start this topic with a Project created, a Data Source and Data Entities have been defined.
In this short article we will be:
Creating a new Validate and Clean stage
Editing transformation rules to:
We recommend that you cover the previous topics before reading this article.
Today we are going to be looking at the obfuscating, and masking data.
In previous articles we have covered topics such as Project Creation, Data Sources and Data Entities. We recommend that you cover these topics as we will be using the functionality demonstrated within them in this article.
In this article we will be using a project and data source we have created in a previous topic. To start off we need to log on and select the project.
From the welcome screen press the Login button taking us to the Login screen.
We enter our iData credentials and press Login.
Open the AcmeData project.
Click on Data Processing Stages and then New Stage.
Data Masking and Obfuscation use the stage type of Validate and Clean. Select the Validate and Clean Stage tile.
Then press ‘Setup new processing stage’.
Give the stage a name.
We will be looking at Orders and Clients, so we will just drag these into the mapping.
If we had different source and destination tables, we could map their fields by clicking on the ‘Column Mapping’ button.
For example
Click on Save.
We can now see our new stage listed.
We shall now update the Client table to add some Obfuscation transformations.
Click on dbo_Client.js to open the Editor.
IData has a large range of transformations to make obfuscation easy. We can do things like masking a field, we can give is a mask character, and tell it where we want masking to start. Lets take a look at this on line 31.
This will mask everything from the 5th character onwards.
If we use a negative value it masks from the end of the field.
We could also MaskExcept, so in this example we can mask all but the last four digits of our account number.
Like this:
We can also do things like taking the comments and applying and Obfuscate to them, which will hide the values. Like this:
Like this:
When we run this we get a new copy of the data with those values obfuscated.
To do this press Save.
The summary shows we can ow see changes to Age, Comments and PAN.
Taking a look at the details, we can see some different values. Pan has been changed, Age has been updated, and we have changed the content of the comments.
To demonstrate more obfuscation functions, we’re going to make some more changes to the Client transformation.
Click on the browser tab for the Client Transformation editor.
Updating the transform.
And we have this additional command here at the end which takes those tagged fields and generates some personal details randomly, but are interrelated, so we should see that the email address uses components of the firstname and surname.
Press Save to save the transform, and we can run the stage again.
As before, select the iData main browser tab, select Stages, and to run the stage click on the Action and Run.
Click on Run again.
Once gain the initiator will kick off the worker processes against each entity. One it’s started we can click on the View Report button.
The status of each entity transformation will change to a tick. When the Client has been completed, we can Click the Reports tab, and select dbo_Client.
We can see we have generated synthetic firstname and surname for each of the clients and generated an email address that uses those values.
In the previous example we generated an email address in our client table. If we make the same change in our orders table, there is an email address here as well. If we apply some obfuscation here, we would want that to match up with the entries we have in our client table.
Select the Obfuscate stage from the Project ribbon and click on Transforms. Select the dbo_Orders.js transform to edit it.
If we update line 25 to tag the email address and add the command to the end to generate tagged data like this:
Saving the transform, we can run this again from the iData main browser tab as before.
If we look at our orders table, let’s pick out client id number 345297 and this has been obfuscated to Holy Walker.
When we then navigate to our client table sorting by ID, and page 20, and find client id number 345297 we can see that the email address is different.
This is not what we want. The reason this happened is because iData uses the primary key of the table as a key for it’s obfuscation. In the orders table, the primary key is order id and the Client Table has a primary key of ClientID so we end up with different results.
If we make a small adjustment to our transform we can bring things into line.
Save the transform and rerun the stage from the iData main tab once more.
Once we open the Orders detail report
And we open the Client Detail report and sort to find Client with Id 345297
Now both the client and the order are placed for Sophia.Amin@aol.com.
The only reason these entries match is because they share a primary key. There is nothing of the original data that we have used to generate these values, these are synthetically generated.