In this short article we will be:
- Random words
- Random names
- Random dates
- Random email addresses,
- Values from a pre-defined list
- All combinations of data
In previous articles we have covered topics such as Project Creation, Data Sources, Data Entities, Data stages and writing transforms. We recommend that you view these as we will be using the functionality demonstrated within them in this article.
Today we are going to be looking at the Synthetic Data Generation.
From the welcome screen, press the Login button taking us to the Login screen.
We enter our iData credentials and press Login.
From the project screen select our AcmeData project.
The first step is to create a new stage.
Click on Data Processing Stages and then New Stage.
Select a Data Generation Stage
Then press ‘Setup new processing stage’.
Give the stage a name.
Select All Data Entities from the left.
And press save
We can now see our Data Generation stage in the list of stages
Click on the Actions button next to the Data Generation stage, and select Run.
Then press run again and wait until the View Reports button is enabled.
Now select View Reports and click on the Reports tab, then click on Company, and click on the Details tab.
As we can see, the column values are not particularly exciting, but we are providing a default value for every single column. We can update the transforms to make these fields more realistic.
Click on the stage name in the Project ribbon, click on Transforms and select the Company transform.
We now have a new browser tab containing our default transform rules for the Company entity.
We can update this to pick some defaults for Department, and we can tag things email, phone number and Company name and address and iData will generate more realistic values.
Press Save
We can perform similar changes to the other entities.
Switching to the main iData browser tab and select the Client transform.
We can update the contents to make similar changes:
We press Save to commit the changes.
Switching to the main iData browser tab and select the Addresses transform.
We update the transformation rules similarly:
Press Save, and close.
We can rerun the stage to see what the results look like.
Click on the stage name in the Projects Ribbon, click Run Stage, press run again and wait until the View Reports button is enabled.
Now select View Reports and click on the Reports tab, then click on Company, and press the Details tab.
When we look at the generated data now, it looks a lot more interesting. We have realistic looking company names and email addresses.
Clicking on the Data Generation Report on the Project Ribbon, and clicking on Client, and then the Details tab.
We see realistic client data with generated email addresses linked to the generated names.
Similarly for the addresses data.
IData can create a data spread of different values in record sets. Let’s use this orders table as an example.
Clicking on the Editor browser tab for dbo_Orders. We’ll make a few small changes to the original script, and the interesting area is this section at the bottom after we define the Test Data Set ‘Shirt’.
What we are asking for is the output to use each of the combinations of colour, and all of the combinations of size. And this can be used to ensure that we have full coverage of a particular set of values for testing purposes in our test environment.
We are using a data set definition here, so it is important to comment out the ‘Default’ definition in line 23.
Also note that the OrderID, ClientID, Email and OrderDate are outside of the test data set definition, as these will apply to any test data sets we create subsequently in the transformation.
We can run this to see the results of the Orders we have defined.
Press Save, then select the Main iData browser tab. Click on the Stage name in the Project Ribbon, Press Run Stage, Run, and wait until the View Reports is enabled. Select the Report tab, and select Orders.
We can see that we have a combination of shirt sizes and colours.
By using an additional Test Data Set definition in our Transformation, we can add another produce to the orders.
Click on the Editor browser tab for the dbo_Orders.js transformation.
Adding a section for the Trousers product line.
Press Save
Returning to the Main browser tab we can rerun the stage by select the Main iData browser tab. Click on the Stage name in the Project Ribbon, Press Run Stage, Run, and wat until the View Reports is enabled. Select the Report tab, and select dbo_Orders.
We can see in the report that we have two products, and that they have not mixed their colours and the orders only have colours and sizes that make sense for that particular product.
iData by default will create 50 rows of data, which of course is configurable in the main entity job script. We can perform many other generation options such as limit the output of data to only the test combinations listed in the transforms, lookup values to populate from and existing table.
These facilities are documented in the user manual.