It is often necessary to add additional custom business rules above and beyond the basic validation provided by the data model.
They are defined in pipelines, inside a Validate section, inside of the LoadTable section.
Each validation command begins with the prefix “Test”.
Common aspects of all validation commands.
All validation commands share the following properties.
Property | Description |
---|---|
Tag | Appears as menu item in the issues summary to the data uploader (to group issues in a meaningful way). |
Impact | The impact of failed validation. Possible values: |
- Warning, raises a warning but no change made to data | |
- Error_RemoveValue, value replaced by NULL. | |
- Error_RemoveRow, record not uploaded. | |
- Error_RemoveColumn, column removed. | |
- Error_RemoveTable, entire table not uploaded |
Summary of validation commands
Command | Description |
---|---|
TestNotEmpty | Tests for the existence of required values, regardless of whether a field is defined as required in the data model or not. |
TestColumnsExist | Verifies that one or more columns exists in a table. |
TestValueInRange | Checks that a value is above, below or within a certain range of limits. |
TestRow | (! We are having trouble with this command. It may be changed in near future) Checks that a row is valid based on a user-defined code fragment in c# that returns true or false. |
TestPattern | Verifies that a value matches a regular expression pattern. There are many resources on regular expressions. Here is a tutorial with examples. |
Example of validation commands
Example data with some bad or questionable data
We would like:
- Warn if “Confirmed (Males)” is empty or filled with space characters (but continue with upload)
- Do not load records if the date is earlier than 2018-12-01 or greater than 2019-12-31 (inclusive).
Example pipeline with some validation tests added.
The result of TestNotEmpty detects the missing value and raises a warning displayed as “Missing values” (the tag of the command). The detail of the issue is shown if you put your mouse over the “i” icon in the data row.
The result of TestValueInRange detects an invalid date, and because the impact is defined as “Error_RemoveRow”, the row is considered invalid. It appears under the tag “Invalid date” as defined in the pipeline.
What does the Data tab look like? It shows an accounting of the invalid rows but does not reflect the warnings in the data, which have no impact on the upload. With the “Failed tests” menu item selected: