Download all tutorial files

How can we apply validation rules to the data?

It is possible to define custom business rules in a pipeline. But before getting to custom business rules, you should know that the data model automatically provides some basic validation – without any extra work.

The data model will automatically enforce:

- Data types such as integers, dates, booleans, etc.

- Lengths of fields

- Required fields

- Duplicate business primary keys (as in previous tutorial)

Model validation is “strict” but has a minimum “impact”. If the model expects an integer and a non-integer value is provided, that value is not allowed to be uploaded. It is strictly enforced.

However, the impact of the issue only affects the value; it is replaced by a NULL. The record containing the invalid value is still permitted to be uploaded (unless it’s a required value too), as well as the other records in the table.

In contrast, when you define custom business rules (future tutorial), the impact is configurable. An invalid value can block a whole record or table from being uploaded. Or an issue can simply be a “warning” and have no impact at all on the data.

Let’s experience model validation by introducing some errors into our data.

First, a reminder of the data model:

image-20201217145207926

Some issues:

image-20201217145231217

Issues Tab

Uploading and previewing this data causes the Issues tab to be displayed. Issues are classified by their impact, in this case Invalid Rows or Invalid Values.

image-20201217145243234

Clicking on the first issue, “ISO3: Text maxlength….” Shows where the issue occurs in the uploaded data. The field(s) containing the issue are shown in the left-most position of the table. Putting your mouse over the “i” icon reveals more detail:

image-20201217145254304

Because the ISO3 code is marked as required in the data model, the invalid value causes the entire row to be invalid.

Like-wise for the missing Date value

image-20201217145303202

On the other hand, the issues with “7 cases” and “33.57” (which are not integers), are simply invalid values, because this column is not a required column. The values are replaced by NULL values only.

Data Tab

While the issues tab focuses on the issues, the Data tab focuses on what will actually be uploaded. The Data tab will not show every issue, but does provide accounting of how many rows will not be loaded due to errors.

Next to “SHANOLA_CASE_COUNTS”, there is a green and red badge, both with the number 2.

image-20201217145321834

The green badge indicates how many rows will be uploaded. The red badge indicates how many rows will not be uploaded.

In this case the 2 “Not Loading” rows are due to “Failed tests”. Clicking on “Failed Tests” displays these rows, in a format similar to what was displayed on the Issues tab.

image-20201217145329405

Clicking on “Updated” displays the data updates that will happen.

image-20201217145335722

The invalid values were replaced by NULL values. The NULL values replace the pre-existing values in the database (33 and 7).