Download all tutorial files

Introduction

With the term “Global Transformation” we are referring to the process of tranforming/reshaping the data right after they are being pulled from their original source. This kind of transformation might happen in multiple tables and might involve processes like data cleansing like removing blanks, merge or split tables and more.

In order to perform a Global Transformation, we need to include the Transform tag after the Extract and before the Load like in the example below.

<Extract>
    <GetExcel TableName="*" FindStartingRow="iso3, suspected, confirmed" WorksheetNameToColumn="Year" />
</Extract>
<Transform>
	<CleanTable RemoveBlankRows="true" TrimValues="true" RemoveBlankColumns="false" />
    <MergeTables TableNames="*" MergedTableName="data" SkipTableNames="Intro" />
</Transform>

The code above is part of the example in the files under the folder T5_Global_Transormations. Inside the Transform section we’ve chosen to remove all the blank rows and merge all the tables (TableNames=”*”) within that excel file but skip the table named as “Intro”.

Note: In tutorial 6, we’ll explain the “local” Transformations where the data manipulation happens only to the table about to be loaded in the mart.

Example of Global Transformation

In the folder T5_Global_Transormations (you may download this by clicking the link at the top of this page) there will be three files, model, Pipeline and Shanola Cases by year. The scope of this example is to demonstrate how the Global Transformation works in order to load the all the tables in the Shanola cases by year file, using the provided model.

Provided Dataset

The dataset is about the yearly data of Shanola Cases. The aim is to load at once the two tables, 2019 and 2018. In order to achieve this we need to merge the two tables into one, before they’ll be loaded to the mart.

Pipeline

This is the pipeline that needs to be created in order to achieve what is described above. The way the Transform works is described in the beginning of the article.

<XmartPipeline IsStructure="false">
  <Extract>
    <GetExcel TableName="*" FindStartingRow="iso3, suspected, confirmed" WorksheetNameToColumn="Year" />
  </Extract>
  <Transform>
    <CleanTable RemoveBlankRows="true" TrimValues="true" RemoveBlankColumns="false" />
    <MergeTables TableNames="*" MergedTableName="data" SkipTableNames="Intro" />
  </Transform>
  <Load>
    <LoadTable SourceTable="data" TargetTable="SHANOLA_CASE_COUNTS_T5" LoadStrategy="MERGE">
      <ColumnMappings>
        <ColumnMapping Source="ISO3" Target="ISO3" />
        <ColumnMapping Source="Date" Target="REPORT_DATE" />
        <ColumnMapping Source="Suspected (Males)" Target="SUSPECTED_MALES" />
        <ColumnMapping Source="Suspected (Females)" Target="SUSPECTED_FEMALES" />
        <ColumnMapping Source="Confirmed (Males)" Target="CONFIRMED_MALES" />
        <ColumnMapping Source="Confirmed (Females)" Target="CONFIRMED_FEMALES" />
      </ColumnMappings>
    </LoadTable>
  </Load>
</XmartPipeline>

Let’s break down the MergeTables:

  • TableNames=”*” : All the tables in the source excel file will be pulled
  • MergedTableName=”data”: The new in-memory table that will be produced after the merging, we name it as “data”.
  • SkipTableNames=”Intro”: We point out which tables to be ignored, in this case the one called “Intro”