Datascope is a filter defined in a LoadTable section and applied to both the source data (in your file) and the target data (in the DB). It defines the range or “scope” of the data to be modified. It prevents data outside of this range from being modified.

It can be used with either REPLACE or MERGE.

Here is an example.

<Load>
    <LoadTable SourceTable="Data" TargetTable="FACT_DEMAND" LoadStrategy="REPLACE">
      <DataScope Operator="AND">
        <Terms>
          <Term Operator="OR" Field="SOURCE_FILE_FK" Rule="LikeExactly">
            <Value>UNDP</Value>
          </Term>
        </Terms>
      </DataScope>
      <Transform >
        <AddColumn Name="SOURCE_CODE" FillWith="UNDP" />

A DataScope filter is a list of filter “terms”. Most people only need a single term. If you have multiple terms, they are “AND”ed by default (that’s Operator=”AND” on the DataScope line below). It’s like a search form in a user interface – each part of the search form is ANDed.

Inside of each term supports 1 or multiple values. Multiple values inside of a term are “OR”ed by default (that’s Operator=”OR” on the Term line below). This is like selecting multiple options in a dropdown on a search form – each item (like multiple countries or years) are ORed.

If you want multiple values to be ORed, list multiple values. This will filter on data where SOURCE_FILE_FK is either UNDP, UNICEF or GAVI.

<Load>
    <LoadTable SourceTable="Data" TargetTable="FACT_DEMAND" LoadStrategy="REPLACE">
      <DataScope Operator="AND">
        <Terms>
          <Term Operator="OR" Field="SOURCE_FILE_FK" Rule="LikeExactly">
            <Value>UNDP</Value>
            <Value>UNICEF</Value>
            <Value>GAVI</Value>
          </Term>
        </Terms>
      </DataScope>
      <Transform >

Records that are outside of Datascope are highlighted in the data preview as “Outside of DataScope”

image-20210209063351391

It is possible to use batch input variables in DataScope filters. This way filters can be customized per origin for the same pipeline. Since batch input variables can be set by prompting the user at the time of upload, the person uploading can also set the filter at the time of upload. In the following example a batch input variable “${SurveyCode}” is used to filter a SURVEY_FK value.

 <Load>
    <LoadTable SourceTable="interviews" TargetTable="RAW_2019_INTERVIEWS" LoadStrategy="REPLACE">
      <DataScope Operator="AND">
        <Terms>
          <Term Operator="OR" Field="SURVEY_FK" Rule="LikeExactly">
            <Value>${SurveyCode}</Value>
          </Term>
        </Terms>
      </DataScope>