How to use the Microsoft Graph API to load data from an Excel Online protected file in xMart

https://docs.microsoft.com/en-us/graph/api/resources/excel?view=graph-rest-1.0

Step 1: Generate ClientID and allow it to get your files

Documentation: https://docs.microsoft.com/en-us/graph/auth-v2-service

In Azure Portal, register a new app with Native client authorization (it should have be allowed to request the scope:offline_access). Retrieve its client id, we will refer to it as YOUR_CLIENT_ID in the next steps.

https://portal.azure.com/#blade/Microsoft_AAD_IAM/ActiveDirectoryMenuBlade/RegisteredApps

Step 2: Retrieve the file info

Retrieve the file id FILE_ID and site id SITE_ID from the Excel Online file you want to access. Usually you can get those infos from the URL when the file is open in your browser or from the Graph Explorer

💡 Tip: use the proposed https://graph.microsoft.com/v1.0/sites?search=YOUR_SITE_NAME to find the id info from the JSON response. Once your have this id aka SITE_ID, explore the site via https://graph.microsoft.com/v1.0/sites/${SITE_ID}/lists or https://graph.microsoft.com/v1.0/sites/${SITE_ID}/drives

Step 2: Set up the mart connection

Code popup helper way

If you don’t want to use the cliend_id 99e547ef-258a-4db0-ad11-9466a3d1f4eb, please follow the manual steps way below, else follow those steps:

  1. Click the Add button to create a new connection under the WebService Bearer provider (referred to “MySharepointConn” for this example)
  2. Make sure you are logged in with the account authorized on the target Sharepoint file.
  3. Click on <button>Open the authorization_code helper popup</button>, and accept the permission to access your files request popup.
  4. Insert the “scope” JSON property as displayed ("scope":"offline_access Sites.Read.All Files.ReadWrite.All",). image
  5. Click on Current Azure AD button and set Token URL endpoint to https://login.microsoftonline.com/f610c0b7-bd24-4b39-810b-3dc280afb590/oauth2/v2.0/token
  6. Click Save button to save the connection.

Manual steps way

We’ll use the code flow using the native client process to get the authorization_code. This will be a one-time manual step.

Replace the YOUR_CLIENT_ID with the one get from Step 1 in the following URL and open it from a browser

https://login.microsoftonline.com/f610c0b7-bd24-4b39-810b-3dc280afb590/oauth2/v2.0/authorize?client_id=YOUR_CLIENT_ID&response_type=code&redirect_uri=https://login.microsoftonline.com/common/oauth2/nativeclient&response_mode=query&state=12345&scope=offline_access%20Sites.Read.All%20Files.ReadWrite.All

You should get redirected to a blank page, but in the new URL, there is a code parameter to extract manually. Beware to not include some extra parameters at the end of the URL!

image

Once you get this code (referred as CODE_EXTRACTED_FROM_URL), in xMart, create a new WebService Bearer Token **connection:

{
  "grant_type":"authorization_code", "client_id":"YOUR_CLIENT_ID","scope":"offline_access Sites.Read.All Files.ReadWrite.All","redirect_uri":"https://login.microsoftonline.com/common/oauth2/nativeclient","code": "CODE_EXTRACTED_FROM_URL"}

The authorization_code is only valid once, that’s why we provide the scope:offline_access. It allows to retrieve a refresh_token, that is then stored by xMart.

Note: Once a refresh_token is retrieved from the Token Endpoint, the xMart connection is automatically updated to store the refresh_token and remove the used code so the connection will still be valid after the first request (usually the refresh_token is valid 6 months).

{"grant_type":"refresh_token", ..., "refresh_token":"OAQABAAAAAAAm-..." }

Step 3: Edit xMart pipeline

Excel API response looks like:

{
    "@odata.context": "https://graph.microsoft.com/v1.0/$metadata#workbookRange",
    "@odata.type": "#microsoft.graph.workbookRange",
    "@odata.id": "/sites('XXX')/drive/items('YYY')/workbook/worksheets(%27%7B00000000-0001-0000-0000-000000000000%7D%27)/range(address=%27A1:E3%27)",
    "address": "Sheet1!A1:E3",
    "addressLocal": "Sheet1!A1:E3",
    "columnCount": 5,
    "cellCount": 15,
    "columnHidden": false,
    "rowHidden": false,
    "numberFormat": [
        [
            "General",
            "General",
            "General",
            "General",
            "General"
        ],
        [
            "m/d/yyyy",
            "m/d/yyyy",
            "General",
            "General",
            "General"
        ],
        [
            "m/d/yyyy",
            "General",
            "General",
            "General",
            "General"
        ]
    ],
    "columnIndex": 0,
    "text": [
        [
            "a better col1",
            "col2",
            "col3",
            "col4",
            "comments maybe?"
        ],
        [
            "1/1/2020",
            "1/3/2238",
            "123456",
            "1321",
            "none"
        ],
        [
            "1/3/2238",
            "03.03.2020",
            "12",
            "1",
            "no comment"
        ]
    ],
    "formulas": [
        [
            "a better col1",
            "col2",
            "col3",
            "col4",
            "comments maybe?"
        ],
        [
            43831,
            123456,
            123456,
            1321,
            "none"
        ],
        [
            123456,
            "03.03.2020",
            12,
            1,
            "no comment"
        ]
    ],
    "formulasLocal": [
        [
            "a better col1",
            "col2",
            "col3",
            "col4",
            "comments maybe?"
        ],
        [
            43831,
            123456,
            123456,
            1321,
            "none"
        ],
        [
            123456,
            "03.03.2020",
            12,
            1,
            "no comment"
        ]
    ],
    "formulasR1C1": [
        [
            "a better col1",
            "col2",
            "col3",
            "col4",
            "comments maybe?"
        ],
        [
            43831,
            123456,
            123456,
            1321,
            "none"
        ],
        [
            123456,
            "03.03.2020",
            12,
            1,
            "no comment"
        ]
    ],
    "hidden": false,
    "rowCount": 3,
    "rowIndex": 0,
    "valueTypes": [
        [
            "String",
            "String",
            "String",
            "String",
            "String"
        ],
        [
            "Double",
            "Double",
            "Double",
            "Double",
            "String"
        ],
        [
            "Double",
            "String",
            "Double",
            "Double",
            "String"
        ]
    ],
    "values": [
        [
            "a better col1",
            "col2",
            "col3",
            "col4",
            "comments maybe?"
        ],
        [
            43831,
            123456,
            123456,
            1321,
            "none"
        ],
        [
            123456,
            "03.03.2020",
            12,
            1,
            "no comment"
        ]
    ]
}

Create or edit your pipeline https://extranet.who.int/xmart4/YOUR_MART/pipelines/PIPELINE_CODE

We can define some origin values to store the multiple parameters in the pipeline Context section.

<XmartPipeline>
  <Context>
    <Inputs>
      <Add Key="SITE_ID" Type="text"/>
      <Add Key="FILE_ID" Type="text" />
      <Add Key="RANGE" Type="text" />
      <Add Key="SHEETNAME" Type="text" />
      </Inputs>
    <OriginValues>
      <Origin Code="EXCEL_ONLINE">
        <Add Key="SITE_ID" Value="TO_RETRIEVE_FROM_STEP_2" />
        <Add Key="FILE_ID" Value="TO_RETRIEVE_FROM_STEP_2" />
        <Add Key="RANGE" Value="A1:E3" />
        <Add Key="SHEETNAME" Value="Sheet1" />
      </Origin>
    </OriginValues>
  </Context>

We want to extract the text property (formatted version of values for dates) of the JSON response, which is formatted as an array of arrays, so we’ll use the PathHeader attribute of the GetJson command that expect this particular format.

The PathHeader element should return an array of strings that will be used as Header row. The Path element should return an array of arrays for the data.

Here the header row is the first array element of the text property, so the data are the following items of the array.

  </Context>
  <Extract>
    <GetWebService Url="https://graph.microsoft.com/v1.0/sites/${SITE_ID}/drive/items/${FILE_ID}/workbook/worksheets/${SHEETNAME}/range(address='${RANGE}')" ConnectionName="excel_online" TableName="data">
      	<GetJson>
          <PathHeader>$.text[0][*]</PathHeader>
          <Path>$.text[1:]</Path>
      </GetJson>
      </GetWebService>
  </Extract>
  <Load>

When the PathHeader element is defined, xMart will assume the data is in array format like shown above, rather than a regular JSON object with named properties like {id: 5}.

In the Url attribute, the & characters should be xml-escaped by using &amp; instead, else a parse error like '=' is an unexpected token. The expected token is ';' will be thrown.

Finish your pipeline by loading stage table data in your target table using the regular <LoadTable SourceTable="data" TargetTable="...">.

You’re done!

Useful resources

Graph API

https://docs.microsoft.com/en-us/graph/api/worksheet-get?view=graph-rest-1.0&tabs=http https://developer.microsoft.com/en-us/graph/graph-explorer

GetWebService command

https://extranet.who.int/xmart4/pipeline-script-guide#GetWebService

GetJson command

https://extranet.who.int/xmart4/pipeline-script-guide#GetJson