How to use the Microsoft Graph API to load data from an Excel Online protected file in xMart
https://docs.microsoft.com/en-us/graph/api/resources/excel?view=graph-rest-1.0
Step 1: Generate ClientID and allow it to get your files
Documentation: https://docs.microsoft.com/en-us/graph/auth-v2-service
In Azure Portal, register a new app with Native client authorization (it should have be allowed to request the scope:offline_access
).
Retrieve its client id, we will refer to it as YOUR_CLIENT_ID
in the next steps.
https://portal.azure.com/#blade/Microsoft_AAD_IAM/ActiveDirectoryMenuBlade/RegisteredApps
Step 2: Retrieve the file info
Retrieve the file id FILE_ID
and site id SITE_ID
from the Excel Online file you want to access.
Usually you can get those infos from the URL when the file is open in your browser or from the Graph Explorer
💡 Tip: use the proposed https://graph.microsoft.com/v1.0/sites?search=YOUR_SITE_NAME to find the id
info from the JSON response.
Once your have this id
aka SITE_ID
, explore the site via https://graph.microsoft.com/v1.0/sites/${SITE_ID}/lists or https://graph.microsoft.com/v1.0/sites/${SITE_ID}/drives
Step 2: Set up the mart connection
Code popup helper way
If you don’t want to use the cliend_id 99e547ef-258a-4db0-ad11-9466a3d1f4eb
, please follow the manual steps way below, else follow those steps:
- Click the Add button to create a new connection under the WebService Bearer provider (referred to “MySharepointConn” for this example)
- Make sure you are logged in with the account authorized on the target Sharepoint file.
- Click on
<button>Open the authorization_code helper popup</button>
, and accept the permission to access your files request popup. - Insert the “scope” JSON property as displayed (
"scope":"offline_access Sites.Read.All Files.ReadWrite.All",
). - Click on Current Azure AD button and set Token URL endpoint to
https://login.microsoftonline.com/f610c0b7-bd24-4b39-810b-3dc280afb590/oauth2/v2.0/token
- Click Save button to save the connection.
Manual steps way
We’ll use the code flow using the native client process to get the authorization_code. This will be a one-time manual step.
Replace the YOUR_CLIENT_ID
with the one get from Step 1 in the following URL and open it from a browser
https://login.microsoftonline.com/f610c0b7-bd24-4b39-810b-3dc280afb590/oauth2/v2.0/authorize?client_id=YOUR_CLIENT_ID&response_type=code&redirect_uri=https://login.microsoftonline.com/common/oauth2/nativeclient&response_mode=query&state=12345&scope=offline_access%20Sites.Read.All%20Files.ReadWrite.All
You should get redirected to a blank page, but in the new URL, there is a code
parameter to extract manually. Beware to not include some extra parameters at the end of the URL!
Once you get this code (referred as CODE_EXTRACTED_FROM_URL
), in xMart, create a new WebService Bearer Token **connection:
{
"grant_type":"authorization_code", "client_id":"YOUR_CLIENT_ID","scope":"offline_access Sites.Read.All Files.ReadWrite.All","redirect_uri":"https://login.microsoftonline.com/common/oauth2/nativeclient","code": "CODE_EXTRACTED_FROM_URL"}
The authorization_code is only valid once, that’s why we provide the scope:offline_access
. It allows to retrieve a refresh_token, that is then stored by xMart.
Note: Once a refresh_token is retrieved from the Token Endpoint, the xMart connection is automatically updated to store the refresh_token
and remove the used code
so the connection will still be valid after the first request (usually the refresh_token is valid 6 months).
{"grant_type":"refresh_token", ..., "refresh_token":"OAQABAAAAAAAm-..." }
Step 3: Edit xMart pipeline
Excel API response looks like:
{
"@odata.context": "https://graph.microsoft.com/v1.0/$metadata#workbookRange",
"@odata.type": "#microsoft.graph.workbookRange",
"@odata.id": "/sites('XXX')/drive/items('YYY')/workbook/worksheets(%27%7B00000000-0001-0000-0000-000000000000%7D%27)/range(address=%27A1:E3%27)",
"address": "Sheet1!A1:E3",
"addressLocal": "Sheet1!A1:E3",
"columnCount": 5,
"cellCount": 15,
"columnHidden": false,
"rowHidden": false,
"numberFormat": [
[
"General",
"General",
"General",
"General",
"General"
],
[
"m/d/yyyy",
"m/d/yyyy",
"General",
"General",
"General"
],
[
"m/d/yyyy",
"General",
"General",
"General",
"General"
]
],
"columnIndex": 0,
"text": [
[
"a better col1",
"col2",
"col3",
"col4",
"comments maybe?"
],
[
"1/1/2020",
"1/3/2238",
"123456",
"1321",
"none"
],
[
"1/3/2238",
"03.03.2020",
"12",
"1",
"no comment"
]
],
"formulas": [
[
"a better col1",
"col2",
"col3",
"col4",
"comments maybe?"
],
[
43831,
123456,
123456,
1321,
"none"
],
[
123456,
"03.03.2020",
12,
1,
"no comment"
]
],
"formulasLocal": [
[
"a better col1",
"col2",
"col3",
"col4",
"comments maybe?"
],
[
43831,
123456,
123456,
1321,
"none"
],
[
123456,
"03.03.2020",
12,
1,
"no comment"
]
],
"formulasR1C1": [
[
"a better col1",
"col2",
"col3",
"col4",
"comments maybe?"
],
[
43831,
123456,
123456,
1321,
"none"
],
[
123456,
"03.03.2020",
12,
1,
"no comment"
]
],
"hidden": false,
"rowCount": 3,
"rowIndex": 0,
"valueTypes": [
[
"String",
"String",
"String",
"String",
"String"
],
[
"Double",
"Double",
"Double",
"Double",
"String"
],
[
"Double",
"String",
"Double",
"Double",
"String"
]
],
"values": [
[
"a better col1",
"col2",
"col3",
"col4",
"comments maybe?"
],
[
43831,
123456,
123456,
1321,
"none"
],
[
123456,
"03.03.2020",
12,
1,
"no comment"
]
]
}
Create or edit your pipeline https://extranet.who.int/xmart4/YOUR_MART/pipelines/PIPELINE_CODE
We can define some origin values to store the multiple parameters in the pipeline Context section.
<XmartPipeline>
<Context>
<Inputs>
<Add Key="SITE_ID" Type="text"/>
<Add Key="FILE_ID" Type="text" />
<Add Key="RANGE" Type="text" />
<Add Key="SHEETNAME" Type="text" />
</Inputs>
<OriginValues>
<Origin Code="EXCEL_ONLINE">
<Add Key="SITE_ID" Value="TO_RETRIEVE_FROM_STEP_2" />
<Add Key="FILE_ID" Value="TO_RETRIEVE_FROM_STEP_2" />
<Add Key="RANGE" Value="A1:E3" />
<Add Key="SHEETNAME" Value="Sheet1" />
</Origin>
</OriginValues>
</Context>
We want to extract the text
property (formatted version of values
for dates) of the JSON response, which is formatted as an array of arrays, so we’ll use the PathHeader
attribute of the GetJson
command that expect this particular format.
The PathHeader element should return an array of strings that will be used as Header row. The Path element should return an array of arrays for the data.
Here the header row is the first array element of the text property, so the data are the following items of the array.
</Context>
<Extract>
<GetWebService Url="https://graph.microsoft.com/v1.0/sites/${SITE_ID}/drive/items/${FILE_ID}/workbook/worksheets/${SHEETNAME}/range(address='${RANGE}')" ConnectionName="excel_online" TableName="data">
<GetJson>
<PathHeader>$.text[0][*]</PathHeader>
<Path>$.text[1:]</Path>
</GetJson>
</GetWebService>
</Extract>
<Load>
When the PathHeader
element is defined, xMart will assume the data is in array format like shown above, rather than a regular JSON object with named properties like {id: 5}
.
In the Url
attribute, the &
characters should be xml-escaped by using &
instead, else a parse error like '=' is an unexpected token. The expected token is ';'
will be thrown.
Finish your pipeline by loading stage table data
in your target table using the regular <LoadTable SourceTable="data" TargetTable="...">
.
You’re done!
Useful resources
Graph API
https://docs.microsoft.com/en-us/graph/api/worksheet-get?view=graph-rest-1.0&tabs=http https://developer.microsoft.com/en-us/graph/graph-explorer
GetWebService command
https://extranet.who.int/xmart4/pipeline-script-guide#GetWebService
GetJson command
https://extranet.who.int/xmart4/pipeline-script-guide#GetJson