<img alt="" src="https://secure.hims1nice.com/151009.png" style="display:none;">
" style="background-color: #2c3e50;">
 

Blogs

Consuming XML web service data in Power BI

 

Power BI

 

Data wrangling is one of my favorite parts of working in data and analytics. Evan Gordon suggested I create a “weird data wrangling” series on working with various data formats in Power BI. This is a great idea! In this blog post, I will demonstrate how to use a web service that outputs towboat lockages in an XML format. We will break down the URL to use parameters for river and lock codes. I will also demonstrate the tweak needed so the web service can refresh in the Power BI service. This is an “intermediate-plus” post, so a basic knowledge of working with Power BI Desktop is assumed.

I previously worked for a company that operates towboats and barges on the US inland waterways. Locks and dams account for elevation changes in the waterway and can be major sources of delays when several towboats and barges queue up. I have included a picture of a towboat approaching a lock below. The barges are known as a “tow” and groups of barges will be “cut” (create smaller groups) to pass through the lock chamber then the tow will be put back together once the towboat comes through.

XML BLog1

(Photo credit: US Army Corps of Engineers)

 

Lockages can be a costly source of delay when they have large queues, so it is best to know when this is happening. There is no need to hurry up and wait, burning unnecessary fuel if there are several boats queued up already. The US Army Corps of Engineers owns and operates these locks and dams and maintains a public website, Corpslocks (https://corpslocks.usace.army.mil/, click on “Data Web Services”) with several data web services. One of these is the Lockqueues report, which contains the last 24 hours of lockages. This data is very valuable in knowing the conditions at locks throughout the inland waterways. In this blog post, we will walk through the steps to create a Power BI report that retrieves the data and can be refreshed once published to the Power BI service.

Download, transform and import the lock_codes.csv file

To prepare for this, we will need a list of all locks available in the service. This is found in a .csv file on the Data Web Services page. To get this, click the “Export to Spreadsheet” link below the table of locks.

XML BLog2

Import this into Power BI (Get data -> Text/CSV and select “Transform Data”) and examine the data in Power Query Editor. One issue is that “Lock No” comes in as a number data type. We need to change that to text since we need the leading zeros when we submit our web service request. This table will serve as our universe of locks from which to retrieve lockage data by passing each river and lock code to a custom Power Query function we will create later.

XML BLog3

When you do this, you will see a prompt asking you to either replace the current step or insert a new one. We will select the “Replace current” option.

XML BLog4

One last step before we start building the function is that we need to remove a dummy record. Filter the River Code column and uncheck the “00” value and click OK.

XML BLog5

Create query parameters for river and lock

Now, we need to define the river and lock parameters to pass values to the web service. Click on “Manage Parameters” in Power Query Editor and create 2 new parameters as defined below:

Name

Type

Current Value

pRiver

Text

GI

pLock

Text

01

 

The dialog should resemble the following image.

XML BLog6

Click OK. Now we will define the web service as a new data source.

Set up web data source to connect to the service

Click New Source -> Web.

XML BLog7

In the “From Web” dialog, select the “Advanced” option and we will break the following URL into four parts, substituting parameters where necessary:

https://corpslocks.usace.army.mil/lpwb/xml.lockqueue?in_river=GI&in_lock=01

 

Click “Add part” until there are 4 URL parts, then select parameter (highlighted) to select pRiver and pLock. Each part will be concatenated with the next to make up a full string (the URL to call). This will be dynamic, and we will substitute values for pRiver and pLock from the .csv file we imported earlier. The dialog should resemble the following image:

XML BLog8

Click OK and Power Query Editor will attempt to render a preview of the data in the service. Click on “Table 1” under Suggested Tables. Unfortunately, this is a temporary disappointment since the table only contains 3 datetime fields, a text field, and a number field. We are expecting a lot more information from the web service (e.g., vessel number, name, direction, number of barges, etc.).

XML BLog9

Click on the Web View tab (enable the preview if prompted) and now we see what we were expecting, lockage data in XML format. Click OK to import the data.

XML BLog10

Advanced transformation steps

When we do this, we get the imported data from Table 1 (the disappointing preview). We can see what happened in the Query Settings (Applied Steps), Power BI tried to interpret the data as HTML and not as XML.

XML BLog11

No problem! Delete the “Changed Type” and the “Extracted Table from Html” steps to preview the data at its source (select the “Source” step in the Query Settings dialog under Applied Steps). Now we can see why Power BI thought it was HTML (look at the first line before the row XML begins). The Source step uses the Web.BrowserContents function, but it will be best if we are working with binary data retrieved using the Web.Contents function rather than HTML. To make this change, open Advanced Editor and rename the Web.BrowserContents call to Web.Contents. The step’s code should read as follows:

XML BLog12

Click Done. Now we see there is a binary object and a new tab named “Binary Tools.” Click on that tab and notice the “Convert” group and select “Open As – Xml Tables.”

XML BLog13

Click on the Table cell to preview the data…this is what we were expecting!

XML BLog14

Now we need to expand the table, but first, remove the “Name” column as we do not need it. Then click the button to the right of “Table” in the column header to expand the table. Uncheck the “Use original column names as prefix” option as shown below.

XML BLog15

Click OK and we will see data on individual vessel lockages for Port Allen Lock (lock 01) on the Gulf Intracoastal Waterway (river GI). Note that each column type is the dangerous “ABC 123” (any) type.

XML BLog16

Change the types as shown in the following table (right-click the column, select “Change Type” and select the appropriate type):

Column Name

Data Type

VESSEL_NAME

text

VESSEL_NO

text

DIRECTION

text

NUM_BARGES

int

ARRIVAL_DATE

datetime

TIMEZONE

text

MMSI

int

SOL_DATE

datetime

END_OF_LOCKAGE

datetime

The table has the non-descript name of “Table 1.” Rename this to LockQueueReport.

XML BLog17

Apply the RelativePath parameter to the Web.Contents function

Now we need to make a small change that will enable this data source to refresh when published to the Power BI Service. Chris Webb’s BI Blog has a great blog post that helped me figure this out. The entire post is worth a read, but the short story is that the service needs a non-dynamic URL (i.e., no parameter substitution, etc.) to evaluate against when it attempts to retrieve data. To that end, we will utilize the base URL of the service with the “RelativePath” option of the Web.Contents function. Open Advanced Editor and make the highlighted change to the Source step:

XML BLog15

What we have built so far is great for getting data for one lock on one river but remember that we created the parameters and can call them from a table of rivers and locks. We need to convert this table into a function.

Convert the transformation into a custom Power Query function

Right-click on the LockQueueReport table and select “Create Function.”

XML BLog19

Give it a name, “fnGetLockages” has a nice ring to it, and click OK.

XML BLog20

Now, we have a function we can call for any number of locks we would like. You will see a new folder created with all the objects necessary for the function (the river and lock parameters, the function, and the query upon which the function is based).

XML BLog21

Invoke the fnGetLockages function

We will create a copy of the lock_codes table to use in invoking the new function (right-click lock_codes and select “Duplicate”) and rename this to LockageData. We do not need the River Name and Lock Name fields in this table, so remove them. We will store river and lock names once in the lock_codes dimension table. To get the lockages we will add a column by calling the custom function. The column type will be a table, so we will have to expand the values once the function is called.

Calling the function for all 190 locks would take some time, so for purposes of this demo, we will filter to include only the 19 locks on the Ohio River (River Code = ‘OH’). Click on the LockageData query, activate the “Add Column” tab, and select “Invoke Custom Function.” Select the fnGetLockages function from the drop-down and accept the new column name of fnGetLockages. Select River Code and Lock No fields for the parameter values of pRiver and pLock, respectively, and click OK. Set the privacy levels for this data set to public if you are prompted.

XML BLog22

A new column, fnGetLockages, will appear with a set of records for each lock. This needs to be expanded. Click the in the column header to expand the table. Note that there are 2 options, expand and aggregate. Select aggregate to see the options available. Rather than import individual records, this option would compute aggregations based on columns (e.g., sum of the num_barges field, count vessel_name values, etc.). Select the expand option, uncheck the “Use original column name as prefix” option, and click OK.

XML BLog23

After expanding the table, the column data types will need to be changed from “Any” (ABC 123) to the proper types. Before closing and applying the data transformation, we should disable load for the LockQueueReport table. Right-click on the table and uncheck the “Enable Load” option. Click “Continue” in the Possible Data Loss Warning dialog, there should be no visuals that depend on this table. The LockQueueReport exists in Power Query Editor as the definition of the fnGetLockages function. You can change the code of the function (data transformation) by modifying the steps in the LockQueueReport table. The only tables we want to load in the report are lock_codes and LockageData.

XML BLog24

Click “Close and Apply” to load the data in the report.

Model relationships between the tables

In order to relate the tables, we will need to create a composite key by concatenating the River Code and Lock No values together as a new “RiverLock” field. Right-click lock_codes and select “New Column” and enter the following text in the Formula bar:

RiverLock = lock_codes[River Code] & lock_codes[Lock No]

 

Similarly, we need this column in the LockageData table so follow the same steps with the code below:

RiverLock = LockageData[River Code] & LockageData[Lock No]

 

Proceed to the modeling tab and drag one of the RiverLock columns onto the same-named column in the other table. This will create a one-to-many relationship from lock_codes to LockageData as shown below.

XML BLog25

This is a rich dataset, and many useful calculations can be produced. As an example, if the “SOL_DATE” (start of lockage datetime) is null, that means the boat has arrived but not yet started its lockage. A simple visualization of locks and distinct counts of MMSI numbers (unique AIS numbers for vessels) broken out by direction is presented below.

XML BLog26 class="photo"

Note that the page has filters to only show records where SOL_DATE is (Blank). Other calculations of interest are time differences between ARRIVAL_DATE and SOL_DATE to compute average delay time across several dimensions (date/time, direction, river, lock, etc.).

Publish the report

The next steps include publishing the report to the Power BI service and configuring a refresh. I have a workspace called “Towboats” where I will publish my report (named “Ohio River Lockqueues”). To publish the report, in Power BI desktop, click the publish button and select the desired workspace.

XML BLog27

Connect the dataset to the Power BI gateway

Now that the report is published, we need to check the settings of the datasource. To do that, go to the service (http://app.powerbi.com) and navigate to the workspace where the report was published. Then, click on the ellipsis next to the dataset and select “Settings.”

XML BLog28

Expand the gateway connection section and examine the data sources included in the dataset and their connections (currently not configured correctly):

XML BLog29

We need to create two entries, one for the Corpslocks web service and another for the lock_codes.csv file. The easiest way to do this is to follow the “Add to gateway” links. My completed entries look as follows, first lock_codes.csv:

XML BLog30

then the Corpslocks Web Service:

XML BLog31

Return to the dataset, click settings, and expand the gateway selection. Then, select the newly created gateway entries and click apply.

XML BLog32

Schedule a refresh and test the connection

Expand the Scheduled Refresh section and check the option to keep your data up to date. Then select an appropriate frequency and click apply. My settings are below:

XML BLog33

A good practice is to perform a manual refresh to test the connection configuration.

XML BLog34

Summary

I hope this blog post is useful and highlights Power Query’s capabilities to extract XML-formatted data from a web service, as well as how to parameterize the function. Note that similar steps could be applied to HTML, JSON, and other types of formats. Special thanks to Chris Webb’s blog for the notes on the RelativePath parameter in Web.Contents.


AAG LogoKiZAN is a Microsoft National Solutions Provider with numerous gold and silver Microsoft competencies, including gold data analytics. Our primary offices are located in Louisville, KY, and Cincinnati, OH, with additional sales offices located in Tennessee, Indiana, Michigan, Pennsylvania, Florida, North Carolina, South Carolina, and Georgia.

Posted by Jimmy Dobbins

Senior Consultant - Data Scientist / Data Architect at KiZAN Technologies

Website

Topics: Power BI