API Pagination in Alteryx

One really cool thing that I learnt a few weeks ago is pagination. This is especially useful when you get data through an API using Alteryx. Pagination is the process of fetching data and looping this over until everything that you need has been acquired. This is especially true when partial data is returned in a response and you need pagination to get the rest of the data.

Because of this loop, we will be using an iterative macro. There are 3 different types of macro’s: standard, batch and iterative. Iterative is used when you don’t know how many loops you need to make (infinite amount of times). Batch macro’s are when you need to finite loops to get the data.

When you are acquiring data through an API, sometimes you only receive partial data and you don’t know how much more there is. You know that you need to use pagination when you are given another key in the API. This new key will allow you to make further requests from the API. You want to keep going until there is no more keys given to you as this means that all the data has been collected.

An iterative macro workflow should look something like this:

Notice how the input and outputs look a little different from the normal input and output. This suggests that this workflow is a macro.

Initially, you would need a text input. This text input should have the URL needed to make the request on the API. You then need to use the download tool to download the URL and used the JSON parse tool to split the data I wanted.

I then wanted to further split the data into 3 parts.

  • I wanted all the rows with the actual data that I wanted.
  • I wanted the headers
  • I wanted the next key to get the next set of data

 

1: Getting the data

I noticed that all the rows with the data I wanted contained “datatable.data”. Therefore, I wanted to filter out all the rows that contained this. I then used text-to-columns tool to parse the data and then used the crosstab tool to get it all in the format that I wanted. I then output the data.

2: Getting the headers

I filtered out any rows that included “columns” as this contained all the header names that corresponded with my data. I parsed it out and used crosstab to get it into the format that I wanted. To join these headers with my data I used the dynamic rename tool to get these header names into the headers.

3: Getting the next key

To get the row with the key inside it, I filtered out the downloaded data using the following formula:

a) “[JSON_Name] = “meta.next_cursor_id”

AND

b) NOT IsEmpty([JSON_ValueString])”

“a)” part is about identifying the data with the key inside it.

“b)” part is about ensuring that no other data gets in. If data does get in, then it will come out of the “False” node of the filter. However, if there is no data and it is just the key then we want to process then we want it.

I then used the select tool and the formula tool to re-create the URL to make another request from the API.

My formula tool looked like this:

“”https://www.quandl.com/api/v3/datatables/WIKI/PRICES.json?ticker=DATA,googl,aapl,amazn,FB,msft&&api_key=uynvrZVQY-pNZg-_1xeU&” + “qopts.cursor_id=” + [JSON_ValueString]”

I then turned this into an output.

Turning it into a macro

At this point I had 2 outputs. The first one, which came from my first stream (the one dealing with the data) and the second output, which contained the new key.

I turned the data output into a standard macro output, but I changed the output with the new key into an iterative macro. Make sure you name your outputs so that you know which one to turn into an iterative macro (the one containing the key).

To tell Alteryx that you want an iterative macro as supposed to a standard one, you need to click on the whitespace in the workflow. You should see the configuration tab like this:

 

You need to select the workflow tab and change the “macro” setting from standard to iterative.

You then need to select View at the top and click interface designer.

You then need to configure your settings to tell Alteryx your “iteration input” and your “iteration output”.

This is where naming your macro’s will become useful.

Then you’re all set!