Week 2 at Data School I got my first taste of Alteryx, an incredibly powerful data preparation tool. It’s truly amazing how much you can do with the software; there are so many tools to choose from!
Our main focus this week was learning how to download data from an API and then parse and transform the JSON data into something Tableau friendly. I was assigned the New York Times Most Popular Articles API. I was keen to find out how much Trump is dominating our news. However, before I could do any analysis in Tableau, I had to use Alteryx to download and transform the API data. The data from the API comes in JSON form and before any manipulation looks like pages and pages of this:
As you can see, this is not particularly useful. I quickly moved into Alteryx to start creating the workflow.
- Import: Imported an Excel file that contains a list of all the different NYT sections.
- Formula: The Formula tool contains the URL for my API. The URL only looks at one section at a time. Therefore, I used the [Section] field from my Excel import to make sure data was downloaded for all sections.
- Download: This tool downloaded all the data from the API into Alteryx in JSON form.
- Select: Selected only the fields that contain data from my API and discarded the rest.
- JSON Parse: This tool configured the downloaded data and output values into a single string field called [JSON Name].
- Text to Columns: Split the field [JSON Name] into three separate columns.
- Filter: Filtered out rows that don’t contain results.
- Select: Selected only the columns that I want in my final dataset.
- Auto Field: Automatically changed the field type of strings to a date, number or whatever the relevant field type should be.
- Export: Created a Tableau Data Extract (.tde) with all my data.
Now, all my data was in Tableau and it was easy to see what portion of the top articles in each section is Trump-related.