Alteryx: It all starts with Input Data

by Anna Noble

inputdata

Every workflow needs a data source.  Enter the Input Data tool – arguably the most important tool in Alteryx!

Access to different data sources

The Input Data tool allows you to load in a range of data files, including Excel, CSV, XML & JSON. You can also connect to various databases and servers if you have large amounts of data to deal with.

loaddata

 

Dealing with big data

You might want to limit the number of records you pull into the workflow so you can test or trial the process quickly. Alteryx allows you to do this by allowing you to set a record limit.

Multiple files? Add a wildcard to a file name

untitled-design

Alteryx uses an asterisks * as a wildcard, like most programming languages. If you enter a wildcard at the end of a file name you can pull in all files with the same file format. For example I have downloaded fire incident data for West Midlands Fire Service, where there is file for each year.  I’ve saved the file in the format WMFS Incidents followed by the year the data was collected.  So I can put the wildcard at the end of ‘WMFS Incidents’ and Alteryx will automatically pull all the files together and join them into one single data set. Pretty neat, right?
multifilejoin

Start Import Data on Line…

startdataimport

Here you can specify where in the data you want to start the import from.  This I’ve found particularly useful when downloading open source CSV or Excel files. Often they come with titles or extra information in the first few rows of data.

 

What’s in the first line of data?

In a similar vein it’s worth checking whether you need to untick the default setting to make the first row of data the headers with the ‘First Row Contains Field Names’ option.  Sometimes the first row is simply values in your data set and you will need to setup new headers.

Reflections on my Alteryx experience so far

Throughout Alteryx week, this flexibility has actually been one of my favourite things.   It’s so simple to go back and change a configuration at any point in the workflow.  So you can use trail and error to figure out what it is you need in the configuration with just a click of the mouse.

trailerror