Are you finding messy data after importing a CSV file into Alteryx? Keep these three tips in mind and avoid disappointments
CSVs rock. I really like the format. Possible because during my years working in newsrooms, I ended up in front of PDFs full of data too many times, so it became a moment of glory in my life. Yeah, we are in the 21st century and governments and organisations still publishing their data in PDFs, making it a nightmare to analyse…
But this post is not to complain about the use of PDFs as a format to publish data. It is to talk about CSV and Alteryx.
During this week we worked with CSVs and Alteryx a couple of times. Once you discover the workflow it is great, however there are few tricky points that can make you crazy. So, if you are starting to use Alteryx like me, there are three important things to have in mind:
FIRST: Check the delimiter
CSV is the abbreviation for Comma-Separated Values. Despite the name, sometimes CSV files use another type of delimiter. And what is the problem with this? Maybe you will have the same problem we had when opening a CSV file today: It didn’t open in a proper way. Instead of seeing a bunch of rows, we had only one!
But don’t panic. It happens because by default Alteryx considers that CSV files have only commas as delimiters.
The only thing you need to do is to tell “Hey, Alteryx, don’t worry about commas, get any kind of delimiters you find”. Translating this phrase to Alteryx language, put “\0” instead of “,” inside the delimiters option.
Where is this option? It is the fifth row when you click on the “Input Data” option.
SECOND: Check if you are importing the first row as the field name
Once we are talking about the “Input Data” options, another thing you may like to do is uncheck the box “First Row Contains Field Names” (except in those situations you are pretty sure that your first row contains field names).
It is common to CSVs to come with introductory rows that are not part of the dataset. Sometimes it is an information about the methodology used to find the data. Other times it has the name of the institution responsible for the research… For this reason, it is better if you import this data as a row instead of putting it as a header. There is a very handy tool in Alteryx that you can use to get rid of this unnecessary rows that I will explain on my third point.
THIRD: Clean the unnecessary rows with “Sample”
There is a Preparation Tool in Alteryx that makes it easier to clean the unnecessary rows that you usually find in CSVs. That is the “Sample” tool.
Basically, it will give you the opportunity to define which are the rows that interest you, allowing you either to select which of them to keep or which to skip. Say goodbye to that terrible row with the source of the data and keep things sample!