Automatically dropping columns and rows with a selected number of missing values in Alteryx

by Abiramm Thavajothy

Normally, when thinking about which columns and rows to drop from our data, we start off by dropping all the columns and rows that are 100% missing. But what happens if we wanted to get rid of columns where over 95% are missing etc? This blog aims to show how you can dynamically remove columns and rows based on how much data is missing.

Removing columns

To do this we will take advantage of the field summary tool;

Field Summary Tool

1)      Feed whatever data you are checking through the field summary tool (in my example it is global HDI data)

2)      The field summary tool has a column that shows us, for all the columns, what is the percentage of data that is missing

3)      We can use a filter to remove all the rows (column names where any specified % of data is missing) e.g. 5%

4)      Now we go back to the original data and transpose it so we get all the column headers as rows (the true anchor of this is what we will use in the join in the next step)

5)      Next we can join the two tables we have made together using the column name as the join clause (this will mean that all the columns we want to drop will get added to the join)

6)      Add a cross-tab after the left or right anchor (whichever one has data) with the group by variable (country in my case), and voila, we should have an appropriately structured table where no column has more than 5% missing data.

Removing rows

Same idea but this time we take advantage of null values and the summarize tool.

1)      Use the same transposed table that we had in step 5 (after the join), and make a new branch with the summarize tool. In the summarize tool we can group by ‘country’ and count the number of nulls that each country has

2)      Same as before we can filter out the number of nulls, we want to have at most

3)      Then we can do the join in the same was as before

4)      Again, transpose the left/right anchor of the join tool and again, we’ve got rid of rows with any number of nulls based on selection.

Avatar

Abiramm Thavajothy

Fri 29 Jul 2022

Thu 28 Jul 2022

Wed 27 Jul 2022

Tue 26 Jul 2022