Hello future DS20s, the Sunday a week ago I was sitting in front of my computer, probably like some of you right now, browsing the Data School’s web page, looking at those smiley faces, while…not quite sure what to expect next! Here you may find your answers.
Week1 started with Alteryx, one of our very first challenges was a data preparation task. Given the dataset below:
How do we turn it into:
Before we start, here are a few things we would want to do to our dataset:
1. Fixed the ‘DATE’ column for whether it is 2018 or 2019;
2. Split the players’ names and their scores in the last three columns;
3. Create the ‘Home or Away’ column based on the information we have in the column ‘OPPONENT’;
4. Split the column ‘RESULT’ into three where each of them will record ‘Win or Lose’, ‘Overtime’ and ‘Score’;
5. Separate ‘Spurs Score’ and ‘Opponent Score’ into two columns.
OK, now let’s start.
– First, grab the ‘Input Data’ tool from our tool bar, hit on the ‘Run’ button to import our spreadsheet.
– Connect the ‘Filter’ tool to remove empty rows. Select ‘DATE’ from basic filter, and choose ‘is not null’.
– Connect the ‘Formula’ tool to the ‘T’ end of the ‘Filter’ to return the data when the statement made was true. Within the ‘Formula’ we select the ‘DATE’ column to update, if the current month is greater or equal to 10, then return the previous year, else, return the current year.
– Next, connect the ‘Text to Column’ tool to split the ‘HI POINTS’ into two columns which records the players’ names and their points. We write ‘\s’ below ‘Delimiters’ to indicate that we would like to split at the space in-between name and point.
– Then we do the same for ‘HI REBOUNDS’ and ‘HI ASSISTS’.
– Before we move on, let’s clean up our data using the tool ‘Select’. We untick the boxes to remove the columns we don’t need, and rename the six new columns we have just created.
– Next, connect the ‘Formula’ tool to create another four columns.
First, ‘add new column’ and name it ‘Home or Away’, if the first character from the left under the ‘OPPONENT’ is ‘@’, then the game was played away, else, at home.
Similarly, name a new column ‘Win or Lose’, copy the first character from the left from the ‘RESULT’ column.
Next, name a new column ‘Overtime?’, if the first two characters from the right under the ‘RESULT’ is ‘OT’, then record ‘Overtime’, else, just ‘Time’.
Lastly, name a new column ‘Score’, if the first two characters from the right under the ‘RESULT’ is ‘OT’, then copy ‘RESULT’, starting after the first character and stop before the space, else, copy ‘RESULT’ from the right without the last character.
– Next, connect the ‘Text to Column’ tool to split the ‘Score’ into two columns. We write ‘-’ below ‘Delimiters’ to indicate that we would like to split at the dash.
– Next, connect the ‘Formula’ tool to create another two columns. First, name a new column ‘Spurs Score’, if ‘Win or Lose’ is W, then copy ‘Score1’, else, copy ‘Score2’. Then we do the same for ‘Opponent Score’. Add ‘trim’ to delete empty space.
– To clean up our final dataset, connect the ‘Select’ tool, untick the boxes that we want to remove.
– Finally, connect the ‘Output’ tool and select your output file type.
And don’t forget to save your workflow!