The first week has passed by and it has been fairly intense, to say the least. The majority of the week was spent on Alteryx, a completely new software that we were introduced to. What really surprised me about Alteryx was how effective it is for preparing and manipulating data. This was apparent when fellow Data Schoolers gave their own use cases of Alteryx in their placements, clearly, you could see that they were able to solve underlying business problems by creating Alteryx workflows.
On Friday we were tasked with improving our initial applications by supplementing our data. My original visualisation was on US evictions and I focused my project on the eviction rate. Initially, I made a broad statement that ethnicity influenced the levels of eviction but had no real data to support this. So, I decided to research data based on specific ethnicities and this is what I found.
Image 1. Educational Attainment by Ethnicity
From this data set, I required the percent estimate of educational attainment for each ethnicity. My first problem was that when I downloaded the file I had multiple excel files that looked like this.
Image 2. Excel File of Educational Attainment Data
Eventually, I realised that the data was separated by unique codes that matched specific values in the original table and that I could use Alteryx to join them.
Image 3. Alteryx Workflow for Educational Attainment Data
- The first thing I did is take both data sets and input them through Alteryx. I could then pivot my data to separate my unique codes and values in separate columns.
- This then allowed me to join by the unique codes giving me the corresponding values.
- However, some of my values were not separated so by using the text to column tool I was able to sort the concatenation.
- From here I could start outlining what parts of the data were numerical values, titles, and further important texts such as ethnicity.
- Another thing that I noticed was that my subcategories (race) were in the same column as educational attainment. The idea was to separate these so that it would be easier for Tableau to interpret.
- I filtered through by null values as my race subcategories returned nulls. This then allowed me to further filter through to create three very simple columns.
Now that I had formatted my data in a way that Tableau could easily read I created a simple butterfly chart.
Image 6. Butterfly Chart Showing Educational Attainment
As you can see the data shows that the Black and Hispanic ethnicity have a lower percentage of educational attainment. I can now better answer my initial question – does ethnicity influence eviction levels?
Looking at this map you can see that there are hot spots on both the east side and the west side of the US. My data showed that on the east side, the hot spots generally have a larger Black population and similarly, for the west side, the hot spots have a larger Hispanic population.
Although it’s difficult to make a true correlation with this data set, you can make a better assumption that the areas with high eviction levels are affected by ethnicities with lower educational attainment. This would make sense as these populations are at greater risk of being unemployed, more likely to occupy renter homes, and thus have a higher probability of being evicted.