The project of this week was really nice, I had the chance to learn more about Alteryx and its tools. I had to prepare one hour training module in Alteryx about “CREATING DISTRIBUTIONS”.

So… I started investigating about this topic and I ended up structuring the training session into 3 parts:

**Data investigation tools – FIRST INVESTIGATION****Predictive Grouping – CREATE CLUSTERS (Benedetta explained already in this blog post)****Data investigation tools – INVESTIGATE YOUR RESULTS**

But what are these data investigation tools? Actually, I never heared about them before this week and this is one of the reasons I really love Alteryx: every time I need to so something really specific I end up discovering new tools and features and it is amazing!

These tools are** based on R **and they are really** useful for a first data investigation. ****Before a process or analysis takes place, users need to understand the details of their data.**

Let’s look into some of these tools together:

**Association Analysis:**this tool generates a correlation matrix report (one static and one interactive), determining which fields in a database have a bivariate association with one another. It is possible to determine a target field (if there is a particular field of interest). I would suggest to investigate before a wider range of data in order to get a general overview of the correlation. In the example below, I just deselected Latitude and Longitude and use the Pearson product-moment correlation:

Here the interactive report:

If you click on one of the small square, it will update a scatter plot on the right side of the report as below:

**Red means strong positive linear correlation** (~ y increases while x increases) while **blue means strong negative linear correlation** (~ y decreases while x increases).

this tool allows to fit one or more distributions to the input data and compare them. It is based on the statistical significance (p-values). I selected the Mean Age field and all the distributions available, in order to see if at least one of them fits it or not:**Distribution Analysis:**

Below the report:

If we look at the **Chi-Square test**, it is possible to deduct that Normal, Lognormal and Gamma quite fit the distribution.

**Histogram**: shows the empirical cumulative distribution of a single numeric field by showing the frequencies of records falling in a set of continuous value ranges. I selected the field Sum_Residents and break the histogram into 10 parts (it is possible to leave it also “auto”) and I plotted a smoothed density curve:

Below the view:

All these information can be processed in order to perform predictive analysis, such as predictive grouping (clustering). Below my workflow:

Below the settings of the K-Centroids Cluster Analysis (I selected all the fields apart from Latitude and Longitude, divide it into 4 clusters – do not forget to check the report and to use the Append Cluster tool):

**After a process or analysis takes place, users need to investigate the results**. I found some of the data investigation tools very useful.

I put as input data the output of the previous predictive grouping workflow. I used a select tool in order to transform the field Cluster into a string (it was a double before). Why? To use the data investigation tools on it!

**Plot of means:**the tool takes a numeric or binary categorical field as a response field along with a categorical field and plots the mean of the response field for each of the categories of the categorical field. I selected Mean_Age as response field and Cluster as categorical field to group by as below:

It is possible to observe how the age fall into the 4 different clusters:

**Frequency table:**the tool helps the user understand the contents of their data. For each field selected by the user, a frequency table is produced. The frequency table is a summary of their data with frequency counts and percentages for each value in a field. In my case, I want to look at Clusters:

Below the interactive report:

I hope you enjoyed it.

A presto!