An Introduction to Converting SQL Code to Alteryx &Tableau Prep

by Jonathan Allenby

This guide will serve as an introduction to taking a chunk of SQL code, or a Stored Procedure, and recreating the logic and analysis within an Alteryx or Tableau Prep Workflow. It will not be a deep-dive into learning SQL or SQL logic, but will provide a good foundation for how to tackle this kind of challenge.

A basic SQL statement is built uaing of an ordered subset of the following key words:

  • SELECT
    • TOP
    • AS
  • INTO
  • FROM
  • JOIN
    • ON
  • WHERE
  • GROUP BY
  • HAVING
  • ORDER BY

Most SQL statements will not include all of these, and there are a few other key words but understanding this subset should cover you in 95% of situations.

Re-Ordering the Statements

Despite SQL statements requiring a strict ordering of the above key words, they aren’t actually processed in this order. This is a big reason why it can be confusing to start trying to translate this into data prep software like Alteryx and Tableau Prep.

Here is how you can re-order the statements to fit the actual order of logic in a way that fits with other software.

1. FROM

The FROM clause defines the first input to be used in the process.

In Alteryx: Input Tool
In Tableau Prep: Input Step

2. JOIN

The JOIN clause defines any additional data sources required in order to produce either the fields defined in the SELECT statement or the number of rows in the output. In SQL, you will usually come across one of 5 join types:

  1. JOIN: An inner join.
    • In Alteryx: the J output of a Join tool.
    • In Tableau Prep: a Join step with just the inner area selected.
  2. RIGHT/LEFT JOIN: A right or left outer join.
    • In Alteryx: the L/R & J outputs of a Join tool unioned together.
    • In Tableau Prep: a Join step with the entire left/right circle selected.
  3. FULL JOIN: A full outer join.
    • In Alteryx: the L, R, & J outputs of a Join tool all unioned together.
    • In Tableau Prep: a Join step with the entire venn diagram selected.
  4. CROSS JOIN: the Cartesian Product of two data sets, i.e. all rows in Table01 joined to all rows in Table02.
    • In Alteryx: the Append tool.
    • In Tableau Prep: create a calculated field for both datasets containing a constant value; then use a Join step that uses this dummy field as the join.

ON

Join conditions are defined by the ‘ON’ keyword. In Tableau Prep and SQL it is possible to use conditional joins, e.g. joining on FieldX > FieldY. However, in Alteryx you can only create ‘equals’ joins. In order to replicate these joins you will may need to first join (if there are other non-conditional join clauses) or append the data from Table02 onto Table01 and then filter based on the join condition, e.g. FieldX > FieldY

3. WHERE

Removes all rows after the FROM & JOIN that do not fulfil the WHERE condition.

In Alteryx: The Filter Tool.
In Tableau Prep: Any of the filtering methods.

4. SELECT (everything apart from TOP)

The SELECT clause defines the fields that will be in the final output. This can include pre-existing fields from any of the tables defined in the FROM and JOIN clauses, but may also include calculations to create new fields as well as aggregation functions (COUNT, MAX, MIN, SUM, AVG).

If any of the aggregation functions are used, the SQL statement will usually include the GROUP BY clause to define the fields the aggregation calculations are being calculated for.

TOP

The TOP keyword defines a number or percentage or rows to return. This keyword gets applied after nearly all other steps as the order of the results is important.

AS

The AS keyword may get used during this clause for one of three purposes:

  1. Renaming a field.
  2. Assigning a field name to a calculation, e.g. ( Profit / Sales ) AS “Profit Ratio”.
  3. Assigning a field name to an aggregation, e.g. SUM( Sales ) AS “Total Sales”.

In Alteryx:

  • Choosing/Renaming fields: the Select tool.
  • Calculating fields: the Formula tool.
  • Aggregating fields: the non-“Group By” settings in the Aggregation tool.

In Tableau Prep:

  • Choosing/Renaming fields: any step by clicking on field names or deleting fields.
  • Calculating fields: any step using any of the methods of creating a new calculated field.
  • Aggregating fields: the “Aggregated Fields” side of the Aggregation step.

6. GROUP BY

The GROUP BY clause usually only gets included if an aggregation was defined in the SELECT clause. The GROUP BY clause defines the fields that the aggregations are calculated for. For example, SELECT SUM( Sales ), Region, ... GROUP BY Region will sum up the Sales for each Region.

In Alteryx: The “Group By” setting in the Aggregation Tool.
In Tableau Prep: The “Grouped Fields” side of the Aggregation Step.

7. HAVING

The HAVING clause acts as a filter after the aggregation has been performed, e.g. HAVING SUM( Sales ) > 500. This differs from the WHERE clause which filters at the row level before the aggregation.

In Alteryx: A Filter tool (after an Aggregation tool).
In Tableau Prep: Any of the filtering methods (after an Aggregation step).

8. ORDER BY

ORDER BY defines which fields and order to sorts the results after all previous clauses have been completed.

In Alteryx: the Sort tool.
In Tableau Prep: as of 2020.1.4 (29/04/2020) Tableau Prep cannot sort the output. This is because Prep’s primary purpose is to prepare data for Tableau Desktop in which the order of rows is not important.

8. SELECT TOP

As mentioned in the SELECT section of this post, TOP gets applied after all other clauses apart from INTO. This is because the order of the rows is important as it only returns the defined number or percentage of rows from the top of the output. For example, SELECT TOP (1) Region… ORDER BY SUM( Sales ) DESC will return just the region with the top sales; SELECT TOP (1) Region … ORDER BY SUM( Sales ) DESC will return just the region with the lowest sales; and SELECT TOP (1) Region without an ORDER BY clause will simply return whichever region happens to be at the top of the data set.

In Alteryx: The Sample tool or the Select Records tool.
In Tableau Prep: Very difficult to near impossible to achieve. 

9. INTO

The INTO clause defines the table that is going to contain the data selected and prepared by all the previous clauses, either for analysis or to use within other SQL blocks. For complex SQL procedures, the INTO clause can be used to break up the procedure into granular manageable chunks.

In Alteryx: The Output tool if it is the final output.
In Tableau Prep: The Output step if it is the final output.

Bringing it all together

The guidance above should help guide you when learning how to convert SQL to Alteryx or Tableau Prep, and possible vice versa too! The galleries below show how an SQL statement would look when written in Alteryx and Tableau Prep, as well as their relevant outputs to show that all three accomplish the same task.

Below you can see a single process but written in SQL, Alteryx, and Tableau Prep.

And here you can see the outputs of all three to prove they all accomplish the same task.

SQL Output
Alteryx Output
Prep Output

Avatar

Jonathan Allenby

Dashboard Week Day 2: The Perilous Life of Being an Avenger

2 mins read

Thu 25 Apr 2019

Dashboard Week - Day 1: NHS Trust Expenditure

2 mins read

Tue 23 Apr 2019

Tableau Prep: Using "Group & Replace" to catch & fix spelling errors.

2 mins read

Tue 26 Mar 2019

Jon's DS13 Week 7 Recap

2 mins read

Mon 25 Mar 2019