Today’s task was to download New York snow plow data but it was us that got plowed.

 

We were advised to download the data and upload it to Exasol, a columnar storage database which is super fast in Tableau for large data sets. Given that the plow data set is over 250M records this seemed like a smart solution.

The problem was that the data on New York City streets was as a spatial file, great for tableau to plot streets on a map, but unable to be uploaded to Alteryx.

Our solution was to upload the plow data to exasol, access it in Tableau and join it to the spatial data there. This was not a good solution. Joining a local file to an Exasol database file is essentially as slow as having those 253M records locally.

My next solution was to upload an edited version of the spatial file, excluding the spatial elements and just including the zip codes. This worked relatively quickly and joined very nicely in Tableau, but was Tableau aggregated the zip codes into large areas, slightly smaller than Boroughs, which eliminated some of the finer details of the street by street plowing data.

The last option we came up with was to access the Exasol data in Tableau, create a datasource filter (filtering down to one or two days) and then to create an extract.

 

Given the fact that the datasource only takes about 4-5 minutes to load in Alteryx, we probably could have done the entire data preparation in Alteryx, and filtered the data down to our desired date range relatively easily.

In summary, Andy 1: DS11 0: Exasol 0