Data Gym - keeping in shape: shapefiles and tableau

by Adi McCrea

Integrating spatial files into Tableau is quite a recent addition. It’s also an absolutely marvellous addition.

You can read, in depth, how to work with them in Tableau here.

In today’s post I’m only going to focus on the actual architecture of the shapefile.

 

Quick read:

 

  • The file with the .shp extension is the one you’ll add to Tableau using the ‘connect to spatial file’ option – filename.shp
  • Don’t delete any of the other associated files – filename.dbf, filename.prj, filename.shx

 

The end 🙂

 

 

Got some extra time?

Here’s a bit of background on spatial data and the shapefile.

 

Common types of spatial data

 

  • points – individual features with a single x, y identifier (longitude and latitude). This could, for example, be the location of an individual house
  • lines – typically used to illustrate linear features with some kind of implied flow. This could be streams or road centre-lines
  • polygons – these represent areas. This could be administrative areas (census voting districts) or natural spaces (parkland, lakes, large rivers). We can create them when we ‘close-off’ lines, snapping together their vertices at the start and at the end

Spatial datasets contain fields and rows just like any other dataset. The key thing with spatial is that we also have a geometry field where the location of each data point is stored. To aid efficiency, the entire dataset is broken down across 3 mandatory file types that are all referenced when we load the .shp file. That’s why it is very important to keep all the spatial files together, not just the .shp.

 

What do you get when you do spatial?

 

  • filename.shp – this contains the actual geometry of your data so it is responsible for drawing the shape of the points, lines or polygons in your dataset
  • filename.shx – this is an index file. It contains the machine-readable code for the spatial entities in the .shp. Indexes typically allow machines to locate entities much more efficiently in a database
  • filename.dbf – this is a standard database file that contains object IDs and object attributes. The software looks to this file to read attribute information associated with each of the unique geometry features. Don’t get rid of this one. It’s possible to edit the data within the file, opening using Excel or a text editor. However, save any changes as a different file. If you alter the original .dbf the .shp will not communicate with it properly so you won’t see what you expect!

 

There are some other, non-essential, file extensions within the spatial family. You may or may not see them. If they’re there, keep them together with the above essentials. If not, don’t worry!

 

  • filename.prj – this contains the spatial projection that has been used on the data. Projections are a bit of a specialist topic within spatial so I won’t cover them here. You can find out more using map projections and coordinate systems here. For our purposes, remember that Tableau uses the projection WGS 84
  • you may also see .sbn, .cpg, .sbx, .xml

 

 

Spatial can be a bit of a head trip. Nudge me on Twitter @AdiBop_ if you need a bit of help!