Tableau Tip #4 - Serving up some rules to optimise your Tableau Server

by Ravi Mistry

We’ve had the legend that is Jonathan MacDonald teaching us Tableau Server this week, and the fact is, it’s been super cool. I have no knowledge of anything server related, so having JMac here as well as *the Guy* for Tableau Server, David Spezia coming in to run a master-class. This included a number of ‘rules of thumb’ and today, Alexandra and I have added a few tips about this product for your perusal.

Sasha (Alexandra) has mentioned how to best configure the server for optimal performance depending on your usage, and I’m going to run through a few ideas which you should keep in mind when purchasing your hardware for the server, if you’re starting from scratch.

First of all, if you are installing a cluster of servers, then you should ensure that all the machines are all running either 32-bit each or 64-bit each. This is because when the software is installed, then there may be communication issues between the configured machines – It’s just inadvisable to include different types of systems to communicate with each other. Something will break.

Hardware-wise, you should look to add;

  • 8GB of RAM per core of your server
  • 500MB to 2GB of hard disk space per User (Depending on whether you are using extracts heavily or not)
  • The disk write and read speed should be greater than 400MB per second up to about 750 MB/s (else it gets stupidly expensive!) and ideally is solid-state hard drive

There are 13 components within Tableau Server; a number of these are configurable and when setting up, there are a certain number of rules of thumb that you should look to follow when configuring the loads on your server.

The gateway is a key part of the server; it has to exist as this is the port where the traffic will come in and out. If you’re setting up to have a server running and existing for a long period of time, then you should look to include a number of gateways – This will make sure that the amount of traffic is distributed around, especially given the round robin structure of the gateway module. If there are a number of users who will access the server at once and you are looking to make sure that it doesn’t go down, then an external load balancer may be recommended. This acts as almost a traffic light, ensuring that no-one server is overwhelmed. It can also double up as a proxy, but that’s for another post (or an alternate blog!)

VizQL is a massive CPU hog – you should look to have;

  • 1 VizQL per 25-50 concurrent users
  • But only up to 8 VizQL’s running on one machine
  • VizQL is also dependant on having an Application Server, and there should be one per VizQL. However, 1 Data Engine needs 2 VizQL processes, but if the processes are likely to include heavy data extracts, then you should look to have a 1:1 relationship to optimize performance.

Similar to VizQL, the backgrounder takes up a lot of space on the CPU, so should generally look to be on it’s own machine where possible. The number of processes that this takes up is determined by the number of cores on the server, divided by 2.

The cluster controller and the co-ordination service are linked; these are the failsafe within the server, and they both work together in the event of an emergency. The Cluster Controller acts as a smoke alarm, as it monitors all the processes in a cluster, whilst the co-ordination service points at the machine/computer which will solve the problem – it essentially rings 999 or 911 to call the fire brigade to put the fire out.

For every application server, there should also be a search and browse, as this allows the server environment to be queried, filtering the data or content on the server and bringing back what is called. Where the search and browse is, there should also be a repository – The repository is the PostGRES part of the server, and is a passive AND active part of the server. One part reads, writes and executes actions such as the file structure whilst the other logs every query in case the other fails. They store the server data, so natural for any query that is asked that the repository has to be on that machine to log the fact that the query occurred.

I understand that this has been a long and fairly complex blog post, but I hope that it helps you gain some level of insight on the best practices when setting up your server hardware and configuring it for optimal performance.