Explaining with: Power BI Boxplots without Custom Visuals

by Robbin Vernooij

As my colleague Jenny Martin was reworking some of our Stats101 content for the training curriculum, we ran into an interesting challenge of trying to create boxplots in different kinds of visualisation tools.

Turned out there is no native boxplot visual in Power BI so we spend some time trying to create them without having to fall back on custom visuals.

The elements of a boxplot are relatively easy to calculate using DAX functions so we took the challenge head on and came up with the following:


Not too shabby, so let's start walking through the steps we took to create this visual and along the way focus on explaining what boxplots do.

Hopefully the first of many in the new series I'm trying to work on: 'Explaining with'


We'll start by going over the datapoints we're looking to investigate.

We're using a sample superstore dataset with transactional data at the order id and item id level. Each record has a profit attached to it and we're interested in the distribution of data for each item sale across the different categories of items we sell.


By default, the scatter plot visual does not allow for 3 discrete categories to break up the visual, hence we have to use three separate scatter plots. In order to align the y-axis for both, we need a measure that finds the maximum and minimum value across all three Categories.

We create two measures using the CALCULATE() function to remove the filters only from the Category field, so it can re-evaluate the minimum and maximum later on when we user other filters/slicers. They've been added at constant lines in the chart above to better visually represent them for this demonstration. They can be added as functions on the y-axis instead if desired.

max profit all categories = 
CALCULATE(
    MAX(Orders[Profit])
    ,REMOVEFILTERS(Orders[Category])
)

min profit all categories = 
CALCULATE(
    MIN(Orders[Profit])
    ,REMOVEFILTERS(Orders[Category])
)

measures

We could have created a histogram instead to see the distribution but we're equally interested in the middle value (median), what the spread is of 50% of the middle part of the distribution and what we could consider outliers.

Therefore, I've decided to leave the individual profit values at as circle marks, but spread out randomly using the RAND() function. The overlapping circles visually enhance the density of marks near the median.


Up next are the main elements of the boxplots, the median, 25th/75th percentiles or 1st and 3rd quantiles, and error bars.

As we are interested in the row level information, we create the three measures to that outline the 25th percentile (bottom of the boxplot, where the bottom 25% of the data falls below), median (line dividing the boxplot marking, identical to PERCENTILE.INC([values], 0.5) ) and 75th percentile (top of the box plot, where the top 25% of the data sits above).

Median Profit = 
MEDIAN(Orders[Profit])

25th Percentile of Profit = 
PERCENTILE.INC(Orders[Profit],0.25)

75th Percentile of Profit = 
PERCENTILE.INC(Orders[Profit],0.75)

measures

Typically this should be enough to plot constant lines to give us a nice division of the data, however, one of the major advantages of a boxplot is to see the size of the box, also referred to as Inter Quartile Range (IQR, the spread of 50% of your inner most data points).

In order to see this value we need to subtract the 25th from the 75th percentile.

Interquartile Range = 
[75th Percentile of Profit] - [25th Percentile of Profit]

measure

This will now allow us to build the visual using a line and stacked column chart is displayed in the embedded report below.


  • add the Category to the x-axis
  • add the Median Profit to the line y-axis
  • format the visual > line > 0 pixels (px)
  • format the visual > markers > turn on > shape - > ~15 px
  • format the visual > markers > colors > dark color of choice
  • place the 25th Percentile of Profit on the Column y-axis
  • Add the Interquartile Range below it on the Column y-axis
  • (see embedded report below) Format the Column series colors;
    • 25th Percentile of Profit > 100% transparency
    • Interquartile Range > 20% transparency and color of choice

These form the basis of our box plots!


For the the error bars we use a common definition that would show us if values are outside above or below 1.5 times the interquartile range from the 25th or 75th percentile (bottom and top of the box).

Typically, these error bars would show up to the full extend of the 1.5 times IF there are values that are beyond the bars. If this is not the case, most boxplots would extend the error bars to the maximum value found in the data. Hence we have to build in an IF statement to do this for us dynamically.

Lower Whisker = 
IF(
    MIN(Orders[Profit])<[25th Percentile of Profit]-1.5*[Interquartile Range]
    ,[25th Percentile of Profit]-1.5*[Interquartile Range]
    ,MIN(Orders[Profit])
)

Upper Whisker = 
IF(
    MAX(Orders[Profit])>[75th Percentile of Profit]+1.5*[Interquartile Range]
    ,[75th Percentile of Profit]+1.5*[Interquartile Range]
    ,MAX(Orders[Profit])
)

measures

For the error bars we go to the Visualisation Pane > add further analysis > error bars.

Lastly, we'd want to synchronise the visualisation the same as we synchronized the scatter plots. Format your visual > y-axis > range > fx for min and max (lower and upper whisker respectively).

This will now allow us to align the scatter plots behind the boxplot, add some bookmarks if you'd like to show and hide the marks. Not going over bookmarks in detail this time around, but feel free to suggest it as a topic through social media.