Building Bullet Graphs and Waterfall Charts with Bokeh

Python

Building Bullet Graphs and Waterfall Charts with Bokeh Source – PbPython.com

Introduction

In my last article, I presented a flowchart that can be useful for those trying
to select the appropriate python library for a visualization task.
Based on some comments from that article, I decided to use Bokeh to create
waterfall charts and bullet graphs. The rest of this article shows
how to use Bokeh to create these unique and useful visualizations.

Bullet Graphs

In the comments section of the last article, Bryan Van de Ven provided example code for
creating a bullet graph in Bokeh. For those of you that do not know Bryan, he
is a Senior Software Engineer at Anaconda and is one of the creators of Bokeh. It is
safe to say he knows that library well so I figured I should listen to his comments!

I took his example and expand it a bit and included it here in order to
compare to the matplotlib process. In the process of building these examples,
I learned a lot more about how to use Bokeh and hope this article will show others
how to use Bokeh. I’ll be honest, I do think the resulting code
is simpler to understand than the matplotlib bullet graph example.

I have posted this notebook on github so feel free to download it and use it to
follow along. Unfortunately the Bokeh charts do not render in github but if you
wish to use this example, it should run on your system as long as the dependencies
are installed.

First, let’s do the imports and enable Bokeh’s output to display in our notebook:

from bokeh.io import show, output_notebook
from bokeh.palettes import PuBu4
from bokeh.plotting import figure
from bokeh.models import Label

output_notebook()
Bokeh Loading

For this example, we’ll populate the data with python lists. We could modify this
to fit in a pandas dataframe but we will stick with simple python data types for
this example:

data= [("John Smith", 105, 120),
       ("Jane Jones", 99, 110),
       ("Fred Flintstone", 109, 125),
       ("Barney Rubble", 135, 123),
       ("Mr T", 45, 105)]

limits = [0, 20, 60, 100, 160]
labels = ["Poor", "OK", "Good", "Excellent"]
cats = [x[0] for x in data]

The code is a pretty straightforward definition of the data. The one tricky code portion
is building out a list of categories in the
cats

variable that will go on the y-axis.

The next step is to create the Bokeh figure and set a couple of options related
to the way the x-axis and grid lines are displayed. As mentioned above, we use
the
cats

variable to define all the categories in the
y_range

p=figure(title="Sales Rep Performance", plot_height=350, plot_width=800, y_range=cats)
p.x_range.range_padding = 0
p.grid.grid_line_color = None
p.xaxis[0].ticker.num_minor_ticks = 0

The next section will create the colored range bars using bokeh’s
hbar

.
To make this work, we need to define the
left

and
right

range
of each bar along with the
color

. We can use python’s
zip

function to
create the data structure we need:

zip(limits[:-1], limits[1:], PuBu4[::-1])
[(0, 20, '#f1eef6'),
 (20, 60, '#bdc9e1'),
 (60, 100, '#74a9cf'),
 (100, 160, '#0570b0')]

Here’s how to pull it all together to create the color ranges:

for left, right, color in zip(limits[:-1], limits[1:], PuBu4[::-1]):
    p.hbar(y=cats, left=left, right=right, height=0.8, color=color)
Bokeh bullet graph start

We use a similar process to add a black bar for each performance measure:

perf = [x[1] for x in data]
p.hbar(y=cats, left=0, right=perf, height=0.3, color="black")

The final marker we need to add is a
segment

that shows the the target value:

comp = [x[2]for x in data]
p.segment(x0=comp, y0=[(x, -0.5) for x in cats], x1=comp,
          y1=[(x, 0.5) for x in cats], color="white", line_width=2)
Bokeh bullet graph start

The final step is to add the labels for each range. We can use
zip

to create the label structures we need and then add each label to the layout:

for start, label in zip(limits[:-1], labels):
    p.add_layout(Label(x=start, y=0, text=label, text_font_size="10pt",
                       text_color='black', y_offset=5, x_offset=15))
Final Graph

I think this solution is simpler to follow than the matplotlib example. Let’s
see if the same is true for the waterfall chart.

Waterfall Chart

I decided to take Bryan’s comments as an opportunity to create a waterfall chart in
Bokeh and see how hard (or easy) it is to do. He recommended that the candlestick chart
would be a good place to start and I did use that as the basis for this solution.
All of the code is in a notebook that is available here.

Let’s start with the Bokeh and pandas imports and enabling the notebook output:

from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.models import ColumnDataSource, LabelSet
from bokeh.models.formatters import NumeralTickFormatter
import pandas as pd

output_notebook()

For this solution, I’m going to create a pandas dataframe and use Bokeh’s
ColumnDataSource

to make the code a little simpler. This has the added benefit of making this code
easy to convert to take an Excel input instead of the manually created dataframe.

Feel free to refer to this cheatsheet if you need some help
understanding how to create the dataframe as shown below:

# Create the initial dataframe
index = ['sales','returns','credit fees','rebates','late charges','shipping']
data = {'amount': [350000,-30000,-7500,-25000,95000,-7000]}
df = pd.DataFrame(data=data,index=index)

# Determine the total net value by adding the start and all additional transactions
net = df['amount'].sum()
amount
sales 350000
returns -30000
credit fees -7500
rebates -25000
late charges 95000
shipping -7000

The final waterfall code is going to require us to define several additional
attributes for each segment including:

  • starting position
  • bar color
  • label position
  • label text

By adding this to a single dataframe, we can use Bokeh’s built in capabilities to
simplify the final code.

For the next step, we’ll add the running total, segment start location and the position
of the label:

df['running_total'] = df['amount'].cumsum()
df['y_start'] = df['running_total'] - df['amount']

# Where do we want to place the label?
df['label_pos'] = df['running_total']

Next, we add a row at the bottom on the dataframe that contains the net value:

df_net = pd.DataFrame.from_records([(net, net, 0, net)],
                                   columns=['amount', 'running_total', 'y_start', 'label_pos'],
                                   index=["net"])
df = df.append(df_net)

For this particular waterfall, I would like to have the negative values a different color
and have formatted the labels below the chart. Let’s add columns to the dataframe with the values:

df['color'] = 'grey'
df.loc[df.amount < 0, 'color'] = 'red'
df.loc[df.amount < 0, 'label_pos'] = df.label_pos - 10000
df["bar_label"] = df["amount"].map('{:,.0f}'.format)

Here’s the final dataframe containing all the data we need. It did take some manipulation
of the data to get to this state but it is fairly standard pandas code and is
easy to debug if something goes awry.

amount running_total y_start label_pos color bar_label
sales 350000 350000 0 350000 grey 350,000
returns -30000 320000 350000 310000 red -30,000
credit fees -7500 312500 320000 302500 red -7,500
rebates -25000 287500 312500 277500 red -25,000
late charges 95000 382500 287500 382500 grey 95,000
shipping -7000 375500 382500 365500 red -7,000
net 375500 375500 0 375500 grey 375,500

Creating the actual plot, is fairly standard Bokeh code since the dataframe has
all the values we need:

TOOLS = "box_zoom,reset,save"
source = ColumnDataSource(df)
p = figure(tools=TOOLS, x_range=list(df.index), y_range=(0, net+40000),
           plot_width=800, title = "Sales Waterfall")

By defining the
ColumnDataSource

as our dataframe, Bokeh takes care of creating
all segments and labels without doing any looping.

p.segment(x0='index', y0='y_start', x1="index", y1='running_total',
          source=source, color="color", line_width=55)

We will do some minor formatting to add labels and format the y-axis nicely:

p.grid.grid_line_alpha=0.3
p.yaxis[0].formatter = NumeralTickFormatter(format="($ 0 a)")
p.xaxis.axis_label = "Transactions"

The final step is to add all the labels onto the bars using the
LabelSet

:

labels = LabelSet(x='index', y='label_pos', text='bar_label',
                  text_font_size="8pt", level='glyph',
                  x_offset=-20, y_offset=0, source=source)
p.add_layout(labels)

Here’s the final chart:

Final Waterfall Graph

Once again, I think the final solution is simpler than the matplotlib code and the
resulting output looks pleasing. You also have the added bonus that the charts are
interactive and could be enhanced even more by using the Bokeh server
(see my Australian Wine Ratings article for an example). The code should also
be straightforward to modify for your specific datasets.

Summary

I appreciate that Bryan took the time to offer the suggestion to build these plots
in Bokeh. This exercise highlighted for me that Bokeh is very capable of building custom
charts that I would normally create with matplotlib. I will continue to evaluate options
and post updates here as I learn more. Feel free to comment below if you find this useful.