Easily Level Up your EDA with Jupyter Notebook Widgets

Richard Mei
4 min readAug 23, 2020

--

Are you tired of coding up a graph in your Jupyter Notebooks and finding yourself needing to copy and paste or edit a small bit of the code to make it work for another variable? Maybe you have a function that takes in multiple variables and you want to try multiple different inputs very quickly.

Well I’m sure even if you can’t relate to those problems up there, you can definitely find some use for Jupyter Widgets! This is library that makes your notebooks interactive, using “buttons” within your Jupyter Notebook that can be in the form of a slider, checkbox, textbox, and even drop down menus. This is definitely only a limited amount of what this library can do, but these are the capabilities I’m going to explore today!

Getting Started

To start off, you need to install the library by using pip or anaconda using:

pip install ipywidgetsconda install -c conda-forge ipywidgets

After doing that, it’s time to import the library, but for today I’ll only be going through the basics of “interact” function.

from ipywidgets import interact

The function is very simple, it takes a method as its first parameter and then takes as many parameters as the method provided needs. Depending on which type of interactive button you want (i.e slider, drop down), you want to choose the appropriate data type.

As a very quick example, we can have a simple method called summation that takes 2 variables and adds them up. We want to look at different summations of numbers, so we want to use a slider for the first variable x with range 0 and 10 and a drop down menu for the second variable y with values 0, 5 and 10.

def summation(x,y):
return x + y
interact(summation,x = (0,10), y = [0,5,10])

This code will produce an interface inside Jupyter that looks like this:

The y has the values listed as all the possible values from the given list in the menu. Inside the notebook you should be able to move the slider left and right as well has pick from the drop down and see an output that would be the sum of x and y.

Interactive EDA

Now how can we use this to do EDA? There isn’t one simple way for everything, but I’ll show an example using avocado prices from a data set I found on Kaggle.

This data set has weekly prices of small, large, and extra large Hass avocados recorded from different regions throughout the U.S. I used Pandas to clean up and contain the data as well as use to make graphs. I end up with a Dataframe that looks like this followed by some code for graphing:

def plot_line_data(region,kind,y):
selection = data_partial[
(data_partial['region'] == region) & (data_partial['type'] == kind)
]
selection.plot('Date',y,kind = 'line')
a = interact(plot_line_data,
region = data_partial.region.unique(),
kind = ['conventional','organic'],
y = ['AveragePrice','Total Volume'])

The method “plot_line_data” has a variable called selection that takes in specific data that followed the criteria we want to look at. In this example I am only taking the first 1000 entries to make a time series graph. Inside the interact function we have region that is a “list” of the different regions inside the region column, which would mean a drop down widget. We also have the more drop downs like “kind” of avocado as conventional or organic and “y” as our dependent variable of average price or total volume.

Conclusion

The advantage of this is obvious! I can change the region to any of the 50+ region values and even change what I want to look at. Depending on what I wanted to look at, I could have changed my selection to look only at certain size avocados or even graphed two different sizes into one graph. This is certainly a lot easier than having to constantly change the code selection of “region” to the different regions of whichever variable.

Hope this post could really help level up your EDA! As I said before this is only a little glimpse of what this library has to offer and you should definitely check out the official documentation for more!

--

--