Exploring Hass Avocado Prices

Richard Mei
4 min readNov 2, 2020

Which U.S. state has the the lowest avocado prices?

Motivation

In this day and age, who doesn’t like avocado? Probably just only those who are allergic, but for those who aren’t there’s always the cost problem. Whether you are an individual or small business, where can one find the cheapest avocados?

Another student and I collaborated together to do a time series analysis to answer this question.

Data

We first got our data from a Kaggle source posted two years ago by Justin Kiggins. He has this amazing quote really highlighting his motivations.

“It is a well known fact that Millenials LOVE Avocado Toast. It’s also a well known fact that all Millenials live in their parents’ basements. Clearly, they aren’t buying home[s] because they are buying too much Avocado Toast! But maybe there’s hope… if a Millenial could find a city with cheap avocados, they could live out the Millenial American Dream”

Like he said in his post, he compiled the information from the official Hass Avocado Board website and put it into a csv. The data set has about 18,000 rows and 100 columns. Each entry is a recorded time of average price and it has features of date collected, average price, total volume, small/large/xlarge bags, type of avocado and region. There’s also a price look up code (PLU) of ‘4046’, ‘4225’, and ‘4770’ which represent the size of avocados. So there are 3 sizes of avocados and two types, conventional or organic. Since our goal was to find the average price for a single avocado, we took out the columns on the bag size and ended up with just the type, average price, total volume, and region.

The date range for our data is from January 2015 to March 2018. Our regions ended up becoming 27 unique states and ended up with 169 data points for each of them. From our EDA, there was no obvious pattern of the prices increasing constantly, so there may have been a different factor to the prices.

Furthermore, we looked into the volume sold per region and saw California had an overwhelming amount sold compared to the other regions. Next was Texas and then New York. The lowest volume sold was Idaho and Kentucky.

Modeling

We started our modeling by testing for stationarity with the statsmodel library. Using the Augmented Dickey-Fuller test, we saw our regions were stationary. We wanted to use ARIMA and SARIMA models, so we looked at the auto-correlation function (ACF) and partial-auto correlationfunction (PACF) to come up with a list of terms to try out. We then ran a loop of our list to try all possible combinations and ended up with models that had Root Mean Square Errors of about $0.78 to $0.86 for either conventional or organic avocados. These were not very good results considering the prices of avocados were ranging from atleast $1 to $3.

We then turned towards using Facebook Prophet, a open source library by Facebook for forecasting. This was simple to do since the pre-processing only required us to load our data into a Pandas dataframe. After, it was using creating model objects and using the built-in forecast functions. We ended up with a RMSE of ~$0.20 for organic avocados and ~$0.16 for conventional ones. This resulted in us modeling different states using Prophet. Since we were under a time crunch, we ended up only targeting states with highest (California), lowest (Georgia)and median (Indiana) volumes.

Conclusion

We saw that Indiana’s prices had didn’t always have the organic prices higher than the conventional. We’d usually expect that, which allows us to infer the low purchase volumes make the prices fluctuate more and are harder to accurately predict. California with the highest volume sold were always more expensive for the conventional ones and Georgia had prices that were seasonal. Since we only had these three states, the next step would be to do all of states we have data on. We would also want to collect more data on more states to get a more accurate read on the U.S. To answer the question at hand, I would pick a state with a volume that lies in the middle of the range. From our analysis, we would end up picking Georgia.

--

--