View on GitHub

Time_Series_Walmart_Sales_Forecasting

Time Series Analysis Walmart Sales Forecasting

Overview

This project focuses on forecasting retail sales using time series models. We analyze Walmart’s hierarchical sales data to predict future weekly sales trend, by implementing ARIMA/SARMIMA and Gaussian Processes (GP). We also aim to evaluate the models based on their accuracy, robustness, and computational efficiency.

Dataset

The dataset we use is hierarchical sales data from Walmart Daily Sales ranging from Jan 28th 2011 to Jun 18th 2016, containing sales information of products in three categories (Food, Hobby, Household) sold in three states (California, Texas, Wisconsin). For prediction, we aggregate the daily sales into weekly sales data in three categories sold in three states.

Methodology

We use drill-down analysis to dissect the data at a more granular level: for each category and each state, we compare ARIMA/SARMIA and GP model’s performance on our test data using RMSE metric for best model selection.

Data Preprocessing

Model Selection

Why ARIMA/SARIMA?

Why Gaussian Processes (GP)?

Results

We use RMSE (normalized) to evaluate the model performance as the following:

The MAPE (mean absolute percentage error) is 2.31% for ARIMA/SARIMA and 5.08% for GPs.

Here is the prediction plot by using ARIMA/SARIMA and GP:

Conclusion

The project confirms the efficiency of ARIMA/SARIMA models for time series forecasting in retail sales. While GPs provide flexibility, their higher computational cost and complexity in hyperparameter tuning make ARIMA/SARIMA a more practical choice for this application. Future work will include integrating external datasets like holidays to account for sales variability during special days and exploring other robust forecasting models, such as LSTM.