Regression Models for Sales Forecasting: A Simple Introduction
The problem
Imagine a company wants to predict next month’s sales. It has information such as:
- Advertising budget
- Number of potential customers (leads)
- Product price
- Time of the year (seasonality)
The question is simple:
How are these variables related to future sales?
One of the simplest and most useful tools to answer this is linear regression. Although there are more advanced machine learning models, linear regression is still one of the most widely used because it’s easy to understand, explain, and evaluate.
What is linear regression?
Linear regression tries to find a mathematical relationship between several variables and sales.
A simplified example looks like this:
Sales = Constant + Advertising Effect + Leads Effect − Price Effect
Each variable has a coefficient, which tells us how much sales are expected to change when that variable changes while everything else stays the same.
The model automatically chooses the coefficients that make its predictions as close as possible to the real sales values.
The difference between the predicted value and the real value is called a residual or prediction error.
The assumptions behind linear regression
Like every statistical model, linear regression works best when a few assumptions are approximately true.
1. The relationship is roughly linear
The model assumes that each variable affects sales in a straight-line way.
In reality, this isn’t always true.
For example, doubling an advertising budget usually doesn’t double sales because advertising tends to have diminishing returns.
If the relationship isn’t approximately linear, the model may consistently overestimate or underestimate future sales.
2. The prediction errors have similar variability
Ideally, prediction errors should have about the same amount of variability across all observations.
Sales data often violate this assumption.
Months with very high sales usually show much larger fluctuations than months with low sales.
When this happens, predictions may still be useful, but measuring how important each variable is becomes less reliable.
3. Prediction errors should be independent
Sales are collected over time, so this month’s performance is often related to last month’s.
If the model doesn’t capture these time patterns, the remaining prediction errors will also show patterns.
Ideally, after fitting the model, the residuals should look completely random.
Multicollinearity
Sometimes several variables contain almost the same information.
For example:
- TV advertising
- Online advertising
- Total marketing budget
These variables usually increase and decrease together.
This creates multicollinearity, making it difficult for the model to separate the individual contribution of each variable.
As a result:
- Coefficients become unstable.
- Small changes in the data can produce very different coefficient values.
- Predictions may remain good, but interpreting the coefficients becomes much harder.
This problem is extremely common in business datasets.
Regularization: making regression more reliable
A common solution is called regularization.
Instead of allowing coefficients to become arbitrarily large, the model applies a small penalty that keeps them under control.
Although this introduces a little bias, it often produces much better predictions on new data.
The two most popular regularization methods are Ridge Regression and Lasso Regression.
Ridge Regression
Ridge shrinks every coefficient toward zero but keeps every variable in the model.
You can think of Ridge as saying:
“Every variable matters a little, but none of them should dominate the model.”
Ridge is usually preferred when we believe that all predictors contain useful information.
Lasso Regression
Lasso also shrinks coefficients, but it goes one step further.
Some coefficients become exactly zero, meaning those variables are completely removed from the model.
Because of this, Lasso automatically performs feature selection.
It’s especially useful when we believe that only a small number of variables truly influence sales.
Choosing the best model
Both Ridge and Lasso include a parameter that controls how strong the penalty is.
If the penalty is too small:
- The model may overfit the training data.
If the penalty is too large:
- The model may become too simple and ignore important relationships.
To find the best value, analysts use cross-validation.
The idea is straightforward:
- Train the model using part of the data.
- Test it on unseen data.
- Repeat the process several times.
- Keep the version that performs best on new observations.
For forecasting, it’s important to always train on past data and evaluate on future data.
Otherwise, the model would accidentally learn information from the future, producing unrealistically optimistic results.
The bias-variance tradeoff
Every predictive model balances two types of errors.
Bias happens when the model is too simple and misses real patterns.
Variance happens when the model learns the training data too well and fails to generalize to new observations.
Regularization accepts a small increase in bias in exchange for a much larger reduction in variance.
In real-world forecasting problems, this usually leads to more accurate predictions.
Evaluating the model
Many beginners focus only on R², which measures how well the model fits the training data.
However, a model with an excellent R² can still perform poorly when predicting future sales.
Instead, forecasting models should always be evaluated using data they have never seen.
Three common metrics are:
MAE (Mean Absolute Error)
The average prediction error.
Easy to understand because it’s measured in the same units as sales.
RMSE (Root Mean Squared Error)
Similar to MAE, but it penalizes large mistakes more heavily.
Useful when large forecasting errors are especially costly.
MAPE (Mean Absolute Percentage Error)
Measures prediction error as a percentage.
This makes it easy to compare forecasting performance across products or markets of different sizes.
Final thoughts
Linear regression remains one of the most important tools in business analytics.
It provides a simple way to understand how business variables influence sales while serving as the foundation for many more advanced machine learning methods.
Checking the model’s assumptions, recognizing problems like multicollinearity, and evaluating predictions on future data are all essential steps in building reliable forecasting models.
Methods such as Ridge and Lasso improve regression by reducing overfitting and producing models that generalize better to new situations.
No forecasting model can predict the future perfectly.
The real goal is to make better business decisions by reducing uncertainty as much as possible.