 # A Guide to Different Evaluation Metrics for Time Series Forecasting Models – Analytics India Magazine

Measuring the performance of any machine learning model is very important, not only from the technical point of view but also from the business perspective. Especially when the business decisions are dependent on the insights generated from the forecasting models, knowing its accuracy becomes vital. There are different types of evaluation metrics used in machine learning depending on the model used and the results generated. In the same context, there are different evaluation metrics used to measure the performance of a time-series forecasting model. In this post, we will discuss different evaluation metrics used for measuring the performance of a time series model with their importance and applicability. The major points to be covered in this article are listed below.
Let’s start the discussion by understanding why measuring the performance of a time series forecasting model is necessary.
The fact that the future is wholly unknown and can only be predicted from what has already occurred is a significant distinction in forecasting. The ability of a time series forecasting model to predict the future is defined by its performance. This is frequently at the expense of being able to explain why a particular prediction was made, confidence intervals, and even a greater grasp of the problem’s underlying causes.
Time series prediction performance measurements provide a summary of the forecast model’s skill and capability in making the forecasts. There are numerous performance metrics from which to pick. Knowing which metric to use and how to interpret the data might be difficult.
Moving further, we will see different performance measures that can be applied to evaluate the forecasting model under different circumstances.
Evaluation Metrics to Measure Performance
Now, let us have a look at the popular evaluation metrics used to measure the performance of a time-series forecasting model.
The stationary R-squared is used in time series forecasting as a measure that compares the stationary part of the model to a simple mean model. ​​It is defined as,
Where SSres denotes the sum of squared residuals from expected values and SStot denotes the sum of squared deviations from the dependent variable’s sample mean. It denotes the proportion of the dependent variable’s variance that may be explained by the independent variable’s variance. A high R2 value shows that the model’s variance is similar to that of the true values, whereas a low R2 value suggests that the two values are not strongly related.
The most important thing to remember about R-squared is that it does not indicate whether or not the model is capable of making accurate future predictions. It shows whether or not the model is a good fit for the observed values, as well as how good of a fit it is. A high R2 indicates that the observed and anticipated values have a strong association.
The MAE is defined as the average of the absolute difference between forecasted and true values. Where yi is the expected value and xi is the actual value (shown below formula). The letter n represents the total number of values in the test set.
The MAE shows us how much inaccuracy we should expect from the forecast on average. MAE = 0 means that the anticipated values are correct, and the error statistics are in the original units of the forecasted values.
The lower the MAE value, the better the model; a value of zero indicates that the forecast is error-free. In other words, the model with the lowest MAE is deemed superior when comparing many models.
However, because MAE does not reveal the proportional scale of the error, it can be difficult to distinguish between large and little errors. It can be combined with other measures to see if the errors are higher (see Root Mean Square Error below). Furthermore, MAE might obscure issues related to low data volume; for more information, check the last two metrics in this article.
MAPE is the proportion of the average absolute difference between projected and true values divided by the true value. The anticipated value is Ft, and the true value is At. The number n refers to the total number of values in the test set.
It works better with data that is free of zeros and extreme values because of the in-denominator. The MAPE value also takes an extreme value if this value is exceedingly tiny or huge.
The model is better if the MAPE is low. Remember that MAPE works best with data that is devoid of zeros and extreme values. MAPE, like MAE, understates the impact of big but rare errors caused by extreme values.
Mean Square Error can be utilized to address this issue. This statistic may obscure issues related to low data volume; for more information, check the last two metrics in this article.
MSE is defined as the average of the error squares. It is also known as the metric that evaluates the quality of a forecasting model or predictor. MSE also takes into account variance (the difference between anticipated values) and bias (the distance of predicted value from its true value).
Where y’ denotes the predicted value and y denotes the actual value. The number n refers to the total number of values in the test set. MSE is almost always positive, and lower values are preferable. This measure penalizes large errors or outliers more than minor errors due to the square term (as seen in the formula above).
The closer MSE is to zero, the better. While it overcomes MAE and MAPE extreme value and zero problems, it may be harmful in some instances. When dealing with low data volume, this statistic may ignore issues; to address this, see Weighted Absolute Percentage Error and Weighted Mean Absolute Percentage Error.
This measure is defined as the square root of mean square error and is an extension of MSE. Where y’ denotes the predicted value and y denotes the actual value. The number n refers to the total number of values in the test set. This statistic, like MSE, penalizes greater errors more.
This statistic is likewise always positive, with lower values indicating higher performance. The RMSE number is in the same unit as the projected value, which is an advantage of this technique. In comparison to MSE, this makes it easier to comprehend.
The RMSE can also be compared to the MAE to see whether there are any substantial but uncommon inaccuracies in the forecast. The wider the gap between RMSE and MAE, the more erratic the error size. This statistic can mask issues with low data volume.
The normalized RMSE is used to calculate NRMSE, which is an extension of RMSE. The mean or the range of actual values are the two most used methods for standardizing RMSE (difference of minimum and maximum values). The maximum true value is ymax, while the smallest true value is ymin.
NRMSE is frequently used to compare datasets or forecasting models with varying sizes (units and gross revenue, for example). The smaller the value, the better the model’s performance. When working with little amounts of data, this metric can be misleading. However, Weighted Absolute Percentage Error and Weighted Mean Absolute Percentage Error can help.
WMAPE (sometimes called wMAPE) is an abbreviation for Weighted Mean Absolute Percentage Error. It is a measure of a forecasting method’s prediction accuracy. It is a MAPE version in which errors are weighted by real values (e.g. in the case of sales forecasting, errors are weighted by sales volume).
where A is the current data vector and F is the forecast This metric has an advantage over MAPE in that it avoids the ‘infinite error’ problem.
The higher the model’s performance, the lower the WMAPE number. When evaluating forecasting models, this metric is useful for low volume data where each observation has a varied priority. The weight value of observations with a higher priority is higher. The WMAPE number increases as the error in high-priority forecast values grows.
Let’s have a quick summary of all the above-mentioned measures.
Through this post, we have seen different performance evaluation metrics used in time series forecasting in different scenarios. Most of all above-mentioned measures can directly be utilized from sklearn.metrics class or can be directly implemented from scratch with NumPy and math modules.
Vijaysinh is an enthusiast in machine learning and deep learning. He is skilled in ML algorithms, data manipulation, handling and visualization, model building.
Copyright Analytics India Magazine Pvt Ltd

source
Connect with Chris Hood, a digital strategist that can help you with AI.