For some time now, SAP have been clear with their intention to develop forecast automation via machine learning capability and functionality. With the 1811 release of SAP Integrated Business Planning (IBP), SAP are really starting on their machine learning journey by leveraging content within their Predictive Analytics Library (PAL) and have introduced Gradient Boosting of Decision Trees (GBDT) to the available set of statistical forecasting algorithms within the IBP for demand license.
This potent and effective machine learning technique is already used extensively in such fields as web page ranking for search engines, spam filtering for email, providing viewing recommendations for online streaming services and even in the analysis of results from the Large Hadron Collider at CERN in Switzerland. In this blog I will explore what GBDT is, and how its introduction to IBP for demand can help us to improve forecast accuracy by better predicting future behaviour.
GBDT has been called the “Swiss Army knife of machine learning” due to its tremendous flexibility – not least in its ability to model heterogeneous data sets and complex non-linear relationships to make solid predictions for otherwise hard-to-fit data. It is, as the name would suggest, an amalgam of two so-called “ensemble learning” algorithms – Decision Trees and Boosting technique – which combines the strengths of each to give improved predictive performance. Let us take a look at each in turn.
Decision Trees - are probably the most commonly used example of an ensemble method. Each consists of a series of ordered conditional control statements and their possible outcomes, represented by a tree-like graph such as this:
A decision tree will predict the outcome/value by first determining the relative importance of each of the regressors, thus establishing the order in which they should be considered, then quantifying the cardinal value for each decision point, before giving an estimate based on the terminal nodes. However in isolation the predictions made by a single decision tree are, whilst better than a random/naïve method, often relatively inaccurate. This is where the second element to GBDT comes in – Gradient Boosting.
Boosting - in statistical terms, is a method of stage-wise, additive modelling designed to provide a better predictive performance by combining multiple simple models together. In this case, Gradient Boosting takes the results of multiple decision trees, which as we explained are seen as “weak learners” in themselves, and looks to reduce the errors from each iteration sequentially to create a “stronger”, more complex predictor. A robust predictive model is therefore built by the linear addition of multiple decision trees, i.e. “Boosted”.
It can be likened to the way in which thin threads, when bound together, form a much stronger rope.
In simplistic terms, the process the GBDT algorithm takes is that it uses the errors from the first decision tree as the input data values for the next observation, and so on, with each iteration seeking to improve the results by reducing the error of the previous model. In this way, by starting with a simple regression model the algorithm “learns” with each iteration and develops a more intricate model which can handle otherwise hard-to-fit data.
One potential drawback of the GBDT model when left unchecked is a risk of overfitting. Due to the continuous iterative nature of the attempts to correct and reduce the errors, there is a genuine risk that the algorithm will begin finding patterns where there are none and fitting random “white noise” which will naturally be present in any data series. This risk can be moderated via the use of various parameters which act together to fine tune the model. This process is known as “regularisation”. We can look at these tuning parameters and how they work by delving into the configuration of the GBDT model in IBP for Demand.
As with all other statistical algorithms available in IBP for Demand (see previous OH blog for more details on these) we configure the GBDT using the “Manage Forecast Models” App. Here we see a screenshot of the “Forecasting Steps” tab with which we define the tuning parameters:
Configuring the "Tuning" Parameters of GBDT
As with any model, horizons for history and forecast must first be defined, then here we specify the key figures for the input (sales history) and output of the statistical result. Thereafter, the individual parameters must be maintained so let us consider each one in turn:
Maximum Number of Trees - overfitting is a significant risk if too many decision trees are used in the algorithm. Using this parameter we can define a maximum threshold for the total number of trees considered in the calculation.
Maximum Tree Depth - here we limit the DEPTH (number of levels) of each tree, again to reduce the risk of overfitting. Each additional level will increase the number of leaves (terminal nodes in the tree) with a factor of 2. So the number of leaves in a tree is equal to 2L where L represents the number of levels.
This has an impact in both the accuracy of the model, and the performance burden of the calculation. If too many levels are permitted, the model will very likely begin to learn relations which are very specific to particular data samples. We use this in conjunction with the above parameter and “tune” the model to find a suitable combination of number & depth of trees. Most available literature suggests that more trees of lesser depth will in general give the most robust predictive results. In any event it is recommended that a maximum depth of 8 should be set – more commonly a depth of 4-6 should be ample.
Learning Rate - this constraint controls the weighting of the decision trees, thus slowing down the rate of learning in the GBDT model. This is also known in this context as “shrinkage”. The parameter scales the contribution of each tree by a factor (i.e. the specified learning rate) when adding the new tree to the calculation. Lower learning rates will produce more robust models which are less prone to overfitting, but will tend to require a higher number of iterations (and thus trees) to reach an optimal result.
It is therefore clear to see that these 3 parameters work concurrently as levers with which to tune the model, making it very dynamic and flexible, but more complex than some of the simpler models available. The trade-off, however, is of course the relative quality of the predictive outcome.
Independent Variables - the GBDT model also allows the addition of Independent Variables which are then considered in the regressive analysis to explain variations in the sales history data. These independent variables must be set up as key figures and modelled in the planning area. To be used, both the past AND future values of each variable must be known (or estimated) and maintained in the relevant key figure(s). In our example we have used the 2 independent variables already created for the Multiple Linear Regression – Temperature and House Price Index (other examples might include variables such as retail price, marketing spend, etc.)
The GBDT model will then analyse the historical values to establish any correlation between the sales history and these independent variables, using this to predict future demand based on the impact of the projected values of those same variables.
System-Generated Features - IBP also offers a number of time-based, system generated features with which to leverage the algorithm. These features allow the model to capture seasonality to help expound the variances in sales history and enhance the predictive results. These can also be used in isolation in the absence of any independent variables (regressors) with which to develop the GBDT model,
As seen in the screenshot, the available alternatives are Month, Quarter and Day. As you would expect, these variables identify any regular seasonal pattern in the sales history to enhance the future estimates for that same periodicity (in the case of month and Quarter). Where the Monthly option is selected, the Quarterly selection has no additional beneficial impact so is not required. Selecting the Daily feature will help GBDT predictive results for each weekday by capturing variations by day within each week (for example whether sales at the weekend are higher/lower).
GBDT - the Pros and Cons
Disadvantages - all statistical forecasting algorithms have their strengths and weaknesses, and GBDT is no exception. We have already explored one of the drawbacks, namely the danger of overfitting, but we have seen how this risk can be mitigated by the use of the available tuning parameters in IBP for demand.
Further to this, the model does not perform well on trending data series as it cannot extrapolate such patterns. A test for trend therefore would be advisable before applying GBDT (this is of course readily available within the “Best Fit” functionality of IBP for demand and also Time-Series Analysis functionality which can be used to identify trends etc in data.)
Moreover, as we also eluded to earlier in this blog, whilst the inclusion of independent variables is a potent predictive tool, it is also contingent upon the availability of reliable future values for those variables. Similarly, if the expected values of those values in future are vastly different compared to the past then the quality of the prediction will also suffer.
A final point worth considering, whilst not related to the quality of results, is the runtime of this algorithm comparative to others. The time taken to run the GBDT algorithm is dramatically increased when the number and depth of the decision trees is raised. Again, however, this can be controlled via the model parameters.
Advantages - in spite of these drawbacks, the positives of this powerful machine learning algorithm are plentiful and significant. The key advantages of Gradient Boosting Decision Tree algorithm are:
- It can automatically determine the significance/weighting of each of the variables being modelled
- It requires no input transformation
- It is able to handle multiple variables
- It can automatically handle the dependencies between predictors
- It can model complex, non-linear relationships between variables
- It is entirely insensitive to outliers
Conclusion - The proof of the pudding is in the eating
Just as with the other statistical forecasting algorithms available in IBP for demand, once configured it could not be simpler to run the algorithm to generate a forecast from the Excel UI planning views. As covered in previous Olivehorse blogs, we have the ability to run the model interactively on any selection at any level of granularity to review the results, and as ever with the option to run a simulation, in each case having full visibility of the final result of the algorithm, and the forecast model fit errors associated with each run.
Indicative results on sample data sets are incredibly encouraging, resulting in some significant improvements in model fit error, as is the ultimate objective of the Gradient Boosting
Even on datasets where existing “Best Fit” results had returned relatively strong results (MAPE of <20%), initial iterations of the GBDT brought this error down into single digits. Through further manipulation and “tweaking” of the number and depth of decision trees we were able to reduce this figure still further.
The improvement on harder to fit data series are more impressive still, seeing a significant reduction in error value in almost all applicable cases.
So, in conclusion, there are clear indications that this latest addition to the SAP IBP for demand is a significant step forward in the endeavour to leverage Machine Learning capability to further enhance and automate the demand management process within SAP IBP. Here at Olivehorse we have our own fully built and integrated SAP IBP system, now configured with this very latest in statistical forecast methods. Should you be interested in seeing how this or any other features can help your organisation, please contact us today to enquire about our free IBP taster session service.
Senior Consultant - Olivehorse Consulting