In this blog post, we’ll present an end-to-end implementation method for predictive analytics in SAP Analytics Cloud.
The most recognized method to implement predictive analytics projects is the cross-industry standard process for data mining, also known as CRISP-DM. You can read more about this method at http://s-prs.co/v577109. This process consists of six major phases as described in the figure below:
In the next subsections, we’ll walk you through the step-by-step process and the detailed tasks required as part of each of the six project phases.
The first phase of the predictive analytics project consists of determining the business expectations and the related implications; that is, what does the business owner want to accomplish? You need to spend the required time analyzing in depth the business objectives and what the business considers as project success criteria. This information will form the cornerstone of your project plan.
The business has high-level expectations that you must analyze and translate into a predictive analytics approach. Here are some examples of the high-level business requirements:
The business objectives will be your North Star to propose a fit-for-purpose predictive analytics approach and a project plan to achieve these objectives on time. In certain cases, the business requirements can’t be addressed with predictive analytics, and you’ll have to make this clear from day one.
You should not rush or neglect the business understanding step. If you do, you might be wasting a great deal of time and effort producing the right answer to the wrong question.
You need to take time to fully assess the situation, and ask yourself and answer questions like these:
You must uncover any crucial factors at the beginning of the project that can influence the outcome of the project to avoid losing time and energy during subsequent steps.
Based on the collected information, you can determine the predictive analytics goals. Whereas business goals state the project objectives from a business perspective, you also need to refine these business goals into more technical goals. These predictive goals will also need to come with success criteria. As an example, the success criteria for a forecasting project might consist of dividing the current forecasting error by a factor of two.
Finally, you must produce a project plan that recaps the business goals as well as the predictive analytics goals. This plan should describe all the steps you’ll take to achieve the project.
Let’s move on to our next phase. During the data understanding phase, you’ll proceed through the following steps, which we’ll discuss in the next sections:
The data collection step consists of acquiring or accessing the data identified as part of the project plan. Depending on the data location, you might want to access and review the data either in SAP Analytics Cloud or directly in the data platform you’re relying on. One option consists of acquiring the data into SAP Analytics Cloud (or accessing it live if the data is stored in an on-premise SAP HANA system) and reviewing it there. While SAP Analytics Cloud can’t be considered an advanced data exploration and data quality solution, it provides some basic features that can help you during the data understanding phase.
On the other hand, most data platforms, such as SAP HANA, offer data exploration functionalities to help data engineers during the data understanding phase. Based on your project plan and initial assessment, multiple data sources might be required to serve your predictive analytics use case, and that you’ll need to group these. During this step, you also need to think of the target data model in SAP Analytics Cloud. Will you use datasets or planning models (only applicable to time series forecasting scenarios)? This modeling choice determines the possibilities you’ll have when using this data for predictive analytics and the way you’ll report on the predictions. It’s good practice to document your findings and choices in a data collection report.
The report should answer the following questions:
In the data description step, you start digging deeper to examine the various properties of the data to answer the following questions:
Based on this analysis, it’s recommended to create another short report that contains a description of the data and to document your findings. While this might sound tedious, it’s easy to lose track of the project context after a brief period.
During the data exploration step, you’ll explore the data using queries and visualizations. Your analysis should answer the following questions:
Datasets in SAP Analytics Cloud offer basic data exploration features. For example, the first figure below shows the data distribution of a measure, and the second one shows unique values for a dimension.
Use your curiosity to dig deeper into the data, and you’ll likely make additional findings during this step. Similar to the previous steps, it’s recommended that you document your findings in a data exploration report.
In the data quality step, you should answer the following questions to check the data quality:
The data quality step will result in you creating a data quality report, detailing your findings, and, if any data quality problems exist, listing practical solutions.
You should have a much deeper understanding of your data after this phase. This will help you prepare the data models best to serve your predictive analytics needs.
There is a famous joke that says predictive analytics is 80% about preparing data and 20% complaining about preparing the data. While the reality might sometimes be less harsh, the condition of the prepared data does impact the accuracy of the predictive results. This is sometimes summarized by predictive analytics practitioners as “garbage in, garbage out.”
During the data preparation phase, you’ll follow these steps, which we’ll discuss in the next sections:
The data selection step focuses on selecting the data used during the predictive modeling phase. The selection criteria should include the relevance to the predictive analytics goals, as well as quality and technical constraints such as limits on data volume or data types. As an example, if you want to forecast the next 12 months ahead for a given indicator, it’s ideal to select five or six years of the past evolution of this indicator, at the monthly level. Note that the data selection should consider dataset columns as well as observations. At the end of this step, you should be able to list the data you would keep or exclude and the rationale for these decisions.
During the data cleaning step, you’ll raise the level of data quality. There are many different actions you can take there. As we previously mentioned, if time series are incomplete, you should avoid as many breaks as possible in the time series by filling the intermediate, missing data points, assuming they correspond to null values. In addition, dimension members frequently have close names due to typos or incorrect master data management. You need to align these close values to the standard one. Note that smart predict can deal with missing values as part of the classification and regression modeling so you don’t need to fill in missing values artificially. The output of this step is a data cleaning report that lists the decisions and actions that were taken to address the data quality problems detected during the data quality check step of the data understanding phase.
The data enrichment step includes data preparation operations such as the production of derived variables or transformed values for existing variables. Derived variables are new variables that are built off one or more existing variables of the dataset. The calculation of derived variables and the transformation of existing variable values will need to be documented for traceability and further reference. Note that SAP Analytics Cloud does offer features to clean and enrich data in datasets and planning models.
In the data integration step, information from multiple data sources is combined to create new records or values. In most cases, the various data sources will be merged into one through joining. Merged data can also include ad hoc aggregations. SAP Analytics Cloud doesn’t offer specific features to handle data integration, so the data integration step must be handled outside of the solution, for instance, by using other SAP solutions such as SAP HANA, SAP Business Warehouse (SAP BW), or SAP Datasphere. Standard data types and data formats are natively handled by SAP Analytics Cloud.
At the end of the data preparation phase, you finalize your data models, alongside a description of the data preparation steps you took to create them.
You’ve finalized your data models and are ready to use them to create your predictive scenarios and predictive models. The predictive modeling phase will consist of four major steps, which we’ll discuss in the following sections:
The first step is to select the right predictive scenario type. In smart predict, you can create three types of predictive scenarios: classification, regression, and time series forecasting. While these three types cover a broad range of business questions addressable by predictive analytics, they don’t allow you to, for instance, answer clustering or product recommendation questions.
Here are the definitions of the three types of predictive scenarios:
Based on your earlier examination of the business question, you should already know at this stage which type of predictive scenario you’ll select.
Before you start creating the predictive models, you must define how you’ll assess the model accuracy and the relevance of the predictions to fit the business needs. In most cases, you can rely on the performance indicators that smart predict provides you with to evaluate the predictive models. These indicators are computed by internally reserving a set of the actual data and using the predictive model to predict this actual dataset. You might want to evaluate the performance of the predictive model in real-world conditions to estimate the potential performance you could get. For instance, it’s common for customers to evaluate the performance of the predictive model by predicting data for periods where they already know the actual values. Another common option is to let some time pass to gather more data points to compare the predictions to what happened later.
During this step, you first create a predictive scenario that will contain your different predictive models. You can create different predictive models on top of the prepared datasets or planning models. You’ll experiment with different predictive model settings. You’ll have to iterate between creating predictive models and sophisticating further the base data until you converge to acceptable predictive models. Once this is done, you need to carefully preserve the selected predictive models and their settings, and then describe the logic of your successful experiments. While smart predict stores every predictive model for you in a single predictive scenario, you need to create the right documentation because you might lose track of the predictive model creation history over time.
Assessing the predictive models corresponds to confronting your domain knowledge, the predictive analytics success criteria you defined earlier, and your plan to compare and evaluate predictive model experiments. You must summarize the different results obtained, compare the respective qualities of the generated predictive models, and rank them in relation to each other. For this, you can use smart predict standard performance indicators; you can also use ad hoc performance indicators you create in stories.
In the predictive model evaluation, you’ll focus on three major steps, which we’ll discuss in the following sections:
Evaluating your results and the performance of predictive models is typically done by the prediction creators using the following standard performance indicators provided by smart predict:
The performance indicator you choose to evaluate the predictive models should be the one that most closely matches the business objectives defined at the beginning of the project. Customers often create their own performance indicators in the context of stories to evaluate the prediction performance. The rationale for doing this is to create a performance indicator that is close to the way the business expresses its needs and will also evaluate predictions. You can use your predictive model to deliver predictions and compare them to what will happen later if your time and budget constraints allow.
Despite all the excellent work that you’ve been doing through the previous phases, the performance of your predictive models may not be satisfactory enough. If this is the case, consider the following possibilities:
The output from this evaluation step is an assessment of the results with respect to the business goal and success criteria. You summarize the assessment results regarding the business success criteria and provide a final statement as to whether the predictive analytics project meets the initial business objectives.
You need to conduct a thorough review of the predictive analytics project to determine if there is any crucial factor or task that has somehow been overlooked. Summarize the process and highlight activities that have been missed and/or should be repeated. This review also covers quality assurance issues, so, for example, consider these questions: Did you correctly build the predictive model? Did you only use variables that are allowed for use and that are available for future analysis? This step is needed to build additional confidence and trust in the steps you took.
Finally, you’ll need to determine the next steps according to the results assessment and the process review. You’ll need to decide whether to finish the predictive part of the project and move onto deployment, if that is appropriate, or whether you need to initiate further iterations and set up additional tasks. This step includes the analysis of the remaining resources and budget that influence the decisions. You’ll need to describe the decisions about how to proceed, along with the rationale.
Predictions can be used in different ways in SAP Analytics Cloud. The main goal is to provide predictions to prediction consumers so that they can use them in their business context and make decisions based on these predictions. Predictions can be both integrated into stories and exported to external systems, which we’ll discuss next.
The most straightforward place to consume predictions is the stories. Predictions can be exported into datasets (using any predictive scenario type) or planning model versions (specifically using time series forecasting). Stories can be built by story designers off datasets or planning model versions. At the end of the day, prediction consumers can see actuals and predictions side by side in stories, if not budget and forecast in the context of financial planning. They can make decisions based on present and future information. As we previously mentioned, prediction consumers aren’t experts in predictive analytics, but they know their business intimately. It’s key that prediction creators and story designers can deliver the predictions with enough business context and in a straightforward way so that prediction consumers can understand and use them easily. Smart predict provides several functionalities for prediction consumers to gain trust in the predictions.
In some cases, it’s not enough to consume the predictions only in the context of SAP Analytics Cloud, so you’ll need to export predictions out of SAP Analytics Cloud.
Several features make prediction export possible, with different degrees of automation:
Editor’s note: This post has been adapted from a section of the book SAP Analytics Cloud: Predictive Analytics by Antoine Chabert and David Serre.