Let's walk through the steps of creating an automated regression model with SAP Predictive Analytics.
We’ll use two datasets (one for training, one for application) related to predicting the energy use of household appliances (https://archive.ics.uci.edu/ml/datasets/Appliances+energy+prediction), provided by the UCI Machine Learning Repository.
The goal of this regression problem is to predict the energy use of appliances in a lowenergy building. The following is the corresponding list of variables:
- date year-month-day hour:minute:second
- Appliances (the output variable): Energy use in Wh
- Lights: Energy use of light fixtures in the house in Wh
- T1: Temperature in kitchen area, in Celsius
- RH_1: Humidity in kitchen area, in %
- T2: Temperature in living room area, in Celsius
- RH_2: Humidity in living room area, in %
- T3: Temperature in laundry room area
- RH_3: Humidity in laundry room area, in %
- T4: Temperature in office room, in Celsius
- RH_4: Humidity in office room, in %
- T5: Temperature in bathroom, in Celsius
- RH_5: Humidity in bathroom, in %
- T6: Temperature outside the building (north side), in Celsius
- RH_6: Humidity outside the building (north side), in %
- T7: Temperature in ironing room, in Celsius
- RH_7: Humidity in ironing room, in %
- T8: Temperature in teenager room 2, in Celsius
- RH_8: Humidity in teenager room 2, in %
- T9: Temperature in parents’ room, in Celsius
- RH_9: Humidity in parents’ room, in %
- T_out: Temperature outside (from Chièvres weather station), in Celsius
- Press_mm_hg: Pressure (from Chièvres weather station), in mm Hg
- RH_out: Humidity outside (from Chièvres weather station), in %
- Windspeed (from Chièvres weather station): In m/s
- Visibility (from Chièvres weather station): In km
- Tdewpoint (from Chièvres weather station): °C
- WeekStatus: Whether the day is a weekday or is a weekend day
- Day_of_week: The day in the week from Monday to Sunday
More details on the dataset’s structure and its variables can be found directly at the UCI dataset website.
Before creating your automated regression model, you must have performed the basic steps of creating a Predictive Factory project that points to a dataset folder and to a model folder, as shown in the figure below.
Next, open the Models tab of your project. Click on the Add Model button and select the entry Regression, as shown in the following figure. The New Model page will open.
The process for creating a regression model is not significantly different from creating a classification model. To summarize, you’ll set the following parameters:
The Name of the model and the Business Question that the model will help resolve: For example, you could enter “Appliance Energy Prediction” and “Predict the energy use of the appliances in a low-energy building” in the respective fields.
The Input Data set: You should use the dataset training.csv.
The Target Variable under Variable Roles: You should choose a continuous variable as your target; for our example, select the variable Appliances from the list of values.
Edit Variable Metadata: You’ll need to set the variable types as shown in the following figure; set the variable type for the variable lights from Ordinal to Continuous.
Click Save and then Train once you’ve configured your various parameters. The resulting regression model has a Predictive Power of 66.39% and a Prediction Confidence of 97.95%, as shown in the final figure. Thus, the model is of good quality even if not all the variability of the target variable is explained, and the model is robust as the Prediction Confidence is greater than 95%.
We chose to create this automated regression model using the Predictive Factory, but the same model could also be created in Automated Analytics. Let us know how you create, interpret, and improve your automated regression models.
Editor’s note: This post has been adapted from a section of the book SAP Predictive Analytics: The Comprehensive Guide by Antoine Chabert, Andreas Forster, Laurent Tessier, and Pierpaolo Vezzosi.
*The full citation for the dataset above is as follows:
Luis M. Candanedo, Veronique Feldheim, and Dominique Deramaix, “Data driven prediction models of energy use of appliances in a low-energy house,” Energy and Buildings, Volume 140, 1 April 2017, Pages 81–97, ISSN 0378-7788, https://archive.ics.uci.edu/ml/datasets/Appliances+energy+prediction.