In this post, we’ll discuss how you can use Docker with Python and R operators to build pipelines in the Modeler application of SAP Data Intelligence.
Python and R operators are used in modern containerization techniques through Docker images in the Modeler application. The Modeler application includes several predelivered Python3 and R Client operators that you can use seamlessly in your pipelines and graphs. Both types of operators are easy to use and configure.
In the Python3 operator, you can write your own scripts, and functions in the script will be supported by API objects. The Python3 operator from SAP Data Intelligence 3.1 is compatible with Python 3.6. As with Python, the R Client operator is also a predelivered operator in the Modeler application. You can run your own RServe scripts through the R Client operator, which also supports predefined API objects. This operator is compatible with R 3.3.3 and 3.5.1.
For both scripts, you’ll need to import additional libraries including the base libraries of Python and R into your user script to execute the script successfully. Docker, through the Modeler application, provides a predefined runtime environment for downloading libraries and successfully using them in scripts of the pipeline. Let’s explore a simple use case using Docker images for tagging through the Python3 operator for a better understanding of Docker usage in the SAP Data Intelligence Modeler application.
In our use case, we assume that, in our Python script, we’ll need the pandas library, which will need to be imported during script execution. We’ll create a simple Dockerfile to build the Docker image to install the pandas library. Our Docker image will be based on the parent image FROM $com.sap.sles.base, which will have the python36 library, and we’ll also install pandas using the package manager pip. We’ll add the tag pandalib to our Docker image.
Note: The pandas library is an open-source library for data analysis and manipulation, built on top of the Python language. This library helps to import data from various sources like JSON, SQL, CSV, Microsoft Excel, and more and enables data manipulation like data cleaning, data merging, and the joining of data sets.
First, create, save, and execute the Dockerfile to generate the Docker image. After the Docker image is created, as shown below, you’ll see success message in the top-right corner of the Modeler page.
Now, our Docker image is ready, and we should focus on integration between the Python3 operator and the Docker image. The Python3 operator has no input or output port by default, so you have the flexibility to define them as needed. Since our Python script needs the pandas library to successfully run, we need to map the Docker image with the Python script. To integrate with the Docker image, we’ll group the Python script with the pandalib tag, which belongs to the Docker image loaded with the python36 and pandas libraries. Follow these steps:
- Open the Modeler application.
- Open any existing pipeline where the Python3 operator must be added or create a blank graph or pipeline. Then, drag the Python3 Operator onto the design area. We used a new blank graph, shown in the figure below, to illustrate the steps.
- Right-click on the Python3 Operator and select the Group option, as shown here.
- You’ll find a Group box surrounding the operator, as shown in the figure below (1). Select the Group box around the operator and open the Configuration pane (2) on the right side of the window to add the tag pandalib (3) for integration.
After grouping with the pandalib tag, integration between the Docker image and your Python script is complete. Now, your Python operator is ready to be executed in a pipeline using the predefined platform of the Docker image where the pandas library is already present.
Note: The R Client operator works in the same way as the Python3 operator. You can group the R Client operator and map the corresponding tag of the Docker image to integrate it.
Editor’s note: This post has been adapted from a section of the book SAP Data Intelligence: The Comprehensive Guide by Dharma Teja Atluri, Devraj Bardhan, Santanu Ghosh, Snehasish Ghosh, and Arindom Saha.