Let’s explore the architecture of SAP Data Intelligence.
The figure below shows an overview of the different components made available as part of SAP Data Intelligence, both from an ecosystem point of view and from a services point of view.
To start, let’s quickly look at the ecosystem view of SAP Data Intelligence in terms of different types of connectivity. The core components of SAP Data Intelligence from this perspective include the following.
Your company may have data lakes and many systems that require always-on integration across complex data types, with data transformations happening on the fly. These integrations may require the use of on-premise legacy processing engines; open-source engines (such as R, Python, and TensorFlow); and cloud processing engines. These requirements are supported by the ability to connect with SAP and non-SAP source systems such as ABAP integration (spanning across SAP S/4HANA, SAP BW/4HANA, and SAP Business Warehouse [SAP BW]); SAP cloud data integration (including SAP Fieldglass, SAP Customer Experience, SAP SuccessFactors, and SAP Concur); standard connectors for open and native protocols (streaming e.g., Internet of Things [IoT], cloud storages, Hadoop/Hadoop Distributed File System (HDFS), Representational State Transfer (REST) APIs, databases, and public clouds); and last but not the least SAP Business Technology Platform (SAP BTP) connectors (SAP API Business Hub, Cloud Integration capability for process integration, and Open Connectors). The functionality allows you to create, update, delete, and monitor connections.
SAP Data Intelligence offers the Kafka Producer and Kafka Consumer operators, along with storage layer and stream processing services, to help you set up a messaging system. Having a Kafka service in place along with SAP Data Intelligence allows you to establish connectivity using Connection Management. Once the connection has been created, you can use producer and consumer operators in the SAP Data Intelligence Modeler to orchestrate the data flow.
SAP Data Intelligence, as a comprehensive data management tool, provides a unified view of the data across the enterprise by allowing you to streamline the integration and processing of all kinds of enterprise data across fragmented landscapes. This solution allows you to design powerful data pipelines that use open-source and reusable on-premise and cloud-based processing engines with more than 250 operators. Finally, SAP Data Intelligence helps with the centralized management of distributed data by providing visibility into all data sources and pipelines across your landscape, with centralized management features to monitor various pipelines for ongoing status reporting and for proactive job management. SAP Data Intelligence can also monitor machine learning-based pipelines both from a performance and from a metrics point of view to help you decide on the right model and help trigger training for a model based on certain metric thresholds.
With the Metadata Explorer, users discover and profile the data being consumed to identify potential anomalies as well as identify other recurring business rules required to ensure high-quality data is passed downstream. Users can use spreadsheet-based user interface (UI) functionality, along with metadata crawlers, to explore, classify, and label data assets across your connected landscape. Beyond data discovery and profiling, data lineage features help users review the data transformation history and related metadata to understand how, where, and why data has been altered.
Now that we’ve set the context of the different possibilities, let’s go one level deeper to look at the core components within SAP Data Intelligence, as shown in this figure, and their contributions to different scenarios.
A core component of the data integration and orchestration block in SAP Data Intelligence is pipeline modeling. This functionality allows you to build modular data pipelines connecting SAP and non-SAP source systems by leveraging different operators to extract, transform, and enrich the data according to your business rules to ensure the final data for reporting or for machine learning scenarios has all the required dimensions. You have options for scaling up and down according to requirements and load processing. These pipelines are made a reality by the Modeler application in SAP Data Intelligence, which helps you create data pipelines or graphs with runtime/design-time environments as needed. You can access the Modeler through SAP Data Intelligence launchpad.
In addition to the metadata governance, pipeline, and connectivity aspects of the platform, SAP Data Intelligence also has another key core component in the form of machine learning content, which comprises of the ML Scenario Manager, MLOps cockpit, and the integration with JupyterLab as part of the overall functionality offered. The ML Scenario Manager organizes data science-related artifacts from the initial pipeline creation for data consumption, all the way to applying the AI/machine learning models and enabling monitoring all through a single cockpit. You can even integrate Python-/R-based notebooks within pipelines both from a training and from a deployment perspective. SAP Data Intelligence also has an AutoML functionality, which automates machine learning workflows so that data scientists can select the data and the output expected without having to worry about the models to be selected and fine-tuned.
Note: SAP has an add-on called contextual AI, which adds transparency into different stages of machine learning pipelines—data, training, and inference—thereby addressing the trust gap that often exists between such machine learning systems and their users. Contextual AI does not refer to a specific algorithm or machine learning method—instead taking a human-centric approach to AI.
For more information about this add-on, along with steps for deployment, visit https://github.com/SAP/contextual-ai.
Editor’s note: This post has been adapted from a section of the book SAP Data Intelligence: The Comprehensive Guide by Dharma Teja Atluri, Devraj Bardhan, Santanu Ghosh, Snehasish Ghosh, and Arindom Saha.