Most organizations employing SAP’s business warehousing solutions utilize flat files (such as the .CSV format) for uploading a variety of transactional business data into their systems.
Typically, the loaded data is further processed, cleansed and consumed in upstream systems with business logic.
The transactional data files are often prepared by functional business users, and human errors within the files can lead to erroneous data in the system, causing significant challenges. One type of error concerns special characters. The error is typically in the description fields due to word processing errors or copy-paste operations. The data load process rejects the special characters while processing the remainder of the data, causing the data request to appear as erroneous in the persistent staging area (PSA) layer or fail to process altogether.
This blog post explores a solution to address special characters by utilizing a combination of existing toolsets, and a bit of customization.
SAP Business Warehouse (SAP BW) offers an out-of-the-box solution for handling special characters using the standard transaction code RSKC. It allows users to maintain all special characters that are allowed within SAP BW (e.g. advanced DataStore Objects [DSOs], InfoCubes, etc.) This blog post addresses the challenge of processing flat files containing special characters before the data is loaded into the SAP BW system.
The main challenge is exploring and identifying a solution that meets the following criteria:
Fortunately, SAP provides a rich framework of toolsets within the SAP BW system that can be leveraged to build a robust solution for a seamless user experience. This toolset includes InfoPackages with ABAP routines, and transformations with START/END routines where ABAP code can be deployed to access most objects within the System (InfoObjects, standard and custom tables, etc.)
The character encoding is a coded character set for which a unique number has been assigned to each character. It reflects the way the coded character set is mapped to bytes for manipulation in a computer. There are many character encoding sets, for example, UTF-8, UTF-16, and UTF-32. In the SAP world, character encoding is named with code page which is a four-digit number. Some examples are listed in the table below. Algorithms associated with the code page are used to interpret incoming datasets in the flat file. The figure below shows the code page setting options in the InfoPackage, which offers two main options to process flat files.
The first is to load text-type files from the application server. You’ll need to configure a routine to process the flat files. To do this in SAP BW, refer to the first figure below. To do this in SAP BW/4HANA, refer to the second figure.
Next, specify a flat file name (which requires a separate program for further processing).
Alternatively, you can load text-type files from a local workstation (which also requires a separate program for further processing), but this option does not present any challenge in processing special characters.
In order to configure the InfoPackage, follow these steps:
When incoming flat files have a character that cannot be interpreted using algorithms of the configured code page, the system assigns a replacement character and the default is “#”.
In a binary number consisting of multiple bytes (e.g. a 32-bit unsigned integer value), you can choose one of two options. The first is little endian, when the least-significant byte is encoded first and the remaining bytes are encoded in increasing order of significance. The second is big endian, when the most-significant byte is encoded first and the remaining bytes encoded in decreasing order of significance.
For binary and text mode files, the endian setting is “default,” and for legacy mode files, either little endian or big-endian settings could be configured.
In order to perform operations such as functional validations, etc., on the application server, the flat file needs to be processed using the below syntax:
OPEN DATASET <P_FILENAME> FOR INPUT IN <MODE> ENCODING DEFAULT IGNORING CONVERSION ERRORS.
*Perform File Operations such as reading, writing etc.
CLOSE DATASET <P_FILENAME>.
Where:
Each mode variant can be qualified by IGNORING CONVERSION ERRORS to allow the system to suppress any conversion errors at runtime while reading/writing from/to the flat file. Whenever a character is replaced by a replacement character while reading or writing, the exception defined in the class CX_SY_CONVERSION_CODEPAGE is raised and the specification IGNORING CONVERSION ERRORS allows it to suppress.
The solution described above addresses the challenge of processing special characters through InfoPackages to the PSA layer. These special characters can be processed into subsequent layers, but might cause display issues in other upstream layers (e.g. Bex or SAP Business Planning and Consolidation or SAP Power BI). The next (but optional) challenge is to replace them with business-defined characters.
This post explores a simpler way of replacing special characters. Follow these steps.
Using the workaround explored above, let’s see how this affects the data load. We configured an InfoPackage in our example with these criteria:
From there, we processed a flat file containing special characters (as shown in the first figure we looked at) through the InfoPackage. After successful processing, the data appears in the PSA layer with special characters shown highlighted.
This data was further successfully loaded into the next layer (i.e. advanced DSO) with special characters shown highlighted.
Organizations employing SAP Business Warehousing or SAP BW/4HANA utilize flat files for uploading a variety of transactional business data into their systems. These transactional data sets may contain special characters which are rejected when processed through standard data load processes, or even fail to process altogether. In this post, we presented a solution to help overcome this issue.
Thank you to Rajesh Hemnani for his assistance with this post.