SAP Community - Big Data

What is SAP Data Intelligence? - Definition and Benefits

2022-11-04T10:14:03+01:00

In order to share a little bit of knowledge that I acquired studying SAP Data Intelligence, I decided to create this post and I hope to help and also be helped with good comments to include more information.

Getting started with SAP Data Intelligence.....

SAP Data Intelligence improves the efficiency of Corporate Data Management, and is presented by the vendor as a single solution to innovate with data. It provides data-driven innovation in the cloud, on premise.

Aimed at transforming the scatter of distributed data into vital data insights to deliver innovation at scale. It is a data management solution that connects, discovers, enriches, and orchestrates disjointed data assets into actionable business insights at the enterprise scale. It enables the creation of data warehouses from heterogeneous enterprise data, the management of IoT data flows, and facilitates scalable machine learning.

SAP Data Intelligence enables you to leverage enterprise applications to become an intelligent enterprise and provides a unified way to manage, integrate, and process all your company's data.

SAP Data Intelligence Cloud is the cloud equivalent of the on-premises SAP data intelligence. This cloud-based service syncs with the roles and responsibilities of data engineers, data stewards, data scientists, and data architects. Essentially, any user can maximize the data spread across the distributed hybrid landscapes, help to create data warehouses from mixed data, and simplify the management of data streams.

SAP Data Intelligence is available as a BYOL (bring your own license) model where it can be deployed on-premise in your own data center, on any hyperscale public (AWS, Google, Microsoft) or private cloud.

Connectivity of SAP Data Intelligence with On-premise can be done:

Cloud Connector

Site to Site VPN

(Image source: SAP® Data Intelligence(Published by SAP))

Intelligently Process Data

Connect data assets and transform them into business insights with features that help you:

Discover and connect to any data, anywhere, and anyway you need.

Integrate and orchestrate massive data volumes and streams at scale.

Streamline, operationalize, and govern machine-learning driven innovation.

Build Trust in your data

Build trust in your through:

Discover and cataloging of distributed metadata, enabling a searchable data fabric.

Profiling, preparing, and building a business glossary and business rules for your data.

Continuous monitoring to ensure robust data quality.

Perform Hybrid Data Management

With hybrid management, you can:

Centrally manage and process data wherever it resides.

Automate and reuse on-premise and cloud processing engines.

Manage complex data types across distributed environments.

Benefits to Your Business

(Image source: SAP® Data Intelligence(Published by SAP))

Architecture View

The figure below shows an architecture and overview of the different components made available as part of SAP Data Intelligence.

(Image source: SAP® Data Intelligence(Published by SAP))

SAP Data Intelligence Technical Details

Deployment Types          On-premise, Software as a Service (SaaS), Cloud, or Web-Based

Mobile Application          No

Supported Countries      Global

Thanks for reading. Do comment with your views, I believe in regular feedback so that I can refine blogs for you all.

Knowledge Graph with Job Recommendation

2022-11-21T09:57:52+01:00

Applying the Knowledge Graph Concept to Improve Job Recommendations

With the rapid development of business, many companies start to deal with larger scale of data that contains more complicated relationships. And knowledge graph or graph technology is referenced more and more when people want to improve this situation. Therefore, in this blog, we want to introduce the basic concept of knowledge graph with some application examples.

The concept “knowledge graph” has been used in literature very early. In 2021, Google announced a more modern introduction for the knowledge graph and its application with Google Search. [1] Today, the concept starts to have more applications on many business cases.

But what is a knowledge graph?

A knowledge graph is a database that uses a graph-structured data model to integrate data which includes not only data but also the relationships between the data (see Figure 1).

Figure 1: Knowledge graph structure

As an example, let’s consider the sentence: Data Scientist is a position at the company SAP. And as we said earlier, if we want to represent the data graphically, we use nodes and relationships. So, if you look at this simple graph in Figure 2, it is a graph database showing some facts from the sentence above. For example, between the node ‘Job’ and the node ‘Company’, there’s a relationship called ‘from’. That represents the fact we mentioned earlier and explains how a graph database connects everything together in one graph.

Figure 2: Knowledge graph structure with example

At SAP, we apply the concept of knowledge graphs to deliver business value. For example, the SAP Innovation Center Network on the West Coast is working on a project to improve the job search results in SAP SuccessFactors. The team is building a knowledge graph related to job ads, candidate profiles, information about required qualifications such as skills and education, as well as details about the candidates such as degrees, years of experience, and more. By using a knowledge graph, we can support job matching so that candidates have a better experience when applying to SAP.

Why We Need Knowledge Graphs

Some of you might be curious why knowledge graphs would be helpful in a recommender system. The power of knowledge graphs becomes clear once we start thinking about connections. When we want to use a model which contains many inter-relations between properties, knowledge graphs allow us to model the relationship with a very clear structure. For example, Figure 3 shows four kinds of nodes: job, skills, company and education. The relationships between these nodes provide us with abundant information from the graph.

With such a graph structure, we have multiple new ways to improve job recommendations. For example, if a candidate applies for the Data Scientist position at SAP, he or she may also be interested in the Data Engineer job, as these two positions are offered by the same company, have similar educational requirements, and share some qualification requirements. Even though modeling this with stored data in a table structure is possible, it’s more difficult. Modeling such a structure in a graph structure database is easier to accomplish.

Figure 3: Connections in Knowledge Graph

Components of Knowledge Graphs

Now let’s talk about the components of the knowledge graph we created. We would like to extract some key information from the job posting data as well as the resume data and put them into the knowledge graph. This way they can be used as properties of the job or candidates when we make recommendations.

Extracting the facts from unstructured documents

There can be different information in a job posting, for example, education requirements, work experience requirements and skills requirements. We use natural language processing (NLP) techniques to extract the information and input it into our graph database. For more complex cases, other deep-learning models can be used, for example, question-answering.

Assimilation of facts

When a candidate is getting a job recommendation and they would also get suggested skills related to the job. We found out that there could be some redundant information within these suggested skills. To provide a better match between job and candidate, our knowledge graph can also find out more relationships between skills and help to remove this kind of redundant information.

For example, a candidate is exploring software engineer jobs, and get one job suggesting that the skills of java and object-oriented programming are related. For the candidate, this may be redundant information. Because we consider object-oriented programming as the hypernym of Java, as the word ‘object-oriented programming’ has a broader meaning under which more specific words like Java falls. And in this case, we might want to remove one of them in the recommendation here.

We want to capture the relationship between ‘Java’ and ‘object-oriented programming’ and add it to our graph database, just like what Figure 4 is showing. And this can help us to recognize the relationships and remove the redundant information, which would provide a better job matching experience for our candidates.

Figure 4: Hypernym Relationship

Methods to Capture the Relationship

One of the methods we use to capture the relationship between the skills are Hearst Patterns. These are hierarchical relations based on semantic information which we use to extract hypernym relations [2].

Let me give you an example:

Figure 5 shows the following half-sentence: Programming languages such as Python. Based on the keyword ‘such as’, we can see that ‘programming languages’ is the hypernym of the word ‘Python’. This is how we extract hypernym relationships from our job posting data.

Figure 5: Hearst Pattern example

Another method we use to extract relationships between skills is WordNet [3]. It’s a database which links semantic relations between words in more than 200 languages, including synonyms, hyponyms, and hypernyms. You can consider WordNet as a combination and extension of a dictionary. When you check a word with it, it can provide you with the hypernym of that word.

Knowledge Graph Example Query

Now that you have an idea of what a knowledge graph is and what it can be used for, here are some examples queried from the graph. The graph is based on the job recommender system we mentioned previously. And Figure 6 and 7 are some examples about skills required by software engineers and data scientists.

Hopefully you like this blog, and if you find the blog interesting, please feel free to share your thoughts or questions in a comment. And if you wish to know more about similar content, you can find other similar posts by clicking on the tags assigned to this post. Thank you for your reading.

Figure 6: Skills required by software engineers

Figure 7: Skills required by data scientists

REFERENCES

[1] Amit Singhal. 2012. Introducing the Knowledge Graph: things, not strings. Google Blog.

[2] Indurkhya, N., Damerau, F. (2010). Handbook of Natural Language Processing. Chapman & Hall/CRC. p. 594.

[3] “WordNets in the World”. Global WordNet Association. Retrieved 19 January 2020.

Hands-On Tutorial: Leverage AutoML in SAP HANA Cloud with the Predictive Analysis Library

2022-11-23T07:21:12+01:00

SAP HANA Cloud has recently been enriched with a new Automated Machine Learning (AutoML) approach. AutoML can be helpful for many different reasons, for example to give a data scientist a head-start into quickly finding a first machine learning model. Also, it is a great starting point to see what is possible with the data and if it is worth to invest more time into the use case.

But isn’t there already an automated machine learning approach in SAP HANA Cloud?

Yes, the Automated Predictive Library (APL) is a proven and trusted approach in SAP HANA Cloud with proprietary content. Further, the APL adds very powerful feature engineering into the process before creating a machine learning model. If you are curious to give it a try, have a look at the following Hands-On tutorial by my colleague andreas.forster.

The Predictive Analysis Library (PAL) provides the data scientist with a huge variety of different expert algorithms to choose from. Now, PAL provides new algorithm pipelining capabilities and an AutoML approach on top, targeting classification, regression and time series scenarios. The new framework allows data scientist experts to build composite pipeline models of multiple PAL algorithms and with the aid of the AutoML engine, an automated selection of pipeline functions from data preprocessing, comparison of multiple algorithms, hyper-parameter search and optimal parameter value selection. Thus, expert data scientists can benefit from a tremendous productivity up-lift, deriving better PAL models in less time.

Let’s take a look at a concrete example to see what is possible through this new approach in the PAL. The challenge will be to predict if a transaction if fraudulent or not. Such use cases are often quite challenging due to imbalanced data and require different techniques before implementing a machine learning model.

What will you learn in this Hands-On tutorial?

Access data from your local Python environment directly in SAP HANA Cloud.

Leverage the native Auto Machine Learning capability in SAP HANA Cloud.

What are the requirements?

Please have your favorite Python editor ready. I used a Jupyter Notebook with Python Version 3.6.12.

The HANA Cloud must have at least 3 CPUs and the script server must be enabled.

Download the Python script and the data from the following GitHub repository.

Let’s jump right in. In your Python editor install and import the following library:

The hana_ml library enables you to directly connect to your HANA. To leverage its full potential you have to make sure that your user has the following policies assigned:

AFL__SYS_AFL_AFLPAL_EXECUTE

AFL__SYS_AFL_APL_AREA_EXECUTE

WORKLOAD_ADMIN

Set your HANA host, port, user, password and encrypt to true:

Execute the following command to connect to your HANA:

You can hide your login credentials through the Secure User Store from the HANA client and don’t have them visible in clear text. In our command prompt execute the following script:

Then back in your Python editor you can use the HANA key to connect:

Now, upload a local dataset and push it directly into HANA. Make sure you change the path to your local directory.

Before you bring your local dataset into HANA, please execute some transformations. Change the columns to upper string and add a unique Transaction ID to the data. This ID will later be used as a key in our machine learning algorithms, which are directly running in HANA.

Next, create a HANA dataframe and point it to the table with the uploaded data.

If your data already exists in HANA, you can create a HANA data frame through the sql or table function i.e.

Next, control your data and convert the following variables accordingly.

Control the conversion and take a look at a short description of the data. Note the target variable is called Fraud. In addition, there are eight predictors capturing different information of a transaction.

Next, split the data into a training and testing set.

Please control the size of the training and testing datasets.

Import the following dependencies for the AutomaticClassification.

Further, you can manage the workload in HANA by creating workload classes. Please execute the following SQL script to set the workload class, which will be used in the AutomaticClassification.

The AutoML approach automatically executes data processing, model fitting, -comparison and -optimization. First, create an AutoML classifier object “auto_c” in the following cell. It is helpful to review and set respective AutoML configuration parameters.

The defined scenario will run two iterations of pipeline optimization. The total number of pipelines which will be evaluated is equal to population_size + generations × offspring_size. Hence, in this case this amounts to 15 pipelines.

With elite_number, you specify how many of the best pipelines you want to compare.

Setting random_seed =1234 helps to get reproducable AutoML runs

In addition, you could set the maximum runtime for individual pipeline evaluations with the parameter max_eval_time_mins or determine if the AutoML shall stop if there are no improvement for the set number of generations with the early_stop parameter. Further, you can set specific performance measures for the optimization with the scoring parameter.

A default set of AutoML classification operators and parameters is provided as the global config-dict, which can be adjusted to the needs of the targeted AutoML scenario. You can use methods like update_config_dict, delete_config_dict, display_config_dic to update the scenario definition. Therefore, let’s reinitialize the Auto ML operators and their parameters.

You can see all the available settings when you display the configuration file.

Let’s adjust some of the settings to narrow the searching space. As the resampling method choose the SMOTETomek method, since the data is imbalanced.

Exclude the Transformer methods.

As machine learning algorithms keep the Hybrid Gradient Boosting Tree and Multi Logistic Regression.

Let’s set some parameters for the optimization of the algorithms.

Review the complete Auto ML configuration for the classification.

Next, fit the Auto ML scenario on the training data. It may take a couple of minutes. If it takes to long exclude the SMOTETomek in the resampler method of the config file.

You can monitor the pipeline progress through the execution logs.

Now, evaluate the best model on the testing data.

Then, you can create predictions with your machine learning model.

Of course, you can also save the best model in HANA. Therefore, create a Model Storage.

Save the model through the following command.

I hope this blog post helped you to get started with your own SAP Machine Learning use cases and I encourage you to try it yourself. If you want to try out more notebooks, have a look at the following Github Repository.

I want to thank andreas.forster, christoph.morgen and raymond.yao for their support while writing this Hands-On tutorial.

Cheers!

Yannick Schaper

SAP Business Technology Platform as part of a platform supporting a Data Mesh

2022-12-15T19:40:49+01:00

Why to read this blog

In his blog series on Data Mesh Wolfgang Epting gives a comprehensive and detailed introduction to Data Mesh and which components in the SAP Business Technology Platform (SAP BTP) can be leveraged as foundation to implement Data Mesh in an organization.

In this blog I wanted to add an additional angle to the discussion. While the phrase “The winner takes is all” might be true in many areas of life, this mustn’t be the case in the context of a Data Mesh project. The question which technology to leverage to implement a Data Mesh is not an either-or question. SAP BTP is well suited to be leveraged as part of a ‘bigger’ platform supporting a Data Mesh implementation.

Concerning wording:

I use term BTP based data products for data products hosted on SAP Business Technology Platform (as described in Wolfgang’s blog series).

Data Mesh leveraging SAP and Non-SAP Technology

Looking at your data from a (SAP-biased) bird’s view there are domains mainly residing in SAP Systems and others in non-SAP Systems. When starting your Data Mesh project with data products in the non-SAP area, non-SAP technology could be the best choice to build your initial data products. Your Data Mesh initiative is initially based on non-SAP technology.

As soon as you start to include data products and domains residing in SAP systems (S/4HANA or ECC or ERP, and SAP Data Warehouses), you should consider leveraging SAP-Technology. Here is why:

Not only do SAP systems contain high value data, but they also contain domains and domain knowledge. I.e, SAP S/4HANA contains business and entity data models for customer data in a company.

Out-of-the box data products are delivered as content packages (in SAP Analytic Cloud and SAP Datawarehouse Cloud see Wolfgang’s blog series), which can be tailored to your needs.

SAP BTP offers supreme out-of-the-box integrations to SAP systems, enabling easy re-use of the semantic models and domain data models as well as virtual real-time access to the actual data.

Your Data Mesh will be built on SAP and non-SAP technology. The combination of both technologies will serve your needs best.

[Remark: As explained in Wolfgang’s blog series, SAP BTP can be leveraged for non-SAP data as well, especially to build data products containing SAP and non-SAP-Data. If most of your data products rely on data in SAP Systems, leveraging only SAP BTP for your Data Mesh initiative is an option, you should consider.]

Here is how SAP BTP fits into a Data Mesh platform combining SAP BTP and non-SAP technology. Let me go through the 4 pillars of Data Mesh to explain how the pieces fit together.

Domain-oriented decentralized data ownership and architecture

Domain Ownership is a key principle for a Data Mesh. To view SAP S/4HANA or ECC or ERP and SAP Data Warehouse systems only as data provider is too short sighted. These systems contain domains and domain knowledge. The users of these systems are domain experts, both on the IT as well as on the business side. SAP systems serve as trusted source of information on customers, products, equipment and many more domains.

Therefore, all the domain knowledge within your SAP eco system (IT and LOB) in your company can be leveraged by including SAP experts in your domain teams. Technically the well-established SAP data and semantic models can - and should be - reused for building data products.

Figure 1: SAP Business Technology Platform as part of a 'bigger' Data Mesh: Domain Ownership and Data as a Product

Data as a Product

SAP S/4HANA or ECC or ERP and SAP Business Warehouse systems are a great source to build source-oriented data products. Data products for data in SAP Systems can be best build (see above) based on the SAP BTP.

SAP Data Catalog entries can be published/integrated via APIs to/with an Enterprise Data Catalog leveraged in a Data Mesh. BTP based data products can be easily consumed by standard APIs and within a Data Mesh.

Self-serve data infrastructure as a platform

Provisioning of underlying the platform based on a self-serve data infrastructure is another key pillar for a Data Mesh initiative. The SAP Business Technology Platform does not offer Infrastructure as a Service, but all services can be provisioned either via automation based on scripts or in a self-serve manner. The infrastructure and resources needed for the services are provisioned alongside with the service.

Scripts for provisioning the SAP BTP services can be included in an overarching approach for providing a Self-Services Data Infrastructure. I.e., domain teams can leverage the BTP Cockpit for instantiating the SAP BTP service needed to build and deliver data products.

Figure 2: SAP Business Technology Platform as part of a 'bigger' Data Mesh: Infrastructure and Federated Governance

Federated computational governance

To enable the interoperability of self-sufficient data products, there is a need for federated computational governance. Rules defined for the Data Mesh can be federated to the BTP based data products. Computational execution of the federated rules is supported by SAP BTP services.

Conclusion

SAP BTP fits well as part of a platform supporting a Data Mesh. As always detailed questions must be clarified based on your specific requirements. The integration options are available.

Next steps

When not already done so, you can start with Wolfgang’s blog series on Data Mesh to understand SAP’s technology which can be leveraged as basis for your Data Mesh initiative.

Especially, when data products based on data from SAP Systems are on the roadmap, you should consider SAP Business Technology Platform build and operationalize these data products.

I am very interested in your comments, please leave these below.

If you have any further questions about SAP Business Technology Platform or specific elements of our platform, leave a question in SAP Community Q&A or visit our SAP Community topic page.

Further information:

- Blogs:

- - Data Mesh with SAP Business Technology Platform Part 6 – SAP Datasphere

- SAP & PwC Study:
  - Data Mesh and SAP – How and why you should mesh your data using the SAP Business Technology Platform

- Data Mesh Architecture by INNOQ:
  - Data Mesh Architecture by INNOQ – SAP Tech Stack for Data Mesh

The 49ers Turn to Real-time Data to Deliver Superior Home Game Experiences for 70k+ Fans

2023-01-18T11:05:41+01:00

The Executive Huddle enables critical calls on game day to improve the stadium experience

Twenty-first-century sports, from neighborhood little leagues to iconic franchises, are driven by data. Another side to the sports universe that is being radically reinvited with data is the game day fan experience.

What to do about an endless line, a clogged sink, and soggy French fries? Typically, sports fans are out of luck. Best case scenario: usually, they are left to fill out a post-game day customer experience survey and hope for better luck next time.

For the legendary San Francisco 49ers hoping for better luck next time wasn't good enough for their 70,000+ loyal fans that arrive at their stadium for ten home games a season. The 49ers wanted to translate the data gathered throughout Levi's Stadium into meaningful, measurable, real-time improvements to the fan experience.

An Epic View From the 50-Yard Line

Not willing to wait days for customer feedback and then weeks to implement changes, the 49ers were looking for a real-time solution that would collect customer data from all over the stadium—from the parking lot to concession stands to retail shops and restrooms—and leverage it for immediate, actionable insights. Enter SAP Business Technology Platform, including SAP Analytics Cloud and SAP HANA Cloud, reimagined for the 49ers as the Executive Huddle.

Now the 49ers' Business Intelligence team suits up for home games, which they survey from their ultramodern skybox control room, perched high above the 50-yard Line. While guests enjoy the game in the front of the suite, the BI team is focused on the Executive Huddle in the back, where dashboards provide visualized, real-time data and analytics from concessions, retail, entry gates, and parking, along with fan feedback from 100 HappyOrNot terminals. Long line at the concession stand 115? The customer service team sends in reinforcements to open another register. Clogged sink in the men's restroom in section 110? A maintenance crew is on the way.

With the Executive Huddle, customer data is deployed on game day to maximum effect, ensuring a seamless experience for the 49ers' legions of fans, and the difference is measurable.

The San Francisco 49ers' implementation of Executive Huddle has resulted in the following:

Fan satisfaction increases by up to 30 index points

Optimization of the fan experience from parking lots to security gates

New opportunity areas in retail and concessions

Improved customer service leading to maximized revenues

Identification and resolution of game day issues before they become widespread or public

Centralized fan feedback for rapid issue triage and resolution

Rapid notification of VIP arrivals via ticket scans

Retail sales data to identify what's selling and what's not

Ability to compare data in real-time across multiple sources

Enhanced staff productivity, from executives to janitors, by pinpointing where problems arise

Tune In

In this episode of Better Together: Customer Conversations, we sit down with Noele Crooks, Director, Business Intelligence & CRM, San Francisco 49ers, to explore why the 49ers leverage SAP BTP technologies to optimize their fans' game day experience.

Thought Leadership Podcast: Tamara McCleary, CEO of Thulium, talks with Noele Crooks about why the 49ers are using technology to make a real-time difference for fans.

Practitioners' Video: Dan Lahl, Global Vice President of SAP Marketing and Solutions, sits down with Noele to explore the technology behind the Executive Huddle and consider other applications for this powerful tool—both for the 49ers and for other organizations.

Interested in more stories about using data insights to improve customer experiences? Check out:

AG Real Estate - Revitalizing the shopping center experience with augmented analytics

King Abdullah University of Science and Technology (KAUST) - improving the user experience of living on the campus for residents and guests, starting with all things related to dining on the campus

Visit sap.com/btp to find all our Better Together: Customer Conversations episodes.

I'm always interested in hearing your input; let me know which topics you would like us to discuss further. Please contact us if you would like to be a guest on a future episode.

San Francisco 49ers Optimize Fan Experience Using the Executive Huddle

2023-01-18T11:08:47+01:00

Tackling customer concerns in real-time to elevate game-day fan satisfaction

At home in Levi's Stadium, where over 70K fans gather on game day in the heart of Silicon Valley, innovation is a natural fit for the San Francisco 49ers. However, the 49ers' Business Intelligence and CRM teams wanted to go the extra mile to ensure a superior experience for their loyal fans. The 49ers looked to SAP to help translate the game-day customer data into real-time, actionable insights.

This episode of the Better Together series features the San Francisco 49ers, who leveraged the Executive Huddle—an advanced platform enabled by the SAP Business Technology Platform—to transform how the 49ers approach the fan experience at Levi's Stadium.

We spoke to Noele Crooks, Director, Business Intelligence & CRM, San Francisco 49ers.

Thought Leadership Podcast: Noele Crooks sits with Tamara McCleary, CEO of Thulium, to talk about why it was necessary to optimize the fan experience and how the 49ers are now leveraging the platform to meet their sustainability goals.

Practitioners' Video: Dan Lahl, Global Vice President of SAP Marketing and Solutions, sits down with Noele to explore the technology behind the Executive Huddle and consider other applications for this powerful tool.

Here are some key insights from these conversations:

Set Clear Goals

Digital transformation can be overwhelming, especially when organizations try to overhaul every aspect of operations in one fell swoop. The San Francisco 49ers found success by focusing first on one clearly defined goal. "We knew we wanted to improve the fan experience," Noele Crooks told us, "That helped us start ideating." The results speak for themselves. "Since we've launched the Huddle," Crooks explained, "our fan satisfaction has gone up 30 index points… we're solving issues in real-time, and our fans are happier."

Put Data to Work with Real-Time Visualizations

The 49ers' Business Intelligence team had plenty of game day data to work with—from parking lots to concession stands to HappyOrNot fan feedback stations. But with only ten home games a year, the team needed to use that data in real time. "Just imagine looking through a spreadsheet or a data dump," Noele said. "It's impossible to decide quickly or even understand what you're looking at."

That's where SAP BTP comes in. The solution has ten different data integrations feeding into SAP HANA Cloud. SAP Analytics Cloud enables the visualization of that data in the Executive Huddle dashboards, which update every one to five minutes on game day, allowing Crooks and her team to respond in real time. For instance, the ticket scanning API integration will quickly notify them of VIP arrivals and ensures these special guests receive additional personalized services throughout their game day experience."It's great to have all the data," Crooks told us, "but being able to visualize it in real-time is what the true impact and innovation are."

Keep Innovating

Now that Noele Crooks and her team have accomplished their original goal of optimizing fan experience on game day, they are looking for new digital frontiers to explore. In the near term, the 49ers are tackling sustainability—using the dashboards to monitor water and gas consumption in real-time—and social media—thinking about how fans' social media posts can become another data point to help improve the fan experience.

"You can always do more once you have all your use cases," Crooks said. And that ongoing process of innovation, Crooks reminds us, is "the fun part."

Check out these other stories involving data insights for improved experiences:

AG Real Estate - Revitalizing the shopping center experience with augmented analytics

King Abdullah University of Science and Technology (KAUST) - improving the user experience of living on the campus for residents and guests, starting with all things related to dining on the campus

Visit sap.com/BTP to find the whole Better Together: Customer Conversations series and sign up to receive upcoming episodes and a year-end series recap delivered to your e-mail.

Let us know if there are topics you'd like to learn more about or any questions you would like us to explore. Please contact us if you would like to be a guest on a future episode.

Top 3 Reasons to Attend SAP Data Unleashed

2023-02-07T22:43:10+01:00

SAP Data Unleashed is just around the corner, officially kicking off on March 8, 2023, at 8:00am PST. This free, one-hour, virtual data summit includes significant announcements, product demos, and a two-hour live Q&A. Attendees will be the first to learn about the latest SAP advancements in data and analytics, hearing directly from some of the world’s best-run companies and technology providers. The event features speakers from SAP’s executive leadership team, including:

Julia White, Chief Marketing and Solutions Officer and Member of the Executive Board, SAP

Juergen Mueller, Chief Technology Officer and Member of the Executive Board, SAP

Will you join us?

Here are the top 3 reasons to attend SAP Data Unleashed.

1. Understand SAP’s vision for data and analytics

Economic and supply chain challenges continue to test the endurance of businesses today, driving leadership to look for new ways to use data to understand their customers, develop better products, and streamline operations. But in the end, it isn’t about gathering the most data, it’s about using your data to its full advantage.

Many organizations have lost the valuable context within their data by moving it to serve a new technology versus focusing on where the value is – within the data itself. In this event, we’ll share our vision to simplify your data landscape and harmonize mission critical data across the enterprise.

2. Discover an open and interoperable SAP

We understand the drive to innovate. We also understand why so many innovative ideas fail. As data projects become increasingly interdisciplinary, organizations must be able to connect cross-functional data siloed in individual applications.

It’s time to leave disconnected systems in the past and embrace an open and interoperable architecture. Organizations of all sizes and industries are using SAP solutions to simplify their data landscapes and maximize the value of their data.

Register for the event to gain inspiration from your peers and spark new ideas for your own team. Find out how you can connect hybrid multi-cloud environments and leverage your data’s full potential.

3. Be the first to hear SAP's data announcements

At SAP Data Unleashed, executives from both SAP and leading data vendors will take the virtual stage to announce our newest innovation and partnerships that will unleash the power of business data and transform decision making. This innovation and collaboration will make it easier for businesses to access context-rich data to shape their strategies for the future.

By attending this event, you’ll be among the first to learn about the new capabilities of SAP’s data and analytics solutions and understand how they can help you achieve your 2023 business goals.

Register for SAP Data Unleashed

Join us on March 8th for this exclusive, virtual, one-hour summit. You won’t want to miss this opportunity to hear about the latest SAP advancements, see new capabilities in action, and understand the value customers gain when they unleash the power of their business data.

Don’t delay, register for SAP Data Unleashed.

Personal perspective: Why technology leaders need a Data Modernization strategy, and how to not miss

2023-02-24T20:03:56+01:00

If you are a technology leader, you know Data Modernization is top of mind in the C-suite. Having been fortunate to support some of the world's largest enterprises as an expert in Cloud technologies I have found this is because data-driven organizations make smarter decisions faster.

Data Modernization is the process of updating data management practices and technologies. This is a significant organizational maturation process that may include moving to the Cloud, using more efficient databases, and adopting new analytics tools. Executives that I’ve spoken with are concerned with data modernization because it underpins essential capabilities to remain competitive and meet the ever changing demands of the digital age:

Insightful decision-making: Accurate, meaningful and timely data precedes decisions to mitigate risks, identify new opportunities and find efficiencies

Improve operational efficiency: Through automation and AI/ML, reducing redundant processes that were previously not possible as a result of inflexible legacy systems, save time and money

Increase business flexibility: Adopt new business models and innovation with open and scalable technologies to better serve customers in a rapidly evolving digital landscape

Enhance customer experiences: Connected understanding of customers, their needs and preferences provide better customer experiences, personalize services, and improve customer retention

Harden security: Modern data technologies with built-in security features and regulation compliance along with updated data governance policies better protect sensitive data which is important to avoid embarrassing breaches and regulatory fines

Increase scalability: Scale operations quickly with the tools and resources required to meet the needs of the business

The reality in many boardrooms is that it’s not “if” data modernization, but how to hit the mark on this investment. Exploring the challenges, steps, and best practices when adopting and executing an Enterprise Data Modernization strategy organizations are better positioned to execute.

Virtually all organizations today have data collection capabilities that far outpace their means to connect, interpret and act on data. Swaths of data are readily available if not for non cross compatible tools and technologies. Thus it is imperative to become a data driven business and that long term IT investments and architecture are built on open, secure and scalable technologies. So how does this translate into meaningful action for technology leaders?

Assess the current state and set the vision

A detailed mapping of existing data infrastructure, applications, and processes highlights gaps and inefficiencies. The stated vision is less concerned with how these are to be addressed, but rather establishes the why and the overarching business objectives, data needs, and technology requirements that will support the organization's future state

Rely on deep technical expertise to develop a roadmap

From the current state assessment, organizations are better equipped to curate a roadmap outlining the steps needed to fulfill the vision. The big blocks on the roadmap include implementing new data technologies, upgrading legacy systems, and enhancing data governance practices

Legacy systems can prove to be a major hurdle on the path to data modernization. Incompatibility of legacy systems with new technologies require significant resources to update or migrate and may take a phased approach to minimize business disruption

Build for the needs of the business

Think big. Start small. Start with a manageable data project core to your business and do not let fear of having “perfect” data stop you. Identify the sources of data required to support the business need which may include multiple systems and sources

Use data to make informed decisions, and not fall into the trap of paralysis by analysis. Define metrics that matter and are centric to your business. There is no need to start from zero. Accelerators and prepackaged reports are commonplace among modern data technologies

Implement and upgrade data technologies with security in mind

Modern data technologies including cloud based data lakes, data warehouses, and data management tools should be evaluated based on their ability to support the organization's data needs, foster innovation and align with the overall stated vision

Data governance practices ensure that data is managed effectively and securely. This may involve establishing data quality standards, implementing data security measures, and ensuring compliance with data privacy regulations

Lead with conviction

Technological change proceeds organizational cultural change. Build a data-driven culture by emphasizing data to inform decisions and training to do so. Often ask “What does the data show?” and challenge conclusions in this way. Investments in data access, analytical systems and dashboards empower employees. Increasingly over the data modernization journey accurate, meaningful and timely data will answer this question.

The C-Suite must model this. Leaders should welcome the use of data to challenge decisions and bust confirmation bias. Leadership here helps the organization adopt the data driven mindset

Modernizing a data strategy is a comprehensive and structured methodology that involves defining the vision, assessing the current state, developing a roadmap, updating systems and implementing new technologies, establishing data governance, and building a data-driven culture. Iterative reflection of these steps throughout the process is required for organizations to better compete and meet the expectations of the digital age, and ensure the execution of the strategy hits the mark.

Unified Analytics with SAP Datasphere & Databricks Lakehouse Platform- Data Federation Scenarios

2023-03-10T15:56:11+01:00

Introduction

With the announcements in the SAP Data Unleashed, SAP introduced the successor of SAP Data Warehouse Cloud, SAP Datasphere – a powerful Business Technology Platform data service that addresses the Data to value strategy of every Enterprise organization to deliver seamless access to business data. Our Datasphere has been enriched with new features thereby delivering a unified service for data integration, cataloging, semantic modeling, data warehousing, and virtualizing workloads across SAP and non-SAP data.

I am sure there will be lots of blogs that will be published soon discussing the latest offerings, roadmaps, and features. However, this blog focuses on the latest announcement related to open data partners and I am going to start by focusing on Databricks.

With the rise of the Lakehouse platform that combines both Data Warehouses & Data Lakes, there has been a trend with SAP customers exploring Unified Analytics Platforms or say unified environments that address different perspectives of data management, governance, development, and finally deployments of analytic workloads based on diverse data sources and formats. Previously, there is a need to replicate the data completely out of SAP environments for customers to adopt the lakehouse platform. But the current partnership with Databricks will focus to simplify and integrate the hybrid landscapes efficiently.

There was an article posted by The Register regarding the SAP Datasphere and it exactly resonated with the SAP messaging

Please note that what I am about to discuss further is the data federation scenarios between SAP Datasphere and Databricks that works as of today. There will be additional ways of integrating with Databricks in the future.

Brief Introduction to the Lakehouse Platform

In simple terms, a lakehouse is a Data Management architecture that enables users to perform diverse workloads such as BI, SQL Analytics, Data Science & Machine Learning on a unified platform. And by utilizing a combined data management platform such as lakehouse has the following benefits

Enables Direct Data Access across SAP and Non-SAP sources

Simplifies data governance

Removes Data Redundancy

Direct Connectivity to BI Tools

Simplified security with a single source of truth

Support for open source & commercial tooling

Operating on multi-cloud environments and many more.

And Databricks is one such Lakehouse platform that takes a unified approach by integrating disparate workloads to execute data engineering, Analytics, Data Science & Machine Learning use cases. And as mentioned on their site, the platform is simple, open & multi-cloud. We will discuss the Lakehouse platform features and capabilities in future blogs but as mentioned before we are going to focus on a data federation scenario to access data from Databricks SQL into SAP Datasphere.

Consider a scenario where the data from a non-SAP source is continuously ingested into cloud object storage say AWS S3. Note that Databricks has an autoloader feature to efficiently process data files from different cloud storages as they arrive and ingest them into Lakehouse seamlessly. Then we utilize the delta live table framework for building data pipelines and storing the transformed data in Delta format on cloud storage, which can subsequently be accessed by Databricks SQL(DB SQL).

As referred to in the integration scenario below, SAP Datasphere will connect to Databricks SQL with the existing data federation capabilities and users can blend the data with SAP sources for reporting/BI workloads based on SAP Analytics Cloud(SAC).

Assuming you process the incoming data and persist as tables in Databricks SQL, you will then perform the following steps to establish connectivity to SAP Datasphere

Data Federation with Databricks SQL

Prerequisites

You have access to SAP Datasphere with authorization to create a new Data provisioning agent

Access to Virtual Machine or On-Premise system where you install Data provisioning agent.

You have access to Databricks Clusters as well as SQL warehouse.

You have downloaded the JDBC driver from Databricks website.

Databricks Cluster & SQL Access

As mentioned in the previous section, I assume you are already working on Databricks topics. If you are interested in getting access , you can sign up for the free trial for 14 days. Or you can sign up from the hyperscaler marketplace such as AWS marketplace for the same.

Once you login to your account, you will notice the Unified environment for different workspaces

Data Science and Engineering

Machine Learning

SQL

Navigate to workspace “Data Science and Engineering” and select the compute which you have been using for data transformation. Just to note that when we build Delta Live tables pipelines, it uses its own compute to run pipelines. We will discuss that in the later blogs.

Select the all-purpose compute to get additional information.

Navigate to the “Advanced options” to get JDBC connection details

You will find the connection details here. We need to tweak it a bit prior to connecting with generic JDBC connectors. Also, we will be connecting using personal access tokens and not user credentials.

To generate the personal access token, you can generate it from the user settings and save the token for later use.

Here is the modified JDBC URL that we will be using in SAP Datasphere connection. We need to add “IgnoreTransactions=1” to ignore transaction related operations. In your case, you will be just copying the URL from advanced options and add the parameter IgnoreTransactions as shown below

jdbc:databricks://dbc-xxxxxxxxxxx.cloud.databricks.com:443/default;IgnoreTransactions=1;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/XXXXXXXXXXXXXXXXX268/XX-XX-56fird2a;AuthMech=3;UID=token;PWD=<Personal Access Token>

We can now navigate to Databricks SQL to see if we have the necessary schemas and tables to access. The tables were created under the schema “dltcheck”.

Data Provisioning Agent(DP Agent) Installation Instructions

I am not going the discuss the DP agent installation in detail as there are numerous articles posted on the same topic. And will be addressing the specific changes that need to be done for Databricks access.

In order to establish secure connectivity between SAP Datasphere and Databricks SQL, the Data Provisioning Agent(DP agent) has to be installed on a virtual machine and configured. For DP agent installation, you can refer to the following document. To connect the DP agent to the SAP HANA Cloud tenant of SAP Datasphere, please follow the steps mentioned in this document. Assuming all the steps were completed, the status of the DP agent will be displayed as “Connected”. In my case, the DP agent is DBRICKSNEW.

Configuration Adjustments on the Virtual Machine

Navigate to the folder <dpagent>/camel/lib and copy the jar file that is downloaded from Databricks site. Also, extract the jar files in the same location.

And navigate to <dpagent>/camel folder and adjust the properties of configfile-jdbc.properties

Change the delimident value to BACKTICK.

Save and restart the DP agent. Login to SAP Datasphere to check the status of DP agent. It should display the status as "Connected". Edit the DPagent and select the Agent Adapter “CameJDBCAdapter”

Now we should be able to connect to Databricks SQL from SAP Datasphere.

Establishing Connection from SAP Datasphere

Navigate to your space and create a new connection for the “Generic JDBC” connector

Provide any Business Name and the following details

Class : com.databricks.client.jdbc.Driver



JDBC URL :  jdbc:databricks://<your databricksaccount>. cloud.databricks.com>:443/default;IgnoreTransactions=1;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/XXXXXXXXXXXXXXXXX268/XX-XX-XXXXXX;AuthMech=3;UID=token;PWD=<Personal Access Token>

Please copy the JDBC URL as such from Databricks Cluster advanced options and just add the parameter “IgnoreTransactions=1”. Setting this property to 1 ignores any transaction-related operations and returns successfully.

Provide your Databricks user account credentials or token credentials with user as token and select the Data provisioning agent that you just activated.

With the connection details and configurations done properly, validation should be successful.

To make sure we have access to data, let’s use the Data Builder to build an analytical model that could be consumed in SAP Analytics Cloud. Navigate to Data Builder and create a new Graphical View

Navigate to Sources->Connections->Databricks("Your Business Name For Generic JDBC Connection")->"Databricks Schema". You will find the list of tables under the schema "dltcheck"

Here is the Databricks SQL Access for the same schema "dltcheck"

Select the data with which you wanted to build the Analytical model. In my case it is bucketraw1, added some calculated columns, aggregated the necessary data, and exposed the relevant columns as the Analytical model “Confirmed_cases”. And the data preview of the model shows the corresponding records too.

This model could be consumed in SAP Analytics Cloud for reporting.

Troubleshooting Errors

1. If you do not see the connection validated, then there are two options to troubleshoot. Either we can use the generic log files from DP agent to identify the issue or add the log parameters in JDBC URL and collect those specific errors. If you face validation issues, then add these parameters in the jdbc url.

jdbc:databricks://<youraccount>.cloud.databricks.com:443/default;LogLevel=5;LogPath= <foldername>;IgnoreTransactions=1;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/<SQLPath>;AuthMech=3;UID=token;PWD=<token>

Log Level – 5 which means enable logging on the DEBUG level, which logs detailed information that is useful for debugging the connector

Logpath – This will be available in the following path /usr/sap/<dpagentfolder>/

The log path can be found on the linux VM and it will generate the specific log files

2. If you see some errors related to mismatched input, then please adjust the camel jdbc properties as mentioned in DPAgent Installation Instructions. The “DELIMIDENT” value should be set to “BACKTICK”.

2022-12-19 16:31:57,552 [ERROR] [f6053d30-7fcd-436f-bed8-bf4d1358847788523] DefaultErrorHandler | CamelLogger.log [] - Failed delivery for (MessageId: 2567788B8CF359D-0000000000000000 on ExchangeId: 2567788B8CF359D-0000000000000000). Exhausted after delivery attempt: 1 caught: java.sql.SQLException: [Databricks][DatabricksJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.catalyst.parser.ParseException: mismatched input '"FUNCTION_ALIAS_PREFIX_0"' expecting {<EOF>, ';'}(line 1, pos 33)



                                == SQL ==



                                SELECT SUM("confirmed_cases") AS "FUNCTION_ALIAS_PREFIX_0", SUM("deaths") AS "FUNCTION_ALIAS_PREFIX_1", "covidsummarygcs$VT"."county_name" FROM "default"."covidsummarygcs" "covidsummarygcs$VT" GROUP BY "covidsummarygcs$VT"."county_name"

3. If you see the communication link failure, then the IgnoreTransactions parameter has not been set in your JDBC URL.

As mentioned before, the data federation is enabled through JDBC connector as of now . But things will change with additional connections in the future. Hopefully this helped you in understanding the Data Federation capabilities with Databricks. Please do share your feedback. In case of connection issues, feel free to comment, and will try to help you out.

Model Compression without Compromising Predictive Accuracy in SAP HANA PAL

2023-03-30T11:29:26+02:00

1. Introduction

The recent success of applying state-of-the-art AI algorithms on tasks with modern big data has raised concerns on their efficiency. For instance, ensemble methods like random forest typically require numerous sub-learners to achieve favorable predictive performance, which result in an increasing demand for model storage space as the dataset size increases.

Such tree-based models may reach GB level, making them difficult to deploy on resource-constrained devices or proving costly in the cloud. Furthermore, a larger model usually leads to higher inference time and energy consumption, which are unacceptable in many real-world applications.

To address this issue, several efficient methods have been proposed. These approaches aim to make machine learning model inference faster (time-efficient), use fewer computational resources (computationally efficient), require less memory (memory-efficient), and take less disk space (storage-efficient). Among these methods, model compression is one of the most popular. It reduces the size of large models without compromising predictive accuracy.

In SAP HANA Predictive Analysis Library (PAL), we have also introduced lossy model compression techniques in several popular methods including Support Vector Machine (SVM), Random Decision Trees (RDT) and Hybrid Gradient Boosting Tree (HGBT). These techniques minimize the loss of predictive accuracy and do not cause any delay in inference. In the following sections, we will dive deep into the model compression methods applied in SAP HANA PAL. At the end of the blog post, some useful terms and links are listed.

2. Algorithms

Tree-based Algorithms – RDT/ HGBT
Regards to these tree-based algorithm whose trees are large and independent and identically distributed random entities that contain many additional characteristics and parameters given the training data. Therefore, the ensemble model is abundant in redundancy which allows us to infer their probability structure and construct an efficient encoding scheme.

The compression methodology we applied in RDT/ HGBT focus on the following attributes:

Tree structure: Each of the trees’ structure is represent with a Zaks sequence and then we apply a simple LZ-based encoder after all sequences are concatenated.
The split of the nodes (Variable Name, Split Value): Usually, each node is defined by a variable name and a corresponding selected split value. As a result of the recursive construction of the tree, the method assumes that the probabilistic model for a node only depends on its depth and parents. Then, based on probabilistic modeling, models (Variable Name Models and Split Value Models) are clustered via Bregman divergence. Then, the data which corresponds to a model could be compressed by Huffman code according to center probability.
The values of the leaves: Also, the method uses a simplified model in which the distribution of the fits in a leaf relies on its depth and parent’s variable name. In the case of classification problems, the usage of entropy coding is suitable as the fits are categorical. However, in the regression problems, quantization techniques like Lloyd-max are required to take over continuous set of values. Such quantization results generally lead to very regularized distortion which could be handled well by setting the distortion level to achieve favorable compression rate.

Moreover, this lossless compression scheme allows make predictions from the compressed model. The results of compression rate of RDT and HGBT with various number of trees are shown in the following two figures. The dataset (768 rows, 9 columns) used in the example is from Kaggle and the original data comes from National Institute of Diabetes and Digestive and Kidney Diseases.

SVM

The aim of SVM is to find a set of support vectors to distinguish the classes and maximize the margin. Sometimes, the number of support vectors is very huge. In particular, categorical data need to be mapped to be continuous through one-hot encoding technique, which required more space for model storage in JSON/PMML format.

Hence, we applied Lloyd-Max quantization and Huffman coding to compress support vectors into a string in SVM. The scheme also enables predictions from the compressed format. In addition, in one-hot encoding, each categorical value is represented only by a value which can significantly reduce the use of memory.

3. Terms

Compression Rate
Compression Rate = Compressed Size / Uncompressed Size

Quantization
Quantization is the process of mapping continuous infinite values to a smaller set of discrete finite values. One popular quantizer is Lloyd-Max algorithm which designs non-uniform quantizers optimized according to the prevailing pdf of the input data.

Entropy Coding
Entropy is the smallest number of bits needed to represent a symbol. Hence, entropy coding techniques are lossless coding methods which could approach such entropy limit. One common form is Huffman coding which uses a discrete number of bits for each symbol. Another is arithmetic coding, which outputs a bit sequence representing a point inside an interval. The interval is built recursively by the probabilities of the encoded symbols.

Tree-based Model Compression
There are many research directions of model compression of tree ensembles. One popular line focus on pruning techniques such as removing redundant components (features and trees), selecting optimal rule subsets, and choosing an optimal subset of trees. However, such pruning schemes are lossy and no guarantees on the compression rate. Another line is to train an artificial neural network to mimic the functionality of tree ensembles which is faster but both lossy and irreversible.

4. Summary

In this blog post, we introduce the model compression methodology used in RDT, HGBT and SVM in SAP HANA PAL which could offer lossless model compression without compromising predictive performance. Hope you enjoyed reading this blog post!

Automatic Outlier Detection for Time Series in SAP HANA

2023-06-29T11:06:04+02:00

In time series, an outlier is a data point that is different from the general behavior of remaining data points. In Predictive Analysis Library (PAL) of SAP HANA, we have automatic outlier detection for time series. You can find more details in Outlier Detection for Time Series in PAL.

In PAL, the outlier detection procedure is divided into two steps. In step 1, we get the residual from the original series. In step 2, we detect the outliers from the residual. In step 1, we have an automatic method.

In this blog post, you will learn:

introduction of outlier in time series and automatic detection method in PAL

some use cases of automatic outlier detection

To make it easy to test the performance of automatic outlier detection, we use Z1 score method (the default method) in step 2 and set the parameter THRESHOLD = 3 (the default value). This parameter value is based on 3 sigma rule.

To make it easy to read and show the results, we call the PAL procedure in Jupyter Notebook. For calling the PAL procedure and plotting the results, we need some functions. We put the functions in the Appendix.

Introduction

In time series, outliers can have many causes, such as data entry error, experimental error, sampling error and natural outlier. Outliers have a huge impact on the result of data analysis, such as the seasonality test. Outlier detection is an important data preprocessing for time series analysis.

In this algorithm, the outlier detection procedure is divided into two steps. In step 1, we get the residual from the original series. In step 2, we detect the outliers from the residual. We focus on the automatic method of step 1 in this blog.

In step 1, we have an automatic method. For the automatic method, we combine seasonal decomposition, linear regression, median filter and super smoother. The processes in the automatic method is shown in the picture below.

Processes of Automatic Method

In the output of this algorithm, we have a result table and a statistic table. In the result table, the residual, outlier score and outlier label are included. In the statistic table, some information of the time series and outlier detection method is included. For automatic method, the final smoothing method in step 1 is shown in the statistic table.

Test Cases

To call the PAL procedure with python, we need to import some python packages.

import numpy as np

from datetime import datetime

from pandas import read_table

import matplotlib.pyplot as plt

import pandas as pd

case 1: smooth data without seasonality

data

The data is from spikey_v.dat. It is smooth, but without seasonality. The plot of the data is shown with the code below.

str_path = 'your data path'

cols = ['j', 'u', 'v', 'temp', 'sal', 'y', 'mn', 'd', 'h', 'mi']



df = read_table(str_path+'spikey_v.dat' , delim_whitespace=True, names=cols)



df.index = [datetime(*x) for x in zip(df['y'], df['mn'], df['d'], df['h'], df['mi'])]

df = df.drop(['y', 'mn', 'd', 'h', 'mi'], axis=1)

data = np.full(len(df), np.nan)

for i in range(len(df)):

    data[i] = df['u'][i]

plt.plot(data)

plt.grid()

automatic outlier detection

We call the procedure _SYS_AFL.PAL_OUTLIER_DETECTION_FOR_TIME_SERIES to detect outliers automatically. The python code and results are as follows.

Outlier_parameters = {

    "AUTO": 1,

}

df = pd.concat([pd.DataFrame({"ID":list(range(len(data)))}),pd.DataFrame(data)],axis=1)

dfOutlierResults, dfStats, dfMetri = OutlierDetectionForTimeSeries(df,Outlier_parameters,cc)

dfOutlierResults

dfStats

The results are plotted as below.

outlier_result_plot(dfOutlierResults)

outlier_plot(dfOutlierResults)

From the statistic table, we can see that the final smoothing method is median filter, as the time series is quite smooth. From the above results, we find that PAL miss detecting an outlier. This is because there are two big outliers here and the standard deviation becomes large. From the plots of residual and outlier score, we can find four outliers very clearly. We can adjust the threshold or choose other methods in step 2 to find all outliers.

case 2: non-smooth data without seasonality

data

The data is from R package "fpp2". It is the gold daily price data from January 1st 1985 to March 31th 1989. The data is neither smooth nor seasonal. There are some missing values. The missing values are imputed by linear interpolation. The plot of the data is shown with the code below.

str_path = 'your data path'

df = pd.read_csv(str_path+'daily_csv_no_missing_value.csv')

num = len(df)

data = np.full(num,np.nan)

for i in range(num):

    data[i] = df['Price'][i]

plt.plot(data)

plt.grid()

automatic outlier detection

We call the procedure _SYS_AFL.PAL_OUTLIER_DETECTION_FOR_TIME_SERIES to detect outliers automatically. The python code and results are as follows.

Outlier_parameters = {

    "AUTO": 1,

}

df = pd.concat([pd.DataFrame({"ID":list(range(len(data)))}),pd.DataFrame(data)],axis=1)

dfOutlierResults, dfStats, dfMetri = OutlierDetectionForTimeSeries(df,Outlier_parameters,cc)

dfOutlierResults

dfStats

The results are plotted as below.

outlier_result_plot(dfOutlierResults)

outlier_plot(dfOutlierResults)

From the statistic table, we can see that the final smoothing method is super smoother, as the time series is not so smooth. From the above results, we find that PAL detect the outlier at t = 769 successfully and also consider some other points as outliers. From the plots of residual and outlier score, we can find that the outlier at t = 769 is very obvious. We can adjust the threshold or choose other methods in step 2 to only detect the outlier at t = 769.

case 3: smooth data with seasonality

data

The data is synthetic. It is seasonal with period = 40. There are four obvious outliers in the time series. The time series is as below.

import math

random_seed = 3

np.random.seed(random_seed)

cols = 200 # length of time series

cycle = 40

outlier_idx = [30, 45, 73, 126, 159, 173]

timestamp = np.full(cols,np.nan,dtype = int)

for i in range(cols):

    timestamp[i] = i

seasonal = np.full(cols,np.nan,dtype = float)

for i in range(cols):

    seasonal[i] = math.sin(2*math.pi/cycle*timestamp[i])

const = np.full(cols,2,dtype = float)

noise = np.full(cols,0,dtype = float)

for i in range(cols):

    noise[i] = 0.2*(np.random.rand()-0.5)

trend = 0.01*timestamp

outlier = np.full(cols,0,dtype = float)

for i in outlier_idx:

    outlier[i] = 4*(np.random.rand()-0.5)

data = seasonal + const + noise + trend + outlier

plt.plot(data)

plt.grid()

automatic outlier detection

We call the procedure _SYS_AFL.PAL_OUTLIER_DETECTION_FOR_TIME_SERIES to detect outliers automatically. The python code and results are as follows.

Outlier_parameters = {

    "AUTO": 1,

}

df = pd.concat([pd.DataFrame({"ID":list(range(len(data)))}),pd.DataFrame(data)],axis=1)

dfOutlierResults, dfStats, dfMetri = OutlierDetectionForTimeSeries(df,Outlier_parameters,cc)

dfOutlierResults

dfStats

The results are plotted as below.

outlier_result_plot(dfOutlierResults)

outlier_plot(dfOutlierResults)

From the statistic table, we can see that the final smoothing method is median filter and seasonal decomposition, followed by super smoother, as the time series is quite smooth and seasonal. From the above results, we can see that the four obvious outliers are detected by PAL.

case 4: non-smooth data with seasonality

data

The data is monthly ice cream data. The period is 12. You can find the data in ice_cream_interest.csv. There are two obvious outliers in the time series. The plot of the time series is as below.

str_path = 'your data path'

df = pd.read_csv(str_path+'ice_cream_interest.csv')

data = np.full(len(df),np.nan,dtype = float)

for i in range(len(df)):

    data[i] = df['interest'][i]

plt.plot(data)

plt.grid()

automatic outlier detection

We call the procedure _SYS_AFL.PAL_OUTLIER_DETECTION_FOR_TIME_SERIES to detect outliers automatically. The python code and results are as follows.

Outlier_parameters = {

    "AUTO": 1,

}

df = pd.concat([pd.DataFrame({"ID":list(range(len(data)))}),pd.DataFrame(data)],axis=1)

dfOutlierResults, dfStats, dfMetri = OutlierDetectionForTimeSeries(df,Outlier_parameters,cc)

dfOutlierResults

dfStats

The results are plotted as below.

outlier_result_plot(dfOutlierResults)

outlier_plot(dfOutlierResults)

From the statistic table, we can see that the final smoothing method is seasonal decomposition, followed by super smoother, as the time series is seasonal, but not so smooth. From the above results, we can see that the two obvious outliers are detected by PAL.

Conclusions

In this blog post, we describe what is outlier in time series and provide an automatic outlier detection method for time series in PAL. We also provide some examples to show how to call the automatic outlier detection procedure and show the detection results. From the above results, we can see that the automatic method can detect outliers in different time series. Hope you enjoy reading this blog!

The method will also be included in hana-ml. If you want to learn more about the automatic outlier detection method for time series in SAP HANA Predictive Analysis Library (PAL) and hana-ml, please refer to the following links:

Outlier Detection for Time Series in PAL

Outlier Detection for Time Series in hana-ml (automatic method will be included after 2023 Q3)

Appendix

SAP HANA Connection

import hana_ml

from hana_ml import dataframe

conn = dataframe.ConnectionContext('host', 'port', 'username', 'password')

Functions for Table

def createEmptyTable(table_name, proto, cc):

    with cc.connection.cursor() as cur:

        try:

            joint = []

            for key in proto:

                joint.append(" ".join(['"{:s}"'.format(key), proto[key]]))

            cur.execute('CREATE COLUMN TABLE %s (%s);' %

                        (table_name, ",".join(joint)))

        except:

            print(

                f"\"CREATE TABLE {table_name}\" was unsuccessful. Maybe the table has existed.")



def dropTable(table_name, cc):

    with cc.connection.cursor() as cur:

        try:

            cur.execute(f"DROP TABLE {table_name}")

        except:

            print(f"\"DROP TABLE {table_name}\" was unsuccessful. Maybe the table does not exist yet.")





def createTableFromDataFrame(df, table_name, cc):

    dropTable(table_name, cc)

    dt_ml = dataframe.create_dataframe_from_pandas(cc, df, table_name=table_name, table_structure={"MODEL_CONTENT":"NCLOB"})

    # dt_ml = dataframe.create_dataframe_from_pandas(cc, df, table_name=table_name, table_structure={"COL1":"CLOB"})

    return dt_ml

Function of Calling the PAL Procedure of Outlier Detection for Time Series

def OutlierDetectionForTimeSeries(df, parameters, cc,

            data_table='ZPD_PAL_DATA_TBL',

            parameter_table='ZPD_PAL_PARAMETERS_TBL',

            result_table='ZPD_PAL_RESULT_TBL',

            stats_table='ZPD_PAL_STATS_TBL',

            metri_table='ZPD_PAL_METRI_TBL'):



    # Input table

    createTableFromDataFrame(df, data_table, cc)



    # Result table

    dropTable(result_table, cc)

    createEmptyTable(result_table, {

                     "TIMESTAMP": "INTEGER","RAW_DATA":"DOUBLE","RESIDUAL":"DOUBLE","OUTLIER_SCORE":"DOUBLE","IS_OUTLIER":"INTEGER"}, cc)



    # Metri table

    dropTable(metri_table, cc)

    createEmptyTable(metri_table, {

                     "STAT_NAME": "NVARCHAR(1000)","VALUE":"DOUBLE"}, cc)



    # Stats table

    dropTable(stats_table, cc)

    createEmptyTable(stats_table, {

                     "STAT_NAME": "NVARCHAR(1000)", "STAT_VALUE": "NVARCHAR(1000)"}, cc)



    # Parameter table

    dropTable(parameter_table, cc)

    createEmptyTable(parameter_table, {"PARAM_NAME": "nvarchar(256)", "INT_VALUE": "integer",

                     "DOUBLE_VALUE": "double", "STRING_VALUE": "nvarchar(1000)"}, cc)



    if parameters:

        with cc.connection.cursor() as cur:

            for parName, parValue in parameters.items():



                if isinstance(parValue, str):

                    parValue = f"'{parValue}'"

                    parametersSQL = f"{parValue if isinstance(parValue,int) else 'NULL'}, {parValue if isinstance(parValue,float) else 'NULL'}, { parValue if isinstance(parValue,str) else 'NULL'}"

                    cur.execute(

                    f"INSERT INTO {parameter_table} VALUES ('{parName}', {parametersSQL});")



                elif isinstance(parValue,list):

                    for x in parValue:

                        if isinstance(x, str):

                            x = f"'{x}'"

                        parametersSQL = f"{x if isinstance(x,int) else 'NULL'}, {x if isinstance(x,float) else 'NULL'}, { x if isinstance(x,str) else 'NULL'}"

                        cur.execute(

                        f"INSERT INTO {parameter_table} VALUES ('{parName}', {parametersSQL});")

                else:

                    parametersSQL = f"{parValue if isinstance(parValue,int) else 'NULL'}, {parValue if isinstance(parValue,float) else 'NULL'}, { parValue if isinstance(parValue,str) else 'NULL'}"

                    cur.execute(

                    f"INSERT INTO {parameter_table} VALUES ('{parName}', {parametersSQL});")

                    

    else:

        print("No parameters given using default values.")



    sql_str = f"\

        do begin \

            lt_data = select * from {data_table}; \

            lt_control = select * from {parameter_table};\

            CALL _SYS_AFL.PAL_OUTLIER_DETECTION_FOR_TIME_SERIES(:lt_data, :lt_control, lt_res, lt_stats, lt_metri); \

            INSERT INTO {result_table} SELECT * FROM :lt_res; \

            INSERT INTO {stats_table} SELECT * FROM :lt_stats;\

            INSERT INTO {metri_table} SELECT * FROM :lt_metri; \

        end;"



    with cc.connection.cursor() as cur:

        cur.execute(sql_str)



    return cc.table(result_table).collect(), cc.table(stats_table).collect(), cc.table(metri_table).collect()

Functions of Plotting Results

def outlier_result_plot(dResults):

    dResults.sort_values(by = list(dResults)[0], inplace = True, ascending = True)

    raw_data = np.array(dResults['RAW_DATA'])

    residual = np.array(dResults['RESIDUAL'])

    outlier_score = np.array(dResults['OUTLIER_SCORE'])

    is_outlier = np.array(dResults['IS_OUTLIER'])

    plt.figure(figsize = (24,4.5))

    plt.subplot(1,4,1)

    plt.plot(raw_data)

    plt.grid()

    plt.title('RAW_DATA')

    plt.subplot(1,4,2)

    plt.plot(residual)

    plt.grid()

    plt.title('RESIDUAL')

    plt.subplot(1,4,3)

    plt.plot(outlier_score)

    plt.grid()

    plt.title('OUTLIER_SCORE')

    plt.subplot(1,4,4)

    plt.plot(is_outlier)

    plt.grid()

    plt.title('IS_OUTLIER')

def outlier_plot(dResults):

    dResults.sort_values(by = list(dResults)[0], inplace = True, ascending = True)

    raw_data = np.array(dResults['RAW_DATA'])

    is_outlier = np.array(dResults['IS_OUTLIER'])

    outlier_idx = np.array([],dtype = int)

    for i in range(len(is_outlier)):

        if is_outlier[i] == 1:

            outlier_idx = np.append(outlier_idx,i)

    plt.plot(raw_data)

    plt.scatter(outlier_idx,raw_data[outlier_idx],color = 'red')

    plt.grid()

    plt.title("series and outlier")

5 Steps to a Business Data Fabric

2023-10-17T19:31:56+02:00

Managing and leveraging data can be a daunting task. Businesses grapple with complex datasets from many different and unconnected sources, including operations, finance, marketing, customer success, and more. Plus, a lot of organizations are geographically dispersed and have complicated use cases or specific needs, like storing data across cloud, hybrid, multi-cloud, and on-premises devices.

A business data fabric offers a solution.

This data management architecture provides an integrated, semantically-rich data layer over underlying data landscapes to deliver scalable access to data without duplication. In other data platforms, when data is extracted from core systems, much of its original context is lost. A business data fabric preserves this context, helping ensure the data remains meaningful and relevant for decision-making, regardless of its origin.

This approach offers a number of benefits including enhanced data accessibility, improved data governance, and accelerated insights.

But how do you get started?

Implementation framework

Let’s look at a high-level framework for implementing a business data fabric architecture in your organization.

1. Data Ingestion

The first step is to ensure that all your data, whether it’s structured or unstructured, can be easily ingested into the system. A business data fabric, with its open data ecosystem, allows for simple data ingestion, regardless of the source or the format of the data.

2. Data Integration

Data from various sources must be integrated and transformed into a unified format easily consumed by data users. The interoperability of a business data fabric enables data from different sources to be combined and connected rather than being moved around.

3. Data Governance

With the growing complexity and volume of data, governance becomes an increasingly important topic. This includes ensuring data quality, privacy, and compliance with various regulations. A business data fabric ensures effective governance by maintaining metadata, lineage, and control measures.

4. Data Cataloging

This involves creating an inventory of data assets and their metadata. The catalog serves as a single source of truth for users to find, understand, and trust the data they need. It’s a critical component of the business data fabric that allows data consumers to understand the business semantics.

5. Data Consumption

This is about delivering the right data, in the right format, at the right time, to the right people. The business data fabric supports data federation, which enables unified and consistent access to data across diverse sources, reducing redundancy. It ensures data is presented in business-friendly terms and contexts, making it simple for data consumers to interpret and use the data for their specific use cases.

This is where SAP provides the foundation for a business data fabric: SAP Datasphere.

Transform your organization with SAP Datasphere

SAP Datasphere is a comprehensive data service that empowers users to provide seamless and scalable access to mission-critical business data.

It makes it easy for organizations to deliver meaningful data to every data consumer with business context and logic intact. As organizations need accurate data that is quickly available and described with business-friendly terms, this approach enables data professionals to permeate the clarity that business semantics provide throughout every use case.

In a major moment for our industry and customers, SAP is partnering with other open data partners — Databricks, Collibra, Confluent, DataRobot, and Google Cloud — to radically simplify customers’ data landscapes. By closely integrating their data and AI platforms with SAP Datasphere, organizations can access their mission-critical business data across any cloud infrastructure.

SAP Datasphere, and its open data ecosystem, is the technology foundation that enables a business data fabric.

Learn more

Read the new e-book to learn more about the practical applications of a business data fabric, including:

Why you need a business data fabric

How to implement a business data fabric

Five business data fabric use cases

Get started with the Five Steps to a Business Data Fabric Architecture e-book today.

Why Business Data is Fundamental to Artificial Intelligence

2023-10-30T23:09:56+01:00

The introduction of cloud computing has enabled organisations all over the world to store vast amounts of data in a cost-effective way as they digitally transform their business operations. Data has commonly been referred to as the 'new oil' and is where companies are looking to help increase their productivity going forward. However, in order to harness this technology the data needs to be relevant, reliable and responsible. As the adage goes, garbage in, garbage out.

AI serves as a powerful tool for extracting actionable insights from the vast amount of reliable data generated and stored within SAP systems. Combining AI with SAP BTP, advanced data analytics and machine learning algorithms becomes possible, allowing organisations to tap into the potential of their SAP data along with the AI technologies available in the market.

Five pillars support SAP BTP: App development, automation, integration, data analytics, and AI. These pillars interplay with embedded intelligent technologies like situation handling, machine learning, and analytics, all fully integrated within the SAP S/4HANA Cloud. Furthermore, side-by-side capabilities through SAP BTP offer additional intelligent industry functionalities like Intelligent Situation Automation, SAP Build Process Automation, and chatbot technology.

By harnessing artificial intelligence, SAP BTP combines business data from S4 with external data, enabling the creation of increasingly precise models in real-time. This ensures a versatile and agile platform that propels innovation while retaining a clean digital core. This demonstrates a shift from traditional systems of record to systems of intelligence.

AI-Powered Capabilities

SAP has embedded AI into its products for many years, from journal reconciliations in S4 to AI-powered writing assistants aimed to streamline HR-related tasks in Success Factors.

These innovative functionalities are not merely theoretical but are practically applicable, ensuring HR admins, managers, and employees can operate more efficiently. In fact, these innovations exist across all the functions in your SAP landscape such as procurement, finance, and human resources.

How to get started?

Automation is pivotal for managing manual and repetitive tasks, especially those involving the consolidation and manipulation of data from diverse sources like MS Excel, vendor portals, and SAP systems. High-volume processes, often exceeding 1000 steps a day—such as data migrations and approvals—and those requiring access to multiple applications, can be streamlined, ensuring seamless operation across your SAP environment.

SAP has provided templates across all the business functions to accelerate these initiatives, as shown below.

A More Advanced AI Use Case

The transition from a rules-based approach to an AI-empowered, data-driven model is illustrated through an example case study of an Australian customer. A decade ago, an employee scripted manual “if:then” statements for road upgrades; a process that has now been revolutionised by AI. AI can now analyse these rules and infuse them with real-time data like weather, road usage, and vehicle types. As a part of their operations, this customer assesses road conditions using specialised trucks called profilometers, generating colossal data volumes that outpace their storage capacities. SAP BTP, however, can house this data in expansive lakes, giving AI the agility to model exponentially precise “if:then” statements.

The shift will allow this customer to manage large datasets from disparate sources seamlessly, scaling memory and compute capabilities to handle big data without losing granularity. Moreover, unlike fixed rules, the AI algorithms continually evolve based on data, thereby ensuring maintenance and road upgrade strategies that are timely, relevant, and efficient.

In the realm of road maintenance, AI’s practical application is manifest, where even a small percentage improvement can result in significant savings for this customer. This financial efficacy, combined with the potential to extend the useful life of assets, underscores the tangible, impactful benefits of combining AI with SAP BTP.

In Summation

Early AI integration can offer businesses a decisive advantage. SAP’s AI vision isn’t just about pioneering technology; it's about tangible, real-world applications. From simple tools deployable within days to intricate endeavours with broad impact.

If you’d like to find out about the value AI can bring to businesses through automation and explore other use cases, then visit the SAP Business AI website.

SAP and DXC team plan to deliver RISE with SAP S/4HANA Cloud, in customer data centers and co-location facilities, creating a new and powerful platform for digital transformation

2023-11-06T18:39:34+01:00

SAP and DXC aim to deliver RISE with SAP S/4HANA Cloud, private edition, customer data center option as a turn-key service delivered by DXC. The new service is ideally suited for Private Cloud customers and other managed services customers who wish to run SAP either from their own data center or a DXC managed data center and get the transformational benefits of RISE with SAP.

In this blog, DXC reaffirms its commitment as a Partner Managed Cloud (PMC) service provider of RISE with SAP by expanding its distinct deployment capabilities already announced with DXC Hyperscaler solutions to support the customer data center option.

This partnership empowers DXC and SAP to offer a comprehensive catalog of managed services and extraordinary opportunities that surpass what each entity could achieve independently, benefiting our mutual customers.

RISE with SAP S/4HANA Cloud, private edition, customer data center option or “CDC” represents the hybridization of SAP’s strategic RISE cloud solution, where a customer can run S/4HANA as a cloud service from their data center; while accessing SAP BTP and SAP Signavio, and all the other innovative RISE components from the public cloud – aka a mixture of internal and external cloud services. For more information about CDC, please visit this dedicated web page: Customer data center option | RISE with SAP

Similar to SAP’s cloud solutions, CDC provides diverse deployment options on Lenovo, HPE and Dell infrastructure. DXC enhances this offering further by enabling delivery of SAP and non-SAP Infrastructure as a Service (IaaS), platform as a service (PaaS), and software as a service (SaaS), all backed by a top service level agreement.

With DXC, SAP’s strategic RISE Cloud offering deployable in Customer Data Centers, is specifically designed to:

Enhance RISE with SAP by harnessing DXC’s expertise in delivering SAP and non-SAP managed services and enabling mutual customers to become more agile and alleviate the challenges often associated with digital transformation projects. This is achieved through best-of-breed solutions and a wealth of experience and skills.

Establish a secure infrastructure and platform for SAP within the customer data centers, backed, managed, designed to perform by SAP and DXC.

Empower customers who want to run in their data center (on-premise) while staying aligned with SAP’s cloud innovation agenda, including ML/AI (Machine Learning / Artificial Intelligence), LLMS (large language models), etc.

Meet the needs of industries with strict regulatory compliance requirements that may prevent running SAP in a public shared Hyperscaler such as utilities, public sector, healthcare, pharmaceuticals, aerospace & defense, etc.

Provide extended services to address specific data sovereignty needs of customers, governments, and industry stakeholders by keeping sensitive data within their national boundaries, governed by local laws.

Offer an innovative approach for those seeking a cloud OpEx model while benefiting from a high-performance dedicated onPrem system with minimal latency.

Benefit from a dedicated on-premise setup, without the data center environment having to be managed by the customer.

To learn more about the DXC & SAP cloud solutions and the DXC Premier Services for RISE with SAP, please visit the following link: DXC Premier Services for RISE with SAP

All thoughts and questions are welcome, please share your comments below to contribute to this discussion.

Joseph Zarb
Head of RISE with SAP – Customer Data Center
SAP RISE Global GTM Execution
10 Hudson Yards, 51st Floor, New York NY 10001 USA

j.zarb@sap.com

How to connect SAP ECC + S/4HANA on-prem and private cloud with Confluent Cloud or Confluent Platform (1)

2023-12-01T15:27:08+01:00

This blog post explains how to connect SAP ECC or S/4HANA on-prem and private cloud editions with Confluent Cloud or Confluent Platform.
Whether you're using SAP NetWeaver Event-enablement Add-on or ASAPIO Integration Add-on, this step-by-step guide provides all you need as a first step to enable SAP systems for communication with a Confluent broker.

Architecture / Connection types

When connecting SAP with Confluent, it is important to understand that there are two very different approaches in terms of connection architecture.

Using a REST Proxy

"Confluent REST Proxy for Kafka" or a similar product is the standard approach currently and therefore can be considered mandatory for the connectivity.
Please see https://github.com/confluentinc/kafka-rest for details.

Reason for using REST instead of AMQP is, that SAP ECC does not support streaming protocols and 3rd-party-libraries cannot be used in SAP-certified Add-ons.

SAP-to-Confluent Architecture

Direct connect/V3:

A Connector for Confluent Cloud direct connect (V3 REST API) is available as pre-release in the download section for registered ASAPIO customers. This connector can only handle outbound connectivity at the time of this blog being published (November 2023). For inbound connectivity, the REST proxy approach above is still required.

AMQP support:

AMQP support is planned to be released by ASAPIO in 2024, for S/4HANA systems.

System prerequisites

Before diving into the integration process, make sure you have the following components available:

Software components required on your SAP system

SAP NetWeaver Event-enablement Add-on (SAP Event Mesh Edition)

or, alternatively, ASAPIO Integration Add-on - Framework (full version)

ASAPIO Connector for Confluent (using REST proxy)

For direct connect/V3 REST API, a pre-release Connector for Confluent Cloud is available for registered ASAPIO customers.

Non-SAP components

Please make sure you have endpoint URI and authorization data at hand for:

Confluent Components:
- Confluent REST Proxy for Kafka (More info)
- Confluent Cloud
- Or, alternatively Confluent Platform

Licensing

All software above requires the purchase of appropriate licenses.

Set-up Connectivity

1. Create RFC Destinations to Confluent REST Proxy

Transaction: SM59

Type: "G" (HTTP Connection to External Server)

Target Host: Endpoint of Confluent REST Proxy

Save and perform a "Connection Test" to ensure HTTP status code 200.

2. Set-up Authentication to REST Proxy

Pre-requisites: Obtain user and password for the REST proxy or exchange certificates with the SAP system.

Transaction: SM59

Choose the correct RFC destination, go to "Logon & Security," and select authentication method:

"Basic Authentication" with username and password

SSL certificate-based authentication

3. Set-up Basic Settings

Activate BC-Sets: Use SCPR20 to activate BC-Sets for cloud adapter and codepages.

Configure Cloud Adapter: In SPRO, go to ASAPIO Cloud Integrator, Maintain Cloud Adapter, and add an entry for the Confluent connector.

4. Set-up Connection Instance

Transaction: SPRO or /ASADEV/68000202

Add a new entry specifying connection details, RFC destination, ISO code, and cloud type.

5. Set-up Error Type Mapping

Create an entry mapping response codes to message types.

6. Set-up Connection Values

Maintain default values for the connection to Confluent in Connections -> Default values.

Default Attribute	Default Attribute Value
KAFKA_ACCEPT	application/vnd.kafka.v2+json
KAFKA_CALL_METHOD	POST
KAFKA_CONTENT_TYPE	application/vnd.kafka.json.v2+json (or application/vnd.kafka.jsonschema.v2+json)

Set-up Outbound Messaging

1. Create Message Type

Transaction: WE81

Add a new entry specifying a unique name and description for the integration.

2. Activate Message Type

Transaction: BD50

Activate the created message type.

3. Set-up additional settings in 'Header Attributes'

Configure the topic, fields for the key, and schema IDs for key/value schemas.

4. Set up 'Business Object Event Linkage'

Link the configuration of the outbound object to a Business Object event.

Send a "Simple Notifications" event for testing

1. Create Outbound Object Configuration

Transaction: SPRO or /ASADEV/68000202

Select the created connection and go to Outbound Objects.

Add a new entry specifying the object, extraction function module, message type, load type, and response function.

2. Test Outbound Event Creation

In the example above, please pick any test sales order in transaction /nVA02 and force a change event, e.g., by changing the requested delivery date on header level.

3. Check monitor transaction for actual message and payload

Access to monitor application
User must have PFCG role /ASADEV/ACI_ADMIN_ROLE to access Add-On monitor.

Use transaction /n/ASADEV/ACI_MONITOR to start the monitor.
You will see the entry screen with a selection form on top.

Congrats, you are now able to send data out to Confluent.

In the next blog, we will create a custom payload for the event.

Event-driven architecture: Simplifying Payload Creation with Payload Designer

2023-12-11T08:42:28+01:00

Overview

Configuring payloads for the SAP NetWeaver Add-On for Event enablement has become remarkably straightforward, all thanks to the Payload Designer.

This powerful tool enables you to effortlessly add tables, define relationships with inner joins, left outer joins, rename tables and fields, all through simple configuration.

In this blog post, we will guide you through the process of configuring your payload with just a few easy steps.

If you you are new to SAP Enterprise Messaging in SAP ERP systems and the Integration Add-on, you can have a look at following Blog-Posts:

Event-driven architecture – now available for SAP ECC users

SAP Enterprise Messaging for SAP ERP: HowTo-Guide (Part 1 - Connectivity)

SAP Enterprise Messaging for SAP ERP: HowTo-Guide (Part 2 - First use case)

Data Events scenario With SAP Event Enablement Add-on for SAP S/4HANA, SAP Event Mesh and SAP Cloud Integration: Step-by-Step Guide

Emit Data Events from SAP S/4HANA or SAP ECC through SAP NetWeaver Add-On for Event Enablement

How to connect SAP ECC + S/4HANA on-prem and private cloud with Confluent Cloud or Confluent Platform (1)

System Prerequisite

One of the Software components needs to be availabvle on your system:

SAP NetWeaver Event-enablement Add-on (SAP Event Mesh Edition)

or, alternatively, ASAPIO Integration Add-on – Framework (full version)

Licensing

All software above requires the purchase of appropriate licenses.

Creating Custom Payloads with Payload Designer

Step 1: Creating Custom Payloads with Payload Designer

1. Navigate to transaction /n/ASADEV/DESIGN.

Click the "Create Payload Designer" button on the main screen and fill in the necessary fields. This action creates the initial version of the payload and takes you to the main screen.

Creating a Payload Designer version in transaction /n/ASADEV/DESIGN

2. Use the join builder to establish table joins.

Insert new tables or custom views.

Adjust table joins through field connections.

Return to the main screen.

Note: Parent relationships in the Table section and key fields in the Field section are automatically determined based on hierarchical sorting.

3. Add additional payload fields from the tables:

Double click on the preferred table.

Select one or multiple fields.

Fields can be reordered using sequence numbers.

Step 2: Outbound Configuration Using Payload Designer

To configure outbound objects using Payload Designer, follow these steps:

Access transaction SPRO.

Navigate to IMG > Cloud Integrator – Connection and Replication Object Customizing or directly to transaction: /ASADEV/68000202.

Select the created connection.

Go to the "Outbound Objects" section.

Add a new entry and specify the following:
- Object: Name of the outbound configuration.
- Extraction Func. Module: /ASADEV/ACI_GEN_PDVIEW_EXTRACT.
- Load Type: Incremental Load.
- Trace: Activate for testing purposes.
- Formatting Func.: /ASADEV/ACI_GEN_VIEW_FORM_CB.
- Field Payload View Name: Payload Name.
- Field Payload View Version: Payload Version.

Step 3: See your Payload in the ACI_Monitor

Navigate to transaction /n/ASADEV/ACI_MONITOR

Conclusion

Payload Designer simplifies SAP interface configuration by providing a user-friendly, code-free approach to defining payloads. With its intuitive interface and powerful features, it enables organizations to streamline their data integration processes and improve efficiency in managing payloads for event messages sent to the SAP Event Mesh.

SAP HANA Cloud, data lake Files への最初のアクセス設定

2023-12-19T07:37:51+01:00

このブログは、2022 年 11 月 15 日に SAP ジャパン公式ブログに掲載されたものを SAP ジャパン公式ブログ閉鎖に伴い転載したものです。

このブログは、jason.hinsperger が執筆したブログ「Setting Up Initial Access to HANA Cloud data lake Files（2021 年 8 月 5 日）の抄訳です。最新の情報は、SAP Community の最新ブログやマニュアルを参照してください。

SAP HANA Cloud, data lake は、フォーマットのあらゆるタイプのデータのネイティブ形式でのストレージをサポートしています。

マネージドファイルストレージは、外部のハイパースケーラーアカウントでストレージを設定することなく、あらゆるタイプのファイルをセキュアに格納するストレージを提供します。

これは、高速 SQL 分析を行う目的で SAP HANA Cloud, data lake にデータを高速投入する必要がある場合や、何等かの目的でデータを extract する場合にとても便利です。

SAP HANA Cloud, data lake Files への初回のアクセス設定は、特にデータベースのバックグラウンドを持ち、オブジェクトストレージや REST API に詳しくない場合には少し難しいプロセスかもしれません。

以下は、私が SAP HANA Cloud, data lake files をテストするのに使用したプロセスです。

SAP HANA Cloud, data lake files はユーザーセキュリティーやアクセスを認証経由で管理するため、ユーザーアクセスの設定には署名付きの証明書の生成が必要です。

認証局へのアクセスがない場合には、OpenSSL を利用する以下のプロセスを使用してCAと署名付きのクライアント証明書を作成してSAP HANA Cloud, data lake files 設定を更新することができます。
私はこれまで何度もこれでテストしたことがあるので読者の方でも同様に行えるでしょう。

最初に、CA バンドルを作成してアップロードする必要があります。

以下の OpenSSL コマンドを使用して CA を生成できます。

openssl genrsa -out ca.key 2048

次に、CA の公開証明書（この場合は 200 日間有効）を作成します。共通名を最低限入力し、他のフィールドを必要に応じて入力します。

openssl req -x509 -new -key ca.key -days 200 -out ca.crt

クライアント証明書の署名リクエストを作成する必要があります。共通名を最低限提供し、他のフィールドを必要に応じて入力します。

openssl req -new -nodes -newkey rsa:2048 -out client.csr -keyout client.key

最後に、クライアント証明書を作成します（この場合は 100 日有効）。

openssl x509 -days 100 -req -in client.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out client.crt

*備考 – CA とクライアント証明書のフィールドがすべて全く同じにならないようにしてください。さもないと、自己署名証明書とみなされ、以下の証明書の認証が失敗します。

証明書が任意の CA によって署名されたことを認証するには（SAP HANA Cloud, data lake に CA 証明書をアップロードしたときにクライアント証明書を認証するために使用できるとわかるように）

openssl verify -CAfile ca.crt client.crt

次に、SAP HANA Cloud Central でインスタンスを開き、「Manage File Container」を選択し、SAP HANA Cloud, data lake files ユーザーを設定します。

設定を編集し、「Trusts」セクションの「Add」を選択します。前に生成した ca.crt をコピーまたはアップロードし、「Apply」をクリックします。すぐには「Manage File Container」スクリーンはクローズしないでください。

これで、管理されたファイルストレージにアクセスできるようにユーザーを構成できるようになりました。

「Authorizations」セクションをスクロールダウンして、「Add」を選択します。新しい入力欄が表示されます。

ユーザーのロールをドロップダウンリストから選択します（デフォルトでは admin とユーザーロールがあります）。

ここからが少し難しいところです。

リクエストしたときに、ストレージゲートウェイ（SAP HANA Cloud, data lake files へのエントリーポイント）がどのユーザーに対して認証するのか決定できるようクライアント証明書からパターン文字列を追加する必要があります。

パターン文字列を生成するにあたり、2 つのオプションがあります。
以下の OpenSSL コマンドを使用して、パターン文字列を生成することができます(アウトプットに表示される「subject= 」　プレフィックスは省略します) 。

openssl x509 -in client.crt -in client.crt -nameopt RFC2253 -subject -noout

あるいは、スクリーンにある「generate pattern」オプションを使用することもできます。
これは、ダイアログボックスを開き、クライアント証明書をアップロード/貼り付けて、自動でパターンを生成します。
証明書は保存せず、パターン文字列だけを保存することに注意してください。

「Apply」をクリックして、権限入力欄にパターン文字列を追加します。

パターン文字列は、ワイルドカードも使用可能なため、特定のロールの証明書のクラスを認証できることに注意してください。証明書のパターンが複数の認証と一致する場合、使用する認証は、特定の認証エントリーに設定された「Rank」値セットによって制御されます。

これで、REST api 経由で SAP HANA Cloud, data lake files にアクセスして使用することができます。

私のテストではうまくいったcurl コマンドのサンプルがあります。接続が成功しているかどうかvalicate できます。(インスタンス ID とファイル REST API エンドポイントは HANA Cloud Centralのインスタンス詳細からコピーすることができます)。

上記で生成して認証の作成に使用したクライアント証明書とキーを使用してください。

curl は少し tricky なことに注意してください。Windows で試していましたが、Windows 10 バージョン用の curl を動作させることができませんでした。最終的に新しい curl version (7.75.0) をダウンロードしたところ、機能しましたが、Windows で curl から証明書ストアへどうアクセスするのかわからなかったため、SAP HANA Cloud サーバー証明書の認証をスキップするために –insecure’ オプションを使用しなければなりませんでした。

curl --insecure -H "x-sap-filecontainer: <instance_id>" --cert ./client.crt --key ./client.key "https://<Files REST API endpoint>/webhdfs/v1/?op=LISTSTATUS" -X GET

上記のコマンドは、以下を返します (空のSAP HANA Cloud, data lake)。

{"FileStatuses":{"FileStatus":[]}}

これで、SAP HANA Cloud, data lake files を使用して、あらゆるタイプのファイルを SAP HANA Cloud に格納するための設定は終了です。

ファイルの管理でサポートされている REST API と引数のフルセットについては、マニュアルを参照ください。

オリジナルのブログはここまでです。

SAP HANA Cloud, HANA データベースから SAP HANA Cloud, data lake リレーショナルエンジンへの最速のデータ移動方法と移動速度テスト結果

2023-12-19T07:47:55+01:00

このブログは、2022 年 11 月 17 日に SAP ジャパン公式ブログに掲載されたものを SAP ジャパン公式ブログ閉鎖に伴い転載したものです。

このブログは、douglas.hoover が執筆したブログ「The fastest way to load data from HANA Cloud, HANA into HANA Cloud, HANA Data Lake（2022 年 3 月 8 日）の抄訳です。オリジナルのブログページでのコメントのやりとりなどもぜひご参照ください。

最新の情報は、SAP Community の最新ブログやマニュアルを参照してください。

このブログは、SAP HANA データ戦略ブログシリーズの1つです。
https://blogs.sap.com/2019/10/14/sap-hana-data-strategy/

概要

SAP HANA Cloud の HANA データベースから SAP HANA Cloud, data lake リレーショナルエンジンにより大きなテーブルを移動するお客様が増えるにつれ、SAP HANA Cloud の HANA データベースから SAP HANA Cloud, data lake リレーショナルエンジンへのデータ移動の最速の方法について聞かれるようになりました。

より正確にいうと、SAP HANA Cloud, data lake リレーショナルエンジン仮想テーブルに対してシンプルに HANA INSERT を実行するよりも高速な方法があるのか聞かれるようになりました。

なぜ SAP HANA Cloud の HANA データベースから SAP HANA Cloud, data lake リレーショナルエンジンに大きなテーブルを移動するお客様がいるのか疑問に思う方もいるかもしれません。

最もよくある利用ケースは、大きなデータベースの最初のマテリアライズあるいは古いデータをSAP HANA Cloud, data lake リレーショナルエンジンにアーカイブするためです。

これらのお客様の大半は、通常 SAP HANA Smart Data Integration (SDI) を使用してこのマテリアライゼーションを行っており、これらのテーブルを最新の状態にキープするために SDI の Flowgraphs や SDI のリアルタイムレプリケーションを使用した Change Data Capture を同じインターフェースを使用して行っています。

SAP HANA SDI の詳細については以下のブログを参照してください:

SAP HANA データ戦略: リアルタイム Change Data Capture を含む高速データ投入（英語）

https://blogs.sap.com/2020/06/18/hana-data-strategy-data-ingestion-including-real-time-change-data-capture/

SAP HANA データ戦略: 高速データ投入 – 仮想化（英語）

https://blogs.sap.com/2020/03/09/hana-data-strategy-data-ingestion-virtualization/

ここで実験するデータ移動に関するシンプルな方法は以下の 3 種です:

シンプルに SAP HANA Cloud, data lake リレーショナルエンジン仮想テーブルへの HANA INSERT

HANA 仮想テーブルにアクセスし、シンプルに SAP HANA Cloud, data lake リレーショナルエンジンから data lake INSERT

HANA エクスポートと data lake LOAD

こう質問する人もいるかもいれません：

「なぜ SAP HANA Cloud, HANA データベース経由で行うのか?」

「なぜ SAP HANA Cloud, data lake リレーショナルエンジンに直接データをロードしないのか?」

繰り返しますが、これらのお客様はターゲットとして HANAオブジェクト（ローカルまたは仮想）を必要とする HANA Enterprise Information Management （EIM）ツールを使用しています。

将来のブログでは、SAP IQ クライアントサイドロード、Data Services、Data Intelligence 経由の SAP HANA Cloud, data lake リレーショナルエンジンへの直接のデータロードについて説明したいと思います。

SAP HANA Cloud, HANA データベースから SAP HANA Cloud, data lake リレーショナルエンジンへの最速のデータロード方法は、SAP HANA Cloud, data lake リレーショナルエンジンから、SAP HANA Cloud, HANA データベースの物理テーブルを指定するプロキシテーブルを作成するために「create existing local temporary table」を使用して HANA テーブルから SELECT で INSERT 文を実行する方法です。
（詳細は以下のテーブル参照）

方法	行	データサイズ	時間（秒）
HANA Cloud, data lake/IQ INSERT..SELECT	28,565,809	3.3 GB	52.86
*HANA Cloud, data lake/IQ LOAD Azure ファイルシステム	28,565,809	3.3 GB	116 (1分56秒)
*HANA Cloud, data lake/IQ LOAD Data Lake ファイルシステム	28,565,809	3.3 GB	510 (8分30秒)
HANA INSERT..SELECT	28,565,809	3.3 GB	1277 (21分7秒)

* HANA データベースからファイルシステムへのデータエクスポート時間は含めていません。

28,565,809 行、約 3.3 GB の TPC-D ORDERS テーブルを使用し、SAP HANA Cloud, data lake リレーショナルエンジンの小さめの設定でロードしています。

以下の SAP HANA Cloud 設定を使用してテストしました

SAP HANA Cloud, HANA データベース：60 GB / 200 GB、4 vCPU

SAP HANA Cloud, data lake リレーショナルエンジン：16 TB、ワーカー 8 vCPU /コーディネーター 8 vCPU

SAP HANA Cloud, data lake リレーショナルエンジンでは、より多くの並列処理を実行するには（特により大きなテーブルの場合）より多くの vCPU 数を設定します。

より多くの TB を SAP HANA Cloud, data lake リレーショナルエンジンに追加することで、より大きなディスク I/O スループットを得ることができます。

テストで使用した詳細設定と構文

SAP HANA Cockpit をスタートして SAP HANA Cloud を管理します。

SAP HANA Cloud, data lake リレーショナルエンジンから「Open in SAP HANA Database Explorer」を選択します。

もしこれが初回であれば、SAP HANA Cloud, data lake リレーショナルエンジンの ADMIN パスワードを求められます。

SQL コマンドを入力し、クリックして実行します。

以下を作成するための SAP HANA Cloud, data lake コマンド

SAP HANA Cloud, data lake リレーショナルエンジンから、SAP HANA Cloud HANA データベースへ接続しているサーバー

データをロードして作成するためのローカルの SAP HANA Cloud, data lake リレーショナルエンジンテーブル

SAP HANA Cloud インスタンスのテーブルを指定するローカルのテンポラリープロキシーテーブル

CREATE SERVER

–DROP SERVER DRHHC2_HDB

CREATE SERVER DRHHC2_HDB CLASS ‘HANAODBC’ USING ‘Driver=libodbcHDB.so;ConnectTimeout=60000;ServerNode=xyxy.hana.prod-us10.hanacloud.ondemand.com:443;ENCRYPT=TRUE;ssltruststore=xyxy.hana.prod-us10.hanacloud.ondemand.com;ssltrustcert=Yes;UID=DBADMIN;PWD=xyxyx;’

CREATE TARGET TABLE

CREATE TABLE REGIONPULL (

R_REGIONKEY   bigint                  not null,

R_NAME            varchar(25)        not null,

R_COMMENT    varchar(152)      not null,

primary key (R_REGIONKEY)

);

CREATE local temporary PROXY

create existing local temporary table REGION_PROXY (

R_REGIONKEY   bigint                  not null,

R_NAME                          varchar(25)        not null,

R_COMMENT    varchar(152)      not null,

primary key (R_REGIONKEY)

)

at ‘DRHHC2_HDB..TPCD.REGION’;

INSERT DATA

INSERT into REGIONPULL SELECT * from REGION_PROXY;

Commit;

–1.9s

ORDERS テーブルテストコマンド

–DROP TABLE ORDERSPULL;

create table ORDERSPULL (

O_ORDERKEY           BIGINT               not null,

O_CUSTKEY            BIGINT               not null,

O_ORDERSTATUS        VARCHAR(1)           not null,

O_TOTALPRICE         DECIMAL(12,2)        not null,

O_ORDERDATE          DATE                 not null,

O_ORDERPRIORITY      VARCHAR(15)          not null,

O_CLERK              VARCHAR(15)          not null,

O_SHIPPRIORITY       INTEGER              not null,

O_COMMENT            VARCHAR(79)          not null,

primary key (O_ORDERKEY)

);

create existing local temporary table ORDERS_PROXY (

O_ORDERKEY           BIGINT               not null,

O_CUSTKEY            BIGINT               not null,

O_ORDERSTATUS        VARCHAR(1)           not null,

O_TOTALPRICE         DECIMAL(12,2)        not null,

O_ORDERDATE          DATE                 not null,

O_ORDERPRIORITY      VARCHAR(15)          not null,

O_CLERK              VARCHAR(15)          not null,

O_SHIPPRIORITY       INTEGER              not null,

O_COMMENT            VARCHAR(79)          not null

)

at ‘DRHHC2_HDB..TPCD.ORDERS’;

INSERT into ORDERSPULL SELECT * from ORDERS_PROXY;

Commit;

–59s

–52.86 s

SELECT COUNT(*) FROM ORDERSPULL;

–28,565,809

LINEITEM テーブルテストコマンド

create table LINEITEM (

L_ORDERKEY           BIGINT               not null,

L_PARTKEY            BIGINT               not null,

L_SUPPKEY            BIGINT               not null,

L_LINENUMBER         INTEGER              not null,

L_QUANTITY           DECIMAL(12,2)        not null,

L_EXTENDEDPRICE      DECIMAL(12,2)        not null,

L_DISCOUNT           DECIMAL(12,2)        not null,

L_TAX                DECIMAL(12,2)        not null,

L_RETURNFLAG        VARCHAR(1)              not null,

L_LINESTATUS         VARCHAR(1)              not null,

L_SHIPDATE           DATE                 not null,

L_COMMITDATE         DATE                 not null,

L_RECEIPTDATE        DATE                 not null,

L_SHIPINSTRUCT       VARCHAR(25)          not null,

L_SHIPMODE           VARCHAR(10)          not null,

L_COMMENT            VARCHAR(44)          not null,

primary key (L_ORDERKEY,L_LINENUMBER)

);

create existing local temporary table LINEITEM_PROXY (

L_ORDERKEY           BIGINT               not null,

L_PARTKEY            BIGINT               not null,

L_SUPPKEY            BIGINT               not null,

L_LINENUMBER         INTEGER              not null,

L_QUANTITY           DECIMAL(12,2)        not null,

L_EXTENDEDPRICE      DECIMAL(12,2)        not null,

L_DISCOUNT           DECIMAL(12,2)        not null,

L_TAX                DECIMAL(12,2)        not null,

L_RETURNFLAG         VARCHAR(1)              not null,

L_LINESTATUS         VARCHAR(1)              not null,

L_SHIPDATE           DATE                 not null,

L_COMMITDATE         DATE                 not null,

L_RECEIPTDATE        DATE                 not null,

L_SHIPINSTRUCT       VARCHAR(25)          not null,

L_SHIPMODE           VARCHAR(10)          not null,

L_COMMENT            VARCHAR(44)          not null

)

at ‘DRHHC2_HDB..TPCD.LINEITEM’;

INSERT into LINEITEM SELECT * from LINEITEM_PROXY;

Commit;

— Rows affected:       114,129,863

— Client elapsed time: 4 m 52 s

まとめ

SAP HANA Cloud, HANA データベースから SAP HANA Cloud, data lake リレーショナルエンジンへのデータの最速のロード方法は、SAP HANA Cloud, data lake リレーショナルエンジンから、SAP HANA Cloud, HANA データベースの物理テーブルを指定するプロキシテーブルを作成するために「create existing local temporary table」を使用してHANA テーブルから SELECT で INSERT 文を実行する方法です。

これは、このブログで紹介しているコマンドを使用することで、とても容易に行うことができます。

あるいは、これらのコマンドを生成するプロシージャーを作成すると、さらに容易になります。（下の Daniel のブログを参照してください。）

以下も参考にしてください

Jason Hinsperger の「SAP HANA Cloud, data lakeへのデータロード」のブログでは、SAP HANA Cloud, data lake リレーショナルエンジンの vCPU 数やデータベースサイズを増やすとロードのパフォーマンスにどのような影響があるか説明しています。

https://blogs.sap.com/?p=1866471

Daniel Utvich の「SAP HANA Cloud, HANA データベースから SAP HANA Cloud, data lake へのデータの高速移動」のブログでは、システムテーブル情報をベースにした SQL コードを生成するプロシージャーの例を紹介しています。

https://blogs.sap.com/?p=1867099

SAP HANA データ戦略ブログインデックス

SAP HANA Data Strategy

SAP HANA Data Strategy: HANA Data Modeling a Detailed Overview

HANA Data Strategy: Data Ingestion including Real-Time Change Data Capture

HANA Data Strategy: Data Ingestion – Virtualization

HANA Data Strategy: HANA Data Tiering

オリジナルのブログはここまでです。

Using Data lake and SQL to create custom reporting models

2023-12-20T10:29:19+01:00

Overview: Through a series of blogs, would like to share scripts that utilize data lakes built for SAP tables, to create reporting models that represent certain sections of SAP screens/transactions or areas of analysis. Hopefully, these scripts serve as an accelerator to cater multiple use cases.For this first script we'll look at building User Status using JCDS and JEST.

Background: Most structured reporting tools (eg:BW) or ETL processes don’t bring in all fields available in source systems, these are deployed using a predefined datamodel (dimensions/measures) that collects fields from different tables and limit what’s initially available for reporting, restricting the ability of Analysts to explore additional fields.

Eg: Financial reporting models built using ACDOCA or BSEG or FAGLFLEXA tables- Irrespective of the approach(CDS views or BW models), these don’t bring all fields from the source as they mostly focus on meeting initial requirements from primary stakeholders.

Additional fields maybe available in SAP transaction systems and to make them available for reporting, multiple cycles of enhancements are implemented, reflecting a dependency on different support teams and time involved to meet these requirements.

Solution With a data lake that replicates tables from SAP, Analysts working with functional resources can build models that meet their specific needs. If replications are managed through SAP SLT, then it enables near realtime (possible delay of a few seconds) reporting. Review must be done with functional consultants to ensure that tables being replicated dont have confidential content.

As part of this blog series, we shall see some models that reflect SAP transactions or commonly used reporting metrics.

Factors that are not addressed in this blog but must be considered:

Organization of reporting models and data lake tables, if not using similar reference as SAP Application components. This becomes Important for managing confidentiality and ensuring personal information of customers, employees and vendors is only available to those that need it as part of their business roles.

Security models needed for

Here's the first script:

Script for Plant maintenance object status

Need: Near real time availability of object status’ for Plant maintenance, eg: an emergency order created for addressing critical equipment failure, the status and progress of investigation needs to be communicated through the manufacturing channels for them to manage bottlnecks in production.

Solution: Below layout provides a simplified overview of how different tables are joined together with their respective fields.

Tables used:

JEST-Individual Object Status

JCDS-Change Documents for System/User Statuses (Table JEST)

JSTO- Status object information

TJ02-System status

TJ02T - System status texts

TJ04- Status control for object type

TJ30- User Status

TJ30T- Texts for User Status

Object status tables relationship overview

Script below provides active status’ for all Plant maintenance objects . To view all instances of status changes remove the JEST.INACT is NULL clause/restriction. Each table and the filter condition starts with a comment(begins with --) to show what it represents. May have to tweak formatting based on tool being used, especially the comments section.

SELECT



JEST.OBJNR AS OBJECT_NUMBER,



JSTO.OBTYP AS OBJECT_CATEGORY,



SUBSTR(JEST.OBJNR, 3) AS OBJECT,



JEST.STAT AS OBJECT_STATUS,



(CASE WHEN LEFT(JEST.STAT, 1) = ‘I’ THEN ‘SYSTEM’ ELSE ‘USER’ END) ASSTATUS_TYPE,



(CASE WHEN LEFT(JEST.STAT, 1) = ‘I’ THEN TJ02T.TXT04



ELSE TJ30T.TXT04 END) AS STATUS_SHORT_TEXT,



(CASE WHEN LEFT(JEST.STAT, 1) = ‘I’ THEN TJ02T.TXT30



ELSE TJ30T.TXT30 END) AS STATUS_LONG_TEXT,



JSTO.STSMA AS STATUS_PROFILE,



JCDS.USNAM AS STATUS_CHANGED_BY,



JCDS.UDATE AS STATUS_CHANGED_DATE,



JCDS.UTIME AS STATUS_CHANGED_TIME,



JCDS.CHIND AS STATUS_CHANGED_TYPE,



TJ04.INIST AS SYSTEM_STATUS_INITIAL_STATUS_FLAG,



TJ04.STATP AS SYSTEM_STATUS_DISPLAY_PRIORITY,



TJ04.LINEP AS SYSTEM_STATUS_LINE_POSITION,



TJ02.NODIS AS SYSTEM_STATUS_NO_DISPLAY_INDICATOR,



TJ02.SETONLY AS SYSTEM_STATUS_SET_ONLY_INDICATOR,



TJ30.STONR AS USER_STATUS_WITH_NUMBER,



TJ30.INIST AS USER_STATUS_INITIAL_STATUS_FLAG_INDICATOR,



TJ30.STATP AS USER_STATUS_DISPLAY_PRIORITY,



TJ30.LINEP AS USER_STATUS_LINE_POSITION,



CASE WHEN TJ30.LINEP = ’01’ THEN TJ30T.TXT04 END ASPOSITION1_USER_STATUS



FROM JEST --Individual object status



INNER JOIN JCDS -- Change Documents for System/User Statuses (Table JEST)



ON JEST.OBJNR = JCDS.OBJNR



AND JEST.STAT = JCDS.STAT



AND JEST.CHGNR = JCDS.CHGNR



LEFT JOIN JSTO -- Status profile information for objects



ON JEST.OBJNR = JSTO.OBJNR



LEFT JOIN TJ02T --System status texts



ON JEST.STAT = TJ02T.ISTAT



AND TJ02T.SPRAS = ‘E’



LEFT JOIN TJ04 -- System status control config table 2



ON JEST.STAT = TJ04.ISTAT



and TJ04.OBTYP = JSTO.OBTYP



LEFT JOIN TJ30T -- User status texts



ON JSTO.STSMA = TJ30T.STSMA



AND JEST.STAT = TJ30T.ESTAT



AND TJ30T.SPRAS = ‘E’



LEFT JOIN TJ02 ”System status config table 1



ON JEST.STAT = TJ02.ISTAT



LEFT JOIN TJ30 -- User status config table 1



ON JSTO.STSMA = TJ30.STSMA



AND JEST.STAT = TJ30.ESTAT



WHERE JEST.INACT is NULL -- remove this to see when a status was set inactive or to get timelines for all status

Conclusion : Using the above code we can active status' and their respective times for all operational objects that have been configured for status tracking. Similar approach can be used to get status' for CRM using table CRM_JEST and CRM_JCDS. Remove the inactive filter to get status' that are currently not active (depending on the values are mapped in data lake i.e default value of blanks as NULLs, NULL may need to be replaced with '')

Possible variations based on need:

To plot timeline of how the operational object moved between status' use JCDS

Restrict to certain Status profile(s) in table JSTO when requirement is to focus on certain types of objects or group

Restrict using change date and time if the need is to focus of recent changes within the hour or day(s)

Next blog will look at details of combining details of orders and related operational tasks

Empowering Businesses with New Insights: The Google Cloud and SAP Analytics Partnership

2023-12-28T12:35:19+01:00

A year ago, the tech giants Google Cloud and SAP embarked on a journey to revolutionize data analytics for businesses. Their goal: to bring together SAP systems and data with Google’s data cloud, offering customers better insights for decision making and innovation. The new SAP Datasphere Replication Flow connector for Google BigQuery is now available.

From the outset, customers have been excited about the potential of this partnership: integrating SAP's robust data models and real-time processes with Google BigQuery's comprehensive real-time data streams, including search engine data, weather data, marketing data, and customer event data, to inspire new and better ways to do business.

Prior to this collaboration, businesses found it challenging to merge data models from either side due to cost, effort, and time. This partnership aims to eliminate these hurdles, providing real-time data streams that adjust dynamically to changes from Google Cloud and SAP. Haridas Nair, the Head of Cross Product Management for Database and Analytics at SAP, stated, "Customers using SAP Business Technology Platform can now extend the reach of curated and modeled SAP business data for downstream consumption with SAP Datasphere Replication Flow. The integration with BigQuery now enables customers to combine SAP business data with Google BigQuery data. This enables new use cases that can unleash significant business value."

For example, while enterprises rely on SAP S/4HANA for their financial planning, reporting, and budgeting there are many that also have finance data coming in other systems. Join ventures, new acquisitions or decentralized business models are common cases where finance data will reside in non-SAP S/4HANA systems. Early adopters of such companies are leveraging the Google Cloud and the SAP Datasphere Replication Flow connector to unify accounting data insights into SAP Datasphere to get a single financial dashboard across all their financial sources and thereby enable secure, self-service access to reusable data models, and streamline financial reporting. The result is enhanced analytics that enable new market correlations to transpire, as well as reporting efficiency and reduced data management costs. As such finance and operations experts can receive new insights that improve their business planning.

The other common use case relates to Consumer Products and Retail companies. A prime example is a North American consumer products company selling through retailers as well as their online platform. As many other companies in this space, they're investing in brand loyalty and scaling their product portfolio to target different customer segments. The company strives to corelate online customer trends with their retail channel sales using demographics and other consumer data.

Their business goals involve improving channel inventory turns, trade promotion management, shelf-availability, SKU margins, and overall understanding of customer buying behavior. Simultaneously, they aim to reduce the cost, effort, and time for accessing SAP data and enable richer SAP ERP data in real-time.

To achieve these goals, they have connected Google Cloud and SAP data to gain better insight into their retail channels and to improve their demand forecasting and supply chain algorithms. The more real-time the connectivity between their SAP data and Google BigQuery data, the more confident they'll be in the predictive algorithms they adopt for their supply chain.

But this is just the beginning. The Google Cloud and SAP Analytics partnership opens the door to a wide range of strategic customer and supply chain programs that leverage data for advanced predictive demand models. Early examples of customer innovations include true customer 360 insight, improved sales performance, yield-driven pricing, new product introductions, unifying external accounting with SAP, manufacturing automation, and operationalizing sustainability.

Both of the Google Cloud and SAP development organizations are excited to see that their work is making a difference. Future releases plan to include more advanced features for managing enterprise scale federation, replication and data catalogs between their respective data platforms. Honza Fedak, Director of BigQuery Engineering at Google Cloud stated, "The combination of Google Cloud's data and AI expertise and SAP's deep understanding of business data is a powerful force that can help businesses unlock the full potential of their data."

As more enterprises utilize this partnership, we are confident that the Google Cloud and SAP analytics partnership can provide better insights, enable better decisions, and foster innovation. This partnership is a significant step towards creating more Intelligent Enterprises, and we hope your enterprise will be one of them.

SAP Community - Big Data

What is SAP Data Intelligence? - Definition and Benefits

(Image source: SAP® Data Intelligence(Published by SAP))

(Image source: SAP® Data Intelligence(Published by SAP))

(Image source: SAP® Data Intelligence(Published by SAP))

Knowledge Graph with Job Recommendation

Hands-On Tutorial: Leverage AutoML in SAP HANA Cloud with the Predictive Analysis Library

SAP Business Technology Platform as part of a platform supporting a Data Mesh

Why to read this blog

Data Mesh leveraging SAP and Non-SAP Technology

Domain-oriented decentralized data ownership and architecture

Data as a Product

Self-serve data infrastructure as a platform

Federated computational governance

Conclusion

Next steps

The 49ers Turn to Real-time Data to Deliver Superior Home Game Experiences for 70k+ Fans

An Epic View From the 50-Yard Line

San Francisco 49ers Optimize Fan Experience Using the Executive Huddle

Top 3 Reasons to Attend SAP Data Unleashed

1. Understand SAP’s vision for data and analytics

2. Discover an open and interoperable SAP

3. Be the first to hear SAP's data announcements

Register for SAP Data Unleashed

Personal perspective: Why technology leaders need a Data Modernization strategy, and how to not miss

Unified Analytics with SAP Datasphere & Databricks Lakehouse Platform- Data Federation Scenarios

Introduction

Brief Introduction to the Lakehouse Platform

Data Federation with Databricks SQL

Prerequisites

Databricks Cluster & SQL Access

Data Provisioning Agent(DP Agent) Installation Instructions

Configuration Adjustments on the Virtual Machine

Establishing Connection from SAP Datasphere

Troubleshooting Errors

Model Compression without Compromising Predictive Accuracy in SAP HANA PAL

1. Introduction

2. Algorithms

3. Terms

4. Summary

Other Useful Links:

Automatic Outlier Detection for Time Series in SAP HANA

Introduction

Test Cases

case 1: smooth data without seasonality

data

automatic outlier detection

case 2: non-smooth data without seasonality

data

automatic outlier detection

case 3: smooth data with seasonality

data

automatic outlier detection

case 4: non-smooth data with seasonality

data

automatic outlier detection

Conclusions

Other Useful Links:

Appendix

SAP HANA Connection

Functions for Table

Function of Calling the PAL Procedure of Outlier Detection for Time Series

Functions of Plotting Results

5 Steps to a Business Data Fabric

Implementation framework

1. Data Ingestion

2. Data Integration

3. Data Governance

4. Data Cataloging

5. Data Consumption

Transform your organization with SAP Datasphere

Learn more

Why Business Data is Fundamental to Artificial Intelligence

SAP and DXC team plan to deliver RISE with SAP S/4HANA Cloud, in customer data centers and co-location facilities, creating a new and powerful platform for digital transformation

How to connect SAP ECC + S/4HANA on-prem and private cloud with Confluent Cloud or Confluent Platform (1)

Architecture / Connection types

Using a REST Proxy

Direct connect/V3:

AMQP support:

System prerequisites