SAP Community - Machine Learning

ML Scenario Implementation (Logistics Regression Model) using SAP Data Intelligence

2024-03-22T02:29:19.346000+01:00

First of all overview of Logistics Regression method which we are going implement using SAP Data Intelligence. Logistic Regression is statistical method and was used in the biological sciences in early twentieth century. It was then used in many social science applications. Logistic Regression is used when the dependent variable(target) is categorical.

For example:-

To predict whether an email is spam (1) or (0)

Whether the tumor is malignant (1) or not (0)

Consider a scenario where we need to classify whether an email is spam or not. If we use linear regression for this problem, there is a need for setting up a threshold based on which classification can be done. Say if the actual class is malignant, predicted continuous value 0.4 and the threshold value is 0.5, the data point will be classified as not malignant which can lead to serious consequence in real time. The following article shows how to implement Logistics Regression statistical method on the dummy data set using SAP Data Intelligence.

UPLOAD THE DATA IN S3 BUCKET

From SAP DI Launchpad go to the connection management.

Go to the icon for creating the connection.

Select S3 as connection type
Give the details of the connection in the Data Intelligence Connection Management window.

When we click on the check status, it tells about the status of the connection.

Open metadata explorer to upload the data in AWS S3 Bucket.

In metadata explorer, go to the Catalog and then select Browse connection.

Browse Connections window is open as shown in the screenshot.
Select S3 Cube Connection to upload the data in S3 Bucket.

Then we have to open our directory.
We have to upload our DATA files by clicking on icon .

Select the file which we want to upload in S3 bucket and click on upload.

After uploading the files, it shows the Upload Complete status in green colour.

The file is successfully uploaded in S3 Bucket.

Build the Pipeline in the Modeler Tile

Go to Modeler tile and open the modeler window. There would be 5 tabs Graphs, Operators, Repository, Configuration Types, Data Types.

Firstly, we have to create a graph – Click on the ‘+’ icon on graphs tab.
Once the graph is created, search for READ FILE operator in Operators tab.

To access AWS S3 files in Pipeline we can use the Read File operator which allows access to S3 directly.
Drag and drop the READ FILE operator on the canvas.

Search Wiretap in the Operators tab.
Drag and drop the Wiretap in the graph area.

Input port (ref): message.fileReference
A File Reference pointing to the file to be read. If the reference is a directory, nothing is done and no output is produced.
Output port (file): message.file
A File whose contents may be presented as a whole or in batches, according to the operator configuration.
Output port (error): message.error
An Error Message, in case an error was raised during an operation.
Here, we connect the Read File message.file port with the Wiretap.

In the connection configuration select “Connection Management” in configuration type
For connection ID select s3Cube

We have to select the path in the Read File configuration.

We can check the configuration of Wiretap.

Now, we can save all the operation of the previous steps.

Once the graph is running, click on the running instance of the graph and click Open UI to see the output in wiretap.

Here, we can see the output of the data in the Wiretap.

Using PANDAS IN THE PYTHON OPERATOR FOR Data Wrangling

The Modeler window is open.
Go to the Operators tab and search Python3 Operator.
Drag and drop Python3 Operator in the graph area.
After dragging, right click on the Python3 Operator and select Add Port for adding the input and output ports.

Here, we have to add the input and output port for taking input and output

The following is the depiction of the pipeline and ToString convertor has been used to convert data into string

Inside the python operator the data manipulation is performing which requires PANDAS library

Output of the Pipeline execution:

UPLOAD THE SAME DATA INTO LOCAL DI DATALAKE THROUGH DATA MANAGER

We have to create the Data Collection by clicking the Create button.

In this Meta Data explorer we have to upload the data in the DI Datalake.

CREATE THE JUPYTER ENVIRONMENT

Create ML Scenario by clicking + icon

After creating the ML Scenario we have various sections like Datasets, Notebooks, Pipelines, Executions, Models, Deployments.

For creating the Jupyter Notebook, click + icon of Notebooks section

Exploratory Data Analysis using Juypter Notebook on the dataset which we have uploaded above

Open Jupyter Notebook

Go to the Data Browser for selecting our Workspace.

Open our data workspace

After open the data workspace, Copy code snippet to clipboard and paste in the Jupyter Notebook cell.

Create a new Kernel in the new environment for installing the libraries in an isolated manner.
Open the launcher and go to terminal.

Now, the kernel is successfully created.

Select new Kernel for the Jupyter Notebook.

Now, We would do Exploratory Data Analysis in Jupyter Notebook.

Check the distribution of age among passengers.

Here, we are plotting the correlation matrix using heat map to identify the correlation between features.

Now, we are training a LogisticRegression model to classify if the passenger survived or not.
Once the model is trained we check the model accuracy

BUILD THE TRAINING PIPELINE AND SAVE THE MODEL

Creating the training pipeline.

Here, we give the name and select the Python Producer template.

In this Python producer graph (Training Pipeline), first we group Python3 operator. After that we need to tag the docker image file

Script of the Python operator

Save the pipeline and run the pipeline. After running the pipeline, we have to check the status.
The artifact producer saves the model pickle file in Semantic Data Lake, for further use

CREATE THE INFERENCE PIPELINE FOR THE SAVED MODEL
TEST IT WITH THE POSTMAN APP TO SEE IF WE ARE GETTING THE OUTPUT

Reference:

https://community.sap.com/t5/technology-blogs-by-sap/di-basics-building-a-custom-dockerfile/ba-p/13518878

https://help.sap.com/docs/SAP_Best_Practices/8c92d5da091847f8bc1f1b319f3df70a/c7bf2bd9fa6b4ddaad46a45e2c4355f4.html

Deliver Real-World Results with SAP Business AI: Q4 2023 & Q1 2024 Release Highlights

2024-03-27T09:51:25.335000+01:00

Artificial intelligence (AI), and above all, generative AI, marks a paradigm shift on how computer systems learn from data, solve complex problems, and even provide creativity when it’s needed. We believe this paradigm shift will fundamentally change work, business, and society.
The impact that AI is expected to have on the global economy of the next three years is tremendous. Morgan Stanley estimate that the impact will be about $4.1 trillion dollars, about the size of the GDP of Germany.

SAP is committed to helping organizations through this transformative moment by creating business outcomes that were unimaginable before through SAP Business AI. Tailored to your unique data landscape and industry nuances, SAP Business AI enables smarter decisions and efficiencies at scale:

AI delivered in the context of your business processes
AI trained on the industry’s broadest business datasets
AI built on leading ethics and data privacy standards

SAP Business AI is relevant, reliable, and responsible. It’s deeply embedded in our business processes and analytics. Already today, more than 27,000 of our customers are seeing the benefits of AI that is built for business every day across every business function. No other technology company can combine the power of AI with the processes and data that run the business world. SAP is helping change how people interact with business software and make work more efficient and more delightful than ever before.

Let’s delve into the newest SAP Business AI capabilities made available for SAP customers helping them deliver real-world results.

Joule

Conversational Search

Introduced last year, Joule, SAP’s natural language generative AI copilot, revolutionizes the way people interact with SAP business systems, making every touchpoint count and every task simpler.

SAP customers can now get quick and seamless access to the exact SAP Help Portal content they need when asking a question in Joule, without ever having to leave the application they are working on. This generative AI-powered feature will be available in all SAP cloud solutions. We started with SAP SuccessFactors and SAP Start in 2023, continued with SAP S/4HANA Cloud Public Edition early 2024, and this will be followed by additional product integrations planned throughout this year.

Using Joule, SAP SuccessFactors and SAP Start users can save up to 10 minutes for an inquiry without leaving the UI and search in different content repositories, including search engines. This is another step towards helping our customers to become more productive and efficient. It drives product adoption, creates business value, and saves our users time and effort in accessing information quickly.

Joule Delivers Seamless Access to SAP Help Portal Content from SAP Solutions

Learn more on the product page.
Get started with the product documentation.

Cloud ERP

SAP S/4HANA Cloud, public edition

We have built Joule directly into SAP S/4HANA Cloud Public Edition, redefining user interaction, streamlining business processes, and enhancing productivity.
Joule can offer quick, contextual access to content and applications; users can simply ask Joule for guidance and get pointed in the right direction.

Looking for help to execute certain finance operations or looking for guidance how to change sales document entries? Joule provides quick and easy access to SAP Help Portal content and creates a summary leveraging generative AI. No need to browse through long search result lists anymore and reading into several topics to find the correct information.
Not sure which application could help to check the status of sales orders? Just express the need in natural language and Joule will propose the most reasonable applications for the required business operation.
Want to quickly check the status of a specific purchase order? Joule can also assist in providing insightful information about specific business objects. Once Joule shows up the required business information directly in the natural language user panel, users can directly jump into the business application from there.

Joule in SAP S/4HANA Cloud Public Edition

Learn more on the product page.
Register for the SAP Early Adopter Care program to test this feature before anyone else.

Are you tired of manually inputting incomplete sales order data? Filling in these missing details requires significant manual data management effort.
The sales order autocompletion capability in SAP S/4HANA Cloud Public Edition eliminates this hassle by providing smart recommendations for seamless data input. Missing fields are automatically populated,
simplifying incomplete sales order handling, delivering cost savings and boosting sales expert productivity.

Sales Order Autocompletion in SAP S/4HANA Cloud Public Edition

Read the blog post.
Get started with the product documentation.

SAP Enterprise Service Management

Shared service centers drive business value by enabling organizations to provide fast and reliable service cost-efficiently. However, too often organizational silos, distributed data and duplicated processes cause complexity and reduce the productivity and quality of service.

With generative AI capabilities in SAP Enterprise Service Management, shared service teams can:

summarize text in email interactions for cases with Case Summary
draft email replies or interactions for cases with Email Draft Recommender
get all the information related to an account that helps the sales representatives with Account Synopsis

Implementation results in significant reductions in overdue items ultimately enhancing customer satisfaction and streamlining processes. By implementing SAP Enterprise Service Management, businesses can save up to 70% of the cost for across various service scenarios or processing service requests 30% faster, like one of our global customers in the automotive industry has achieved for their accounts payable shared-services teams.

SAP Enterprise Service Management Homepage

Get started with the product documentation.

SAP Transportation Management

Imagine the chaotic scene at a freight clearing dock. Freight logistics providers arrive with stacks of manual papers, causing delays, inefficiencies, and the need for manual rework. Anomalies in freight orders are often detected late, leading to even longer processing times. It's a frustrating and time-consuming process for anyone involved.

Generative AI capabilities have been infused as part of SAP Transportation Management to expedite freight verification and documentation seamlessly. Businesses can automate the processing of tens of thousands of goods receipts and delivery notes in SAP Transportation Management, avoiding data entry errors and processing many types of document layouts without needing to do any training.

The benefits of this solution are immense. First and foremost, there's faster processing time with greater accuracy, minimizing errors and reducing the need for manual checks. This means less waiting time for logistics carriers and the potential for reduced yard space through optimized turnaround times. Additionally, the entire operation experiences enhanced operational efficiency and improved data quality for subsequent processes. Delivery notes are processed 50% faster, saving valuable time and resources. And the cost for truck delivery processing is reduced by an impressive 50%.

This capability is available as pilot phase for SAP customers, with a general availability planned for Q4 2024, including new features.

Expedite Freight Verification in SAP Transportation Management

Learn more about SAP Transportation Management.

SAP Digital Manufacturing

Visual inspection for SAP Digital Manufacturing is now available as a redesigned beta release for SAP Digital Manufacturing customers and will be generally available in the second quarter of 2024. This feature transforms human visual inspection experience to overcome human cognitive limits and empowers production engineers to enable AI models to visual inspections.

AI is leveraged to simplify and optimize visual inspection processes, assisting workers in finding defects earlier, faster, and more accurately. Operator productivity, product quality, and user experience are all improved at the same time. According to the first customer pilots, visual inspection can reduce the time to visually inspect parts by 60% when AI-assisted.

Visual Inspection - Detect Product Defects Faster on the Shop Floor

Learn more about SAP Digital Manufacturing.
Get started with the product documentation.

Human Capital Management

SAP SuccessFactors

Joule is integrated natively in SAP SuccessFactors, enabling employee and managers to quickly complete tasks in natural language such as requesting a wide range of record updates, providing feedback, checking and viewing your payment statement, approving time off requests, and much more.
You can access Joule via SAP Start and all pages in the SAP SuccessFactors HXM Suite.
Let's enable a future-ready workforce with our AI copilot that truly elevates every employee's experience.

Access Joule on the SAP SuccessFactors Homepage

Start the product tour to test Joule capabilities in SAP SuccessFactors yourself.
Get started with the product documentation.

In addition, SAP SuccessFactors Recruiting customers can now create compelling job descriptions and interview questions with the help of generative AI.

Attracting the right talent begins with the right job description. That’s why our generative AI-powered assistant gives hiring managers and recruiters a huge jump start to write non-biased, compelling job descriptions in seconds, while still allowing them to edit and improve it afterwards. Not only does it save a significant amount of time – from 2 hours to 5 minutes – as well as costs in creating job descriptions, but it also improves the quality of interviews and preparation. This, in turn, increases attractiveness as employer of choice, because potential applicants can sense the professionalism and attention to detail that the company brings to the hiring process.

Writing Job Descriptions in SAP SuccessFactors Recruiting

Conducting an effective interview is crucial to ensuring you’re making the best hiring decision. Interviewers can now generate interview questions based on the job description using generative AI capabilities and evaluate applicants in Microsoft Teams after they complete their interviews. AI-powered questions will make it easier for you to create an insightful discussion that dives deep into a candidate’s skills and potential. Your hiring process becomes more streamlined and ultimately leads to better hiring decisions.

Generating Interview Questions in SAP SuccessFactors Recruiting

See how these new capabilities help organizations attract, develop and retain top talent:

Put Your People and Their Experiences First with SAP Business AI

Customer Experience

SAP CX AI Toolkit

Available for existing SAP CX users with SAP Sales Cloud, SAP Service Cloud or SAP Commerce Cloud (standard edition) solution licenses, SAP CX AI Toolkit empowers your sales, service and e-commerce teams with proactive and contextual generative AI:

Apply AI models to data from sales, service, and e-commerce, along with operational data, to help your teams make more intelligent decisions.
Enable your teams to use AI to automate routine tasks from right where they work using a stand-alone app or Joule.
Understand customers and answer their questions faster than ever before with proactive insights and answers derived from enterprise data.

Capabilities available for e-commerce teams:

Product Tagging: Unlock the full potential of your products by harnessing the power of AI to extract and enrich your product tags. Take your catalog to new heights.
Personalized Product Descriptions: Let generative AI transform your customer experience with personalized product descriptions that enrich your catalog data.
Visual Search: Improve product discovery - simplifying the process and delivering quick and accurate search results based on AI image detection.
Commerce AI Tools: Commerce role specific prompts utilizing CX data to generate a blog post, social media posts and more.

Capabilities available for sales teams:

Intelligent Customer Profile: Uncover insights with a 360-degree view of every customer through Customer Data Platform integration. Quickly view segment data and profile summarization contextually on top of email and Sales Cloud.
Sales AI Tools: Create your own AI Tools for unique business needs or utilize our standard sales role specific prompts utilizing CX data to generate discovery questions, emails and more.
Intelligent Q&A: Proactively identify questions within your conversations and deliver trusted answers from company knowledge.
Smart Scheduling: Streamline and optimize your calendar management, saving you time and ensuring efficient allocation of appointments with personalized AI responses.

Capabilities available for service teams:

Intelligent Customer Profile: Uncover insights with a 360-degree view of every customer through Customer Data Platform integration. Quickly view segment data and profile summarization contextually on top of Service Cloud.
Service AI Tools: Create your own AI Tools for unique business needs or utilize our standard service role specific prompts utilizing CX data to generate responses, case overviews and more.
Intelligent Q&A: Proactively identify questions within your conversations and deliver trusted answers from company knowledge.
CX Record Summarization: Align your organization on customer issues, pathway to resolution, sentiment, and responsiveness with service and leadership specific case summaries.

Drive Profitable Growth with SAP CX AI Toolkit

Learn more on the product page.
Get started with the product documentation.
Additional AI-powered functionalities are planned to be released as part of SAP CX AI Toolkit throughout 2024.

SAP Sales & Service Cloud Version 2

Generative AI-powered capabilities have also been introduced as part of SAP Service Cloud Version 2 and SAP Sales Cloud Version 2, enhancing sales & service teams productivity:

Account Synopsis (in SAP Sales Cloud Version 2 & SAP Service Cloud Version 2) summarizes all the information related to an account that helps the sales representatives, such as the business, the culture or the competitive landscape. It ensures that your team is always equipped with the latest information, enabling them to make well-informed decisions and deliver a superior customer experience.
Case Summary (in SAP Service Cloud Version 2) automatically fetches a text-based interaction (e.g. email) and pre-processes it. Display summary to customer service rep to quickly grasp the topic and simplify the agent experience.
Email Draft Recommender (in SAP Sales Cloud Version 2 & SAP Service Cloud Version 2) generates a response email based on the previous email reply and purpose selected by the user for which an email needs to be drafted.
Lead Booster (in SAP Sales Cloud Version 2) provides tailored information that enhances the sales team’s understanding of the account’s needs and increases the likelihood of successful sales conversion.

Learn more on SAP Sales Cloud and SAP Service Cloud product pages.
Get started with SAP Sales Cloud Version 2 and SAP Service Cloud Version 2 product documentation pages.

Watch the video below to see how SAP Business AI provides the backbone for understanding and responding to customer needs - helping you get from issue to resolution faster.

Provide Customer Service That’s Personalized and Fast with SAP Business AI

SAP Emarsys Customer Engagement

Make the creation of subject lines intuitive for your email campaigns based on your natural language prompts! As part of SAP Emarsys Customer Engagement, Emarsys’ AI Subject Line Generator – currently in its pilot phase – uses AI to analyze your campaign content and generate multiple engaging and contextually relevant subject line options.

It saves time and effort by enabling marketers to create subject lines more efficiently, enhances creativity with a range of new subject-line ideas and enable data-driven decisions with the ability to A/B test AI-generated and traditional subject lines.

AI Subject Line Generator in SAP Emarsys Customer Engagement

Explore additional capabilities on Emarsys’ website that empower marketers to build, launch, and scale personalized cross-channel campaigns that drive business outcomes.

Intelligent Spend and Business Network

SAP Ariba Category Management

Gone are the days of painstakingly updating category management tools manually. Category managers can now expedite the development of their category strategy with contextual insights and recommendations using generative AI and large language model (LLM) integration in SAP Ariba Category Management tools, such as Category Segmentation, Market Dynamics, and Cost Structure.

It normally takes 12 weeks to manually create a strategy for a new purchasing category, but with SAP Ariba Category Management, research on category specific information is available with a single click. This new generative AI-powered feature accelerates strategy and planning processes across multiple categories, reduces onboarding time and reliance on more senior category managers and helps adapt to market change with insight and agility.

Powering Strategy Planning with Generative AI in SAP Ariba Category Management

Learn more on the product page.
Get started with the product documentation.

SAP Business Network

For suppliers, invoice creation can be one of your most inefficient and redundant processes. Every day, employees are manually keying data to submit invoices to customers.
The introduction of the intelligent invoice conversion feature of SAP Business Network reduces the need for this error-prone manual approach to invoice data entry.

The feature automatically converts your invoices from PDF and other scanned image files into an electronic format in SAP Business Network. It extracts your invoice data and automatically maps it to relevant fields in the electronic invoice and simplifies onboarding by providing a self-service setup process that guides supplier administrators through a streamlined workflow.

With intelligent invoice conversion, improve operational efficiency, reduce errors and rejected invoices, and receive payments faster.

Intelligent Invoice Conversion in SAP Business Network

Learn more on the product page.
Get started with the product documentation.

SAP Concur

AI is only as good as the quality and breadth of the data it draws upon, and SAP Concur is the global market leader in travel and expense solutions, with more than 92 million end users booking travel and/or processing expenses. For a decade, we have been infusing AI into SAP Concur solutions including ExpenseIt, Verify, Intelligent Audit, and Concur Invoice.

This quarter, we are excited to debut new Business AI capabilities in SAP Concur solutions:

Concur Request: Now uses generative AI to provide intelligent cost estimates for trip planning, saving employees time and effort. When the user creates the request, they enter their trip details such as the length of the trip and the destination. They will have the option to select from the following services that they want cost estimates for: flight or train; hotel; taxi. Request Assistant uses generative AI and the trip details provided by the user to provide estimated costs with explanations for the pricing.

The user can adjust, based on their needs for the trip, such as flight class, number of connections, hotel ratings, and then regenerate the estimated cost. The estimated costs are automatically added to the request as expected expenses. Subsequently, the user can choose to add additional expected expenses if needed or submit the request for approval.

Intelligent Cost Estimates for Trip Planning in Concur Request

ExpenseIt: We’ve also brought capabilities from ExpenseIt on mobile to the web interface so customers can upload receipt images to ExpenseIt web, where it taps AI to automatically create a new expense with several key fields prepopulated.

Expense itemizations for hotel folios are cumbersome for end users to manually enter, and inaccurate itemizations can cause returned expense reports and rework. In fact, the median time to complete a report with manual hotel itemizations entries is about 17 minutes longer than an expense report with no manual hotel itemizations.

Using ExpenseIt, hotel itemizations can be processed for the end user in under 30 seconds, saving employee time while increasing accuracy. The recent generative AI update to the traditional AI ExpenseIt models increases the accuracy of hotel itemizations by 30% (as measured by 100% accurate itemizations) in pre-production testing.

Hotel Itemization in Concur ExpenseIt

Learn more about travel and expense with SAP Business AI.
Get started with the Concur Expense and Concur Request product documentation pages.

SAP Business Technology Platform (SAP BTP)

Document Information Extraction, premium edition

The existing Document Information Extraction service on SAP BTP has been supercharged with generative AI capabilities, introducing Document Information Extraction, premium edition.
Businesses can now process documents intelligently across their entire business, with:

The automatic extraction of unstructured data by simply describing the required fields that you need, and watching the solution do the heavy lifting for nearly every document type.
The multilingual support for over 40 languages, catering to global businesses with diverse document types.

With neither manual annotation nor resource-intensive machine learning training required, business document processing use cases can now be onboarded in days instead of weeks or months, reducing the time-to-value drastically.

Learn more on the SAP Discovery Center.
Get started with this tutorial.

Generative AI Hub in SAP AI Core and SAP AI Launchpad

SAP developers can expedite their generative AI development of SAP BTP applications in a secure and trusted way with the new Generative AI hub. This new capability gives streamlined access to a broad range of large language models (LLMs) from different providers (such as GPT-4 by Azure OpenAI or OpenSource Falcon-40b), facilitated in a more secure, harmonized, and orchestrated manner; thereby boosting efficiency and productivity.

Developers can orchestrate multiple models, whether programmatically via SAP AI Core or via the playground within SAP AI Launchpad, and ensure high-quality output by choosing the large language model that best aligns with their unique use case. They can leverage all features while relying on a data privacy policy and robust measures, ensuring secure and trusted operations.

Each generative AI model yields unique qualities and strengths. To meet customers’ needs for their business, SAP has made a strategic move to provide enterprise-grade access to the most common models through its generative AI hub as part of SAP AI Core. This includes leading cloud vendors as well as SAP-managed third-party models deployed on SAP’s own infrastructure for highest compliance standards.
All available and upcoming generative AI-powered capabilities from SAP are running on the generative AI hub.

Generative AI Hub via SAP AI Launchpad

Learn more on the SAP Discovery Center.
Get started with this tutorial.
See why the generative AI hub plays a core role in how SAP's generative AI architecture redefines business applications.

Just ask feature in SAP Analytics Cloud

Just ask is a natural language query interface available in SAP Analytics Cloud used to retrieve data that applies generative AI to search-driven analytics. Users can quickly access trusted insights in their preferred language regardless of expertise, helping to increase data literacy and analytics adoption for more fact-based decision-making.

By providing knowledge workers with self-service access to timely analytics, companies can improve their productivity by up to 5%. Instead of spending hours searching for data, they can focus on more strategic tasks that move their business forward.

Jusk ask users can search data easily and efficiently using business terms they are familiar with, by simply asking questions using natural language. Just ask will instantly provide answers as simple charts and tables. Working with data from acquired models or SAP Datasphere, users can query just ask to get quick answers to questions and incorporate the results into a story, export to CSV or Excel files, or for further exploration using the Data Analyzer tool.

Just ask in SAP Analytics Cloud

Learn more on the product page.
Get started with the product documentation.

SAP Build Code

SAP Build Code, our generative AI-based code development solution, is now generally available.
With its integration of Joule, optimized for Java and JavaScript application development, it provides a turn-key environment for coding, testing, integrations, and application lifecycle management.

Developers can use generative AI to generate code and app logic aligned with SAP-centric programming models from natural language descriptions, create data models and sample data conforming to apps with the help of Joule, and produce rapid unit tests with AI for existing code to increase quality and precision.

The solution is also tailored for SAP development and secure collaboration across SAP Build low-code solutions and ABAP Cloud. It provides rapid extensibility of SAP S/4HANA and other SAP as well as non-SAP systems. It prebuilds integrations, APIs, and business services via the service center. Furthermore, it ensures proven security from SAP BTP for authentication, authorization, and SAP data protection.

With up to 30% faster application development and up to 30% improvement in application development cost, this is a game-changer for your enterprise. Say goodbye to the days of spending endless hours on code development and maintenance. Say hello to a more efficient and productive future with SAP Build Code.

Pro-code App Development with SAP Build Code

Take the product tour to experience the next generation of application development.
Get started with the product documentation.

SAP HANA Cloud vector engine

SAP HANA Cloud vector engine can natively store and search vector embeddings, which are numerical representations of objects, along with business data as part of its industry-leading multi-model processing capabilities to power intelligent data applications.

With these vector capabilities, SAP HANA Cloud will enable Retrieval Augmented Generation (RAG), facilitating the combination of LLMs with private business data. These applications learn and adapt to new information, enabling automated decision-making.

Key benefits of the SAP HANA Cloud vector engine include:

Multi-model: Users can unify all types of data into a single database to build innovative applications using an efficient data architecture and in-memory performance. By adding vector storage and processing to the same database already storing relational, graph, spatial, and even JSON data, application developers can create next-generation solutions that interact more naturally with the user.
Enhanced search and analysis: Businesses can now apply semantic and similarity search to business processes using documents like contracts, design specifications, and even service call notes.
Personalized recommendations: Users can benefit from an improved overall experience with more accurate and personalized suggestions.
Optimized large language models: The output of LLMs is augmented with more effective and contextual data.

Retrieval Augmented Generation (RAG) and GenAI with SAP HANA Cloud Vector Engine

Succeed Now with SAP Business AI

AI plays a pivotal role in SAP’s commitment to enable every organization to become a network of intelligent and sustainable enterprises.

With a focus on ethical and responsible AI, we are not only driving efficiency and productivity, but also ensuring that AI is used for the betterment of society. Our uniquely holistic perspective on the demands of global businesses continues to inspire us as we define the future of business AI.

Accelerate your SAP Business AI journey today:

Visit SAP.com/AI and our AI SAP Community page for all SAP Business AI updates and announcements.
Explore the SAP Roadmap Explorer for a detailed view of upcoming product innovations.

Unleashing AI and Machine Learning in Sales: Advanced Price-Volume Forecasting with SAP Analytics Cl

2024-04-02T16:11:26.655000+02:00

Introduction:

In the ever-changing world of sales, organizations face the challenge of not only accurately planning their sales volume and revenue forecasts, but also considering the impact of price changes on those forecasts. In this blog post, I will talk about how companies can optimize their sales volume price planning using SAP Analytics Cloud and the "Content for Corp FP&A" Rapid Sales Planning package included in their SAP Analytics Cloud license.

In addition to the planning options in the rapid package, we have also built in reports and analyses to better understand the price-volume mix and incorporate the findings into our planning. If you are interested, I will be happy to discuss these options in a future blog.

Today we will focus in particular on the use of predictive scenarios made possible by machine learning and the integration of prices as an important influencing factor in the forecasts.

Using this example, I will show how existing content in the SAP Analytics Cloud can be easily adapted to benefit from machine learning forecasts (predictive scenario).

Sales volume price planning with SAP Analytics Cloud:

SAP Analytics Cloud offers companies a powerful platform for analyzing and planning their sales activities. With the free "Content for Corp FP&A" Rapid Sales Planning package, companies receive ready-made dashboards and models that help them to create their sales forecasts quickly and accurately, taking price changes into account. In the standard package, the planner can plan very flexibly for volume, price, discounts and other variables. I would like to expand these options in the package with the functions of the SAP Analytics Cloud to include machine-generated forecasts. This reduces the manual effort involved in planning, as a large part of the forecasts can be generated from the automatic forecast.

Predictive scenarios and machine learning:

A crucial aspect of sales volume price planning is the use of predictive scenarios enabled by machine learning. These predictive scenarios analyze historical sales data and other relevant factors to automatically generate future sales forecasts taking price changes into account. The price changes are the future variables that the planner enters as influencing factors on the machine forecast in the forecast period.

Integration of prices as an influencing factor:

An important component of sales volume price planning is the consideration of prices as an influencing factor on sales forecasts. By integrating price data into the analysis, companies can better understand how price changes will affect their future sales figures. This information enables companies to make informed decisions about their pricing strategy and adjust their sales forecasts accordingly. Here too, manual effort can be significantly reduced as the system automatically includes prices as an influencing factor in the forecasts.

A look into the future:

To demonstrate the effect of price changes on future sales figures, in our example company we plan to set an extreme price for bicycles ("Mountain Bike") in the month of March (actual data up to Sep/2023 is available in the example). By including this extreme price in our forecasts, we can see how this change will affect sales figures and what impact it will have on the overall sales target.

Structure of the sceanrio in SAP Analytics Cloud:

We need the following versions (scenarios):

Actual (actual data until Septmber 2023)
Plan (plan scenario, in which the future prices are also planned)
Predictive_ML (scenario in which actual data and future prices are transferred).

The "Predictive Forecast Scenario" is executed on a private copy of the Predictive_ML version. After the forecast run, the forecast values are transferred to the private version and can then be copied back to the Predictive_ML version. The execution of these actions (data actions) is triggered by the embedded button in my SAP Analytics Cloud story.

Conclusion:

Sales volume price planning is a crucial aspect of many companies' sales success. By using SAP Analytics Cloud and Predictive Scenarios, companies can improve their sales forecasts and make informed decisions about their sales strategy. The integration of prices as an influencing factor enables companies to better understand the impact of price changes on their sales figures and adapt their strategy accordingly. With a clear view into the future and a reduction in manual effort, companies can achieve their sales targets more effectively and ensure long-term success in sales. SAP Analytics Cloud and content packages such as Rapid Sales Planning and Analysis help with this.

I hope you found this blog post insightful and informative. If you have any questions, feedback, or ideas to share, please don't hesitate to reach out. Your input is valuable in shaping future discussions and content. Thank you for reading, and may your sales planning endeavors with SAP Anayltics Cloud be prosperous. Live long and prosper!

Adversarial Machine Learning: is your AI-based component robust?

2024-04-02T22:29:27.157000+02:00

The epoch of AI is already in motion and it will be touching our everyday lives more and more. This very same blog may have been written by using some AI-based module. The source code underlying this blog platforms may have been implemented together with some AI companion.

All this triggers a lot of excitement, but also some concerns, for sure in people having some security background (you know, the usual mood breakers).

How do we ensure that these AI-components will not jeopardize the security and safety of the system in which they will be integrated? There are already many examples of real AI-components that have been easily fooled with simple attacks (face recognition, autonomous driving, malware classification, etc). How can we make these AI-components more secure, robust, and resilient against these attacks?

Adversarial machine learning (AML, in short) is the process of extracting information about the behavior and characteristics of an ML system and/or learning how to manipulate the inputs into an ML system in order to obtain a preferred outcome. As explained very well in the document released by NIST in Jan 2024 "Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations" [1], studying AML enables understanding how the AI system can be compromised and therefore how it can also be better protected against adversarial manipulations. AML, as discussed in [1], has a very broad scope, touching different security and privacy aspects of AI components. In particular, four main types of attacks are considered: (1) evasion, (2) data and model poisoning, (3) data and model privacy, and (4) abuse.

In collaboration with the University of Cagliari, Pluribus one, and Eurecom, we started a PhD subject in October 2022 focusing on "Security Testing for AI components", mainly targeting evasion attacks. The rest of this blog will introduce this line of study and mention what we have been done so far.

Security Testing for AI components

In our work we study evasion attacks in multiple industrial domains to understand how and at which extent these attacks can be mitigated. Concretely, we aim to contribute to the community effort toward an open-source testing platform for AI components. We are currently channeling our work into SecML-Torch, an open-source Python library designed to evaluate the robustness of AI components. The ultimate goal for us is to build a feedback loop based on adversarial retraining to increase the robustness of the AI component under-test.

In an evasion attack, the adversary’s goal is to generate adversarial examples to alter AI model behavior, i.e., fool its classification result.

Depending on the situation, the adversary may have perfect (white-box), partial (gray-box), or very limited knowledge (black-box) of the AI system under-testing. In the first case, attackers can stage powerful gradient-based attacks, and they can easily guide the optimization of adversarial examples with the full observability of the victim AI model itself. A classical example of partial knowledge is when the attacker knows the learning algorithm and the feature representation, but not the model weights and training data. In black-box scenarios, the attacker does not know the AI system under-testing and can only access it via queries.

In our study, we focused so far on gray-box and black-box scenarios. As industrial domains we have been targeting Web Application Firewall (WAF) and Anti-Phishing classifiers. Both these domains are consuming more and more AI, with their classifiers getting advantage of complex AI models trained on many historical data (e.g., CloudFlare ML WAF, Anti-phishing Vade Secure). Creating adversarial examples for these industrial domains is more challenging than doing it for simpler domains such as image recognition. Indeed the input space the attacker can play with is more elaborated than adding noise to the different pixels of an image.

Our work on adversarial machine learning for WAF is already well explained in this blog [2] written by Davide Ariu CEO at Pluribus One. For the reader interested to the full technical details, I can suggest our paper pre-print "Adversarial ModSecurity: Countering Adversarial SQL Injections with Robust Machine Learning" here [3]. In summary we showed how

full rule-based WAF (e.g., ModSecurity and its Core Rule Set) are of little help in blocking SQL injections in complex WAF scenarios
ML and adversarial retraining can improve and significantly raise the security bar in those scenario

For our study on Anti-phishing, we will write a specific blog as continuation of this one. Full details are available in our paper "Raze to the Ground: Query-Efficient Adversarial HTML Attacks on Machine-Learning Phishing Webpage Detectors" available here [4]. The manipulations created for this work will be soon available in SecML-Torch.

Let me finish by thanking the main leader of all this line of research: Biagio Montaruli, our fellow PhD candidate.

References

[1] NIST. Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations. Jan 2024.
[2] Davide Ariu. The rise and fall of ModSecurity and the OWASP Core Rule Set (thanks, respectively, to robust and adversarial machine learning). Medium. Oct 2023.
[3] Biagio Montaruli et al. Adversarial ModSecurity: Countering Adversarial SQL Injections with Robust Machine Learning. arXiv:2308.04964. Aug 2023.
[4] Biagio Montaruli et al. Raze to the Ground: Query-Efficient Adversarial HTML Attacks on Machine-Learning Phishing Webpage Detectors. Artificial Intelligence and Security. Nov 2023.

AI Foundation on SAP BTP: Q1 2024 Release Highlights

2024-04-05T09:47:35.574000+02:00

AI Foundation is SAP’s all-in-one AI toolkit, offering developers AI that’s ready-to-use, customizable, grounded in business data, and supported by leading generative AI foundation models. It is also the basis for AI capabilities that SAP embeds across its portfolio.

If you are not already familiar with the different capabilities of the AI Foundation stack represented below, read this introductory blog post.

Let's delve into the Q1 2024 release highlights of the AI Foundation, including innovations for the Document Information Extraction service, SAP Translation Hub, SAP AI Core and SAP HANA Cloud.

AI Services

Document Processing - Document Information Extraction

New: Combine Different Setup Types When Adding Data Fields to Schemas

Users can now combine header fields with different setup types in the same schema.
They can add header fields with the following setup types to a schema created for a standard document type (e.g. invoice) or custom document type:

auto (with and without a default extractor)
manual

This extends the scope of the existing standard schemas from SAP, such as for the invoice document and reduce time to value for new business fields (key-value pairs) that need to be extracted from documents.

New: Conversion of Country Specific Unit of Measure Values to ISO Format

The conversion of country specific unit of measure values into ISO format for invoice documents has been improved. For instance, the processing of certain locale-specific unit of measures (e.g. German 'Stk.' for 'Stück' / 'piece'). Users can expect improved quality of extraction and faster business process execution.

New: Purchase order number extraction from line-item level

Users can extract purchase order numbers that are available on-line item field level from invoice documents. It allows a faster execution of accounts payable processes when there are multiple PO numbers listed in the tables of supplier invoices.

New: New Invoice Supported Language – Japanese

The Document Information Extraction service supports now the Japanese language for invoice documents, improving global coverage.

New: Better Models for the Extraction of Standard Document Types

The machine learning models for the extraction of invoice, paymentAdvice, and purchaseOrder documents have been improved. Users can expect improvements in particular when extracting dates, amounts, tax ID, bank accounts.

Invoices: higher extraction accuracy can be expected for Japan, Hungary, Türkiye, and Romania.
Purchase orders: improved extraction accuracy can be expected for Spanish purchase orders.
Payment advice: more consistent column extraction and improved extraction of amounts in line-items.

Get started with Document Information Extraction.

Machine Translation - SAP Translation Hub

New: User settings for the document translation UI

Users can define the preferred UI language and theme in the application, which will be carried over across software and document translation UIs, thus improving the overall user experience. They can also define the preferred source and target languages to translate content using the application, which will apply only while using the document translation UI. It automates the translation process by not having to define the source and target language every time the application opens.

New: Updated tile in SAP BTP

As part of the efforts of migrating the application for software translation to the multi-cloud environment, the tile available on SAP Business Technology Platform has been renamed, from Document Translation to SAP Translation Hub. That’s one of the last steps towards completing the migration planned in Q2 2024.

Get started with SAP Translation Hub.

Generative AI Management & AI Workload Management

SAP AI Core

New: Availability of additional large language models in the generative AI hub

Integration of additional large language models (LLM):

Google PaLM 2 for text (text-bison), PaLM 2 for chat (chat-bison), and embeddings for text (textembedding-gecko)
Google Gemini Pro
Updates to Microsoft Azure OpenAI model versions

You can now leverage a greater selection of "best-to-fit" LLMs for your use case, ease exploration over market-leading generative AI models, without the need to go through lengthy contractual, legal discussions and finally harmonize lifecycle management across models.

Find more information here.

New: SDK for support of large language models

We introduced a series of features in SAP AI Core:

Technical libraries that simplify inference on large language models (LLMs) by automatically injecting the correct headers and paths into each request. It improves the extensibility, allowing users to add additional adaptations as needed.
Tooling for the effective integration and use of LLMs with LangChain in the context of the generative AI hub. It simplifies the developer experience with ready-to-use libraries for access to LLMs deployed using the generative AI hub.
A new library, the ai-core-llm-sdk, in addition to enhancements to the existing ai-core-sdk, to accommodate the required changes to support LLM access. It boosts efficiency when working with various LLM models by streamlining the deployment of LLM models and the querying of available models.

Get started with SAP AI Core.

Business Data & Context

SAP HANA Cloud

New: Support for storage and retrieval of vector embeddings in SAP HANA Cloud, called SAP HANA Cloud vector engine

A vector datastore manages unstructured data - such as text, images, or audio - in high-dimensional vector space as vector embeddings, to provide long-term memory and better context to AI models. This makes it easy to find and retrieve similar objects quickly, for example, by asking a question using natural language. This both simplifies interactions with large language models (LLMs) and empowers developers to securely implement generative AI in applications.

SAP HANA Cloud vector engine can now natively store and search vector embeddings, which are numerical representations of objects, along with business data as part of its industry-leading multi-model processing capabilities to power intelligent data applications.

With these vector capabilities, SAP HANA Cloud will enable Retrieval Augmented Generation (RAG), facilitating the combination of LLMs with private business data. These applications learn and adapt to new information, enabling automated decision-making.

Key benefits of the SAP HANA Cloud vector engine include:

Multi-model: Users can unify all types of data into a single database to build innovative applications using an efficient data architecture and in-memory performance. By adding vector storage and processing to the same database already storing relational, graph, spatial, and even JSON data, application developers can create next-generation solutions that interact more naturally with the user.
Enhanced search and analysis: Businesses can now apply semantic and similarity search to business processes using documents like contracts, design specifications, and even service call notes.
Personalized recommendations: Users can benefit from an improved overall experience with more accurate and personalized suggestions.
Optimized large language models: The output of LLMs is augmented with more effective and contextual data.

To deep dive into the SAP HANA Cloud Vector Engine, read these blog posts by our experts:

Build business-ready AI applications with SAP and stay updated!

Leverage the AI Foundation capabilities by visiting the SAP Discovery Center. Compare and select the service that fits most to your business needs. 
Explore the AI Foundation roadmap to discover past and upcoming innovations.
Engage with our community of SAP experts through Q&A and blog posts.

See you next quarter for exciting innovations!

Forecast Local Explanation with Automated Predictive (APL)

2024-04-05T15:42:30.582000+02:00

In HANA ML 2.20, APL introduces a new tab “Local Explanations” in the time series HTML report. This new tab includes a waterfall chart showing how each component of the time series model contributed to individual forecasts. Thanks to this visualization end users will be able to better understand how individual forecasts are generated by the predictive model. This feature requires APL 2325 or a later version.

Let’s create a Jupyter notebook to see how it works.

We will use a daily number of visits for a touristic site. This series has two candidate predictors, Weather and Temperature, that can help improve the forecast accuracy.

We first define the HANA dataframe for the input series:

from hana_ml import dataframe as hd
conn = hd.ConnectionContext(userkey='MLMDA_KEY')
series_in = conn.table('DAILY_VISITS', schema='APL_SAMPLES')
series_in.head(7).collect()

Then we fit the historical data and extrapolate 7 days ahead:

from hana_ml.algorithms.apl.time_series import AutoTimeSeries
apl_model = AutoTimeSeries(time_column_name= 'Day', target= 'Visits', 
                           horizon= 7, last_training_time_point='2023-12-17 00:00:00')
series_out = apl_model.fit_predict(data = series_in, build_report=True)
df_out = series_out.collect()

Last, we generate the HTML report:

apl_model.generate_html_report('my_html')
apl_model.generate_notebook_iframe_report()

Here are the 7 values in the horizon presented in a table:

The forecasted value for December 19th is: 377. To see how this number is decomposed we go to the Local Explanations tab:

The data used to build the waterfall chart comes from the following tabular report:

df = apl_model.get_debrief_report('TimeSeries_ForecastBreakdown').deselect('Oid').collect()
df.style.hide(axis='index')

To know more about APL

Adversarial Machine Learning: is your AI-based component robust?

2024-04-09T16:37:43.380000+02:00

All this triggers a lot of excitement, but also some concerns, for sure in people having some security background (you know, the usual mood breakers).

Adversarial machine learning (AML, in short) is the process of extracting information about the behavior and characteristics of an ML system and/or learning how to manipulate the inputs into an ML system in order to obtain a preferred outcome. As explained very well in the document released by NIST in Jan 2024 "Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations" [1], studying AML enables understanding how the AI system can be compromised and therefore how it can also be better protected against adversarial manipulations. AML, as discussed in [1], has a very broad scope, touching different security and privacy aspects of AI components. In particular, four main types of attacks are considered: (1) evasion, (2) data and model poisoning, (3) data and model privacy, and (4) abuse.

Security Testing for AI components

In an evasion attack, the adversary’s goal is to generate adversarial examples to alter AI model behavior, i.e., fool its classification result.

full rule-based WAF (e.g., ModSecurity and its Core Rule Set) are of little help in blocking SQL injections in complex WAF scenarios
ML and adversarial retraining can improve and significantly raise the security bar in those scenario

Let me finish by thanking the main leader of all this line of research: Biagio Montaruli, our fellow PhD candidate.

References

[1] NIST. Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations. Jan 2024.
[2] Davide Ariu. The rise and fall of ModSecurity and the OWASP Core Rule Set (thanks, respectively, to robust and adversarial machine learning). Medium. Oct 2023.
[3] Biagio Montaruli et al. Adversarial ModSecurity: Countering Adversarial SQL Injections with Robust Machine Learning. arXiv:2308.04964. Aug 2023.
[4] Biagio Montaruli et al. Raze to the Ground: Query-Efficient Adversarial HTML Attacks on Machine-Learning Phishing Webpage Detectors. Artificial Intelligence and Security. Nov 2023.

AI & External Workforce: A New Technology Trend that unlocks tremendous value to businesses.

2024-04-15T09:48:41.091000+02:00

As companies continue to embrace the idea of digital transformation, integrating Artificial Intelligence into external workforce management offers them numerous chances to improve productivity and streamline processes. Thanks to modern technological advancements, particularly AI, tasks can now be completed more effectively and intelligently than ever before.

Through AI, companies can boost their organizations’ productivity and advance their talent strategies as it relates to the external workforce. This is business critical, as external workers provide a breadth of skills and contributions to a company’s total workforce. Let's take a closer look at some of these significant developments that artificial intelligence is revolutionizing and how they can work together to help both businesses and external workers reach their full potential:

Boost Productivity through Process Excellence, powered by Artificial Intelligence

The days of manually writing down job descriptions and establishing requirements for job roles in a solution are over. Recruitment operations for the external workforce have become smarter because of digital transformation and artificial intelligence. This has pushed organizations to digitize and optimize HR processes for both internal and external workers.

With SAP Fieldglass' Generative AI capabilities, you can now easily generate more effective Job Descriptions, populating specific and relevant skill sets required for Job Templates your organization frequently hires for.

Fig. 1. Generative AI Enhancing Job Descriptions for Job Posting Templates

Fig. 2. Generative AI Proposing an Enhanced Job Description for Job Posting Templates

Resourcing & Workforce Optimization

As the number of external workers increases, it becomes more and more important to ensure that organizations allocate resources optimally to support critical business operations and to fill in critical talent gaps. Businesses can learn about the availability of skills and business demand by utilizing data points and algorithms in resource planning and workforce scheduling.

This is where the value of an intelligent Vendor Management System such as SAP Fieldglass comes into play. You can be sure that your company operates at peak efficiency thanks to AI-powered Candidate Screening and Analysis and data-driven insights. SAP Fieldglass supports workforce planning and resource allocation to optimize existing workers and ensure availability of in-demand and relevant talent.

Fig. 3. Smart Candidate Screening through Resume Ranking via Skill Analysis

Talent Management & Succession Planning

Having an integrated approach to your organizational talent management strategy is made easier by being aware of the strengths and opportunities for growth of your external workers. These resources have critical skill areas and often serve in-demand roles within your business. It becomes critical to comprehend which members of your External Workforce Talent Pool are most qualified to be retained and eventually converted into your full-time employees based on KPIs set to measure performance.

Through intelligent vendor management technology, SAP Fieldglass enables your company to view qualified external workers from your diverse talent pool and intelligently evaluate them to choose, hire, and retain the highest performing resources. Configurations for performance and criteria help guarantee that you choose and keep the best external workers who can help your business succeed.

Fig. 4. Available External Worker Candidates from SAP Fieldglass’ Talent Pool

Learn more about how SAP Fieldglass, empowered by Artificial Intelligence, helps your business and the External Workforce things done efficiently and intelligently.

New Machine Learning features in SAP HANA Cloud 2024 Q1

2024-04-18T08:14:50.590000+02:00

With the 2024 Q1 database release, several new features have been released the SAP HANA Cloud Predictive Analysis Library (PAL), an enhancement summary is available in the What’s new document for SAP HANA Cloud database 2024.02 (QRC 1/2024).

The feature highlights for the current release are described in more detail below

Classification and Regression enhancements

Unified Regression along with Unified Classification and Time Series now supports permutation feature importance, a new and trending method in global explain-ability to evaluate the contribution of individual features to the overall predictive power of a model. This is achieved by measuring the decrease of a model’s performance when a feature‘s values are being shuffled around. A detailed explanation and examples are also given in this blog Global Explanation Capabilities in SAP HANA Machine Learning.

Classic feature importance vs permutation feature importance reports (see blog for details)

The Hybrid Gradient Boosting Tree (HGBT) now supports F1-scores, recall and precision as cross validation metric for improved, more targeted classification models. Furthermore, weight scaling of target values in classification is now supported to address imbalanced classes or weight scale target values in relation for example to different costs associated to the different class values.
A new and trending regression model objective function “reweighted square” has been introduced, aiding to achieve more robust and regularized regression models.
For improved early stopping during model optimization, the validation metric for early stopping can now be explicitly set.

The recently introduced multi-layer perceptron MLP recommender function, now supports multiclass classification and regression recommender scenarios. This allows to reformulate the recommendation task as a classification or regression problem. The implementation employs a dual-stream framework where two sets of features representing for example user – and items features, respectively, are fed into a feature selection module. The outputs are streamed into MLP-neural networks and combined in a bilinear aggregation layer. This new and trending neural network framework can handle large-scale data volumes in recommendation scenarios very effectively.

The K-Nearest Neighbor (KNN) classification and regression functions has been enhanced with a new similarity search method, in addition to brute force and KD-tree searching a matrix enabled search-method has been introduced, allowing for much faster similarity search results especially with high-dimensional numeric feature data.

Auto-ML and ML pipeline function improvements

The Auto-ML functions for the Predictive Analysis Library (PAL) have been enhanced with

a new option to trigger deeper finetuning of the best pipeline found
the genetic algorithm-based Auto-ML optimization has been enriched with a RANDOM SEARCH-based optimization, suited especially for smaller configurations (e.g. simple time series) and yielding with faster results
new method to clear and initialize the Auto-ML log
Auto-ML and pipeline model explain-ability enhancement with a SHAP Global surrogate light-weight model for faster global explanation model calculation and faster local prediction interpretability results

Text Processing

The Text Mining related document and term analysis function do now support massive parallel invocation, allowing for multiple input text to be analyzed in parallel.

Multiple documents (here IDs 0 and 5) are searched in parallel for related documents

New financial data analysis functions

The newly implemented single-factor Hull-White procedure , can be used to model the time evolution of interest rates, which are required for price estimation of financial instruments based on interest rate derivatives.

To apply the Hull-White model it first needs to be adopted to match existing market conditions (interest rates). This is achieved by providing the values of the drift term of the Hull-White model as a time series as input table. The simulation will then provide the mean value for a given number of simulation paths (also specified as an input parameter), their variance, as well as the upper and lower bounds.

The chart above depicts the initial dataset used to calibrate the mode, mean and confidence interval of the Hull-White simulation.

New Benford’s Law function in PAL, a trending algorithm used to detect anomalies in numerical datasets like e.g. financial transactions.

One of the (not so) well-known statistical observations is the fact that in many datasets the leading significant digits are not equally distributed. If all digits were represented equally, then they would appear 11.1 percent (1/9TH) of the time. However, when analyzing real-world datasets, e.g. the population totals of the US census data, it is revealed that the distribution of the leading digits follows the Bedford’s law, also known as the first-digit law.

P(d) = log10 (1+ 1/d), where P(d) is the probability of the leading digit {1,2,....9} to occur.

With the help of PAL’s new BENFORD analysis function it is now very easy to validate if a dataset obeys Bedford’s law or not. A first step means very commonly used in financial applications to detect unexpected value distribution and e.g. potential fraudulent transaction data.

Python ML client (Hana-ML) enhancements

The full list of new methods and enhancements with Hana-ML 2.20 is summarized in the changelog for Hana-ml 2.20.240319 as part of the documentation. The key enhancements in this release include

Time series analysis and forecasting methods

Time series permutation feature importance analysis
Time series outlier detection with voting
Segmented (massive) online Bayesian Change Point Detection

Auto-ML configuration and methods enhancements

Updated Auto-ML configuration dictionary-templates with new operators and random search optimization support for e.g. small time series configurations
Enhanced Auto-ML configuration option for setting connection constraints during optimization of multi-operator pipelines and visualization of pipeline connection scores between operators
Support algorithm-specific parameters with Auto-ML predict-calls, relevant for both pipeline predict and Auto-ML methods.
Enhanced progress monitor for Auto-ML to display at anytime and log management methods, allowing to set log levels, persist progress logs clean up logs and more.

Exploratory data analysis and visualization enhancements

New Bubble Plot and Parallel Co-ordinate Plot

You can find an examples notebook illustrating the highlighted feature enhancements here 24QRC01_2.20.ipynb.

SAP HANA Cloud, SAP HANA database Python Machine Learning

Unveiling SAP Business AI's Potential at the Upcoming Solution Experience Live Session

2024-04-18T08:15:36.770000+02:00

Unveiling SAP Business AI's Potential at the Upcoming Solution Experience Live Session
In an era of continuous innovation and heightened competitiveness, AI's ability to enable smarter decisions and efficiencies on monumental scales has become essential. One such solution that helps businesses to harness the power of AI is SAP Business AI.

The upcoming Solution Experience Live Session: Deliver Real-World Results with SAP Business AI presents a one-of-a-kind opportunity for everyone to grasp the countless benefits and applications of SAP Business AI. Led by Dr. Philipp Herzig, Chief AI Officer at SAP, this one-hour virtual session on Tuesday, April 30, 2024, from 1:30 p.m. to 2:30 p.m. CEST, will expand upon the innovative capabilities of SAP Business AI.

The impact of AI on businesses today cannot be overstated. It is integral – not just in streamlining operational processes, but in offering insights and opportunities that were once unthinkable. Increasingly, businesses are making deeper strides into leveraging AI for corporate advantage – with SAP Business AI at the forefront of this change. With AI-skilled experts being the most sought-after professionals in the workforce, the challenge is to close the AI skills gap. SAP has made it it’s mission to make a variety of learning resources on the SAP Learning site available to equip anyone with the right tools to navigate today's AI-driven business environment.

Join us for the virtual session, collectively debunk the mysteries of AI, and tap into the potentials of SAP Business AI. The free-of-charge solution experience live session is open to everyone — mark your calendars and prepare to journey into a AI-empowered future.

Make sure to sign up today.

Want to discover live sessions beyond AI?
With a subscription to SAP Learning Hub, you can unlock further live sessions in other in-demand topics, such as SAP BTP, ABAP Cloud development and how to build apps with SAP Build. You can explore the full live session schedule and join the sessions of your liking with an SAP Learning Hub subscription.

New Machine Learning features in SAP HANA Cloud

2024-04-19T14:39:25.280000+02:00

SAP HANA Cloud provides out of the box a rich set of embedded AI functions enabling SAP customers and partners to build Intelligent Data Applications, applications embedding AI scenarios like time series forecasting, classification, or regression and more, with applying the AI functions in the database directly on the application data.

Each new release of SAP HANA Cloud includes a lot of exciting new embedded AI features, hence see the list of links below to get an overview about the continuous improvements in the Predictive Analysis Library (PAL), Automated Predictive Library (APL) and the Python Machine Learning client (hana-ml) across the past release versions:

Review all 81 newly released Machine Learning enhancements since the 2022 Q1-release on the What’s New feature page for SAP HANA Cloud
Review planned feature on the SAP HANA Cloud roadmap page
Review the Machine Learning enhancement details in SAP HANA Cloud releases
- 2024_QRC1 (PAL, APL) 2.20 (hana-ml) blog
- 2023_QRC4 (PAL, APL) 2.19 (hana-ml)
- 2023_QRC3 (PAL, APL) 2.18 (hana-ml)
- 2023_QRC2 (PAL, APL) 2.17 (hana-ml)
- 2023_QRC1 (PAL, APL) 2.16 (hana-ml)
- 2022_QRC4 (PAL, APL) 2.15 (hana-ml)
- 2022_QRC3 (PAL, APL) 2.14 (hana-ml)
- 2022_QRC2 (PAL, APL) 2.13 (hana-ml)
- 2022_QRC1 (PAL, APL) 2.12 (hana-ml)

Here are some additional links to reference information introducing to and tutoring about how to get started using SAP HANA Cloud’s Machine Learning capabilities

Featured SAP HANA AI/ML capabilities are highlighted in the following resources:

Leverage data-parallel ML/AI, segmented forecasting from within SAP HANA Cloud utilizing
- APL Time Series Forecast using a Segmented Measure
- Segmented (massive) Forecasting applying PAL Additive Model Analysis (aka prophet) hands-on example here SAP-samples/teched2022/Getting Started with Multi-Model Capabilities in SAP HANA Cloud/Forecasting
Leverage AutoML from the Predictive Analysis Library (PAL)
- Hands-On Tutorial: Leverage AutoML in SAP HANA Cloud with the Predictive Analysis Library
- SAP Community call | Accelerate your Machine Learning efforts - benefit from SAP HANA Cloud AutoML
Integrating SAP HANA ML into your Application Code
- re>≡CAP 2023 conference presentation: "Data Science handshake with CAP developer building an intelligent data app“, Recordings of re>≡CAP 2023 session
- Auto-generating HANA ML CAP Artifacts from Python

Blog collections and sample repository
- Rich collection of references in this blogs post SAP HANA Machine Learning Resources
- SAP HANA ML sample repository https://github.com/SAP-samples/hana-ml-samples see Python samples.

Setting up your Python environment for SAP HANA ML

There are certainly numerous flavors of how to setup a local Python environment for Data Scientists to work with respective Python packages, in this case leveraging hana-ml and hdbcli to work with SAP HANA. The use of Miniconda or venv with Jupyter Notebooks is a common practice. Moreover the following instructions give guidance on how to enrich your SAP Development tools with a Python environment

Moreover, the following blog posts give indications how to leverage SAP HANA Machine Learning and hana-ml from within 3rd party Data Science, Python expert environments

Finally, reference information about the Python native SAP HANA database client can be found in SAP Note 2939501 and the SAP HANA Client documentation. The database client software is available from tools.hana.ondemand.com/#hanatools, while the Python client is also available via the pypi repository at https://pypi.org/project/hdbcli/. Further installation instructions are documented in the section Machine Learning APIs in SAP HANA Client Interface Programming Reference.

The following SAP Note 3220622 indicates which SAP HANA Cloud release is available with SAP Datasphere.

It's never been easier to invoke Machine Learning and Generative AI from an ABAP application

2024-05-03T08:34:43.224000+02:00

We have exciting news for SAP developers! You may remember that last year, IBM and SAP jointly announced a milestone collaboration around embedding AI into SAP solutions. One visible outcome was the adoption of Watson technology in the SAP Start and SAP Joule digital assistants.

And at the same time, IBM's been working on creating a watsonx Software Development toolKit (SDK) for SAP ABAP, complementing the existing Watson SDK that was refreshed in 2023.

I am pleased to confirm that the watsonx ABAP SDK version 1.0.0 is now live. It was released late March 2024 by the IBM SAP Software Engineering Team as expected. It is a separate release from its relative, the Watson SDK, who released a version 2 in Q4 2023. Both SDKs are open source and downloadable from GitHub.

If you develop SAP apps or have clients doing so, this is big news. It is truly a revolution in easily integrating Machine Learning and Generative AI, in just a few clicks, into an SAP ABAP application. Literally in 10 minutes!

The watsonx SDK comes in two flavors, allowing SAP developers to easily access watsonx from an ABAP application in a traditional SAP NetWeaver environment, and also in the latest SAP Business Technology Platform (BTP):

The SDK natively supports both the watsonx.ai (V1) and the Watson Machine Learning (V4). To get developers up to speed quickly, comprehensive examples are included in both of the APIs' documentation.

The sample code for the watsonx.ai sends a AI prompt to a generative AI model and collects the generated response. It uses the granite-13b-chat-v2 in the example but there are many other superb models available through watsonx.
The sample code for the Watson Machine Learning API lets you unleash the full capability of Watson Machine Learning (ML):
- Without going too far in the detail, to use watsonx ML you have to create a watsonx deployment space. This is where you will deploy your ML assets, which can be models, functions, or scripts.
- Python is a popular language for functions, in fact most frameworks in Watson ML rely on Python (or R). With a Python function, you can perform any ML task like creating, training, and using models.
  - For instance you can recognize hand-written digits, or to predict business area for a car rental company, etc.
  - You're welcome to write your own Python functions from scratch but in most cases you'd want to leverage existing example notebooks to save time
- Going back to the watsonx ABAP SDK: in the supplied sample code, the ABAP application deploys arbitrary Python code as a Python function on a watsonx deployment space and calls the deployed function. It also supplies a CURL command that can be used to easily call the deployed function from the command line
- So the SDK connects lets you use the full capabilities of watsonx ML with very little development effort. Then no further SAP programming knowledge is necessary, you can refer to the regular watsonx ML documentation. And of course, reuse the many examples of functions, models, and scripts provided by IBM and third parties.

See also the post by @christian_bartels on the same topic.

Tracking HANA Machine Learning experiments with MLflow: A technical Deep Dive

2024-05-08T17:00:00.007000+02:00

Introduction

This blog post is part of a series describing the usage of MLflow with HANA Machine Learning co-authored by @stojanm and @martinboeckling. In this blog post we provide a more technical deep dive on the setup of a MLFlow instance and provide a general introduction how Machine Learning models trained with HANA ML can be logged with MLflow. The first blog post of the blog series is called Tracking HANA Machine Learning experiments with MLflow: A Conceptual Guide for MLOps and gives an introduction to the topic of MLOps with MLflow.

Starting with the python HANA ML package version 2.13, HANA Machine Learning added support for tracking of experiments with the MLflow package, which makes the incorporation of models developed using HANA Machine Learning into a comprehensive MLOps pipeline easy to achieve.

In this blog post we will provide an overview how MLflow can be used together with HANA ML. MLflow, which manages the experiment tracking and artefact management can run as a managed service at a hyperscaler platform, deployed locally or on a remote infrastructure. In the following we describe how to deploy MLflow on SAP Business Technology Platform and how to track your HANA machine learning experiments with MLFlow. In addition, we present which methods and algorithms in the hana ml package currently support the experiment tracking feature. Finally, we touch on the possibility to use logged models in MLflow for prediction.

Prerequisites

In this blog post we solely focus on the technical integration of HANA ML and MLflow as a logging platform. Generally, we assume that Python is already installed together with an already established development environment. Furthermore, we will not completely explain all details of docker and Cloud Foundry, but simply focus on the essential parts for HANA ML and MLFlow within this blog post.

Set up MLFlow on BTP

MLFlow is leveraged and integrated in different solutions. For example Databricks as well Machine Learning in Microsoft Fabric integrate or provide a managed MLflow instance. In case MLFlow is not yet provided, we outline in this section a possibility to deploy MLflow in SAP BTP. We focus for simplicity reasons on the SQLite based deployment of MLflow. However, for productive environments it is recommended to separate the storage from the runtime of the MLFlow instance. A detailed explanation for setting up your own MLFlow server with alternatives to SQLite can be found under the following link: https://mlflow.org/docs/latest/tracking/server.html. In the following paragraphs, we focus on a step by step overview to set up your own MLFlow instance using SQLite in BTP.

As a first step we want to create a local docker file which can be used to upload it to our BTP environment. In the following code snippet, we provide the coding used to construct your own docker container locally. For that, paste the following code into a file called Dockerfile within your desired local folder.

# Retrieve a python version as a base runtime for our docker container
FROM python:3.10-slim
# Run the pip install command for the package of mlflow
RUN pip install mlflow
# Create a temporary folder within our docker container to store our artifact
RUN mkdir -p /mlflow
# Expose the port 7000 to make our application which runs within docker accessable over the defined port
EXPOSE 7000
# Define environment variables BACKEND_URI and ARTIFACT_ROOT to define the backend uri as also the 
# artifact root
ENV BACKEND_URI sqlite:///mlflow//mlflow.db
ENV ARTIFACT_ROOT /mlflow/artifacts
# run the shell comand to setup our mlflow server within our docker container
CMD mlflow server --backend-store-uri ${BACKEND_URI} --host 0.0.0.0 --port 7000

After creating successfully the Docker object, you can run docker build -t {tagname}. to construct your docker container. Afterwards, the local docker image is locally built. To expose the docker image, we push the image to a docker registry. In our example, we assume that you already have a docker registry set up where you can push your image to. For that step, you can run the following commands: docker tag {tagname} {dockerhub repository tag}, docker push {dockerhub repository tag }. After the successful run of the command, you see within your private docker hub the newly published docker container, which contains MLflow and all its dependencies inside of it.

After the successful publishing of your docker image to your registry, we can run the following command to create a BTP app based on the published docker image:

cf push APP-NAME --docker-image REPO/IMAGE:TAG

After successfully publishing the docker image on our BTP Cloud Foundry environment, we can find our published app within our BTP account and are able to access it under the published URL.

Set up tracking for MLflow

With MLflow users have the possibility to track their trained HANA ML models. In the following paragraph, we introduce the aspects that are needed to be able to log HANA ML models into MLflow itself.

To be able to use MLFlow together with HANA ML, we need to first install besides the HANA ML package also the MLFlow package. Therefore, you need to run the following command in your virtual environment, to be able to run the following scripts.

pip install mlflow hana-ml

As a general setup, we first need to run the following command to set up our tracking with MLFlow to our available MLFlow instance. Therefore, place into the following two lines first your personal MLFlow tracking URI and your own custom experiment. In case you do not want to create a separate experiment, the different runs together with the MLFlow model are stored under the default experiment.

The method that allows us to track HANA ML models is implemented in the HANA ML package and is called enable_mlflow_autologging(schema=None, meta=None, is_exported=False, registered_model_name=None). This method can be used for initialised HANA ML models that are under the following methods:

Within the method enable_mlflow_autologging the user has different keywords that can be filled that allows us to influence the behaviour of our MLFlow autologging in HANA ML.

schema: Defines the HANA database schema for MLFlow autologging where the MLflow logging table is stored
meta: Defines the name of the model storage table in HANA database
is_exported: Determines if the hana model binaries should be exported to MLflow
registered_model_name: Name of the model stored in MLflow

In the following section we provide an overview for the Unified Interface how the logging of MLflow can be used.

Run HANA ML Algorithms with MLflow

As we have explained and outlined in the sections above, we have created a MLFlow instance and have introduced the syntax that is needed for the logging of HANA ML models in MLflow. In the following sections we will provide based on an example how the logging of HANA ML models on MLflow is done.

Model training of HANA ML with MLFlow

For the training of HANA ML in combination with MLflow, we focus in this blog post on the Unified Method. We apply for the respective elements a Classification on the sample bank dataset which can be found under the HANA ML sample dataset folder on GitHub.

The dataset can either be uploaded directly to the SAP HANA database or you could also use SAP Datasphere as your starting point. Generally, to use HANA ML directly you would need to store the dataset in a HANA database. However, HANA ML also provides methods to integrate third party files/ data structures. This involves Pandas, Spark as also shapefiles. In addition also HANA Data Lake file tables can be integrated with HANA ML functionalities. An overview of the different methods can be found under the following page. In the following paragraphs, we will go through the sample code that we have created to combine HANA ML and MLFlow.

Connect to HANA database (Deployed under SAP Datasphere)

To be able to connect to the HANA database instance, we first need to build up a connection to the HANA database. In our example, we load the data from the data samples provided by HANA ML. During the time of this blog post, the OpenSQL schema of Datasphere only supports Basic Authentication. Therefore, in this blog post we only elaborate how the connection is done over basic authentication. SAP HANA standalone supports however non-basic authentication, which are also supported in the HANA ML package to connect certificate based to the SAP HANA instance.

To establish the connection to the HANA database, we make use of the implemented HANA ML dataframe class and call the method ConnectionContext. We store the instance of the connection in the variable conn. To now be able to establish the connection to the HANA database view or table, we will need to specify over the method table the connection. The beautiful aspect is that overall, the dataset is not going to be loaded to the Python runtime, but will only be represented with a proxy to the actual table in the HANA database. All transformations, if done over the methods of HANA ML, are then pushed down to the database itself and executed there if the training is executed. In our case, we load the sample dataset into our database by making use of the provided methods of the HANA ML package.

"""
This script provides a short example how HANA ML and MLFlow can be integrated together.
The credentials to the database are following the currently supported authentification (Basic) of the 
SAP Datasphere OpenSQL schema. Overall, HANA Cloud standalone is also able to support multiple other authentification
methods. We have used an abstraction python file (constants) where we retrieve the securely stored 
authentification properties. To get more details about the exact method structure needed, please
have a look at the documentation:
https://help.sap.com/doc/1d0ebfe5e8dd44d09606814d83308d4b/2.0.07/en-US/hana_ml.dataframe.html#hana_ml.dataframe.ConnectionContext
"""
from hana_ml import dataframe
from hana_ml.algorithms.pal.unified_classification import UnifiedClassification
from hana_ml.algorithms.pal.auto_ml import AutomaticClassification
import mlflow
from hana_ml.algorithms.pal.auto_ml import Preprocessing
from hana_ml.algorithms.pal.partition import train_test_val_split
from constants import db_url, db_user, db_password
# dataset retrieval
conn = dataframe.ConnectionContext(address=db_url, port=443, user=db_user, password=db_password)
dataset_data, training_data, _, test_data = DataSets.load_bank_data(connection=conn, schema=schema_name, train_percentage=0.7, valid_percentage=0, test_percentage=0.3, seed=43)

After connecting to the database, the user is able to use the preprocessing methods implemented in HANA ML. Generally, the different changes are pushed down to the HANA database and are not executed within the Python runtime. In our use case, we do not need to use the data preprocessing as we directly retrieve a sample dataset which we can directly use for our ML training.

After finishing the potentially needed transformations, we are now able to implement the tracking of our HANA ML runs with the possibility of MLFlow. Similar to the normal usage of MLFlow, we set up first our tracking uri under which we want to store our HANA ML runs and models. In your case you would need to change the keyword mlflow_tracking_uri with your respective MLflow tracking URL. Furthermore, we then are able to specify the experiment name under which the runs are tracked. If we do not specify a specific experiment, the runs are tracked under the Default experiment.

# set up MLFlow
mlflow.set_tracking_uri(mlflow_tracking_uri)
mlflow.set_experiment("HANA ML Experiment")

In the following chapters, we will provide an outline how the exact training is performed and what components are logged to MLflow.

Unified Method

For the example we use the implemented Hybrid Gradient Boosting Tree as a classification algorithm for our Classification. In order to perform the classification, we use the Unified Classification in order to be able to run our algorithm. On the defined variable, we then use the implemented enable_mlflow_autologging method. This allows us to directly log the model using implemented auto logging behaviour.

uc = UnifiedClassification(func="HybridGradientBoostingTree")
uc.enable_mlflow_autologging() uc.fit(training_data, key="ID", label="responded")

We call the fit method once we have initiated the HANA ML model variable and the associated autologging for MLflow. For the fit method, we have in total two different options. Firstly, the non-partitioned training dataset where we only use the training dataset. If we decide to partition our training dataset, we allow to create a validation dataset for which we can log metrics automatically during training.

If we do not define for our fit function the partitioning, we will not log metrics within MLFlow. In the following image, you can see how a potential HANA ML tracked run looks like in MLFlow together with the stored HANA ML model in MLflow.

If we decide to partition our dataset, here for instance to partition the dataset along the defined primary key, we are able to directly log evaluation metrics relevant for the Classification we have used. This includes the following metrics: AUC, Recall, Precision, F1 Score, Accuracy, Kappa coefficient and the Mathews Correlation Coefficient (MCC). This would directly allow us to compare multiple runs within our MLFlow project to one another and measure the different performances.

In addition to the general run, HANA ML also logs the model to MLflow. What is logged to MLflow depends on the parameters set for the method enable_mlflow_autologging. If for instance everything is set to the default settings, we will see the following yaml file to be logged to MLflow.

If within the method enable_mlflow_autologging the parameter is set to is_exported, the model binaries stored in the model storage on HANA are exported to MLflow. This setting would allow us to retrieve the trained model from MLflow and use it in a different HANA database for prediction purposes. In addition to the yaml file containing the metadata we now can see a created subfolder called models which contains the necessary model artefacts normally stored in the HANA database now in MLflow.

After the training is finished, we have besides the auto logging capabilities of HANA ML for MLflow the possibility to track further artefacts in MLflow. In the following section we will outline a few possibilities that exist with the additional tracking.

Additional logging possibilities

Besides the outlined auto logging capabilities, we can track with MLFlow additional artefacts to the respective run. In the following chapters, we outline selected possibilities to further enrich the auto logging for HANA ML runs tracked in MLFlow.

Adding run and experiment description

The description in the experiment section can be handy once the number of your experiments grows in the repository. In addition, mlflow allows to also add individual description to each run of an experiment. Using the following methods you can set up both:

from mlflow.tracking import MlflowClient
current_experiment=dict(mlflow.get_experiment_by_name("HANA ML Experiment"))
experiment_id=current_experiment['experiment_id']
    
run = mlflow.active_run()
MlflowClient().set_experiment_tag(experiment_id,"mlflow.note.content",
"This experiment shows the automated methods of HANA machine learning and how to track them with MLFLOW")
MlflowClient().set_tag(run.info.run_id, "mlflow.note.content", "This is a run tracked with Unified Classification from HANA Machine Learning")

Logging input datasets

Sometimes it is important to keep the input dataset also as part of the tracking with MLflow. Since HANA machine learning datasets are located in HANA, they need to be converted to pandas DataFrames to be tracked as shown in the following code:

# Store training dataset in MLFlow itself
pandas_training_dataset = training_data.collect()
mlflow_dataset = mlflow.data.from_pandas(pandas_training_dataset, name="Customer data", targets="LABEL")
mlflow.log_input(mlflow_dataset, context='training')

This results in the change, that the respective state of the training data is logged to the current run. The logged dataset can be found in the associated MLflow run, where the schema of the dataset is provided together with some metadata information about the number of rows and number of elements. In addition, also the provided context is marked in the UI of MLflow.

Logging a model report

In addition to the logging of the dataset, it might also be important to add a model report to MLFlow. HANA ML generally provides different interactive visualisations for the trained model artefact, which can be stored as an HTML file. After the storing of the model report to your local repository, we can log the input of the model report to our current run. This allows us to interactively explore the model report automatically generated by HANA ML and make it accessible in MLFlow. To log the HANA ML model report, you can use the following code snippet.

# create additional model report in MLFlow
UnifiedReport(uc).display(save_html="UnifiedReport")
mlflow.log_artifact("UnifiedReport_unified_classification_model_report.html")

After the Model report is stored successfully under the current run, we can see in the artefact tab in MLFlow the interactive model report:

The complete script used for this section can be found in the following code snippet:

from hana_ml import dataframe
from hana_ml.algorithms.pal.unified_classification import UnifiedClassification
from hana_ml.visualizers.unified_report import UnifiedReport
import mlflow
from hana_ml.algorithms.pal.utility import DataSets
from constants import db_url, db_user, db_password

# dataset retrieval
conn = dataframe.ConnectionContext(address=db_url,
                                   port=443,
                                   user=db_user,
                                   password=db_password)

dataset_data, training_data, _, test_data = DataSets.load_bank_data(connection=conn, schema=schema_name, train_percentage=0.7, valid_percentage=0, test_percentage=0.3, seed=43)

# set up MLflow
mlflow.set_tracking_uri(tracking_uri)
mlflow.set_experiment("HANA ML Experiment")
# set up classification
uc = UnifiedClassification(func="HybridGradientBoostingTree")
uc.enable_mlflow_autologging(is_exported=True)
# train model
uc.fit(training_data, key="ID", label="LABEL", partition_method="stratified", stratified_column="ID", partition_random_state=43, build_report=True)
# create additional model report in MLFlow
UnifiedReport(uc).display(save_html="UnifiedReport")
mlflow.log_artifact("UnifiedReport_unified_classification_model_report.html")
# Store training dataset in MLFlow itself
pandas_training_dataset = training_data.collect()
mlflow_dataset = mlflow.data.from_pandas(pandas_training_dataset, name="Customer data", targets="LABEL")
mlflow.log_input(mlflow_dataset, context='training')

Apply of trained model

After we have finished our training, we are able with HANA ML to retrieve the model from MLFLow and use it for our prediction purposes. For this purpose, we will create a separate Python script where we will provide an overview to retrieve the trained MLflow model.

Similar to our training script, we first set up our connection to the HANA database and establish the connection to our table. In our case, we simply use the sample dataset provided by HANA ML.

from hana_ml import dataframe
from hana_ml.algorithms.pal.unified_classification import UnifiedClassification
from hana_ml.visualizers.unified_report import UnifiedReport
import mlflow
from hana_ml.algorithms.pal.utility import DataSets
from constants import db_url, db_user, db_password

# dataset retrieval
conn = dataframe.ConnectionContext(address=db_url,
                                   port=443,
                                   user=db_user,
                                   password=db_password)

dataset_data, training_data, _, test_data = DataSets.load_bank_data(connection=conn, schema=schema_name, train_percentage=0.7, valid_percentage=0, test_percentage=0.3, seed=43)

Similar to our training script, we need to set the tracking url for MLflow and need to initiate the model storage of HANA. If we have decided to not export the HANA ML model to MLflow, we need to specify the same schema for the model storage where our HANA ML model is stored after the successful run. In case we have exported our model, we are able to specify a different schema. In the following, you can see the necessary script in order to retrieve the logged HANA ML model from MLflow.

# set up MLFlow and model storage
mlflow.set_tracking_uri(tracking_url)
model_storage = ModelStorage(connection_context=conn, schema=schema_name)

After the model storage has been initiated, we are able to retrieve the stored HANA ML model from MLflow. In order to select the correct model, you need to extract the correct run id associated to the model you would like to apply for your prediction dataset. In our case, this is the test dataset we have received from the sample dataset method. The model_uri needed for the model retrieval is consisting of the following pattern 'runs:/{run id}/model', in which you would need to exchange the run id with your respective run. For the actual retrieval of the model, we use the initiated model storage, in our case called model_storage and call the method load_mlflow_model to load the MLflow model to our HANA database and assign the respective proxy to our variable mymodel. The variable mymodel is then used to call the predict method in order to apply our model to our dataset. In the end we transform our prediction dataset into a Pandas DataFrame to look at the content of the created DataFrame. Normally, we could directly persist the created temporary table with the save method and therefore make the dataset available for further processing.

# load logged run from MLflow to HANA ML
logged_model = 'runs:/d8a763b7b81940598633605e447cd880/model'
mymodel = model_storage.load_mlflow_model(connection_context=conn, model_uri=logged_model)
dataset_data_predict = mymodel.predict(data=test_data, key="ID")
# collect the predicted dataset to see content in dataframe
print(dataset_data_predict.collect())

After running the script, you should be able to see the following terminal output, for which we can see the download of the artefact stored in MLflow and the created prediction dataset, which consists in our case of 4 columns: ID (primary key), SCORE (predicted label), CONFIDENCE (prediction confidence for applied row) and REASON_CODE (influence of individual variables to prediction output).

In case we have exported our model, the output of our terminal look slightly different indicating that we also download the respective model artefacts stored additionally to the yaml file. In the following you see the complete script used for applying the model to a new dataset.

from hana_ml import dataframe
from hana_ml.model_storage import ModelStorage
from hana_ml.algorithms.pal.utility import DataSets
import mlflow
from constants import db_url, db_user, db_password

conn = dataframe.ConnectionContext(address=db_url,
                                   port=443,
                                   user=db_user,
                                   password=db_password)

# full_set, diabetes_train, diabetes_test, _ = DataSets.load_diabetes_data(conn)
dataset_data, training_data, _, test_data = DataSets.load_bank_data(connection=conn, schema=schema_name, train_percentage=0.7, valid_percentage=0, test_percentage=0.3, seed=43)

# set up MLFlow and model storage
mlflow.set_tracking_uri(tracking_uri)
model_storage = ModelStorage(connection_context=conn, schema=schema_name)
# load logged run from MLflow to HANA ML
logged_model = 'runs:/ed7b8d4734cb42ca90c417f932957b40/model'
mymodel = model_storage.load_mlflow_model(connection_context=conn, model_uri=logged_model)
dataset_data_predict = mymodel.predict(data=test_data, key="ID")
# collect the predicted dataset to see content in dataframe
print(dataset_data_predict.collect())

Key take aways

In this blog post we have showcased an end to end example how MLflow can be integrated in your HANA ML workload by providing the possibility to share and compare multiple tracked runs in MLflow. If the data is already stored in HANA, this allows you to directly interact with MLflow while being able to run your Machine Learning algorithms on data stored in the HANA database without the need to transfer your data between multiple systems. This blog covered an essential part of the automated logging capabilities of HANA ML models into MLflow.

We highly appreciate your thoughts, comments and questions under this blog post. In case you want to reach out for general questions around HANA, or specifically HANA ML, don't hesitate to use the Q&A tool with the respective tags that describe your question.

Tracking HANA Machine Learning experiments with MLflow: A conceptual guide for MLOps

2024-05-08T19:17:01.849000+02:00

Introduction

MLflow is an open-source platform, which is the de facto standard, when it comes to managing and streamlining machine learning lifecycles, including experimentation, reproducibility, and deployment. It offers a centralized repository to track experiments, share projects, and collaborate effectively, making it a common choice among data scientists and can be used with most open-source machine learning frameworks (e.g. scikit-learn, Tensorflow, etc.).

Some providers offer MLflow as a managed service (e.g. Databricks) or integrate it (e.g. Azure ML). In addition, it is possible for the user to deploy the service manually on their platform of choice (e.g. on SAP Business Technology Platform).

Starting with version 2.13 HANA Machine Learning added support for tracking of experiments with the mlflow package. This makes models, which were developed using hanaml, easily incorporated into an extensive MLOps pipeline.

This blog post is part of a series describing the usage of MLflow with HANA Machine Learning co-authored by @martinboeckling and @stojanm. In the first part we present an conceptual guide on how to use MLflow with SAP Datasphere and HANA Machine Learning (through the hanaml package). The objective is to provide to the reader a high level template for machine learning operations (MLOps) for HANA ML specifically with MLflow. In the second part of the series, called Tracking HANA Machine Learning experiments with MLflow: A technical Deep Dive, we provide a more technical deep dive on how to setup an MLflow instance and a general introduction on how Machine Learning models trained with HANA ML can be logged with MLflow.

It is important to mention that SAP offers an extensive MLOps platform for managing ML experiments, AI Core / AI Launchpad, which is out of the scope of this post. For more information on AI Core please refer to the blog post here.

Ok, let's start reviewing our example. We will work our way along a simplified Machine Learning pipeline as shown below and will comment on the architectural patterns for each task.

To simplify the use case, we will assume that the gravity of the data required for our model lies within SAP. This means that majority of the data used for model training is in an SAP application either on-premise or in the cloud.

Data Modeling

Typically data landscapes in enterprises are quite complex and data is distributed across numerous systems. So, even though the majority of the data for our example comes from an SAP source, it is realistic to assume that for the modeling a portion of that data could come from another system. It is the task of a Data Engineer to connect to the data and to prepare it for the algorithm training (e.g. feature engineering). As shown in the picture below SAP Datasphere can help unify data sources in a central repository either via federation or, where not supported, via replication.

In addition to the data modeling features, SAP Datasphere also offers a runtime for Machine Learning tasks thanks to the embedded HANA Cloud instance. This runtime can be utilized by Data Scientists and since it is embedded it allows to perform ML without the need for data movement and replication. This brings several benefits related to security, execution speed, business context preservation and compliance aspects. For more information about those benefits check out this blog post.

Ok, let's move to the data science tasks and model training.

Model Training

During this phase Data Scientists experiment and iteratively develop the ML model. Most Data Science experts have their preferred platform for ML prototyping. The HANA Machine Learning Python package, called hana-ml, can be used with any Python IDE available. The development environment can be either deployed manually by the Data Scientist or hosted centrally on a dedicated platform. The following blog posts show examples how HANA Machine Learning code can be developed using different platforms: Azure ML and Databricks.

Already during training and experimentation, MLflow plays an important role. It helps evaluate the progress and log the details of each experiment run for later reference. Several algorithms from the hana-ml package (e.g. Automated* or Unified* methods) support the automatic logging of model key performance indicators during training. This is seamless for the user and uses the same interface as open source frameworks. It allows to track hyper parameters, model performance KPIs and also log training activities with usernames and timestamps for auditibility. For more technical details about these features please refer to the second part of our blog post.

Model Deployment

Once a suitable model has been selected and trained, it needs to be deployed. For our example with HANA Machine Learning, we will do this in two steps. In the first step, hana-ml is used to store the artifacts into the built-in model repository of SAP HANA Cloud. In the second step the model is exposed via an API to be consumed by other applications. This can be achieved in several ways, but a lean approach is to use a deployed Flask application (e.g. on SAP Business Technology Platform). To see the details on this process please refer to this blog post.

Model Performance Tracking

In addition to being able to track experiments while training the model, also tracking of the model performance after deployment is important - e.g. in order to track prediction quality and also detect effects like data drift, etc. Some information about those concepts can be found in this blog post.

In our example we will achive the monitoring of the model during operations as follows: Since our model is deployed and exposed via a Flask application, as proposed above, in the application code we use the mlflow package to log incoming data as well as predictions. This allows us to run validation tests (compare actual vs. predicted) once validation data becomes available.

Let's now review how to detect deterioration in model performance and how to perform retraining.

Model Re-Training

Model retraining could be either scheduled or triggered based on a condition or an event. There are several ways how this can be achieved, including automation flows, like SAP Build Process Automation, AirFlow (https://airflow.apache.org/) or Kubeflow (https://www.kubeflow.org/), or by simple helper applications deployed by the user.

For the sake of simplicity in our case we use a simple application deployed on SAP Business Technology Platform (e.g. on Cloud Foundry), which can schedule model retraining runs (e.g. if new training data becomes available regularly). This same application can periodically run a check on the model performance via the APIs from mlflow as described in the previous section. And if there is any performance deterioration (e.g. there is a high deviation in predicted vs actuals) a new retraining run can be performed. And as discussed already, during the new training run the model parameters will be logged via hana-ml and mlflow. In addition, the model will be updated in the model repository in SAP HANA Cloud.

This closes the cycle for the example ML pipeline in the first section. Let's put all pieces together to see the big picture.

Key takeaways

In this blog post we showcased an example conceptual and architectural blueprint on how to realize MLOps pipelines using SAP HANA Machine Learning and the open-source framework MLflow. We discussed the end-to-end process and the advantages of integrating these tools to streamline the machine learning lifecycle, especially in the part of model lifecycle management. To see the technical details and example code to achieve the described steps please refer to the second part of the blog series here. Happy reading!

SAP CodeJam HANA ML In Poznań 2024-04 Recap

2024-05-09T16:09:45.370000+02:00

This week, we had the Getting Started with Machine Learning using SAP HANA and Python in Poznań, Poland, kindly hosted by All-for-One Poland and organized by @JulianG

We got participants not only from Poznań, but as well from Polkowice and Warsaw.

I would like to thank Gabriela, Danka, and Julian for the organization!

And, as always, it was great to catch up with local SAP community veterans @DominikTylczyn and @GregMalewski during a spontaneous SAP Stammtisch the night before 🍻

If you want to host the SAP CodeJam on this topic, then please check: https://community.sap.com/t5/technology-blogs-by-sap/quot-getting-started-with-machine-learning-using-sap-hana-quot-as-a-new-sap/ba-p/13574098

Exploring ML Explainability in SAP HANA PAL – Classification and Regression

2024-05-10T08:45:44.156000+02:00

1. Introduction

In this blog post, we will delve into the concept of Machine Learning (ML) Explainabilityin SAP HANA Predictive Analysis Library (PAL) and showcase how HANA PAL has seamlessly integrated this feature into various classification and regression algorithms, providing an effective tool for understanding predictive modeling. ML explainability are integral to achieving SAP's ethical AI goals, ensuring fairness, transparency, and trustworthiness in AI systems.

Upon completing this article, your key takeaways will be:

An understanding of the concept of ML Explainability.
How to utilize HANA PAL for ML Explainability in classification and regression tasks.
Hands-on experience with Python Machine Learning Client for SAP HANA (hana-ml) through an example.

Please note that ML explainability in HANA PAL is not just confined to classification and regression tasks but also extends to time series analysis. We will explore these topics in the following blog post. Stay tuned!

2. ML Explainability

ML Explainability, often intertwined with the concepts of transparency and interpretability, refers to the ability to understand and explain the predictions and decisions made by ML models. It aims to clarify which key features or patterns in the data contribute to specific outcomes.

The necessity for explainability escalates with AI's expanding role in critical sectors of society, where obscure decision-making processes can have significant ramifications. It is essential for fostering trust, advocating fairness, and complying with regulatory standards.

The field of ML explainability is rapidly evolving as researchers in both academia and industry strive to make AI smarter and more reliable. Currently, several techniques are widely employed to enhance the comprehensibility of ML models. These methods are generally divided into two categories: global and local.

Global explainability methods seek to reveal the average behavior of ML models and the overall impact of features. This category encompasses both:

Model-Specific approaches, utilize inherently interpretable models like linear regression, logistic regression, and decision trees, which are designed to be understandable. For instance, feature importance scores in tree-based models assess how often features are used to make decisions within the tree structure.
Model-Agnostic approaches that offer flexibility by detaching the explanation from the model itself, utilizing techniques like permutation importance, functional decomposition, and global surrogate models.

In contrast, local explainability methods focus on explaining individual predictions. These methods include Individual Conditional Expectation, Local Surrogate Models (such as LIME, which stands for Local Interpretable Model-agnostic Explanations), SHAP values (SHapley Additive exPlanations), and Counterfactual Explanations.

3. ML Explainability in PAL

PAL, a key component of SAP HANA's Embedded ML, is designed for data scientists and developers to execute out-of-box ML algorithms within HANA SQL Script procedures. This eliminates the need to export data in another environment for processing, thereby reducing data redundancy and enhancing the performance of analytics applications.

In terms of explainability, PAL offers a variety of robust methods for both classification and regression tasks through its Unified Classification, Unified Regression, and AutoML functions. The model explainability is made accessible via the standard AFL SQL interface and the Python/R machine learning client for SAP HANA (hana_ml and hana.ml.r). By offering both local and global explainability methods, PAL ensures that users can choose the level of detail that best suits their needs.

Local Explainability Methods
- SHAP (SHapley Additive exPlanations values), inspired by game theory, serve as a measure to explain the contribution of each feature towards a model's prediction for a specific instance. PAL implements various SHAP computation methods, including linear, kernel, and tree SHAP, tailored for different functions. For example, in tree algorithms such as Decision Tree (DT), RDT, and HGBT, PAL also provides tree SHAP and Saabaas for computation. PAL also implements kernel SHAP in the context of AutoML pipelines to enhance the interpretability of model outputs.

Global Explainability Methods
- Permutation Importance: A global model-agnostic method that involves randomly shuffling the values of each feature and measuring the impact on the model's performance during the model training phase. A significant drop in performance after shuffling indicates the importance of a feature. For more detailed exploration and examples, one can refer to the blog post on permutation importance.
- Global Surrogate: Within AutoML, after identifying the best pipeline, PAL also provides a Global Surrogate model to explain the pipeline's behavior.
- A native method to tree-based models like RDT and HGBT that quantifies the importance of features based on their frequency of use in splitting nodes within the tree or by the reduction in impurity they achieve.

4. Explainability Example

4.1 Use case and data

In this section, we will use a publicly accessible synthetic recruiting dataset which is derived from an example at the [Centre for Data Ethics and Innovation] as a case study to explore HANA PAL ML explainability. All source code will use Python Machine Learning Client for SAP HANA(hana_ml). Please note that the example code use in this section is only intended to better explain and visualize ML explainability in SAP HANA PAL, not for productive use.

This artificial dataset represents individual job applicants, featuring attributes that relate to their experience, qualifications, and demographics. This same dataset is also used in my another blog post on PAL ML fairness. We have identified the following 13 variables (from Column 2 to Column 14) to be potentially relevant in an automated recruitment setting. The first column includes IDs, and the last one is the target variable, 'employed_yes', hence the model shall predict if an applicant will or shall be employed or not.

ID: ID column
gender : Femail and male, identified as 0 (Female) and 1 (Male)
ethical_group : Two ethic groups, identified as 0 (ethical group X) and 1 (ethical group Y)
years_experience : Number of career years relevant to the job
referred : Did the candidate get referred for this position
gcse : GCSE results
a_level : A-level results
russell_group : Did the candidate attend a Russell Group university
honours : Did the candidate graduate with an honours degree
years_volunteer : Years of volunteering experience
income : Current income
it_skills : Level of IT skills
years_gaps : Years of gaps in the CV
quality_cv : Quality of written CV
employed_yes : Whether currently employed or not (target variable)

A total of 10,000 instances have been generated and the dataset has been divided into two dataframes: employ_train_df (80%) and employ_test_df (20%). The first 5 rows of employ_train_df is shown in Fig.1.

Fig. 1 The first 5 rows of training dataset

4.2 Fitting the Classification ML Model

In the following paragraphs, we will utilize the UnifiedClassification and select the "randomdecisiontree" (RDT) algorithm to showcase the various methods PAL offers for model explainability.

Firstly, we instantiate a 'UnifiedClassification' object "urdt" and train a random decision trees model using a training dataframe employ_train_df. Following this, we employ the score() function to evaluate the model's performance. As shown in Fig.2, the model's overall performance is satisfactory, as indicated by its AUC, accuracy, and precision-recall rates for both classes 0 and 1 in the model report .

>>> from hana_ml.algorithms.pal.unified_classification import UnifiedClassification

>>> features = employ_train_df.columns # obtain the name of columns in training dataset
>>> features.remove('ID') # delete key column name
>>> features.remove('employed_yes') # delete label column name
>>> urdt = UnifiedClassification(func='randomdecisiontree', random_state=2024)
>>> urdt.fit(data=employ_train_df, key="ID", label='employed_yes')
>>> score_result = urdt.score(data=employ_test_df, key="ID", top_k_attributions=20, random_state=1)

>>> from hana_ml.visualizers.unified_report import UnifiedReport
>>> UnifiedReport(urdt).build().display()

Fig.2 Model Report

4.3 Local ML Model Explainability

SHAP values can be easily obtained through the predict() and score() functions. The following code demonstrates the use of the predict() method with 'urdt' to obtain the predictive result "predict_result". Figure 3 displays the first two rows of the results, which include a 'SCORE' column for the predicted outcomes and a 'CONFIDENCE' column representing the probability of the predictions. The 'REASON_CODE' column contains a JSON string that details the specific contribution of each feature value, including "attr" for the attribution name, "val" for the SHAP value, and "pct" for the percentage, which represents the contribution's proportion.

When working with tree-based models, the 'attribution_method' parameter offers three options for calculating SHAP values: the default 'tree-shap', 'saabas' designed for large datasets and can provide faster computation, and 'no' to disable explanation calculation to save computation time as needed.

>>> predict_result = urdt.predict(data=employ_test_df.deselect("employed_yes"), key="ID",
top_k_attributions=20, attribution_method='tree-shap', random_state=1, verbose=True)
>>> predict_result.head(2).collect()

Fig. 3 Predict Result Dataframe

For a more convenient examination of SHAP values, we provide a tool - the force plot, which offers a clear visualization of the impact of individual features on a specific prediction. Taking the first row of the prediction data as an example, we can observe that being female (gender = 0), having 3 years of experience (years_experience = 3), and not being referred (referred = 0), all contribute negatively to the likelihood of being hired. Furthermore, by clicking on the '+' sign in front of each row, you can expand to view the detailed force plot for that particular instance (as shown in Figure 4).

>>> from hana_ml.visualizers.shap import ShapleyExplainer
>>> shapley_explainer = ShapleyExplainer(feature_data=employ_test_df.sort('ID').select(features),
reason_code_data=predict_result.filter('SCORE=1').sort('ID').select('REASON_CODE'))
>>> shapley_explainer.force_plot()

Fig. 4 Force Plot

4.4 Global ML Model Explainability

4.4.1 Permutation Importance Explanations

To compute permutation importance, you need to set the parameter permutation_importance = True when fitting the model. The results of the permutation importance scores can be directly extracted from the importance_ attribute of the UnifiedClassification object, with each feature name suffixed by PERMUTATION_IMP in Fig. 5 and the scores are visualized in Fig. 6.

>>> urdt_per = UnifiedClassification(func='randomdecisiontree', random_state=2024)
>>> urdt_per.fit(data=employ_train_df, key='ID', label='employed_yes', partition_method='stratified',
stratified_column='employed_yes', training_percent=0.8, ntiles=2, permutation_importance=True, permutation_evaluation_metric='accuracy', permutation_n_repeats=10, permutation_seed=2024)
>>> print(urdt_per.importance_.sort('IMPORTANCE', desc=True).collect())

Fig. 5 Permutation Importance Scores

Fig. 6 Bar Plot of Permutation Importance Scores

In Fig. 6, we can see the top three features in terms of importance, are 'years_experience', 'referred', and 'gcse'. This indicates that these features have the most significant impact on the model's predictions when their values are randomly shuffled, leading to a measurable decrease in the model's performance metric.

4.4.2 SHAP Summary Report

The ShapleyExplainer also provides a comprehensive summary report that includes a suite of visualizations such as the beeswarm plot, bar plot, dependence plot, and enhanced dependence plot. Specifically, the beeswarm plot and bar plot offer a global perspective, illustrating the impact of different features on the outcome across the entire dataset.

>>> shapley_explainer.summary_plot()

The beeswarm plot (shownin Fig. 7), which visually illustrate the distribution of SHAP values for features across all instances. Point colors indicate feature value magnitude, with red for larger and blue for smaller values. For instance, the color distribution of 'years_experience' suggests that longer work experience increase hiring chance while the 'years_gaps' spread implies a longer gap negative affects hire likelihood.

Fig. 7 Beeswarm Plot

The order of features in the beeswarm plot is often determined by their importance, as can be more explicitly seen in the bar plot shown in Fig. 8. which ranks features based on the sum of the absolute values of their SHAP values, providing a clear hierarchy of feature importance. For example, the top 3 influential features are 'years_experience', 'referred', and 'ethical_group'.

Fig. 8 Bar Plot

For a more granular understanding of the impact of each feature on the target variable, we can refer to the dependence plot shown in Fig. 9. This plot illustrates the relationship between a feature and the SHAP values. For instance, a dependence plot for 'years_experience' might show that shorter work experience corresponds to negative SHAP values, with a turning point around 6 years of experience, after which the contribution becomes positive. Additionally, the report includes an enhanced dependence plot that examines the relationship between pairs of features. This can provide insights into how feature interactions affect the model's predictions.

Fig. 9 Dependence Plot

4.4.3. Tree-Based Feature Important

The feature important for tree-based models is currently supported by RDT and HGBT in PAL. The feature importance scores can be directly extracted from the importance_ attribute of the 'UnifiedClassification' object "urdt". Below is a code snippet that demonstrates how to obtain and rank these feature importance scores in descending order. The result is shown in Fig. 9 and these scores can then be visualized using a bar plot (as shown in Fig. 10). It is clear that the top three features in terms of importance are 'years_experience', 'income', and 'gcse'.

>>> urdt.importance_.sort('IMPORTANCE', desc=True).collect()

Fig. 10 Feature Importance Scores

Fig. 11 Bar Plot of Feature Importance Scores

Figures 6, 8, and 11 present feature importance scores from 3 different methods, consistently identifying 'years_experience' as the most critical factor. However, the ranking of importance of other features varies across methods. This fluctuation stems from each method's unique approach to assessing feature contributions and the dataset's inherent characteristics. SHAP values are based on a game-theoretic approach that assigns each feature an importance score reflecting its average impact on the model output across all possible feature combinations. In contrast, tree-based models' feature importance scores reflect how frequently a feature is used for data splits within the tree, which may not capture the nuanced interactions between features. Permutation importance, on the other hand, can reveal nonlinear relationships and interactions that are not explicitly modeled. Thus, interpreting the model requires a multifaceted approach, considering the strengths and limitations of each method to inform decision-making.

5. Summary

The blog post introduces ML explainability in SAP HANA PAL, showcasing the use of varous local and global methods like SHAP values, permutation importance, and tree-based feature importance to analyze a synthetic recruiting dataset using Python Client. It emphasizes the necessity for a multifaceted approach to model interpretation, considering the strengths and limitations of each method for informed decision-making. This feature is crucial for SAP's ethical AI objectives, aiming to ensure fairness, transparency, and trustworthiness in AI applications.

4 Lessons from Moms in Risk-Resiliency for Supply Chain Professionals

2024-05-10T13:36:56.541000+02:00

This Sunday is Mother’s Day! It is a beautiful day to show our moms, grandmas, or nanas how much we care and appreciate all the things they do for us. They are our first real-time superheroes, setting an amazing example in every aspect of life. So, why not take a cue from moms and incorporate their best practices into our supply chain strategies?

Let's get inspired and see how we can learn from our very first supply chain managers of all time: our moms!

#1 She is the dictionary definition of risk-resiliency.

When I was a kid, each time I tripped and scraped my knee, like a superhero, my mom would come rushing to my rescue with a napkin and band-aid in hand. Or whenever I mentioned I was hungry or thirsty, she'd pull out a magical purse filled with snacks and water. They are constantly one step ahead, knowing our needs and ready for any kind of shenanigans.

Just like how moms, supply chain professionals need to have their special tools in their toolbox to predict, anticipate, and act with agility toward any issues that may arise. If anyone knows how often the ship hits the fan, it's supply chain managers.

One of the most important tools in the supply chain managers’ toolbox is having full visibility to thrive instead of just surviving in the business world. Breaking down silos and sharing real-time information with each stakeholder in the supply chain operations enables businesses to make informed decisions with an agile reflex.

#2 She leverages Momtifical Analysis capabilities.

Let’s be honest, moms never seem to forget a single detail – this is a dad-approved statement by the way - and they are always ready to give us the guidance and support we need to make the optimal decision in seconds.

Much like moms’ ability to make quick decisions, supply chain professionals need the tools to do the same. Incorporating business AI into operations is like the supermom senses of real-time data and insights, enabling supply chain managers to make quick and resolute decisions and guiding businesses through the complexities.

By harnessing the power of data analytics and machine learning algorithms, organizations can pinpoint and address issues in real time, reduce wastage, and improve overall production quality. They prevent potential equipment malfunctions before they occur, schedule maintenance, and safeguard against costly breakdowns and production chokepoints, which enables them to increase efficiency and maximize profitability.

#3 She can effectively alternate sourcing strategies.

Shopping with mom is like embarking on a wild scavenger hunt through the aisles of multiple stores. She knows where to buy the freshest groceries, and always knows when to get them. Moms’ knowledge of suppliers - including the network amongst the local stores and other moms - is so extensive that she is always prepared and alternates sourcing strategies in case their supplier cannot provide what they need.

In today's digital age, an interconnected, transparent, and versatile network is a must for businesses to establish strong connections with their partners and suppliers along the supply chain. Connecting the partners, suppliers, and logistics data in a single centralized environment eliminates the need for individual integrations with each trading partner. This streamlined and centralized network improves risk management and enhances trust, therefore, enhancing customer satisfaction.

#4 She shapes a greener future.

Back in the day, my schoolbag and pockets were like a treasure trove of empty snack packs and wrappers, as my mom taught me to throw away my trash only in the bin. Fast forward to adulthood, I still find myself hoarding trash in my bags, as her voice still rings in my head. The fact is moms are shaping the future by teaching generations to be eco-conscious by instilling habits like repurposing items and cutting down on waste.

Just as mothers are instrumental in making this change, supply chains, as great contributors to waste in the world, are also essential drivers of change and innovation in creating a more sustainable future for our planet.

By implementing sustainable methodologies and tools, the supply chain can become a valuable asset in not only profitable growth but also positive environmental change.

In a world where we are consistently looking for alternative ways to run supply chains more efficiently, why not draw inspiration from our very first supply chain managers we know: our mothers?

To learn more about enabling a risk-resilient and sustainable supply chain, please join us at the upcoming SAP Sapphire & ASUG Annual Conference, from June 3-5, 2024 in Orlando.

Installing SAPRouter on Linux: A Step-by-Step Guide

2024-05-11T15:55:48.753000+02:00

What is SAP Router ?

SAPRouter is a software component used to secure communication between SAP systems and the internet. Installing SAPRouter on Linux is a crucial step in ensuring secure communication for your SAP landscape. This step-by-step guide will walk you through the installation process.

Prerequisites:

- Linux server (e.g., CentOS, Ubuntu)

- Root access to the server

- SAPRouter software package downloaded from the SAP Support Portal

Step 1: Download SAPRouter:

Download the SAPRouter software package from the SAP Support Portal. Ensure that you download the correct version for your operating system.

Step 2: Extract the SAPRouter Package:

Transfer the downloaded SAPRouter package to your Linux server. Use the following command to extract the package:

tar -xvf saprouter_<version>_linux_x86_64.tar.gz

Step 3: Create a Directory for SAPRouter:

Create a directory to store the SAPRouter files. You can use the following command to create the directory:

mkdir /usr/sap/saprouter

Step 4: Copy SAPRouter Files:

Copy the extracted SAPRouter files to the newly created directory:

cp -R <path_to_extracted_files>/saprouter /usr/sap/saprouter

Step 5: Create a Configuration File:

Create a configuration file named `saprouter.ini` in the `/usr/sap/saprouter` directory. Here's a basic example of the configuration file:

# SAProuter Configuration File

version = 39

httpport = 81

tracefile = /usr/sap/saprouter/saprouter.trc

authid = *

permit = *

Step 6: Set Permissions:

Ensure that the SAPRouter binary and configuration files have the correct permissions:

chmod 755 /usr/sap/saprouter/saprouter

chmod 644 /usr/sap/saprouter/saprouter.ini

Step 7: Start SAPRouter:

Start the SAPRouter using the following command:

/usr/sap/saprouter/saprouter -r -R /usr/sap/saprouter/saprouter.ini

Step 8: Verify SAPRouter Status:

Verify that SAPRouter is running and listening on the specified port (e.g., 81):

netstat -tuln | grep 81

Step 9: Configure Firewall:

Configure your firewall to allow incoming and outgoing traffic on the SAPRouter port (e.g., 81) to ensure proper communication.

Step 10: Configure SAP Systems:

Update the `secinfo` file of your SAP systems to include the SAPRouter details for communication through the SAPRouter.

Overall information:

By following these steps, you can successfully install SAPRouter on your Linux server. This will help secure communication between your SAP systems and the internet, ensuring the integrity and confidentiality of your SAP landscape.

#SAP #SAPRouter #Linux #Installation SAP Young Thinkers #Red Hat Enterprise Linux SUSE Linux Enterprise Server SAP Women in Tech SAP Integration Suite SAP Business Application Studio @Muthumayandi @Subramanian @Sap @sapsapsap

Welcome to the Product Inspiration Series for Industries & CX

2024-05-13T15:20:57.917000+02:00

Curious to learn more about SAP Industries & Customer Experience product innovations and seeing them in action? Then what we've prepared is exactly made for you.

According to the motto "Get Ready and Experience Intelligent CX from SAP" we from the SAP product team are happy to invite you to the Industries & CX Product Inspiration Series 2024. Starting in May we are planning to release multiple deep-dive videos made specifically for you, our dear customers, partners, colleagues and all SAP industries & CX enthusiasts out there.

Each product inspiration video covers a specific domain of the Intelligence CX portfolio by highlighting latest innovations and technologies used e.g. SAP Business AI powered scenarios along with Generative AI and SAP Joule.

By definition each video...

does not contain any slides (zero slides rule),
covers specific innovations as part of the Intelligent CX portfolio,
is made from product and/or industry experts for experts,
typically starts with a short Q&A at the beginning (why, what, how),
consists of deep-dive and/or process demos,
wants to inspire you by showcasing real software,
is produced in high quality to offer a jaw-dropping viewing experience,
helps you to understand the Industries & CX product strategy,
and ultimately is made with love and dedication for you.

And here's the list of the first videos which made it through the finish line...

We hope you like it and you're as excited as we are…

Enjoy and have fun!

Expert Systems in AI: Bridging Human Expertise and Machine Intelligence

2024-05-18T19:46:44.716000+02:00

Artificial Intelligence (AI) is a broad field focused on creating intelligent machines capable of emulating human intelligence. AI expert systems specifically aim to replicate human expertise and decision-making processes within specific domains. These systems leverage knowledge bases, rule sets and inference mechanisms to provide recommendations or make decisions, imitating the problem-solving capabilities of human experts.

Need for an AI Expert System:

AI expert systems offer consistent decision-making and problem-solving processes, ensuring a high level of expertise is consistently applied across various scenarios eliminating bias. This reliability enhances the quality of outcomes.
These systems excel at handling complex problems involving vast amounts of data. They can analyze intricate situations and derive rule-based insights that may be challenging for humans to process manually.
Expert systems automate routine tasks, decision-making processes and problem-solving activities, leading to reduced errors, time savings and increased productivity within organizations.
They also play a crucial role in retaining and preserving valuable knowledge and expertise within an organization, even as experts retire or leave the organization.

Inference Engine - crucial component of AI expert system. An inference engine is the component that processes logical rules and knowledge stored in the knowledge base to make inferences to derive conclusions.
Knowledge Base - core component of an expert system, containing information, facts, rules and heuristics related to the domain of expertise.
Knowledge Acquisition Module - responsible for acquiring knowledge from domain experts and translating it into a format that the expert system can understand.
Explanation Facility - expert systems include an explanation facility that can explain the reasoning process behind the system's recommendations or decisions. This transparency helps users understand why a particular solution or suggestion was provided.

How is an AI expert system different from ML Model?

The key difference between a machine learning model and an expert system is the expert systems rely on explicit domain knowledge and rules provided by experts, while machine learning models learn patterns and relationships directly from data through training. Machine learning models are completely data driven, whereas an expert system is rules, facts, objectives and domain-specific knowledge provided by human experts.

Expert systems are often used when human expertise is critical and rules are well-understood, while machine learning excels in tasks where large amounts of data can provide insights and predictions.

How to build an expert system?

Define the problem statement the expert system will address. Identify the specific tasks, decisions, or problems that the system will help with.
Next is the Knowledge Acquisition from domain experts. This can involve document reviews, expert interviews and extracting rules, facts and heuristics that experts use to make decisions in the domain.
Choose an appropriate knowledge representation method which could be rule-based or case-based (storing and retrieving past cases) or a combination of these approaches.
Develop the inference engine which is responsible for applying the knowledge rules to the input data or user queries to derive conclusions or make recommendations.
Last step is to design a user interface through which users can interact with the expert system.

Can we fine-tune a Large-Language Model to an expert system?

While it's theoretically possible to convert a large-language model into an expert system, the process is complex and the resulting system may not fully capture the richness and nuances of human-provided expert knowledge. It's more practical to use LLMs for tasks where their strengths are like natural language understanding and content generation

Large language models learn implicit knowledge from vast amounts of data and then derive patterns and relationships. This may not capture human-provided rules in an expert system. Converting the learned representations of a large language model to use structured knowledge representation such as rules, facts and ontologies can be crucial and may require significant effort.
Large language models are general-purpose models trained on diverse data, whereas expert systems focus on domain-specific data.
Bigger large language models like GPT-3 have complex internal structures that are not easily interpretable or explainable. Whereas expert systems are designed with transparency and interpretability as key features

Applications of AI Expert Systems:

Medical Diagnosis - AI Expert Systems are used extensively in healthcare for diagnosing diseases, interpreting medical images like X-rays and MRIs to provide decision support to healthcare professionals.
Financial Services - In the financial sector expert systems assist in risk assessment, fraud detection, investment advisory, credit scoring and portfolio management based on market trends and economic indicators.
Manufacturing and Quality Control - In manufacturing and quality control expert systems help optimize the manufacturing processes, monitor quality control, schedule maintenance tasks and improve overall operational efficiency
Education and Training - Expert systems are used in e-learning platforms to create intelligent tutoring systems and personalized educational content tailored to learning styles.
Customer Support and Chatbots - Chatbots powered by expert systems provide personalized customer support, answer queries, troubleshoot technical issues, recommend products or services and handle routine tasks through natural language processing (NLP) and machine learning.

Tools and Platforms:

Java Expert System (Jess)
CLIPS (C Language Integrated Production System)
PyKnow
Prolog

Conclusion:

Overall, AI expert systems leverage domain knowledge, logical reasoning and inference capabilities to emulate human expertise and provide valuable insights and solutions in complex problem-solving scenarios. They are a powerful tool for augmenting human decision-making, improving efficiency and addressing challenges in diverse fields.

Regards,

ArunKumar Balakrishnan