SAP Community - SAP AI Launchpad

How SAP's Generative AI Architecture Redefines Business Applications

2023-12-20T14:42:20+01:00

At SAP, we view artificial intelligence (AI) not as a standalone technology, but as an integrated part of our business applications that infuses intelligence natively into business processes. The introduction of generative AI marks a disruptive shift that will transform how businesses operate, make decisions, and optimize processes. Contextualizing this technology in a business setting can unlock unprecedented value for our customers.

At SAP TechEd on November 2-3, 2023, we announced the introduction of generative AI hub, expected to be generally available by end of 2023, and SAP HANA Cloud Vector Engine, which will be available by March 2024.
This blog post is the first of a series that dives deeper into various aspects of SAP’s approach to generative AI and its technical underpinnings.

SAP’s Generative AI Strategy

SAP believes that generative AI is one of the most consequential technologies of our time, and SAP is committed to helping our clients leverage generative AI to create tremendous business value. SAP specializes in business AI: generative AI that is relevant since it’s embedded in your business applications and processes from day one; that is reliable since we train, ground, and adapt generative AI on your business data and context; and that is responsible by design, following SAP’s rigorous AI ethics, privacy, and security practices.

SAP is releasing generative AI-powered capabilities and features natively into our cloud business software, including Joule, our generative AI copilot that transforms how people interact with their processes and data. New capabilities on SAP Business Technology Platform, like a vector database engine in SAP HANA and generative AI hub, allow customers and partners to create generative AI-powered extensions and apps. We foster an ecosystem of strategic partnerships to leverage the best generative AI models available.

We also believe that while AI is a useful tool, humans still play a key role in business decision-making and reasoning. Therefore, we deliberately design our AI solutions following our AI ethics policy and to keep humans in the loop to carefully review AI-generated information.

SAP’s Architecture for Generative AI

With more than 75% of the world's transaction revenue touching an SAP software system, SAP is uniquely positioned to integrate AI into business processes to create value. Already, over 24,000 SAP cloud customers actively use more than 130 AI scenarios in our applications and SAP BTP.

The recent advancements in AI have opened vast opportunities that go far beyond what has been possible until today. At SAP, we are leveraging our deep business process knowledge to natively embed generative AI across our business application portfolio. Our teams are developing a wide range of use cases from question answering, text generation, classification, and summarization to code generation and are exploring the potential of emerging paradigms such as large language model agents.

Figure 1: Generative AI architecture enables fast and compliant productization of use cases with the most common generative AI models while ensuring compliance with enterprise standards.

Each generative AI model yields unique qualities and strengths. To meet customers’ needs for their business, SAP has made a strategic move to provide enterprise-grade access to the most common models through its generative AI hub as part of SAP AI Core as illustrated in Figure 1. This includes leading cloud vendors as well as SAP-managed third-party models deployed on SAP’s own infrastructure for highest compliance standards. Moreover, the rapid evolution of open-source models has been impressive, and we're committed to keeping pace with this innovation by making selected open-source models available.

However, even the most capable generative AI models risk 'hallucinations' or generate outputs that don't align with reality. At SAP, we are grounding our use cases in the vast amount of high-quality, real-world business and industry data that is at the core of our enterprise solutions. By leveraging techniques such as prompt engineering, in-context learning, and retrieval augmented generation (RAG), we are augmenting large language models (LLMs) with our customers’ data and generating results that are contextual and relevant to their specific business environment. The generative AI hub is orchestrating data flows towards our data platform, a key component to enable our development teams to scale development at low cost and provide a rich data model across business processes.

For this, we are also working on bringing vector database capabilities to SAP HANA Cloud to store embeddings, a form of semantic data representation capable of translating high-dimensional data into a lower-dimensional space. This is particularly useful for making sense of complex and unstructured data, such as text, images, or user behavior, by capturing the essence of the data in a condensed form.

We also provide developers with a playground system and libraries to explore and build on top of these technology components. At SAP, more than 35,000 employees are already using this playground system and have generated over one million prompts.

Moreover, we recognize the importance of trust and transparency in the context of generative AI. Our technology enables us to implement strong authorization, audit and monitoring capabilities. We apply tenant segregation and zero data retention rules including no third-party training on our customers’ data as well as our strict data privacy policy. We understand that AI models have the potential to generate content that may not always align with ethical or quality standards. For this, we are building robust measures such as prompt templates and validation, and content management to filter these outputs so that they align with common standards.

Our approach is not only ensuring enterprise standards, but it is also the technical backbone of SAP's commercial framework. Rather than imposing a one-size-fits-all solution, we provide our customers the flexibility to consume generative AI capabilities in SAP’s business applications that address their specific needs and objectives.

Building on these technical underpinnings, we recently announced Joule as SAP's gen-AI powered digital assistant that is embedded across our suite of business applications. The primary aim of Joule is to enhance productivity for our customers’ users by streamlining their daily tasks and decision-making processes. Recognizing the potential of generative AI, we are integrating technical components into Joule to augment conversational capabilities. These include analytics, text editing, document Q&A, and generative capabilities through a federated architecture to enable contextual and relevant user interaction.

We recognize the extensive impact of AI on decision-making, fairness, transparency, privacy, and dignity. It is this recognition that led us, in 2018, to establish a robust ethical framework guiding our AI development and operations. Our ethical principles are anchored in a commitment to human rights, designing for people, striving for bias-free business, transparency, upholding quality and safety, data protection, and engaging with broader societal challenges and regulators.

Conclusion

By integrating generative AI into our business processes, applications, and SAP BTP, we aim to unlock unprecedented value for our customers. Our strategy is centered on ensuring trust, control, and business readiness to harness the full potential of generative AI.

Looking ahead, we are committed to keeping pace with the rapidly evolving landscape and will continue to explore and innovate in this exciting new field of AI. In the coming 6 months, we plan to further enhance our platform and embed Joule in many of our business applications. We are also working on many more generative AI innovations in our business applications from writing assistants generating job descriptions to intelligent process recommendations.

We look forward to sharing more insights as part of this blog series and invite you to join us on this journey exploring the transformative potential of generative AI in business processes. Stay tuned for the next part of this series: we will deep dive into SAP's generative AI hub.

Explore and learn more in our related publications:

-------
Co-authored by Dr. Philipp Herzig and Dr. Jan Dumke

Generative AI Hub - OUT NOW!

2023-12-21T13:39:32+01:00

At TechEd 2023, we previewed the generative AI hub in SAP AI Core and SAP AI Launchpad and the excitement went through the roof. 🚀

Generative AI Hub Announcement at TechEd 2023

I am now thrilled to announce that the generative AI hub is released to customers.

We worked quite hard and successfully integrated generative AI capabilities into our SaaS solution offerings (i.e. SAP SuccessFactors) with the generative AI hub as its foundation. More and more embedded generative AI capabilities will be shipped across 2024, all leveraging the functionality of the generative AI hub. We now can also offer the same generative AI hub to developers and enable them to create generative AI extensions on SAP BTP for their SAP solutions like SAP S/4HANA, SAP SuccessFactors, etc.

The generative AI hub gives you instant access to a broad range of large language models (LLMs) from different providers, such as GPT-4 by Azure OpenAI or OpenSource Falcon-40b. With this access, you’ll be able to orchestrate multiple models, whether programmatically via SAP AI Core or via the playground within SAP AI Launchpad.

We truly believe that by providing a safe and secure playground to each employee will result in an enormous boost in productivity, creativity and efficiency to ultimately improve people's lives. This is our own learning within SAP and now we are making it also available to customers for their employees.

Generative AI Hub Playground Demo

The generative AI hub provides tooling for prompt engineering, experimentation, and other capabilities to accelerate the development of BTP applications infused with generative AI, in a secure and trusted way. AI development teams can submit a prompt to multiple LLMs, compare the generated outcomes to identify the best-suited model for the task, and gain greater control and transparency with the built-in prompt history.

Availability

The generative AI hub is now generally available via SAP AI Core for productive use.

A dedicated trial environment is planned in the upcoming weeks.

In addition, all task-specific AI capabilities in SAP AI Core & SAP AI Launchpad are getting further enhancements and plan to take advantage of the generative AI hub.

License

The generative AI hub is available as consumption-based pricing within CPEA. Customers with existing cloud credits can directly consume LLMs upon entitling SAP AI Core in their BTP subaccount, without the need to manage different commercial frameworks from various LLM providers.

For our partners, we are working on a Test, Demonstration and Development (TDD) license and targeting late Q1 ’24.

UPDATE:

TDD license is now available for our partners!

SAP's Generative AI Architecture

For a more deep dive how we shape our Generative AI Architecture, please refer to this blogpost.

Learn more

Roadmap for Generative AI overall

Roadmap for generative AI hub in SAP AI Core & SAP AI Launchpad

SAP Discovery Center

Documentation

Tutorial for generative AI Hub

Reference Architecture for Generative AI in SAP BTP

Make sure to also follow our AI tags on SAP Community and start interacting with your peers.

Artificial Intelligence / Machine Learning

SAP AI Core

SAP AI Launchpad

SAP AI Services

Understanding Generative AI: Core Concepts

2024-01-10T15:04:25+01:00

Dear All,

How are you?

It is quite impossible nowadays to look at any technology news and not hear of Generative AI. Let us try to first understand the same from the very beginning and basics.

We will first understand the basic concepts and then will drift toward how SAP has incorporated the same in its business model.

But before we do that, let us have a good look at the capabilities of SAP in areas of AI, it's very important to understand that SAP has already incorporated the AI capabilities but it has now enabled GenAI scenarios.

Just have a good look at the below and next we will move to start with BASIC's.

What is “intelligence”? The ability to accomplish complex goals.

What is “artificial intelligence”? Intelligence exhibited by non-biological systems.

Now let us understand the Approaches to Artificial IntelligenceNow as we have a basic understanding of AI concepts let us talk about Generative AI.

Below are some of the quick facts:

Now let us look at the basic definitions:

Foundation models are neural networks trained on large volumes of data using self-supervised learning that can be applied to many tasks.

Large language models (LLMs) are a subcategory of foundation models for text incl. computer code.

Generative AI can create novel output in text, images, sound, or video based on simple user input (called “prompts”).

Now let us look at Generative AI at SAP

Generative AI enables completely new capabilities in business software. With generative AI, we can create new content, summarize complex information, write computer code, and more. It is a new generation of AI that can reason through business problems and make suggestions that were previously unthinkable. Customers can expect generative AI use cases that bring tangible value across the SAP portfolio.

The below timeline shows the Evaluation of SAP products in terms of technology and innovation.

Generative AI is not all good and rosy, it does have its fair amount of limitations and hiccups.

The limitations of generative AI models

Hallucination. Large language models can generate plausible-sounding yet false answers.

Up-to-date and specific knowledge. The knowledge of a generative AI model is frozen in time from when it was trained. In contrast to world general knowledge, business information changes quickly.

Inconsistent math abilities and limited notion of time. Although improving, large language models are not calculators. Other kinds of foundation models could conceivably complete forecasts and math.

But some methods will help to make it more reliable and efficient.

Prompt Engineering - Provide more information to describe the task

Retrieval Augmented Generation(RAG) - Extend to external domain knowledge by
retrieving and injecting information via embeddings (numeric vectors).

Orchestration Tools - Agents, functions, plug-ins, prompt and model chaining, memory.

Now let us try to understand how SAP has incorporated Generative AI within its business models.

Extending SAP applications with generative AI

SAP has provided the entry point to Generative AI, with a BTP application SAP AI Launchpad which can be found in the SAP Discovery Center.

SAP AI Launchpad is a multitenant software as a service (SaaS) application in the SAP Business Technology Platform. Customers and partners can use SAP AI Launchpad to manage AI use cases (scenarios) across multiple instances of AI runtimes (such as SAP AI Core). SAP AI Launchpad also provides generative AI capabilities via the Generative AI Hub.

The below diagram is quite self-explanatory and helps to understand SAP capability in Generative AI.

At the very top layer, it has an SAP AI launchpad that provides various tools for Prompt engineering and management.

The bottom layer showcases the Trust and control that SAP has incorporated with the foundation models so that it would be reliable for the business.

SAP is introducing a vector engine within SAP HANA Cloud so that it understands, extracts or insert data in a natural language.

Now let us talk about the Generative AI business use cases

Just Ask feature for SAP Analytics Cloud

Document Information Extraction in SAP Transportation Management Application

Joule – The copilot that truly understands your business

The below diagram shows some more use cases.

I hope this will spark more interest in the areas of AI and GenAI, I would like to mention that technology is ever-evolving. GenAI seems to have a long road ahead.

Thanks all for reading the article, please do leave feedback, if possible.

Happy Learning !!

Unleashing the Power of SAP AI Launchpad & SAP AI Core: Create Your First AI Project

2024-01-28T17:45:07.242000+01:00

Introduction:

Hello All, Here I am ready with another blog on SAP AI Launchpad & SAP AI Core. I will walk you through this blog and share my knowledge on how you can create your IST AI project with these trending AI technologies in SAP BTP platform.

As many of you already know that SAP AI Core is a service in the SAP BTP that allows us to confidently deploy and manage artificial intelligence models that natively integrate with SAP applications. In this blog, I am going to show you how you can create an application with SAP AI Launchpad-SAP AI Core by connecting with GitHub repository.

Let's start..

Before going further, Please don't forget below pre-requisites to complete. Otherwise you can't able to see the desired output.😉

Prerequisites:

SAP BTP Global account access
Setup account for SAP AI Launchpad & SAP AI Core.
GitHub account access.

Note: If you are using a GitHub account hosted by your organization, it must be external facing (not behind a firewall).

To connect GitHub to SAP AI Core, we need to create a GitHub repository and a GitHub access token. A GitHub repository is where we store our AI workflows and other files. Each workflow can be used in various AI use cases. A GitHub access token is a code that allows SAP AI Core to access our GitHub repository. Below is screenshots for reference how you can generate access token and workflow pipeline.

Once we have created a GitHub repository and a GitHub access token, we will establish bridge between SAP AI Core and our GitHub by clicking on ADD button and provide your repo url and credentials.

Before creating Repository:

After creating Repository:

In case you didn't store your access key locally or forgot don't worry.. You can regenerate token as shown in below screen. All you have to do is click Regenerate Token button.

After following all the steps that I have explained, you will see below screen. If you had reached below screen means you are doing great and have completed half of the process 😊

After creating workflow in GitHub we will jump into SAP AI Launchpad to create an application to sync workflows with SAP AI Core.

once application is ready we can check our scenario(Name, Executable ID and Description etc details) under ML Operations node in SAP AI Launchpad.

To run executable, all we have to do is create a config file. The configuration name value should be a descriptive name of your choice. All other values must be set as shown in the image below, in line with your executable identifiers.

Note: This step is one time only. Once a configuration is created, you can use it again and again.

To execute your workflow, click Create Execution on the configuration details page.

after updating our workflow with another modification we can see the sync directly in SAP AI Launchpad logs section. All this because of configuration file was fetching info based on executableID that we defined in our workflow.

If you got above message under Execution Logs section, then your workflow is working perfectly fine. Job Well done!!

Conclusion:

So we have completed a project creation with help of SAP AI Launchpad, SAP AI Core and GitHub. SAP AI Launchpad is a multitenant SaaS application in SAP Business Technology Platform. Customers and partners can use SAP AI Launchpad to manage AI use cases aka scenarios across multiple instances of AI runtimes. I have shown you 1 of the runtime environment SAP AI Core here. We can explore more gen AI capabilities via the Generative AI Hub available via SAP AI Launchpad.

Hope you liked it. Thanks for reading this blog. Happy learning!!

Reference Links:

https://help.sap.com/docs/sap-ai-core

https://community.sap.com/topics/ai-core-artificial-intelligence

https://help.sap.com/docs/ai-launchpad/sap-ai-launchpad/what-is-sap-ai-launchpad?q=sap%20ai%20launchpad

https://help.sap.com/docs/sap-ai-core/sap-ai-core-service-guide/what-is-sap-ai-core

Release Navigator for SAP BTP What's New in Q4 2023 and What's Next in 2024

2024-02-05T15:50:27.237000+01:00

The release navigator for SAP Business Technology Platform by SAP Enterprise Support consolidates release information across SAP BTP products and services. You can find blog posts, newsletters, videos, podcasts, webcasts, social media channels, webpages, and more helpful content created by SAP Product Management, SAP User Assistance, SAP Learning, SAP Enterprise Support, and other areas of the SAP ecosystem in this easy-to-use web interface. It’s available to all customers as part of the SAP Enterprise Support offering.

You can now access the release navigator easier and with less clicks. Based on your feedback, we launched a new page on the SAP Support Portal that features a direct link to the release navigator for SAP BTP as well as to all other currently available release navigators.

The new Release Ready web page on the SAP Support portal

How to navigate to the Release Ready page in the SAP Support Portal:

In the SAP Support Portal, click on Offerings & Programs and select Release Ready.

This is where you can find the Release Ready page within the SAP Support Portal

Access via the SAP Enterprise Support Value Maps continues as previously. Additionally, you can now also find the release navigator within the Onboarding Resource Center.

We have published a quarterly maintenance update with the Q4 release cycle. You can find updated dates and resources according to the latest releases for the products featured.

Additionally, you can get to product pages with less clicks now due to a simplified navigation on the Home page of the release navigator.

Did you know that the release navigator is created using SAP Enable Now, which offers a machine translation feature? You can translate all text on the navigator pages automatically to 40 languages.

The translation feature is available in the authenticated version of the release navigator which requires sign in with an S-user, Universal ID or internal ID.

You can access the release navigator using a sign-in, by clicking on the tile "Translation Feature - Authenticated Version" in the section "About the Release Navigator". After you have signed in with one of the authorizations mentioned, you will be able to see the translation option at the top right corner of the screen.

Upcoming updates for the release navigator are scheduled on a quarterly basis in alignment with the Harmonized Release Calendar for SAP Cloud Products.
Content and UX updates to the BTP release navigator are planned for CW10, CW22, CW35 and CW48 of 2024. This schedule is subject to change without prior notice including, when necessary, delays or incorporating changes in subsequent update windows. It is not a commitment, promise, or legal obligation to deliver any material, code, or functionality.

Here’s a sneak peek at some of our Roadmap items for the release navigator:

Content Updates:

Updating resources relating to a specific quarterly release cycle
Integrating more products and services on dedicated product pages

UX Updates:

Continuous improvements of the UX elements for an easier navigation
Alignment of UX among the release navigators

Since launch of this tool in September 2023 we have received a great number of positive reactions and constructive feedback. Thank you and please keep the input coming, we appreciate all feedback!

Find all our teams blogs with the tag Readiness@Scale and feel free to connect here in the SAP Community or on LinkedIn.

Happy navigating!

Readiness@Scale
SAP Enterprise Support

How SAP’s Generative AI Hub facilitates embedded, trustworthy, and reliable AI

2024-02-07T10:40:16.480000+01:00

This blog post is part of a series that dives into various aspects of SAP’s approach to generative AI, and its technical underpinnings. Read the first blog post of the series.

Introduction

In our previous blog post, we discussed how SAP's comprehensive architecture components underpin business AI. In this blog post, we delve deeper into the heart of this architecture – the generative AI hub.

Why a generative AI hub

The large language model (LLM) landscape has grown significantly for over a year, with the introduction of multiple models, each equipped with unique benefits and limitations. The suitability of a model often depends on the specific use case, with factors such as accuracy, latency, and operational environment playing pivotal roles.

At SAP, we are fortunate to have a plethora of use cases to leverage LLMs. For each, there needs to be a systematic and tool supported selection of an appropriate model from a wide range of offerings from SAP’s AI partners, such as Microsoft Azure, Google Cloud Platform (GCP), Amazon Web Services (AWS), Aleph Alpha, Anthropic, Cohere, and others, or from open source. Moreover, we must ensure versatility in application runtime, supporting a variety of models tailored to meet diverse needs. While we appreciate the innovation brought forward by LLM providers, our focus remains on avoiding over-reliance on a single provider. This approach allows us to pivot towards other LLMs that may better cater to our customers and use cases, and react to changes in the market over time.

Our emerging use cases share a common set of requirements. First, we prioritize establishing compliance and fostering trust in the application of LLMs. Following this, we diligently meet all commercial, metering, and billing necessities. However, our primary emphasis is on streamlining and harmonizing the incorporation of these models into our business applications. By addressing these concerns in a standardized way and following a common programming model, we are aiming to speed up the innovation process within our application development teams.

Grounding, which involves providing LLMs with specific and relevant information beyond their inherent knowledge, is crucial in ensuring the accuracy, quality, and relevance of the generated output. Within SAP, our extensive data assets are integral in enhancing LLM use cases with business context seamlessly and reliably, making them more effective.

The transformative potential of generative AI (GenAI) extends beyond just the capabilities of new models; it also lies in their accessibility to both developers and non-developers. When the excitement around ChatGPT started a year ago, we realized that SAP engineers required an uncomplicated while enterprise compliant instrument to unlock the potential of LLMs. This led to the creation of the internal SAP AI playground system, a simple service to explore different models. The result was an enthusiastic response and a flood of innovative ideas to incorporate LLMs into SAP applications. We believe that providing such central, easy access to LLMs is a key component of fostering innovation within any organization.

With these considerations in mind, SAP has decided to bring together trusted LLM access, business grounding for LLMs, and LLM exploration into a single generative AI hub. This hub is provided as integral part of SAP AI Core and SAP AI Launchpad, central elements of our AI foundation on SAP Business Technology Platform (SAP BTP). This strategic move aims to streamline our approach to leveraging LLMs, making it more efficient and effective. While the generative AI hub addresses the requirements from SAP’s own business applications, we are aware that these requirements are shared in the wider SAP ecosystem, and thus took the decision to make the generative AI hub also available to our partners and customers.

How we realize the generative AI hub

The architecture of the generative AI hub looks as follows:

Figure 1: Architecture of generative AI hub

The primary purpose of the generative AI hub is to combine the best of LLMs with the best of SAP processes and data. To achieve this, first it needs to be easy to integrate LLMs of a wide range of provider types into SAP’s business applications.

Let us go step by step through the components of generative AI hub.

Access

In generative AI Hub, we build on SAP AI Core, an SAP BTP service used by all SAP applications and services. SAP AI Core is not only providing capabilities to run AI workloads, but also efficient in proxying access to models that are operated as a service by providers. This is enabling us to maintain consistent access and lifecycle, and reuse much of SAP AI Core’s security and metering implementation.

For models that we operate independently, we depend on SAP AI Core’s features to cost-effectively run AI models on a large scale. To achieve this, we employ a Kubernetes-based structure with specific features suited to AI, such as GPU support.

What do applications need to do to integrate an LLM via Generative AI Hub into their application? Conceptually there are three major steps involved.

In the first step, users must acquire a service instance of SAP AI Core using the SAP BTP Cockpit.

In the second step, the goal is to create a deployment programmatically or via the SAP AI Launchpad, to instantiate a use case specific LLM configuration. It references a model provider specific executable, e.g., models provided via the Azure OpenAI service are bundled together in one. Further parameters like model name, model version, etc. can be configured as well. For each deployment, SAP AI Core provides a unique URL that can be used to access the LLM.

In the final step, the deployment URL can be embedded within an application. It is possible to use this URL as if interacting directly with the native LLM provider API, e.g. through an LLM provider specific SDK. We have intended for this to be the case for a good reason: we acknowledge that various LLMs may have unique features, which we want our use case providers to make use of. However, to ease the migration from one provider to the other, we are also planning to provide an abstract API that works uniformly across all providers, despite any provider-specific features that may be missing.

These steps are described in detail in this dedicated tutorial.

Trust

On the service-level, generative AI hub adheres to the highest industry standards by integrating SAP BTP's security functions, including Identity and Access Management, multi-tenancy, audit log service, TLS management, asset management, and CAM. Along with all other SAP products, it strictly adheres to a secure development and operations lifecycle, comprising among others threat modeling, static and dynamic code scans, open-source risk assessment, penetration testing, and periodic scans of cloud infrastructure. Operational security best practices are enforced, including access control, audit logging, multi-tenancy, and network layer security. These security procedures and their operations undergo reviews by internal governance procedures e.g. internal audits and the NIST CSF framework, as well as external audits in accordance with ISO27001, SOC1, SOC2 and C5 requirements.

In addition, the generative AI hub also addresses the need for additional trust capabilities for GenAI use cases, primarily content moderation capabilities. The hub will check prompts and LLM responses for policy violations, and we aim to soon support optional de-identification and re-identification of personal data.

All our GenAI use cases need to support consumption-based pricing following our commercial model for AI. The generative AI hub plays a pivotal role in reliable usage metering, automatically reporting and aggregating consumed tokens based on tenants and the relevant business context.

Trust needs to extend beyond just the technical aspects. We have a legal and commercial framework in place, including a set of policies, procedures, and agreements that govern how we conduct business, interact with our selected AI partners, and comply with laws and regulations. Therefore, from input to outcome, trust is integral in all operations, not just in the technology we use but also in its management, customer engagement, and legal compliance.

Orchestration and Grounding

LLMs very often incorporate contextual data using patterns such as Retrieval Augmented Generation (RAG) and other in-context learning methods, to achieve higher accuracy and include up-to-date information. This process is supported by common libraries to access data sources and orchestrate LLMs; one of the most notable examples is LangChain. The generative AI hub is designed to work seamlessly with LangChain and similar libraries.

Consider an application that would like to utilize a database of documents (e.g. emails) as contextual information (e.g. in order to utilize historic email responses). To realize this, we need to setup LLM deployments in generative AI hub (see above) for an embedding model as well as a completion model. Further we setup a vector database and store embeddings calculated by the embedding model of our documents. As outlined in the previous post, our SAP HANA database will soon provide such vector store capabilities, although in some use cases certain SAP BTP services, such as PostgreSQL, might already be a good enough option. With the help of LangChain we can now orchestrate a retrieval step by embedding a query through generative AI hub, using the vector database to retrieve the most similar matches, and finally generate a response to the query via generative AI hub.

While generative AI hub embraces LangChain, to optimize the development process, the service will support an orchestrated completion mechanism. It will enable users to define an orchestration configuration, which outlines the specific orchestration procedure (such as RAG), types of grounding data sources (such as a designated vector database), actual data source destination (detailing how to connect to a vector store), etc. In doing so, this mechanism provides a customizable and efficient solution for developers working with LLMs.

For an example around generating email insights with generative AI hub and LangChain have a look into this sample project.

Tooling

As a further generative AI hub capability, there is a professional prompt engineering experience that we integrate into our SAP AI Launchpad. SAP AI Launchpad is designed from the outset to be a one-stop shop for all AI-related activities, providing a modular architecture that is extendable with further tools, sitting on top of the AI API offered by SAP AI Core. Thus SAP AI Launchpad is the natural place to add prompt engineering capabilities that simplify getting started with GenAI, such as storing and versioning prompts, comparing different variants with different LLMs. The screenshot below shows the prompt editor next to the other AI and GenAI related capabilities of SAP AI Launchpad.

Figure 2: Prompt Engineering with generative AI hub

We also see significant potential in making SAP AI Launchpad a place where the productization of LLM-centric applications begins, and continuous monitoring of prompt performance takes place.

UI based tooling is of course just one part of the story, we also provide more developer focused tooling. In this direction, our generative AI hub SDK and our SAP AI Core SDK are already available.

Conclusion

Our technical architecture at SAP for GenAI revolves around the generative AI hub, which centrally addresses common concerns of our business applications and that we plan to gradually open up to the entire SAP ecosystem. This technical architecture is only one part of a successful GenAI narrative. Our next blog post will introduce how we perform benchmarking of LLMs, which is a crucial part of our journey towards relevant, reliable and responsible business AI powered by GenAI technology.

Co-authored by Dr. Andreas Roth and Dr. Philipp Herzig

Navigating the AI Landscape: A UX Designer's Guide

2024-02-14T11:01:22.387000+01:00

Introduction

As we stand on the brink of a technological revolution, AI is reshaping the way we interact with the digital world. I'm here to share my insights regarding this evolving landscape, offering observations into how AI, machine learning, and other related technologies can enhance our work in UX design.

Brief Overview of AI and ML

Artificial Intelligence (AI) is the overarching concept of machines carrying out tasks in ways that we consider 'smart'. It's a broad discipline aimed at creating systems that can simulate various aspects of human intelligence. Machine Learning (ML) is a subset of AI, focused on algorithms that enable machines to improve at tasks with experience. Think of AI as the universe of intelligent computation, with ML being a planet within it.

Deep Learning, in turn, is a subset of ML. It involves neural networks with many layers (hence 'deep') that can learn and make intelligent decisions on their own. Deep Learning has been instrumental in achieving significant breakthroughs in areas like image and speech recognition.

Then come Foundation Models, a recent development in AI. These are large-scale models (like GPT-3) that are trained on vast amounts of data and can be adapted to a wide range of tasks without being specifically trained for them. They are called 'foundations' because they provide a base layer of understanding that can be built upon for various applications.

These technologies are interrelated, forming a hierarchy from broad to specific. AI encompasses everything intelligent that a machine might do. Under AI, ML is the method through which machines learn from data. Deep Learning is a further specialization of ML with a focus on complex, layered neural networks. And Foundation Models are the cutting-edge, versatile systems pushing the boundaries of what AI can achieve.

Generative Pretrained Transformer (GPT) and Transformer Architecture

GPT, or Generative Pretrained Transformer, is a type of language processing AI. It's built on the Transformer architecture, which is revolutionary in the way it handles data. Traditional models processed data sequentially, one piece after another. The Transformer, however, allows for parallel processing, looking at entire sequences of data at once. This is done through mechanisms called 'attention' and 'self-attention,' letting the model weigh the importance of different parts of the input data and learn the context more effectively.

GPT takes this architecture and applies it to language, learning to predict the next word in a sentence. It's trained on a vast corpus of text and then fine-tuned for specific tasks. This pretraining is what makes it 'generative' - it can generate text, not just understand or classify it.

Inception of Generative AI

Generative AI came into existence as a natural progression from earlier AI models that were primarily discriminative. While discriminative models could classify and understand data, they couldn't create new data. The inception of Generative AI marked a shift from understanding to creation.

This shift was fueled by advancements in neural networks and an exponential increase in computational power and data availability. Researchers began exploring how neural networks could not only recognize patterns but also use those patterns to generate new, similar data. This exploration led to the development of models like Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), and eventually to large-scale models like GPT.

Integrating Reinforcement Learning with Human Feedback

Adding to this is the concept of Reinforcement Learning with Human Feedback (RLHF). This approach trains AI systems not just on static datasets but on dynamic feedback loops. It’s akin to teaching a pet new tricks: the AI tries different strategies and, based on human feedback, learns which actions are desirable or undesirable. This human-in-the-loop methodology ensures that the AI’s learning trajectory aligns more closely with human values and preferences, making it particularly potent for personalization and adaptive learning scenarios.

Incorporating RLHF into the design of AI systems compels us to consider not just the initial user interaction but the ongoing relationship between the user and the AI as they adapt and learn from each other over time.

Various Approaches to AI

Understanding the various approaches to AI can be daunting, but it's crucial for harnessing its full potential:

Symbolic AI: Based on the manipulation of symbols and rules, this approach tries to mimic human reasoning.
Probabilistic AI: Focuses on using probabilities to make predictions and decisions.
Statistical AI: Uses statistical methods to infer patterns and make predictions.
Large Language Models (LLMs): These are trained on vast datasets to understand and generate human-like text.

Each approach has its strengths, and knowing which to apply can significantly impact the success of a project.

How Are Models Trained?

AI models are trained using large datasets. They learn by recognising patterns and making associations. For instance, an image recognition model might learn to identify cats by being shown thousands of pictures of cats and not-cats. Over time, it improves its accuracy through a process called backpropagation, where it adjusts its internal parameters to minimize errors.

AI and Design

In the realm of AI, as designers, we're tasked with shaping a human-machine relationship that is continually evolving. Unlike traditional design, where interactions are fixed, AI-based design involves a fluid, ever-changing dynamic. Non-AI systems interact in a consistent, unchanging manner. However, AI systems learn and adapt over time, leading to a constantly developing relationship.

As both the machine and humans learn and adapt, they engage in a reciprocal learning process, forming a dynamic feedback loop. This loop is characterized by an ongoing exchange of information, with both parties growing and adjusting to the interaction. Our role as AI designers is to foster effective communication within this vibrant human-machine relationship.

In this context, data becomes the pivotal element. Previously, data interaction was straightforward — a user command followed by a machine response. With AI, data is the foundation of learned behaviours, informing the machine and shaping its growth. Here, data inputs are about educating the machine with information, not just instructing it. Conversely, machine outputs are no longer fixed; they have generated responses that necessitate explanations of how the data led to a particular conclusion.

As designers, we are charged with orchestrating these interactions throughout the entire data lifecycle. This includes the initial data capture, guiding the machine's learning process, designing clear and understandable data outputs, and ensuring a seamless flow of information back into the system for continued learning. Our responsibility extends beyond creating interfaces; it's about crafting experiences that support this rich, ongoing dialogue between humans and AI, ensuring clarity, transparency, and a mutual growth trajectory.

A comparison can be drawn from the days when selecting a movie at a DVD store involved browsing aisles for a favoured title or relying on a recommendation from a salesperson. In contrast, platforms like Netflix now curate suggestions tailored to our tastes, informed by our viewing history.

Designers find themselves at the intersection of innovation, with three distinct yet interconnected realms to explore:

1. Designing with AI: Crafting Alongside AI

Designers are now partnering with AI in a collaborative dance of creation, where the output is a fusion of human ingenuity and machine efficiency. Imagine the synergy of a designer working with AI to conjure up ground-breaking architectural structures, much like Autodesk's venture in utilizing generative design principles to conceive their Toronto office. Here, AI becomes an ally in the creative process, providing new perspectives and solutions that push the boundaries of traditional design.

Moreover, designers harness AI to streamline repetitive tasks, freeing up creative energy for more complex challenges. Tools like Airbnb's system for transforming sketches into digital wireframes or Netflix's algorithm for adapting graphics across different cultures exemplify this trend. These innovations signify a new era where AI does not replace the designer but rather amplifies their capabilities.

2. Designing for AI: The Human-Centric Design Approach

Designing for AI requires a human-centred lens, focusing on crafting systems that prioritize user needs and experiences. It's about spotting those unique opportunities where AI can not only function but flourish in addressing real-world problems. Here, the designer's role transcends aesthetics, venturing into the realm of functionality and utility, transforming user needs into data-driven AI solutions.

Human-centered design makes AI effective. These are some ways:

1. Beyond Interfaces: Embracing Human-Centered Design in AI Algorithm Development

UX can aid in designing algorithms that mirror the decision-making processes humans employ, by considering their information, goals, and constraints. It can ensure that the decision environment, which encompasses both the algorithm and its human users should be thoughtfully constructed. Users should comprehend their AI tools well enough to use them effectively. Designers also aid in establishing guidelines and business protocols that translate algorithmic predictions into actionable insights, advising when human intervention is appropriate to supplement or override the AI.

2. To Translate User Needs into Data Requirements: A designer can aid in identifying the type of data necessary for training the model, considering various factors like predictive power (A percentage that refers to an ML model’s ability to predict outcomes given a certain input correctly), relevance, fairness, privacy, and security. Ensure the training dataset is comprehensive, reflecting the real-world scenarios the AI will encounter, and free from biases.

3. Knowing the source of the data & Tuning the Model: Evaluating data sourcing and collection methods for their suitability of the project is critical. Once deployed, A designer will assess if the AI meets the target user’s needs as per predefined success metrics. Provide feedback on adjusting the model’s parameters as needed to enhance its performance, focusing on metrics that reflect user experience, such as customer satisfaction or the frequency of users following the AI’s recommendations.

4. Addressing Bias, Fairness, and Transparency: UX designers can help analyze data with an understanding of the domain, Goal definition, possible and required outcomes and the process that generated it. This leads to designers being a crucial part of designing algorithms that are mindful of the environment they will operate in, avoiding controversial predictors. They aid in conducting usability tests or audits to detect and eliminate unintended biases.

5. Managing the Handoff in AI Systems: Designing for smooth transitions between AI and human control in situations demanding common sense or contextual understanding is very important. The accountability for any action in the real world still lies with human users as over-reliance on technology can leave users unprepared for instances where AI fails, necessitating more skilled human intervention. A UX designer can analyse such situations and will aid in designing smooth handoff processes.

6. Designing Reward Functions and leveraging User Feedback for Model Improvement: Designing the AI’s reward function is critical as it influences the user experience significantly. User feedback is essential in refining AI models and enhancing user experience. Designers analyse and optimize the reward data to enhance the model for long-term user benefits and anticipate the downstream effects of your product. This also allows users to contribute to the personalization of their experiences, thereby increasing their trust in the system.

7. Anticipating Errors and Designing Response Pathways: A Human-centred design prepares your AI system to facilitate user responses to inevitable errors, turning them into opportunities for learning and improvement.

8. Educating Users and Setting Realistic Expectations: Designers help communicate the capabilities and limitations of your AI product to customers. Help users develop accurate mental models and understand how their interactions train the system. It’s essential to balance user trust, avoiding both undue scepticism and over-reliance on AI.

9. Guiding User Trust in AI Systems: Users need to adjust their trust in AI systems appropriately, rather than relying on them implicitly in every situation. The phenomenon of 'algorithm aversion' is well-documented, where users may be sceptical of software systems. Conversely, there are instances where users place excessive trust in AI capabilities, expecting more than what the system can deliver. Designers help users develop a balanced level of trust, aligning with the system's actual capabilities and limitations by taking a human-centred approach.

For instance, openly acknowledging the potential inaccuracies in AI predictions can temporarily reduce trust in those specific outcomes. However, this honesty can foster a more sustainable trust in the long term. Users become more judicious in their reliance on the system, reducing the likelihood of disappointment due to misplaced expectations."

3. Designing of AI: The User Experience Frontier

When it comes to the design of AI, it's about envisioning and sculpting the interactions between AI systems and their human users. It's a space where new forms of engagement, like voice-activated assistants or image recognition software, become gateways to enhanced user experiences. The key challenge here is transparency: designing interfaces that not only serve but also educate. Users should be able to grasp, with just the right level of detail, how AI systems make decisions and learn over time.

An example could be the intuitive dashboards in our cars that provide real-time insights into the vehicle's AI, or the smart home devices that learn our preferences and conversationally explain their actions. These are no longer scenarios from a sci-fi novel; they are today's design challenges that call for a blend of technical knowledge, user empathy, and creative foresight.

In essence, the designer's canvas has expanded, not just in size but in dimensionality. As AI continues to intertwine with our daily lives, it invites designers to step into roles that are as diverse as they are dynamic, shaping not only how AI looks but also how it behaves and interacts in the fabric of human experience.

Understanding the Limitations of AI in UX Design

Incorporating AI into UX design comes with distinct limitations that underscore the irreplaceable value of human insight and direction.

1. Hallucinations and Reliability

AI "hallucinations" refer to instances where a model confidently generates an incorrect response. These can be caused by inconsistencies within a large data set or errors in the model's training methodology. In fields where precision is critical, such as financial reporting or legal documentation, these inaccuracies can introduce significant risks. Combatting this requires robust document structuring and advanced prompt design techniques to direct AI towards more dependable outcomes.

2. Prompt Sensitivity

Large Language Models (LLMs) are highly sensitive to user input. The nuances of how a prompt is phrased can lead to varied and unpredictable responses. This sensitivity necessitates a careful and strategic approach to prompt engineering, ensuring that the AI's responses align with user intentions. The evolving role of "Prompt Engineer" is a testament to the significance of crafting prompts that steer AI toward delivering consistent and accurate results.

3. Context Window

Limits The context window, the amount of information an AI can consider when generating a response, is a notable constraint. As the context window expands, so does the computational complexity. Despite improvements like GPT-4's extended context window, there remains a ceiling to the volume of data an AI can process at a time. This limitation is particularly challenging in tasks that require the review of extensive documents, where the AI must understand and analyze large quantities of text. Designing for AI in UX thus requires a thoughtful balance between the AI's capabilities and the complexity of the tasks it is expected to perform.

Things Designers Should Know Before Designing for AI

Technical Knowledge

Grasping the basics of data science and AI techniques like NLP and deep learning.
Basic Understanding of the AI toolchain and DevOps processes for AI development.
Access to clean data set.

Ethics in AI

Integrate ethical standards into the design and development process, ensuring accountability, fairness, and transparency.
Navigate GDPR/compliance mandates and be mindful of the financial and human impacts of design decisions.

Collaboration is Key:

Work closely with data scientists and engineers to understand the technical aspects and constraints.
Foster a shared vision across multidisciplinary teams and participate actively in the AI development lifecycle.
Engage in design thinking activities tailored to AI and address the needs of diverse users.

AI Strategy

Develop and communicate a user-focused AI strategy, explaining barriers to adoption and the business and user benefits.
Articulate your company's AI strategy, differentiators, and the journey toward AI success.

Designing AI Interactions

Translate AI model outputs into understandable insights for users and design interactions that capture human input for machine learning.
Prototype and test AI solutions frequently, and design for various stages of the AI lifecycle, such as model maintenance and data collection methods.

User Trust and Transparency

Build trust by making your AI interactions transparent and understandable.

In conclusion, AI is not just a tool but a new frontier in design. By embracing it, we can create more personalized, efficient, and engaging user experiences. Let's embark on this journey together, continuously learning and adapting to ensure technology serves humanity in the most beneficial ways.

Resources to Refer :

Generative AI at SAP: https://open.sap.com/courses/genai1
AI Ethics at SAP: https://open.sap.com/courses/aie1-1
Designing for Generative AI: https://experience.sap.com/internal/fiori-design-web/generative-ai-design/
Designing for Intelligent Systems: https://experience.sap.com/internal/fiori-design-web/designing-intelligent-systems/
SAP AI community file: https://www.figma.com/file/SJh2Eb5KrBIbaK3Bo2wqtq/AI-UX-Pattern-Community-Explorations?type=design&node-id=737-200772&mode=design
AI Design Community Teams Channel
AI Design Community Exploration File
Prompt Pattern Presentation Figjam

SAP Business AI : Infuse AI in applications using SAP BTP (with some Use Cases)

2024-02-18T04:43:44.423000+01:00

Introduction

The primary focus of this article is to lucidly present a broad classification of SAP Business AI, followed by an in-depth explanation on how to infuse AI into applications using SAP Business Technology Platform (BTP). The spotlight will be on the AI/ML services within SAP BTP.

The article features some use cases drawn from my own projects. These examples are intended to illustrate the practical application of various technologies within SAP BTP, showing how they can be synergistically utilized to create end-to-end applications that are infused with AI.

Agenda

Broad classification of SAP Business AI & Introduction to various Services.
SAP BTP AI : AI services & solutions in SAP BTP to build applications infused with AI.
Example Use Cases
- Semantic Search App & RAG (Retrieval Augmented Generation example).
- Zero shot classification with LLM's
- AI Quiz App using LLM's

Broad classification of SAP Business AI

SAP Business AI can be broadly classified into following categories :

Embedded AI : AI Integrated into business processes and solutions. SAP business AI includes AI solutions & features infused in S4HANA (Finance , Sales & Service), ARIBA( procurement), SAP Digital Assistant for Cx ( marketing & commerce) , SAP Success Factors(HR), SAP Transport Management (Supply Chain), SAP Extended Warehouse Management (Supply Chain) and Cross Functional use cases with "Just Ask" at SAP Analytics cloud and many more.
SAP BTP AI : Build custom AI solutions and integrate them to your SAP landscape. BTP AI solutions Infuse artificial intelligence into your apps with scalability and responsibility in mind by enabling consumption of ready-to-use AI Models , training & deployment of custom AI models, using Enterprise Process Automation, SAP's Foundation Models, AI through SAP Partnerships : IBM Watson, Google Cloud, Data Robot, Microsoft OpenAI etc. SAP combines the power of generative AI with the context of business data and business processes and applies fine-tuning and prompt engineering at scale.
SAP AI Services : Models Pretrained on Business Data, access pretrained AI capabilities e.g. Document information extraction, Personalized recommendation and Data attribute recommendation.
Ecosystem AI : Procure best-of-breed AI solutions from SAP's ecosystem.

SAP BTP AI

We will now focus on SAP BTP AI foundational platform & services. Let's look at the foundational platform and services available to customers & partners today to infuse AI into applications.

SAP AI Services

Access Pretrained models as APIs for Document Information Extraction (Dox), Personalized Recommendation services, Data Attribute recommendations & Machine translation services etc. These pretrained models from SAP are trained on Business data and can be infused in BTP Applications to perform task like extracting Text from Images using OCR, retrieving master data from Invoices (e.g. pdf etc) or building custom recommendation services or machine translation services.

How can this help partners and customers to build AI applications ?

Partners and customers can consume SAP AI services API for DoX & recommendation services etc in custom BTP apps ( No code/ Low code apps plus Pro code apps) , to create side by side AI applications or extentions to existing exterprise applications.

Document Information Extraction Video

Documentation

SAP BTP Generative AI hub

SAP BTP Generative AI hub is available on SAP AI Launchpad/ SAP AI Core on BTP and provides a central place for accessing Large Language models from different providers such as GPT-4 by Azure OpenAI or OpenSource Falcon-40b .

How can this help partners and customers to build AI applications ?

The generative AI hub provides tooling for prompt engineering, experimentation, and other capabilities to accelerate the development of BTP applications infused with generative AI, in a secure and trusted way.
Partners and customers can build AI Applications on BTP using the Gen AI hub based access to LLM models as an API from SAP AI Core – Reference Architecture

Generative AI Hub – Out now! : Includes a Demo of Gen AI Hub.

SAP HANA Cloud Vector Engine (GA)

The world of Gen AI / LLM’s has lot of mention on vector data (embeddings) , which is simply put numerical representation of text, voice, Image data etc. With this vector representation of text, voice & images, use cases like similarity search or contextual search or Retrieval Augment Generation (RAG) are prevalent, wherein similar documents matching a user query are pulled from a vector data store , before being fed to a LLM for a contextual response.

Such application development can now happen in BTP for enterprise applications using the Reference architecture above which includes Gen AI Hub, HANA Vector Engine etc.

Video Vector Engine

Below is a 2D representation of embeddings generated on simple text like dog, cat, kitten, man,woman and SAP BTP etc. Words similar in meaning are clustered together.

I have created a Google colab notebook for basic understanding of these NLP terms including :

Transformers & self Attention mechanism
Word Embeddings
Similarity search
Retrieval Augmented Generation (RAG)
Interaction with PDF documents

SAP BTP AI Core/ AI Launchpad

This is the foundational technology stack for orchestrating and deploying custom AI/ML applications on BTP. AI Core/Launchpad allows you to incorporate your GIT Code repository along with config.yaml file containing the templatized parameters for your ML/AI scenarios, Docker secret & docker image, AI core resource to use (CPU/GPU config), any input parameters etc. AI core provides an SDK to interact with the foundation where as AI launchpad provides a UI5 Frontend to configure your ML/AI scenario. You can configure, deploy & Monitor your scenarios on AI Core & AI launchpad.

Example Application : Reduce your CO2 footprint using a smart Generative AI application on SAP BTP

SAP Analytics Cloud "Just Ask" & SAP Datasphere

Just Ask(GA Q1 2024) will provide a Natural Language processing AI interface within SAP Analytics cloud allowing the ability to interact with your data models & dashboards using Natural language based queries.

Video Demo

SAP Datasphere (SAAS Datawarehouse) allows federated data access from many systems allowing Partners to use data from SAP Business applications ( S/4, Success Factors, Ariba etc) or Non SAP Business applications with other datasets on varied on premise/cloud environements to produce powerful Business reporting applications in SAP Analytics cloud, which now also provides inbuilt Gen AI capability with Just Ask.

Below discovery mission includes details on how customers & partners can use data from AWS, Azure, and Google Cloud with the federation capabilities of SAP Datasphere.

Discovery Mission : Explore your Hyperscaler data with SAP Datasphere

Video : SAP Datasphere

Try & Experience SAP Datasphere

SAP Build Code

Build Code will provide “Joule” assistant for faster application development by generating code, data models & by providing Guided experience for developers.

Example Use Cases

Semantic search app

Following is a sample search application deployed on Cloud foundry using Python Build pack. A csv of all documents uploaded to sharepoint (or sap workzone) was extracted (typically using the "Reports" section which extracts the list of docs, url, num of clicks, Title, decription etc). This is then processed in python and the Text from Title & description is merged and tokenized. The tokenized text for all docs is then converted into a TF-IDF Matrix (for Term based search which is the default option in below app screen) and also converted to Vector Embeddings using Google BERT model for contextual search.

When a user puts in a query its processed to removed white spaces , special characters etc, tokenized and then compared with the existing Matrices for TF-IDF or Embeddings using Cosine Similarity to retrieve the top matching documents.

The missing piece in the architecture below is a Vector database to store the text embeddings as a persistent layer. I have used a FAISS in memory index which is loaded in memory when the app starts up. For full fledged guidance from SAP on such architectures refer the Reference Architecture.

To understand what contextual search means here let's look at an example, Let's say a user searches for "Aviation Industry relared demo & presentation content" , the results will include all documents which are close in the embedding space example Demos related to Airlines, Flights , Airport Management, Airport ground operation management etc.

BTP Solution Components

AI core / BTP CF Runtime python buildpack(Python API), docker image

AI Launchpad : ML Ops , orchestration and monitoring of deployed API.

SAP Build Apps : Frontend

SAP BTP Object Store : CSV extracts store

HANA Cloud Vector database : Text embeddings / similarity search

Production Architecture with RAG

In this reference architecture pattern, learn how to seamlessly combine various Large Language Models (LLMs) using SAP AI Core. Maximize the potential of LangChain in CAP model and implement advanced methods such as custom schema-based output parsing or Retrieval Augmented Generation (RAG) with embeddings and a vector database to further enhance the benefits for your specific needs. This reference architecture accommodates both Cloud Foundry and Kyma runtimes, providing adaptability in your endeavor to leverage GenAI on SAP BTP.

Retrieval Augmented Generation with GenAI on SAP BTP

Zero Shot Classification using LLM

Following is another example wherein I tried Zero Shot classification using a GPT model orchestrated using SAP Gen AI Hub ( SAP AI Core/Launchpad)

Here users are using SAP Analytics cloud to visualize the Average Co2 on a dashboard based on an analysis of their spend at various vendors/retails and across various transaction descriptions. As there are too many descriptions the dashboard is not very intuitive, we then call AI Core GPT proxy to categorize transactions to fixed Categories like Retail,Accomodation, Food etc. using a prompt like below :

PROMPT

Here we are asking the GPT model to assign close to 66 transaction descriptions into 13 categories, the Index of the data will be preserved and will be later used to create the mapping in the data model.

Categorize the numerized descriptions below into the 13 categories ['Agricultural Services' 'Contracted Services' 'Airlines' 'Car Rental'
 'Lodging' 'Transportation Services' 'Utility Services'
 'Retail Outlet Services' 'Clothing Stores' 'Miscellaneous Stores'
 'Business Services' 'Professional Services and Membership Organizations'
 'Government Services'].

Display the result in a table with the format "| INDEX OF DESCRIPTION | CATEGORY |".
Descriptions:
| 0 | Meat Provisioners – Freezer and Locker |
| 1 | Grocery Stores, Supermarkets |
| 2 | Cable and other pay television (previously Cable Services) |
| 3 | Digital Goods: Media, Books, Movies, Music |
| 4 | Financial Institutions – Manual Cash Disbursements |
| 5 | Sporting Goods Stores |
| 6 | Service Stations ( with or without ancillary services) |
| 7 | Financial Institutions – Merchandise and Services |
| 8 | Eating places and Restaurants |
| 9 | Household Appliance Stores |
| 10 | Drugs, Drug Proprietors, and Druggist’s Sundries |
| 11 | Glass, Paint, and Wallpaper Stores |
| 12 | Petroleum and Petroleum Products |
| 13 | Taxicabs and Limousines |
| 14 | Book Stores |
| 15 | Drug Stores and Pharmacies |
| 16 | Candy, Nut, and Confectionery Stores |
| 17 | Veterinary Services |
| 18 | Miscellaneous and Specialty Retail Stores |
| 19 | Money Orders – Wire Transfer |
| 20 | Shoe Stores |
| 21 | Lodging – Hotels, Motels, Resorts, Central Reservation Services (not elsewhere classified) |
| 22 | Family Clothing Stores |
| 23 | Miscellaneous Home Furnishing Specialty Stores |
| 24 | Fast Food Restaurants |
| 25 | Caterers |
| 26 | Local/Suburban Commuter Passenger Transportation – Railroads, Feries, Local Water Transportation. |
| 27 | Misc. Food Stores – Convenience Stores and Specialty Markets |
| 28 | Legal Services and Attorneys |
| 29 | Men’s Women’s and Children’s Uniforms and Commercial Clothing |
| 30 | Civic, Fraternal, and Social Associations |
| 31 | Department Stores |
| 32 | Orthopedic Goods Prosthetic Devices |
| 33 | Telecommunications Equipment including telephone sales |
| 34 | Nurseries – Lawn and Garden Supply Store |
| 35 | Package Stores – Beer, Wine, and Liquor |
| 36 | Men’s and Women’s Clothing Stores |
| 37 | Computers, Computer Peripheral Equipment, Software |
| 38 | Dairy Products Stores |
| 39 | Membership Organizations ( Not Elsewhere Classified) |
| 40 | Equipment Rental and Leasing Services, Tool Rental, Furniture Rental, and Appliance Rental |
| 41 | Watch, Clock, Jewelry, and Silverware Stores |
| 42 | Travel Agencies and Tour Operations |
| 43 | Laundry, Cleaning, and Garment Services |
| 44 | Commercial Footwear |
| 45 | Card Shops, Gift, Novelty, and Souvenir Shops |
| 46 | Digital Goods: Games |
| 47 | Hardware Stores |
| 48 | Cigar Stores and Stands |
| 49 | Bakeries |
| 50 | Membership Clubs (Sports, Recreation, Athletic), Country Clubs, and Private Golf Courses |
| 51 | Recreation Services (Not Elsewhere Classified) |
| 52 | Women’s Ready-to-Wear Stores |
| 53 | Women’s Accessory and Specialty Shops |
| 54 | Tax Payments |
| 55 | Management, Consulting, and Public Relations Services |
| 56 | Drinking Places (Alcoholic Beverages), Bars, Taverns, Cocktail lounges, Nightclubs and Discotheques |
| 57 | News Dealers and Newsstands |
| 58 | Ambulance Services |
| 59 | Industrial Supplies, Not Elsewhere Classified |
| 60 | Computer Software Stores |
| 61 | Motion Picture Theaters |
| 62 | Cosmetic Stores |
| 63 | Electronic Sales |
| 64 | Hospitals |
| 65 | Miscellaneous Publishing and Printing |
| 66 | Wholesale Clubs |

RESULTS:

As you see in the results 66 Indexes are now assigned to 13 Categories or transaction descriptions:

This classification is based on the provided categories and the details of the descriptions:

| INDEX OF DESCRIPTION | CATEGORY |
|----------------------|-------------------------------------|
| 0 | Retail Outlet Services |
| 1 | Retail Outlet Services |
| 2 | Utility Services |
| 3 | Retail Outlet Services |
| 4 | Financial Institutions |
| 5 | Retail Outlet Services |
| 6 | Retail Outlet Services |
| 7 | Business Services |
| 8 | Retail Outlet Services |
| 9 | Retail Outlet Services |
| 10 | Retail Outlet Services |
| 11 | Retail Outlet Services |
| 12 | Retail Outlet Services |
| 13 | Transportation Services |
| 14 | Retail Outlet Services |
| 15 | Retail Outlet Services |
| 16 | Retail Outlet Services |
| 17 | Professional Services and Membership Organizations |
| 18 | Miscellaneous Stores |
| 19 | Financial Institutions |
| 20 | Clothing Stores |
| 21 | Lodging |
| 22 | Clothing Stores |
| 23 | Retail Outlet Services |
| 24 | Retail Outlet Services |
| 25 | Retail Outlet Services |
| 26 | Transportation Services |
| 27 | Retail Outlet Services |
| 28 | Professional Services and Membership Organizations |
| 29 | Clothing Stores |
| 30 | Professional Services and Membership Organizations |
| 31 | Retail Outlet Services |
| 32 | Retail Outlet Services |
| 33 | Retail Outlet Services |
| 34 | Agricultural Services |
| 35 | Retail Outlet Services |
| 36 | Clothing Stores |
| 37 | Retail Outlet Services |
| 38 | Retail Outlet Services |
| 39 | Professional Services and Membership Organizations |
| 40 | Contracted Services |
| 41 | Retail Outlet Services |
| 42 | Professional Services and Membership Organizations |
| 43 | Professional Services and Membership Organizations |
| 44 | Clothing Stores |
| 45 | Retail Outlet Services |
| 46 | Retail Outlet Services |
| 47 | Retail Outlet Services |
| 48 | Retail Outlet Services |
| 49 | Retail Outlet Services |
| 50 | Professional Services and Membership Organizations |
| 51 | Recreation Services |
| 52 | Clothing Stores |
| 53 | Clothing Stores |
| 54 | Government Services |
| 55 | Professional Services and Membership Organizations |
| 56 | Retail Outlet Services |
| 57 | Retail Outlet Services |
| 58 | Professional Services and Membership Organizations |
| 59 | Retail Outlet Services |
| 60 | Retail Outlet Services |
| 61 | Recreation Services |
| 62 | Retail Outlet Services |
| 63 | Retail Outlet Services |
| 64 | Professional Services and Membership Organizations |
| 65 | Retail Outlet Services |
| 66 | Retail Outlet Services |

Note: There were some descriptions that didn't fit perfectly into any of the provided categories. For these, the closest match was chosen based on the context. For example, "Financial Institutions – Manual Cash Disbursements" and "Financial Institutions – Merchandise and Services" were assigned to "Financial Institutions", which is not present in the provided categories but seems the most appropriate based on the description.

The AI Core code calls GPT to map the transaction descriptions to the outcome categories & the results are then processed and saved back in the HANA cloud database. The SAC live connection then pulls the updated records in a new simplified dashboard.

Below is a simpler dashboard with lesser categories after update from GPT :

BTP Solution Components

AI core & Gen AI Hub : App logic in Python in a docker image(GPT call / prompt management)

AI Launchpad : ML Ops , operations monitoring of deployed API.

SAP Analytics Cloud : Frontend

AI QUIZ APP

Following is a sample quiz application which uses Gen AI to create Multiple Choice Question & Answers , using product specific or organization specific scenarios. Quizes can be generated in JSON format using a LLM and then served using a Frontend application.

PROMPT

In the prompt we have provided a JSON template for the response, GPT model will return question/answer pair in this template , making it easier to directly use this in the frontend application.

using the below text as context : 

"
What Is Edge Integration Cell
Edge Integration Cell is an optional hybrid integration runtime offered as part of SAP Integration Suite, which enables you to manage APIs and run integration scenarios within your private landscape.
The hybrid deployment model of Edge Integration Cell enables you to:
* Design and monitor your integration content in the cloud.
* Deploy and run your integration content in your private landscape.
To learn more about how hybrid deployment works using Edge Integration Cell, see Hybrid Deployment Using Edge Integration Cell.
To compare hybrid deployment with the standard deployment model of SAP Integration Suite, see Standard Deployment in the Cloud.
Edge Integration Cell supports the following use cases:
* Security or compliance use cases In many enterprises, sensitive data must be managed and controlled inside the enterprise's firewall. In addition, there could be strict architectural restrictions and you want to keep your data within your private landscape.
* Getting a migration path for SAP Process Integration customers SAP Process Integration is used by many customers for ground-to-ground integration scenarios. Edge Integration Cell allows these customers to benefit from the newest innovations that come with SAP Integration Suite, with the option to design and monitor integration content in the cloud, and deploy and run their scenarios exclusively in their private landscapes. Note If you want to use integration flows to integrate your on-premise applications, you can use SAP Process Orchestration in addition to SAP Integration Suite. SAP Integration Suite comes with a set of runtime profiles that make sure that the integration flow editor allows you to design only those features that are supported by the runtime components of a specific SAP Process Orchestration release. For more information, see Runtime Profiles. However, this option requires you to install a separate product, SAP Process Orchestration, for processing and monitoring messages. Using Edge Integration Cell, you can manage all tasks – from integration design up to processing and monitoring the scenario – within one service: SAP Integration Suite. 
In both cases, we recommend using a hybrid integration approach for enterprise-wide connectivity.
For more information about the supported features and limitations of Edge Integration Cell, see Edge Integration Cell Runtime Scope.
"

Generate five multiple choice question answer set output as a single JSON code, using below template , the link property of the JSON should default to the url: “link":"https://help.sap.com/docs/integration-suite/sap-integration-suite/what-is-sap-integration-suite-edge-integration-cell”

{
            "type": "multiple",
            "difficulty": "easy",
            "category": “SAP BTP - Edge”,
            "question": “What is fundamental technology behind ChatGPT”,
            "correct_answer": “Generative AI”,
            "incorrect_answers": [
                "Its cool",
                "Ethanol",
                "Formaldehyde"
            ],
            "link":""

Test the Response using different LLM models on Gen AI Hub , here we have the opportunity to maintain our prompts.

The generated questions JSON can be saved in a GIT repo and exposed as an API using Python Flask/Fast API on cloud foundry buildpack. Frontend Application here is built on React. Here is a reference Quiz Open source App which can be customized to read from a Private API instead of Open Trivia Database : GIT

BTP Solution Components

AI core & Gen AI Hub : App logic in Python in a docker image(GPT call / prompt management)

AI Launchpad : ML Ops , operations and monitoring of deployed API.

React App: FrontendDeployed on Cloud foundry Node buildpack.

Conclusion

SAP BTP has a powerful platform to quickly build highly scalable applications and further infuse Gen AI using the technologies & products that we explored in this article. I'm happy to collaborate and provide more inputs on any of the applications I have presented in the use cases section.<Please note that these were self initiated projects for personal learning and development >.

Unlocking the Power of SAP AI Core: Building a RAG Pipeline with LlamaIndex

2024-02-26T19:03:06.124000+01:00

Image generated by Author

Introduction: In the ever-evolving landscape of AI and language models, the year 2023 marked a significant focus on the capabilities of Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) architectures. As the year came to a close, I seized the opportunity during the holidays to delve into this cutting-edge field. This blog recounts my journey of building a sophisticated RAG pipeline, leveraging technologies like LlamaIndex, ChromaDB, HuggingFace’s Zephyr-7b (a fine-tuned version of mistralai/Mistral-7B-v0.1 ), and deploying the entire system on SAP Business Technology Platform (BTP).

How it All Began:

My exploration began with the learnings I had in the realms of LLM and Generative AI. LlamaIndex, a potent data framework connecting domain data to LLMs, caught my attention. Learning the intricacies of executing data ingestion pipelines, generating embeddings, and storing them in a vector database, I decided to extend this knowledge by building and deploying it on SAP AI Core.

Challenges and Solutions:

While embarking on this endeavor, I encountered a hurdle: the anticipation of the vectorized SAP HANA DB release in Q1 2024. Undeterred, I scoured for an open-source vector database and discovered ChromaDB — an influential, user-friendly solution with an active online community. The next puzzle piece was finding a free-to-use LLM, and I stumbled upon HuggingFace platform.

In a corporate use case where an SFTP site contains diverse data types, such as text-based PDFs, image-based PDFs, images, and plain text files. The goal is to ingest this data into the RAG pipeline, create vector embeddings, store them in a vector database, and build an index for efficient search and retrieval. The process flow is visualized in the diagram below:

Image credit : https://blog.griddynamics.com/retrieval-augmented-generation-llm/

Basic RAG implementation involves 4 steps:
If you are already familiar with RAG techniques, you can jump to Build & deployment steps for SAP BTP and AI Core

Step 1 : Loading documents — In our case we will be loading the documents from SFTP site

Step 2 : Parsing Documents into Text Chunks (Nodes) — Split the document into text chunks, which are called “Nodes” in LlamaIndex, where we define the chuck size as 512 in the embedding model.

        sentence_node_parser = SentenceWindowNodeParser.from_defaults(
            window_size=3,
            window_metadata_key="window",
            original_text_metadata_key="original_text"
        )
        nodes = sentence_node_parser.get_nodes_from_documents(documents)

Step 3 : Selection of embedding model and LLM

The embedding model is used to create vector embeddings for each of the nodes. Here we are calling the microsoft/mpnet model from Hugging Face. This model tops the leader board in terms of overall performance for embeddings.
LLM: User query and the relevant text chunks are fed into the LLM so that it can generate answers with relevant context.

embed_model = HuggingFaceEmbedding(model_name=”sentence-transformers/all-mpnet-base-v2", max_length=512)

Step 4. Create Index, retriever, and query engine

Index, retriever, and query engine are three basic components for asking questions about your data or documents:

Index is a data structure that allows us to retrieve relevant information quickly for a user query from external documents. The Vector Store Index takes the text chunks/Nodes and then creates vector embeddings of the text of every node, ready to be queried by an LLM.

ctx_sentence = ServiceContext.from_defaults(llm=llm, embed_model=embed_model, node_parser=sentence_node_parser)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
sentence_index = VectorStoreIndex(nodes, service_context=ctx_sentence, storage_context=storage_context)

For retrieval we will be using an advanced method sentence window retrieval : To achieve an even more fine-grained retrieval, instead of using smaller child chunks, we can parse the documents into a single sentence per chunk.
In this case, single sentences will be similar to the “child” chunk concept. The sentence “window” (5 sentences on either side of the original sentence) will be similar to the “parent” chunk concept. In other words, we use the single sentences during retrieval and pass the retrieved sentence with the sentence window to the LLM. Details of this technique can be visualized better with this diagram.

Create the sentence window node parser

# create the sentence window node parser w/ default settings
node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)
sentence_nodes = node_parser.get_nodes_from_documents(docs)
sentence_index = VectorStoreIndex(sentence_nodes, service_context=service_context)

Create a query engine

When we create the query engine, we can replace the sentence with the sentence window using the MetadataReplacementPostProcessor, so that the window of the sentences get sent to the LLM.

query_engine = sentence_index.as_query_engine(
    similarity_top_k=2,
    # the target key defaults to `window` to match the node_parser's default
    node_postprocessors=[
        MetadataReplacementPostProcessor(target_metadata_key="window")
    ],
)
window_response = query_engine.query(
    "Can you tell me about the key concepts for supervised finetuning"
)
print(window_response)

Build & deployment steps for SAP BTP and AI Core

Now with this understanding lets build the RAG pipeline using LlamaIndex and run this pipeline on SAP AI core.
Pre requisites and setup:

I have set up a SFTP server on BTP Kyma runtime, you can use any other SFTP server as well.
Set up a ChromaDB using Docker image and deploy the docker image on Kyma runtime using below YAML file. Other deployment options

docker pull chromadb/chroma
docker tag chroma <your_username>/chroma:latest
docker push <your_username>/chroma:latest

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-chroma-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-chroma-app
  template:
    metadata:
      labels:
        app: my-chroma-app
    spec:
      containers:
      - name: my-chroma-app
        image: your-docker-username/my-chroma-image:latest
        ports:
        - containerPort: 80

Create a Load balancer service for this deployment

apiVersion: v1
kind: Service
metadata:
  name: your-service
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 8080
  selector:
    app: your-app

and lastly create a DNS entry on top of the service to access the ChromaDB using host and port details over internet.

3. I have subscribed to the SAP AI Core standard edition for advanced memory usage in running LLM.

4. Now, let’s explore the code for building the RAG pipeline, including data ingestion from SFTP and loading the HuggingFace model zephyr-7b-beta model into SAP AI core

In the VS code make a directory

mkdir zephyr-on-ai-core

Code to build a Docker Image for custom GPU which will be required to load the transformer model, I have the docker file named as Dockerfile.customgpu

FROM python:3.11.6 AS base

WORKDIR /serving
COPY requirements.txt requirements.txt

RUN apt update
RUN git clone https://github.com/huggingface/transformers
RUN pip3 install -r requirements.txt

RUN apt-get update \
  && apt-get -y install tesseract-ocr
RUN  apt-get install -y poppler-utils

FROM base as final


ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8

RUN export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/local/cuda-10.0/targets/x86_64-linux/lib:/usr/local/cuda-10.2/targets/x86_64-linux/lib:/usr/local/cuda-11/targets/x86_64-linux/lib:/usr/local/cuda-11.6/targets/x86_64-linux/lib/stubs:/usr/local/cuda-11.6/compat:/usr/local/cuda-11.6/targets/x86_64-linux/lib
RUN export PATH=$PATH:/usr/local/cuda-11/bin

# Required for huggingface
RUN mkdir -p /nonexistent/
RUN mkdir -p /transformerscache/

RUN chown -R 1000:1000 /nonexistent

RUN chmod -R 777 /nonexistent
RUN chmod -R 777 /transformerscache

# Create the directory and set permissions
RUN mkdir -p /serving/data
RUN chown -R 1000:1000 /serving/data
RUN chmod -R 777 /serving/data

# Create the directory and set permissions
RUN mkdir -p /serving/data/temp
RUN chown -R 1000:1000 /serving/data/temp
RUN chmod -R 777 /serving/data/temp

# Create the directory and set permissions
RUN mkdir -p /usr/local/lib/python3.11/site-packages/llama_index/download/llamahub_modules
RUN chown -R 1000:1000 /usr/local/lib/python3.11/site-packages/llama_index/download/llamahub_modules
RUN chmod -R 777 /usr/local/lib/python3.11/site-packages/llama_index/download/llamahub_modules


ENV TRANSFORMERS_CACHE=/transformerscache

COPY /serving /serving

CMD ["uvicorn", "app:api", "--host", "0.0.0.0", "--port", "8080"]

Create a requirements.txt to manage all the dependencies as below

transformers==4.36.2
tokenizers>=0.13.3
--find-links https://download.pytorch.org/whl/torch_stable.html
torch
fastapi
uvicorn
accelerate
huggingface_hub
onnxruntime
requests
chromadb==0.4.9
llama-index
sentence-transformers==2.1.0
PyPDF2==1.26.0
pysftp
# pytesseract and its dependencies
pillow
pytesseract
PyMuPDF
# pdf2image and its dependencies
pdf2image

Now let’s create a subfolder template and in this folder create a transformers.yaml with below code

apiVersion: ai.sap.com/v1alpha1
kind: ServingTemplate
metadata:
  name: transformers
  annotations:
    scenarios.ai.sap.com/description: "zephyr"
    scenarios.ai.sap.com/name: "zephyr"
    executables.ai.sap.com/description: "zephyr"
    executables.ai.sap.com/name: "zephyr"
  labels:
    scenarios.ai.sap.com/id: "zephyr"
    ai.sap.com/version: "1.0"
spec:
  template:
    apiVersion: "serving.kserve.io/v1beta1"
    metadata:
      annotations: |
        autoscaling.knative.dev/metric: concurrency
        autoscaling.knative.dev/target: 1
        autoscaling.knative.dev/targetBurstCapacity: 0
      labels: |
        ai.sap.com/resourcePlan: infer.l
    spec: |
      predictor:
        imagePullSecrets:
          - name: dockerpk
        minReplicas: 1
        maxReplicas: 5
        containers:
        - name: kserve-container
          image: docker.io/purankhoeval/zephyr:01
          ports:
            - containerPort: 8080
              protocol: TCP

Create a another folder called serving and inside this we are going to create two files app.py and worker_model.py
In the app.py we are creating API definition using FastAPI and this API has entry points defined for each operation such as Data Ingestion and Query search
In the worker_model.py we are implementing the handling functions of data ingestion, query search and results retrieval
Data ingestion for RAG is a multistep process. It starts with reading the content from SFTP server directory, I have kept 3 files for this scenario, a remittance image, a text based pdf of 20 pages which is a subset of this document and one more 46 pages image based pdf which has remittance details. These documents will be loaded using LlamaIndex library functions and split into smaller chunks i.e. nodes in our case.

# Code for worker_model.py
import torch
import transformers
import traceback  # Add this import
import sys, os
import pysftp
import huggingface_hub
from pprint import pprint
from llama_index.node_parser import SentenceWindowNodeParser
from llama_index.embeddings import HuggingFaceEmbedding
from llama_index.schema import MetadataMode
from llama_index.postprocessor import MetadataReplacementPostProcessor
from llama_index.vector_stores import ChromaVectorStore,VectorStoreQuery
from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    ServiceContext,
    Document
)
from llama_index.llms import HuggingFaceInferenceAPI
from llama_index.embeddings import HuggingFaceEmbedding

import chromadb
from chromadb.utils import embedding_functions
from llama_index import download_loader
from pathlib import Path
import pytesseract
import pdf2image
from pdf2image import convert_from_path
import fitz

transformers.utils.logging.set_verbosity_error()
transformers.utils.logging.disable_progress_bar()

os.environ["TRANSFORMERS_CACHE"] = "shared/IMR/llm2023/cache"

def download_pdf_from_sftp(sftp_host, sftp_username, sftp_password, sftp_port, remote_path, local_path):
    cnopts = pysftp.CnOpts()
    cnopts.hostkeys = None  # Disable host key checking (not recommended for production)

    with pysftp.Connection(sftp_host, username=sftp_username, password=sftp_password, port=sftp_port, cnopts=cnopts) as sftp:
        # Create the local directory if it doesn't exist
        os.makedirs(os.path.dirname(local_path), exist_ok=True)
        # sftp.get(remote_path, local_path)
        remote_files = sftp.listdir(remote_path)

        # Download each file individually
        for remote_file in remote_files:
            remote_file_path = os.path.join(remote_path, remote_file)
            local_file_path = os.path.join(local_path, remote_file)
            sftp.get(remote_file_path, local_file_path)

# SFTP details
sftp_host = '<sftp_host>'
sftp_port = '<port>'
sftp_username = '<user>'
sftp_password = '<password>'
remote_pdf_path = '/upload/'
local_pdf_path = './data/'

# Download PDF from SFTP
download_pdf_from_sftp(sftp_host, sftp_username, sftp_password, sftp_port, remote_pdf_path, local_pdf_path)

ImageReader = download_loader("ImageReader")

# Use the model_kwargs to pass options to the parser function
loader = ImageReader(text_type="plain_text")


image_paths = []
documents = []

def is_text_based_pdf(pdf_path):
    try:
        # Open the PDF file
        pdf_document = fitz.open(pdf_path)

        # Iterate through each page and check for text
        for page_number in range(pdf_document.page_count):
            page = pdf_document[page_number]
            text = page.get_text()

            # If text is found on any page, it's likely a text-based PDF
            if text.strip():
                return True

        # No text found on any page, it might be an image-based PDF
        return False

    except Exception as e:
        # Handle exceptions (e.g., if the PDF is encrypted or malformed)
        print(f"Error checking PDF: {e}")
        return False

def process_pdf_file(pdf_path):
    is_text_based = is_text_based_pdf(pdf_path)

    # Check if the PDF is text-based or image-based
    if is_text_based:
        directory_reader = SimpleDirectoryReader(input_files=[pdf_path])

        # Load data from the specified file path
        documentspdf = directory_reader.load_data()

        # Create a llamaindex Document from ImageDocument
        doc1 = documentspdf[0]
        doc1 = Document(doc_id=doc1.id_, text=doc1.text, metadata=doc1.metadata)
        documents.append(doc1)
        doc1 = []
    else:
        print("The PDF is image-based.")

        # Convert the PDF to images
        
        images = convert_from_path(pdf_path)

        # Save each image to a file and load as ImageDocuments
        for i, image in enumerate(images):
            image_path = Path(f"./data/temp/page_{i}.png")
            image.save(image_path)
            image_paths.append(image_path)
            doc = loader.load_data(file=image_path)
            documents.extend(doc)

# Process files in the directory
def process_files_in_directory(directory_path):
    # Iterate through files in the directory
    for filename in os.listdir(directory_path):
        file_path = os.path.join(directory_path, filename)

        # Check file extension
        _, file_extension = os.path.splitext(filename)

        # Call the appropriate function based on the file type
        if file_extension.lower() in ['.jpg', '.jpeg', '.png']:
            local_image_path = "./data/remit.png"
            ImageReader = download_loader("ImageReader")
            # Use the model_kwargs to pass options to the parser function
            loader = ImageReader(text_type="plain_text")
            documentsimg = loader.load_data(file_path)
            documents.extend(documentsimg)
        elif file_extension.lower() == '.pdf':
            process_pdf_file(file_path)

class Model:
    generator = None

    @staticmethod
    def setup():
        """model setup"""
        print("START LOADING SETUP ZEPHYR 7B", file=sys.stderr)
        
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

        model_name = "HuggingFaceH4/zephyr-7b-beta"
        
        HUB_TOKEN = "<hub_token>"
        huggingface_hub.login(token=HUB_TOKEN)

        llm = HuggingFaceInferenceAPI(
            model_name="HuggingFaceH4/zephyr-7b-beta", token=HUB_TOKEN
        )
        print("SETUP DONE", file=sys.stderr)

    @staticmethod
    def predict(prompt, args):
        """model setup"""
        return Model.generator(prompt, args) 
    
    @staticmethod
    def query(question):
        print("Question:", question)
        # sentence_transformer_ef = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="all-mpnet-base-v2")
        # Set up ChromaDB client and collection
        chroma_host = "<chroma host on kyma>"
        chroma_port = 8000
        chroma_client = chromadb.HttpClient(host=chroma_host, port=chroma_port)

        # chroma_client = chromadb.PersistentClient(path='./sentence_index')
        print('HEARTBEAT:', chroma_client.heartbeat())
        chroma_collection_name = "multidoc" 
        chroma_collection = chroma_client.get_collection(name=chroma_collection_name)
                                                        #  , embedding_function=sentence_transformer_ef)

        HUB_TOKEN = "<your token>"
        huggingface_hub.login(token=HUB_TOKEN)

        llm = HuggingFaceInferenceAPI(
            model_name="HuggingFaceH4/zephyr-7b-beta", token=HUB_TOKEN
        )

        embed_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-mpnet-base-v2", max_length=512)

        # set up ChromaVectorStore and load in data
        vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
        ctx_sentence = ServiceContext.from_defaults(llm=llm, embed_model=embed_model)
        retrieved_sentence_index = VectorStoreIndex.from_vector_store(vector_store=vector_store, service_context=ctx_sentence)

        sentence_query_engine = retrieved_sentence_index.as_query_engine(
            similarity_top_k=5,
            verbose=True,
            # the target key defaults to `window` to match the node_parser's default
            node_postprocessors=[
                MetadataReplacementPostProcessor(target_metadata_key="window")
            ],
        )

        import json
        try:
            sentence_response = sentence_query_engine.query(question)

            # Check if the result is empty
            if not sentence_response:
                result_message = {"success": False, "message": "No results found."}
            else:
                # Extract relevant information from sentence_response
                extracted_info = {"response": sentence_response.response}
                result_message = {"success": True, "results": extracted_info}

            # Print the JSON representation
            print(json.dumps(result_message))
            # Return the result_message
            return result_message
        except Exception as e:
            error_message = {"success": False, "message": f"Error during query execution: {str(e)}"}
            print(json.dumps(error_message))
            traceback.print_exc()
            sys.exit(1)

    @staticmethod
    def DataIngestion():
        print("Data Ingestion Started")
        directory_path = "./data/"
        process_files_in_directory(directory_path)
        sentence_node_parser = SentenceWindowNodeParser.from_defaults(
            window_size=3,
            window_metadata_key="window",
            original_text_metadata_key="original_text"
        )
        nodes = sentence_node_parser.get_nodes_from_documents(documents)

        HUB_TOKEN = "<your token>"
        huggingface_hub.login(token=HUB_TOKEN)

        llm = HuggingFaceInferenceAPI(
            model_name="HuggingFaceH4/zephyr-7b-beta", token=HUB_TOKEN
        )

        # sentence_transformer_ef = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="all-mpnet-base-v2")
        chroma_host = "<chroma host on kyma>"
        chroma_port = 8000
        chroma_client = chromadb.HttpClient(host=chroma_host, port=chroma_port)
        chroma_collection_name = "multidocai"
        # chroma_client.delete_collection(name=chroma_collection_name)
        chroma_collection = chroma_client.get_or_create_collection(name=chroma_collection_name)
        # embedding_function=sentence_transformer_ef)

        embed_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-mpnet-base-v2", max_length=512)
        vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

        ctx_sentence = ServiceContext.from_defaults(llm=llm, embed_model=embed_model, node_parser=sentence_node_parser)
        storage_context = StorageContext.from_defaults(vector_store=vector_store)
        sentence_index = VectorStoreIndex(nodes, service_context=ctx_sentence, storage_context=storage_context)
        # sentence_index.storage_context.persist()

if __name__ == "__main__":
    # for local testing
    Model.setup()
    print(Model.predict("Hello, who are you?", {}))

We are ready to create a docker image. In the below code replace <your_username> with the actual docker hub username

docker build -t <your_username>/zephyr:01 -f Dockerfile.customgpu .
docker push <your_username>/zephyr:01

Now push the entire project to github, we would be creating an AI application from the serving template in SAP AI Core using the github repository, and docker registry secret, detailed steps to set up the repository and how to create an AI application can be found in this article and deployment related tutorial
Once you have deployed successfully, SAP AI Core will generate an inference API URL to run the LLM predictions ( URL generation takes a bit)

You can test this inference endpoint using postman, just remember to set the bearer token for oAuth authentication using client id and client secret for the AI Core service.

5. Let’s move on to the last phase of this post to create a Next JS UI project which can call this AI inference endpoint to generate results for us. While client side is handled through Next JS, the server side requests are going to be handled by Node JS

Back to the VS code project zephyr-on-ai-core, inside this parent folder let’s create a Next JS app using the terminal command

npx create-next-app ragui
cd ragui
npm run dev

This will launch your Next.js app, and you can view it in your browser at http://localhost:3000.
Inside ragui there should be a folder app and inside that page.tsx, modify this file with the following code. We are trying to create a basic chat app container where user can input the query and trigger the response retrieval by clicking on the send message button. There is an additional button start data ingestion to trigger the data ingestion process. This will trigger the logic to read the files located on SFTP server and for each of these files we are going to create nodes using LlamaIndex library function sentence node parser.

"use client"; // This is a client component
import React, { useState, useRef, useEffect } from 'react';

interface Message {
  text: string;
  user: string;
}

interface RenderMessageTextProps {
  text: string;
}

const ChatApp = () => {
  const [messages, setMessages] = useState<Message[]>([]);
  const [newMessage, setNewMessage] = useState<string>('');
  const [dataIngestionStatus, setDataIngestionStatus] = useState('');

  const messagesContainerRef = useRef<HTMLDivElement>(null);

  const renderMessageText = ({ text }: RenderMessageTextProps) => {
    const formattedText = text.replace(/\n/g, '<br>');
    return <span dangerouslySetInnerHTML={{ __html: formattedText }} />;
  };

  const handleStartDataIngestion = async () => {
    try {
      const messageData = {};
      const response = await fetch('/api/startDataIngestion', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
        },
        body: JSON.stringify(messageData),
      });

      if (response.ok) {
        setDataIngestionStatus('Data ingestion successful');
      } else {
        setDataIngestionStatus('Data ingestion failed');
      }
    } catch (error) {
      console.error('Error starting data ingestion:', error);
      setDataIngestionStatus('Internal server error');
    }
  };

  const handlesendMessage = async () => {
    try {
      const userMessage = {
        text: newMessage,
        user: 'You',
      };

      setMessages((prevMessages) => [...prevMessages, userMessage]);
      const messageData = {
        message: newMessage,
      };

      console.log('Sending message:', newMessage);

      const response = await fetch('/api/runQuery', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
        },
        body: JSON.stringify(messageData),
      });

      if (response.ok) {
        const responseData = await response.json();
        const responseText = responseData.response?.response || '';

        setMessages((prevMessages) => [
          ...prevMessages,
          { text: responseText, user: 'AriMa' },
        ]);
      } else {
        setDataIngestionStatus('Failed to send message');
      }
    } catch (error) {
      console.error('Error sending message:', error);
      setDataIngestionStatus('Internal server error');
    }
  };

  // Function to scroll to the bottom of the messages container
  const scrollToBottom = () => {
    if (messagesContainerRef.current) {
      messagesContainerRef.current.scrollTop = messagesContainerRef.current.scrollHeight;
    }
  };

  // Scroll to the bottom on initial render and whenever messages change
  useEffect(() => {
    scrollToBottom();
  }, [messages]);

  return (
    <div className="flex flex-col items-center justify-between h-screen p-8 bg-gray-800 text-white">
      <div className="flex items-center justify-between w-full mb-8">
        <button
          onClick={handleStartDataIngestion}
          className="px-4 py-2 text-white bg-green-500 rounded-md focus:outline-none"
        >
          Start Data Ingestion
        </button>
      </div>
      <div className="flex flex-col items-center justify-end flex-1 w-full mb-8 overflow-y-auto p-4">
        <div ref={messagesContainerRef} className="flex flex-col">
          {messages.map((message, index) => (
            <div key={index} className="mb-2">
              <span className="text-gray-400">{message.user}:</span> {message.text}
            </div>
          ))}
        </div>
      </div>
      <div className="flex items-center justify-between w-full">
        <input
          type="text"
          value={newMessage}
          onChange={(e) => setNewMessage(e.target.value)}
          placeholder="Type a message..."
          className="flex-1 px-4 py-2 mr-2 text-black bg-white rounded-md focus:outline-none"
        />
        <button
          onClick={handlesendMessage}
          className="px-4 py-2 text-white bg-black rounded-md focus:outline-none"
        >
          Send Message
        </button>
      </div>
    </div>
  );
};

export default ChatApp;

Inside the app folder create another folder api and here create a folder as startDataIngestion and inside this create a file route.ts. The below code for this page api route will handle the data ingestion process

 // pages/api/data-ingestion.ts


import type { NextApiRequest, NextApiResponse } from 'next';
import axios from 'axios';
import { NextResponse } from 'next/server';
interface Message {
  role: string;
  text: string;
}

export const POST = async (request: Request, res: NextResponse) => {

    try {


      const apiUrl = '< url >/api/DataIngestion';

      const axiosRes = await axios.post(apiUrl);



      // You need to handle conversation storage or update here (use a database, state management, etc.)

      return NextResponse.json({ success: true, message: 'Data Ingestion Completed' });
    } catch (error) {
      console.error('Error with API request:', error);
      return NextResponse.json({ success: false, message: 'Internal server error' });
    }
  }

Create another folder under api as runQuery and route.ts file inside this with the below code. This code will call the Node JS API to run the SAP AI Core inference API we deployed earlier with the query we are submitting it from the UI

// pages/api/runQuery.ts
import type { NextApiRequest, NextApiResponse } from 'next';
import axios from 'axios';
import { NextResponse } from 'next/server';
interface Message {
  role: string;
  text: string;
}

export const POST = async (request: Request, res: NextResponse) => {

    try {
      const  userInput  = await request.json();

      const systemPrompt = "Answer with high precision";
      const fullPrompt = `<|system|>\n${systemPrompt}</s>\n<|user|>\n${userInput.message}</s>\n<|assistant|>`;
      // const fullPrompt = userInput.message;
      console.log(fullPrompt);
      const requestBody = {
        prompt: fullPrompt,
        args: {}
      };

      const apiUrl = '<url>/api/getResponse';

      const axiosRes = await axios.post(apiUrl, requestBody);

      const data = axiosRes.data;
      const generatedText = data;
      console.log('generated text:',generatedText);
      const newMessage: Message = {
        role: 'user',
        text: userInput,
      };
      const newResponse: Message = {
        role: 'arima',
        text: generatedText,
      };

      // You need to handle conversation storage or update here (use a database, state management, etc.)

      return NextResponse.json({ success: true, response: generatedText });
    } catch (error) {
      console.error('Error with API request:', error);
      return NextResponse.json({ success: false, message: 'Internal server error' });
    }
  }

Kudos to if you made it this far, now onto the last part, implementation of server side API handling.
Create a folder serverapi under the parent folder zephyr-on-ai-core and inside this a file server.js with below code

const express = require('express');
const axios = require('axios');
const cors = require('cors');
const request = require('request-promise');

const app = express();
const port = process.env.PORT || 3006;

app.use(cors());
// Parse JSON in the request body
app.use(express.json());
app.post('/api/getResponse', async (req, res) => {
  try {
    // Get OAuth token (replace with your actual credentials)

    const fs = require('fs');

    // Specify the path to your service key file
    const serviceKeyPath = './config.json';

    // Read the content of the file
    const serviceKeyContent = fs.readFileSync(serviceKeyPath, 'utf8');

    const serviceKey = JSON.parse(serviceKeyContent);

    // Extract client ID and client secret
    const clientId = serviceKey.clientid;
    const clientSecret = serviceKey.clientsecret;


    const token = await getOAuthToken(clientId, clientSecret);

    // Make AI API call
    const response = await axios.post(
      'https://<url>/v2/inference/deployments/d87531155571d6e8/v2/query',
      req.body,
      {
        headers: {
          Authorization: `Bearer ${token}`,
          'ai-resource-group': 'default',
          'Content-Type': 'application/json',
        },
      }
    );

    // res.json(response.data);
    // res.json(response.results);
    const responseBody = response.data.results.response;
    res.status(200).json({ success: true, response: responseBody });
  } catch (error) {
    console.error('Error with API request:', error);
    res.status(500).json({ error: 'Internal Server Error' });
  }
});

app.post('/api/DataIngestion', async (req, res) => {
    try {
      // Get OAuth token (replace with your actual credentials)
  
      const fs = require('fs');

      // Specify the path to your service key file
      const serviceKeyPath = './config.json';
      
      // Read the content of the file
      const serviceKeyContent = fs.readFileSync(serviceKeyPath, 'utf8');
      
      const serviceKey = JSON.parse(serviceKeyContent);
      
      // Extract client ID and client secret
      const clientId = serviceKey.clientid;
      const clientSecret = serviceKey.clientsecret;
  
      const token = await getOAuthToken(clientId, clientSecret);
  
      // Make AI API call
      const response = await axios.post(
        'https://<url>/v2/inference/deployments/d87531155571d6e8/v2/DataIngestion',
        req.body,
        {
          headers: {
            Authorization: `Bearer ${token}`,
            'ai-resource-group': 'default',
            'Content-Type': 'application/json',
          },
        }
      );
  
      res.status(200).json({ success: 'Data Ingestion completed' });
    } catch (error) {
      console.error('Error with API request:', error);
      res.status(500).json({ error: 'Internal Server Error' });
    }
  });
app.listen(port, () => {
  console.log(`Server is running on port ${port}`);
});

async function getOAuthToken(clientId, clientSecret) {
    const tokenUrl = '<token_url>/oauth/token';

    try {
        const credentials = Buffer.from(`${clientId}:${clientSecret}`).toString('base64');

        // Make a POST request to the token endpoint using request-promise
        const response = await request.post({
            uri: tokenUrl,
            form: {
                grant_type: 'client_credentials',
            },
            headers: {
                'Content-Type': 'application/x-www-form-urlencoded',
                'Authorization': `Basic ${credentials}`,
            },
            json: true, // Automatically parses the response body as JSON
        });

        // Check if the request was successful (status code 200)
        if (response) {
            // Access token is available in the response data
            const accessToken = response.access_token;
            return accessToken;
        } else {
            // Print the error details if the request was not successful
            console.error(`Error: ${response.status} - ${response.error_description}`);
            return null;
        }
    } catch (error) {
        // Handle exceptions, e.g., network errors
        console.error(`Error: ${error.message}`);
        return null;
    }
}

Package.json file for this node js with below code

{
    "name": "server",
    "version": "1.0.0",
    "description": "",
    "main": "server.js",
    "scripts": {
      "test": "echo \"Error: no test specified\" && exit 1"
    },
    "keywords": [],
    "author": "",
    "license": "ISC",
    "dependencies": {
      "axios": "^1.6.2",
      "cors": "^2.8.5",
      "express": "^4.18.2",
      "request-promise": "^4.2.6"
    }
  }

and finally manifest.json file as below

---
applications:
- name: aicoreapirag
  path: .
  memory: 128M
  disk_quota: 250M

Let’s push this node js api to SAP BTP cloud foundry runtime with below command

cd serverapi
cf login
cf push

Once the node js server side api is deployed successfully, we need to take the API URL from the BTP cockpit and use that URL in our frontend app code to call the server side api.

Now we are all set to push the Next JS app to cloud foundry with the below commands

cd ragui
npm run build
cf push

Let’s launch the chat app and ask it a query from the llama text pdf
Query 1: Can you tell me about the key concepts of supervised finetuning ?

Next query is from image based pdf which contained itemized remittance summary for patients
Query 2 : How much is the total Remittance Summary amount for the patient <patient name>?

It’s a very basic UI chat app which works and serves the purpose of relaying the queries to LLM running on SAP AI Core and ingesting the data into vector database. You can also test the “Start Data Ingestion”, it works..

Thank you for reading this article, Try the code yourself and let me know for any feedback, and if you enjoyed reading this article:

Advanced RAG references :

I published this article originally on Medium during the transition of SAP Community blogs

https://medium.com/@purankhoeval/unlocking-the-power-of-sap-ai-core-building-a-rag-pipeline-with-llamaindex-036c3613b8c3

Set up SAP Generative AI Hub in an SAP BTP Enterprise Account and Consume Generative AI Model

2024-03-01T21:07:06.919000+01:00

SAP Generative AI Hub is now available with the SAP AI Core extended service plan. By leveraging Generative AI Hub, SAP partners and customers can build AI-powered extensions and apps on SAP BTP.

As part of the SAP AI Core, Generative AI Hub provides access to the most common large language models (LLMs) from different providers. To integrate an LLM via Generative AI Hub into an application, there are a few important steps to follow, that will be covered in this blog:

Create an SAP AI Core service instance.
Create an LLM configuration to reference a model provider specific executable.
Use the configuration to create a deployment.
The configuration and deployment can be created either via the SAP AI Launchpad UI or using the SAP AI Core APIs.
For the created deployment, SAP AI Core provides a unique URL that can be used to access the LLM. Use this URL to infuse generative AI features into your application. Embedding the use of the URL into an application is not covered in this blog. Instead, the use of the deployment URL is demonstrated with Postman.

Prerequisites

You have an SAP BTP Enterprise Account.
You have an entitlement for SAP AI Core, Service Plan "Extended".
You have an entitlement for SAP AI Launchpad, Service Plan "Standard".
You have an entitlement and free quota for SAP BTP, Cloud Foundry Runtime.

SAP AI Core Setup

First, make sure with your global account administrator that all services listed in the Prerequisites section have service plan quota assigned to your global account.

As subaccount administrator, go to the account level, Entitlements -> Entity Assignments screen and select your subaccount. Click Edit, then Add Service Plans to configure entitlement.

Add 1 unit quota for the extended service plan then save entitlements update.

Create SAP AI Core Service Instance

Navigate to your subaccount and select Services -> Instances and Subscriptions. Click on Create. Fill out the fields as shown below and click Create.

In case you have an existing standard plan service instance, the creation will fail as shown below.

Update existing SAP AI Core service instance instead. For that, go back to Services -> Instances and Subscriptions and search for the SAP AI Core standard service plan service instance. Click on Update button illustrated below:

In the next screen, select extended plan instead of the standard plan and Update Instance.

Create Service Key

On the Instances and Subscriptions screen, find your SAP AI Core service instance and choose Create Service Key from the three dots dropdown.

Enter the name and click Create.

Download and store your service key json file locally. You will use it in the next steps.

SAP AI Launchpad Setup

As subaccount administrator, go to the account level, Entitlements -> Entity Assignments screen and select your subaccount. Click Edit, then Add Service Plans to configure entitlement.

Add 1 unit quota for the standard service plan then save entitlements update.

Subscribe to SAP AI Launchpad

SAP AI Launchpad is a SaaS application, and you need to subscribe to it. Follow these SAP Help steps to create a subscription: https://help.sap.com/docs/ai-launchpad/sap-ai-launchpad/subscribe-to-sap-ai-launchpad

Set up User Access to SAP AI Launchpad

In your subaccount go to Security -> Users and select your user. Assign role collections as required. Follow these SAP Help steps: https://help.sap.com/docs/ai-launchpad/sap-ai-launchpad/allow-access-to-sap-ai-launchpad

If you want access to Generative AI Hub features, don’t assign role collections containing “without_genai”. If you want to be editor for specific features, assign only the editor role collections, and not both, viewer and editor role collections for that feature.

Create Connection between SAP AI Launchpad and SAP AI Core

Following subscription, you’ll be able to launch SAP AI Launchpad application. Go to Instances and Subscriptions screen in your subaccount. You’ll see your SAP AI Launchpad application under the Subscriptions. Click on it to open it.

Workspaces app is used to manage connections between SAP AI Launchpad and AI runtimes such as SAP AI Core. AI runtimes provide resources to process AI activities such as training and inference.

In the Workspaces, you need to add a new AI API connection. Click on Add and fill out the fields as shown below. Under the Service Key field, click the upload symbol to upload the SAP AI Core service key json file that you stored locally as per Create Service Key section.

Click on Create and the connection will be established as shown below.

You also need to select a resource group so that you can perform ML Operations tasks. Resource groups represent a virtual collection of related resources within the scope of your AI runtime. Runtime entities such as executions, deployments, configurations, and artifacts belong to a specific resource group and are not shared across resource groups.

During subscription to SAP AI Launchpad, when your tenant is onboarded, a default resource group is automatically created. For your connection, you can select the default resource group as shown in the picture above.

Create a Configuration

To be able to deploy and consume LLMs you first need to create a configuration that will reference a model provider specific executable.

An executable is used to define training or serving pipelines for an AI use case. An executable that is used to serve (deploy) an AI model is called a serving executable.

Access to the LLMs is provided under the global AI scenario foundation-models, which is managed by SAP AI Core. Individual models are provided as executables in the form of serving templates and accessed by choosing the corresponding template for the desired model.

See models that are currently supported here:

https://help.sap.com/docs/sap-ai-core/sap-ai-core-service-guide/models-and-scenarios-in-generative-ai-hub

In SAP AI Launchpad, go to ML Operations -> Configurations and click Create.

Give your configuration a name, select the foundation-models scenario and the version, and select the serving executable.

Click Next to enter Input Parameters.

Select model name you want to deploy. See model version in the link provided above. Note that the model version can be given as latest. If the model version is not specified, it will be set to latest by default.

Click on Next -> Review -> Create to create the configuration.

When you now click on ML Operations -> Configurations on the left-hand side menu, you’ll see the created configuration.

Create a Deployment

A deployment runs a model for serving (inferencing) purposes.

A serving executable is an AI pipeline that serves or deploys a model for online predictions. As we saw in the previous section, you use a configuration to specify values for the serving executable, such as the model as an input artifact. The configuration is then used to create a deployment.

A running deployment can only reference one configuration. Multiple deployments can be created using the same configuration, resulting in separate endpoints for online predictions.

You use SAP AI Launchpad to create deployments for your runtime connection. Deployments that are implemented on an SAP AI Core runtime produce HTTPS endpoints. These endpoints can be used to access the LLM.

Following the previous step, click on the created configuration, then on Create Deployment.

Keep Duration as Standard and click on Review then Create to create the deployment.

Once the deployment created, wait for the current status to be set to RUNNING.

It could take a few minutes for the deployment to go from UNKNOWN status to RUNNING status.

Take note of the deployment URL as you will use it in the next section.

Please note that both, the configuration and the deployment can also be created by using SAP AI Core APIs. See the API Reference on the SAP Business Accelerator Hub https://api.sap.com/api/AI_CORE_API/resource/Configuration,

and https://help.sap.com/docs/sap-ai-core/sap-ai-core-service-guide/create-deployment-for-generative-ai-model-in-sap-ai-core#using-the-api for details. Use the AI_API_URL from your SAP AI Core service key as the "apiurl".

Consume Generative AI Models

Now that you have a deployment running, you can run inference requests against it, and it will return the results from the LLM. LLMs have broad applicability for a range of natural language processing tasks, including text prediction and generation, content summary, translation, content classification and categorization, conversational AI and chatbots, etc.

Models from Azure OpenAI are accessed through a private instance of the chat-completion API.
The endpoint to use is {{deploymentUrl}}/chat/completions?api-version={{api-version}}.
For more info on this API and version to use, see https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#chat-completions

With each inference request, you need to pass SAP AI Core access token and AI-Resource-Group key value in the header. To get a new token, use OAuth 2.0 client credentials grant type and find client ID, client secret and access token URL values in your SAP AI Core service key. See pictures below:

The query is sent in the body of the request, under “messages”. There are other optional parameters that you can include, and for their definitions, see https://help.sap.com/docs/sap-ai-core/sap-ai-core-service-guide/consume-generative-ai-models-using-sap-ai-core

The picture below shows the body and response for a text summarizing request. For more examples and details, see the same link above.

In conclusion, all the steps covered in this blog are prerequisites to integrating an LLM via SAP Generative AI Hub into extensions and applications built on SAP BTP, with the purpose of adding generative AI features and functionalities to your application. Embedding the use of deployment URL programmatically into your application can be at the basis of building such functionalities.

If you are a partner that needs structured support in building your solutions on SAP BTP, check out our Co-Innovated with SAP service: Co-Innovated with SAP Service | Innovation | Partner Benefits Catalog | Partner Marketplace

If you need additional details, reach out to us at coinnovate@sap.com.

Fantastic SAP AI SDKs and where to find them

2024-03-19T14:42:07.467000+01:00

In today's fast-paced digital landscape, businesses are increasingly turning to artificial intelligence (AI) to gain a competitive edge. SAP has recognized this trend and has developed a suite of powerful SDKs (Software Development Kits) to help organizations seamlessly integrate AI capabilities into their applications and workflows. In this blog post, we will explore the different SAP AI SDKs, their functionalities, and the value they bring to customers and if I miss any of them, let me know!

SAP AI Core SDK

The [SAP AI Core SDK](https://pypi.org/project/ai-core-sdk/) is a Python-based SDK that allows developers to access and interact with the SAP AI Core service using Python methods and data structures. This SDK provides a comprehensive set of tools to manage AI scenarios and workflows within SAP AI Core.

AI Core Key Features
Scenario Management	Create, update, and manage AI scenarios and executables.
Workflow Execution	Execute pipelines as batch jobs for tasks such as preprocessing, training models, or performing batch inference.
Model Deployment	Deploy trained machine learning models as web services to serve high-performance inference requests.
Infrastructure Integration	Register Docker registries, synchronize AI content from Git repositories, and register object stores for training data and models.
Metrics Tracking	Log and track metrics within workflow executions for monitoring and analysis.

The SAP AI Core SDK simplifies the development process by providing a unified interface to interact with SAP AI Core. This enables developers to focus on building and deploying AI solutions without worrying about the underlying infrastructure complexities.

Key GitHub repositories that leverage the SAP AI SDKs:

[SAP-samples/ai-core-samples](https://github.com/SAP-samples/ai-core-samples)

This repository contains sample code and examples of the SAP AI Core SDK that can be used to interact with SAP AI Core. It demonstrates how to create clients, manage resource groups, track metrics during training, and deploy inference services.

[SAP-samples/azure-openai-aicore-cap-api](https://github.com/SAP-samples/azure-openai-aicore-cap-api)

This repository shows how to deploy an inference service on SAP AI Core that acts as a proxy for Azure OpenAI Services. It leverages the SAP AI Core SDK to create and manage the necessary resources.

[SAP-samples/aicore-multioutput-regression-restaurant-inspections](https://github.com/SAP-samples/aicore-multioutput-regression-restaurant-inspections)

This repository demonstrates a machine learning service that leverages the capabilities of SAP BTP (SAP AI Core and SAP AI Launchpad) for criticality assessments, likely using the SAP AI Core SDK.

[SAP-samples/teched2023-AI269](https://github.com/SAP-samples/teched2023-AI269)

This repository contains materials for a TechEd session on prompt engineering with the SAP Generative AI Hub, suggesting the use of the SAP Generative AI Hub SDK.

[SAP-samples/teched2023-AI266](https://github.com/SAP-samples/teched2023-AI266)

This repository provides a hands-on lab for working with SAP AI Core and SAP AI Launchpad, likely leveraging the SAP AI Core SDK and AI API Client SDK.

These repositories cover various use cases, such as interacting with SAP AI Core, deploying inference services, leveraging generative AI models, and integrating with other services like Azure OpenAI. They serve as valuable resources for developers looking to leverage the capabilities of SAP's AI SDKs in their projects.

AI API Client SDK

The [AI API Client SDK](https://pypi.org/project/ai-api-client-sdk/) is a Python library that allows developers to interact with the AI API, a standardized interface for managing the AI scenario lifecycle across different runtimes, including SAP AI Core and other partner technologies.

AI API Client Key Features
Scenario Lifecycle Management	Create, update, and manage AI scenarios, executables, and configurations.
Execution Management	Trigger and monitor AI workflow executions
Model Management	Deploy, update, and undeploy machine learning models for serving inference requests.
Runtime Agnostic	Interact with AI scenarios across multiple runtimes, including SAP AI Core and partner technologies.

The AI API Client SDK provides a consistent and unified way to manage AI assets, regardless of the underlying runtime environment. This abstraction layer simplifies the development process and promotes portability, allowing organizations to leverage AI capabilities across different platforms and technologies.

SAP Generative AI Hub SDK

The [SAP Generative AI Hub SDK](https://pypi.org/project/generative-ai-hub-sdk/) is a Python library that enables developers to interact with the SAP Generative AI Hub, a platform that provides access to a broad range of large language models (LLMs) from various providers, such as GPT-4 by Azure OpenAI or open-source models like Falcon-40b.

Generative AI Hub Key Features
LLM Access	Submit prompts to multiple LLMs and compare the generated outcomes to identify the best-suited model for a given task.
Prompt Engineering	Leverage tooling for prompt engineering, experimentation, and other capabilities to accelerate the development of applications infused with generative AI.
Prompt History	Gain greater control and transparency with built-in prompt history tracking.
Secure and Trusted	Access LLMs in a secure and trusted environment, ensuring compliance and data privacy.

The SAP Generative AI Hub SDK empowers developers to harness the power of generative AI, enabling them to create innovative applications that leverage the capabilities of state-of-the-art language models while maintaining control, transparency, and compliance.

[SAP-samples/btp-generative-ai-hub-use-cases](https://github.com/SAP-samples/btp-generative-ai-hub-use-cases)

This repository contains samples on how to build industry solutions leveraging generative AI capabilities on top of SAP BTP, integrated with SAP S/4HANA Cloud. It likely utilizes the SAP Generative AI Hub SDK.

AI Core LLM

The [AI Core LLM](https://pypi.org/project/sap-ai-core-llm/) is a Python library that provides a unified interface for interacting with large language models (LLMs) deployed on SAP AI Core. This SDK abstracts away the complexities of working with different LLM providers and models, allowing developers to focus on building applications that leverage the power of language models.

AI Core LLM Key Features
LLM Integration	Seamlessly integrate LLMs from various providers, such as OpenAI, Anthropic, and others, into your applications.
Unified Interface	Interact with different LLMs using a consistent and standardized API, simplifying development and promoting code reusability.
Prompt Management	Manage and version-control prompts, enabling efficient collaboration and reproducibility.
Monitoring and Logging	Monitor and log LLM interactions for auditing, debugging, and performance analysis.

The AI Core LLM SDK streamlines the development process by providing a unified interface for working with LLMs, enabling developers to leverage the power of language models without being tied to specific providers or models.

Conclusion

SAP's suite of AI SDKs empowers organizations to unlock the full potential of artificial intelligence and seamlessly integrate AI capabilities into their applications and workflows. From managing AI scenarios and workflows to deploying machine learning models and leveraging the power of generative AI, these SDKs provide a comprehensive set of tools and utilities to accelerate AI adoption and drive innovation.

Citations

https://github.com/SAP-docs/sap-artificial-intelligence/tree/main

https://pages.community.sap.com/topics/ai-core-artificial-intelligence

https://community.sap.com/t5/technology-blogs-by-members/unleashing-the-power-of-sap-ai-launchpad-amp-sap-ai-core-create-your-first/ba-p/13583759

https://help.sap.com/docs/sap-ai-core/sap-ai-core-service-guide/libraries-and-sdks

https://community.sap.com/t5/technology-blogs-by-sap/what-s-new-in-sap-ai-core-sap-ai-launchpad-in-q2-2022/ba-p/13526476

https://community.sap.com/t5/technology-blogs-by-members/understanding-generative-ai-core-concepts/ba-p/13574898

https://help.sap.com/docs/sap-ai-core

https://www.linkedin.com/pulse/sap-ai-core-sridevi-aduri-ibhxc

https://api.sap.com/package/SAPAICore/overview

https://discovery-center.cloud.sap/serviceCatalog/sap-ai-core

https://help.sap.com/docs/sap-ai-core/sap-ai-core-service-guide/about-ai-api

https://community.sap.com/t5/artificial-intelligence-and-machine-learning-blogs/sap-ai-core-sap-ai-launchpad-a-visual-introduction-part-1/ba-p/13543494

https://github.com/SAP-samples/ai-core-samples

https://learning.sap.com/learning-journey/learning-how-to-use-the-sap-ai-core-service-on-sap-business-technology-platform/configuring-sap-ai-core-and-sap-ai-launchpad_a51c5214-d9ab-4f22-bc81-b683d09697fc

https://developers.sap.com/tutorials/ai-core-helloworld.html

https://github.com/SAP-samples/btp-generative-ai-hub-use-cases

https://github.com/SAP-samples/aicore-multioutput-regression-restaurant-inspections/blob/main/mission/register-general-artifacts.md

https://github.com/SAP-samples/btp-generative-ai-hub-use-cases/activity

https://github.com/SAP-samples/azure-openai-aicore-cap-api

https://github.com/appintheair/api-ai-android-sdk

https://github.com/SAP-samples/btp-generative-ai-hub-use-cases/actions

https://github.com/SAP-samples/kyma-runtime-extension-samples

https://github.com/gregorwolf/bookshop-demo/blob/main/package.json

Predict, Personalize, Prosper: BTP AI Capabilities Redefining Retail Intelligence - Part 3/3

2024-03-22T14:26:42.198000+01:00

Welcome to the third part of our blog series where we explore how SAP Business Technology Platform (BTP) is revolutionizing retail by enabling predictive insights, personalized interactions and ultimately prosperity in the enterprise. In this part, we provide an overview of the BTP architecture and its transformative capabilities, focusing on key components such as the SAP HANA Vector Engine, SAP GenAI Hub and Cloud Foundry.

Part 1 - Predict, Personalize, Prosper: Crafting Tomorrow's Retail Experience with RAG

Part 2 - Predict, Personalize, Prosper: ReAct's Decision-Making in Retail

Part 3 - Predict, Personalize, Prosper: BTP AI Capabilities Redefining Retail Intelligence

Architectural overview of the BTP

The SAP Business Technology Platform (BTP) is a comprehensive suite of tools and services designed to help companies achieve greater flexibility, innovation and intelligence. At its core, BTP seamlessly integrates various components to provide a unified platform for application development, deployment and management. Leveraging the power of SAP HANA Vector Engine, SAP GenAI Hub and Cloud Foundry, BTP provides retailers with a solid foundation for AI-driven insights and personalized experiences. Below is an architecture overview showing the individual components of an AI Store Assistant mentioned above

AI Store Assistant Architecture

Let's take a closer look at the individual components and highlight their capabilities to enable AI innovation

SAP HANA Vector Engine

At the heart of BTP's AI capabilities is the SAP HANA Vector Engine, a high-performance database optimized for processing vector-based operations. In the context of retail, the SAP HANA Vector Engine serves as a powerhouse for storing embeddings and enables fast and efficient retrieval of relevant data. By using this engine, retailers can seamlessly access and analyze volumes of data understandable to Large Language Models (LLMs), leading to predictive insights and personalized interactions. More information about the SAP HANA Vector Engine can be found here.

SAP GenAI Hub

The SAP HANA Vector Engine is complemented by the SAP GenAI Hub, a comprehensive platform for the development and deployment of LLM-based AI models. With the GenAI Hub, retailers can access a variety of pre-trained language models and customize them to their specific requirements. With this feature, retailers can send prompts to various Large Language Models (such as GPT-4 by Azure OpenAI or OpenSource Falcon-40b) and use their advanced natural language processing capabilities to generate contextualized responses. The GenAI Hub can be deployed as part of the SAP AI Core. Further information can be found here.

Cloud Foundry

To bring these capabilities to life, BTP leverages the power of Cloud Foundry, an open source platform-as-a-service (PaaS) environment. By deploying Python-based applications on Cloud Foundry, retailers can quickly develop and deploy AI-driven solutions. As an example, the use of Python frameworks such as Flask (web framework) and LangChain (AI framework) helps to quickly deploy scalable AI applications. In addition, Cloud Foundry provides retailers with the flexibility and scalability they need to adapt to changing business requirements.

Example: AI Store Assistant

To illustrate this architecture in practice, we look at the AI Store Assistant - a revolutionary tool and proof-of-concept for improving operational activities in a store. The AI Store Assistant utilizes BTP's AI capabilities and integrates seamlessly with SAP S/4HANA and Industry Cloud (such as Omnichannel Promotion Pricing Service- OPPS) solutions to retrieve relevant data. Based on prompts generated by the SAP GenAI Hub, the AI Store Assistant can obtain information, make suggestions and ultimately perform these as follow-up actions. In the example below, a prediction can be made for certain product groups or actions can be created for expiring goods. The AI Store Assistant also has the option of providing charts and tables in addition to text.

AI Store Assistant predicting sales revenue

Conclusion

In summary, SAP Business Technology Platform is redefining retail intelligence by leveraging advanced AI capabilities to predict, personalize and prosper. Through the seamless integration of SAP HANA Vector Engine, SAP GenAI Hub and Cloud Foundry, the BTP enables retailers to gain new insights and innovation. As we move into the future of retail, let's continue to harness the transformative power of BTP to create personalized and impactful experiences for customers worldwide. Join us on this journey of innovation and discovery as we shape the future of retail intelligence together.

Augmenting SAP BTP Use Cases with AI Foundation: A Deep Dive into the Generative AI Hub

2024-03-25T15:08:29.967000+01:00

Authors: Alice Magnani, Cesare Calabria, Jacob Tan, Mostafa Sharaf, Trinidad Martinez.

Generative AI is accelerating in the industry, with a McKinsey report stating that Generative AI could add between $2.6 trillion to $4.4 trillion in incremental value to the global economy annually. Companies globally are increasingly using Generative AI, with about one-third of them using it regularly in at least one business function.

At SAP, we integrate Generative AI technology with industry-specific data and process knowledge to create innovative AI capabilities for our applications. To illustrate how Generative AI can benefit businesses, in this blog we'll walk you through a citizen reporting app for the public administrations industry.

Our use case involves a fictitious city, "Sagenai City", struggling with managing and tracking maintenance in public areas. The city wants to improve how it handles reported issues from citizens by analyzing social media posts and making informed decisions. A better perception of public administration by citizens is an anticipated outcome.

The goal of the citizen reporting app is to assist the Maintenance Manager by extracting insights from citizen social posts, classifying them, and creating maintenance notifications in the SAP S/4 HANA Cloud tenant.

Let’s understand the Citizen Reporting App business flow:

A citizen reports an incident through a post on the city's community page on Reddit (we have used Reddit as having a rich and free API, but any social media of your choice can be chosen).
The citizen reporting app receives the post and notifies the responsible persons from the public administration office.
The post is then processed and analyzed by a Large Language Model through SAP’s Generative AI Hub to extract key points and derive insights such as summarizing the issue, identifying the issue type, its urgency level, determining the incident's location, and analyzing the sentiment behind the post.
The Maintenance Manager reviews the incident details extracted by Generative AI and decides whether to approve or reject the incident. As a result, the manager saves time through the reporting process, thanks to the powerful text analysis by Generative AI.

To demonstrate this in action please check the following demo video of the Citizen Reporting App, leveraging the power of SAP’s Generative AI.

Before we go into details on how we implemented this proof of concept, let’s review some key concepts.

LLM’s Overview

Large Language Models (LLMs) have been making waves in the artificial intelligence (AI) landscape. These foundation models trained on broad datasets have expanded their influence beyond natural language to domains such as videos, speech, tabular data, and protein sequences.

Understanding Large Language Models

Foundation models, a term coined by Stanford researchers, are self-supervised models trained on vast datasets. They can be easily adapted to a myriad of downstream tasks. Among these, Large Language Models (LLMs) focus on natural language in the form of text.

In some cases, LLMs also exhibit impressive generative capabilities, making them suitable for creating novel content. They are often referred to as generative AI models.

In this blog we will focus mainly on the LLMs application around Generative AI.

These models leverage transfer learning, a concept that has been in use for decades in the image domain, on a much larger scale. The change of scale was made possible by a set of key-factors, such as improved computing capabilities, the advent of the transformer architecture, and the availability of massive datasets. As a consequence, these large models started to show new emerging capabilities. One of them is in-context learning, which allows to adapt the model to perform a certain task with no need for additional training, just giving instructions in human language in the form of text. This can reduce tremendously the time to value required to implement artificial intelligence for a wide set of tasks.

Also, LLMs are typically trained on the task of predicting the next word in a text. This approach requires the model to gain substantial knowledge about the world, making them a form of "lossy compression" of the information available on the internet.

Limitations of Large Language Models

Despite their remarkable capabilities, LLMs come with some limitations. First of all, training these models is time-consuming and expensive. Furthermore, keeping the models' knowledge correct and updated is a challenge. They are also stochastic, meaning minor variations in the prompt can lead to different outputs. There is no way to verify the correctness of the models' answers and finally, they tend to prioritize form over content, so they can generate incorrect but plausible sounding answers.

With the current state of the art, LLMs can help solving many uses cases requiring language understanding and language generation. LLM struggle, instead, in logics and math, and with use cases that require analysis of numerical, structured data. In the following image we share some use case samples that we have classified into good LLM use case and not such a good case for LLMs.

Adapting LLMs to Desired Tasks

There are several strategies to adapt LLMs to the desired task and work-around some of their limitations. The most widely used are listed below:

Prompt engineering.
Prompt engineering is the process of designing and refining prompts or instructions given to a language model to elicit desired responses.
The quality of the prompt can greatly help in improving the response model's accuracy, controlling its biases, or generating creative and coherent responses.
Retrieval Augmented Generation (RAG).
RAG idea is to provide the model with a knowledge base (a document or a corpus of documents) and ask it to answer our prompts based on these documents.
Fine-tuning.
Fine-tuning refers to the process of taking a pre-trained model and adapting it to perform a specific task or on a specific dataset. In machine learning, models are typically trained on large datasets to learn general patterns and features. Fine-tuning allows to take advantage of this pre-trained knowledge and apply it to a more specific task or dataset.
The costs involved in fine-tuning can be considerable as they might include computational resources, such as GPU or cloud computing costs, as well as the time and effort required to collect and preprocess data, train the model, and evaluate its performance.

Choosing the Right LLM

When choosing an LLM, it's important to consider factors like price, latency, and request limits. Latest and larger models are usually more capable and robust. Consider looking at model benchmarks and leaderboards, such as Chatbot Arena, to compare the performance of different models.

Evaluating LLMs

Before putting an LLM-based application into production, it is always best practice to test the performance of the LLM for your specific task. To do that, you can come up with a sample of data points to test your model. Apply your LLM prompt in batch to the sample and then evaluate the LLM response accuracy. Sometimes, this operation can be challenging, maybe because of the lack of data points to use for testing, or because it is difficult to rate how good is the model output when a language-related task is involved. LLMs can be useful to overcome these challenges. For instance, you can ask an LLM to generate test data points for you. Also, you can use an LLM as a judge, to rate the quality of an LLM-generated text with the criteria that are important for you, for instance creativity, correctness, etc.

Libraries / LangChain
There are libraries bridging the gap between traditional software development and Generative AI, to make it easier to implement generative AI technology in an application. Langchain is one of the most well-known but there are other options.

Langchain is available for python and Javascript, and it offers many templates to build prompt effectively, parsers to parse the response output, chains to build sequences of LLM operation, Agents, and model evaluation. Langchain can be used in combination with SAP tools for Generative AI, we will talk about it later in this blog.

Overview on SAP’s Generative AI Hub

Artificial Intelligence (AI) is revolutionizing every aspect of business, and at SAP, we are committed to harnessing this power to drive business transformation. Our focus is on Business AI, which means AI for every aspect of the business. We continuously strive to build the best technology and leverage the best tools on the market, including strategic partnerships with industry leaders.

SAP Business AI Strategy

Our SAP Business AI strategy is embedded across various business domains including Finance, Supply Chain, HR, Procurement, Marketing and Commerce, Sales and Services, and IT. We offer a complete set of services that enable developers to create solutions with artificial intelligence. These services form the AI Foundation from which developers can pick whatever they need to build their AI- and generative AI-powered extensions and applications on SAP Business Technology Platform (SAP BTP).

SAP BTP AI Portfolio

Our AI portfolio includes SAP AI Services to help you automate and optimize corporate processes by adding intelligence to your applications using AI models pretrained on business-relevant data. These services include document information extraction, document classification, personalized recommendations, data attribute recommendation, and translation of software texts and documents.

As we have seen, Generative AI, a form of artificial intelligence, can produce text, images, and varied content based on its training data. At SAP, we integrate Generative AI with extensive industry-specific data and deep process knowledge to create innovative AI capabilities for the SAP applications you use every day. It’s built-in, relevant to your business, and responsible by design.

We also offer the Generative AI Hub, which provides instant access to a broad range of large language models (LLMs) from different providers. The hub provides tooling for prompt engineering, experimentation, and other capabilities to accelerate the development of your SAP BTP applications infused with generative AI, in a secure and trusted way.

Generative AI Hub

The Generative AI Hub allows access to a large variety of foundation models, from the hosted or open-source ones to the proprietary and remote models and in the future also models built or fine-tuned by SAP.

The Generative AI Hub is a new SAP BTP capability offered across two different services: AI Core and AI Launchpad.

Deployment of LLMs in SAP AI Core

To consume the LLMs through SAP’s Generative AI Hub, we first need to deploy them in SAP AI Core (Create a Deployment for a Generative AI Model). We leverage AI Core capabilities to provide a standard model inference platform on Kubernetes and serve the LLMs taking care of security aspects, multi-tenancy support, ...

Check the Models and Scenarios in the Generative AI Hub up-to-date list of supported models.

Generative AI Hub Capabilities in SAP AI Launchpad

SAP AI Launchpad provides a graphical interface for Generative AI Hub and from there several tools are provided for the developers. These include a playground system where developers can perform prompt engineering and test prompts against different LLMs, and Prompt Management features that allow storing and versioning prompts.

Generative AI Hub SDK

The Generative AI Hub SDK is a collection of tools, in addition to the existing ai-core-sdk, to support LLM access. It boosts efficiency when working with various LLM models by streamlining the deployment of LLM models and the querying of available models.

At the time of writing this blog, only the Python version is made GA (Generable Available), stay tuned for a JavaScript version!

Deep Dive into the Citizen Reporting App Use Case

In the previous sections, we've explored the concept of Generative AI and how SAP's Generative AI Hub can help you implement your own solutions. Now, we're going to delve deeper into our specific use case: the Citizen Reporting app.

Business Scenario

The Citizen Reporting app is designed to streamline the process of incidents reporting in social media. The app leverages the capabilities of the Generative AI Hub to gain insights from citizens' social posts. This not only saves time for maintenance managers but also improves the public's perception of the administration.

The app allows maintenance managers to easily review and manage incidents, including AI insights and classification. Once the relevant incidents are supervised, the app creates Maintenance Notifications in the SAP S/4HANA Cloud tenant.

Solution Architecture

The Citizen Reporting app consists of a server-side application that can be run in SAP BTP Cloud Foundry or Kyma runtime, and a user interface developed with SAP Build Apps. The server-side application provides all the APIs required to get data and start the execution of the actions required by the maintenance manager.

The server-side application consumes APIs from different BTP Services, including the SAP AI Core service, which provides Generative AI APIs from different foundation models in a trusted and controlled way.

The Destination service is used to securely connect to the SAP AI Core APIs as well as SAP S/4HANA Cloud APIs to automatically create Maintenance Requests after Maintenance Manager’s approval.

Implementation Steps

Let’s go a bit deeper into the implementation by having a look to the different implementation steps:

SAP BTP Setup

To work with Generative AI hub in SAP Business Technology Platform, the first requirement is to have an SAP BTP sub account with an instance of SAP AI Core. The SAP AI Core instance will provide us with a set of APIs to consume Generative AI Hub from our implementation.

Once we have our instance of SAP AI Core, we can create instance of the SAP AI Launchpad to have access to a user interface to configure and manage our generative AI deployments.

As we saw in the architecture diagram, other SAP BTP services like the Cloud Foundry environment, the SAP HANA Cloud Service, SAP Authorization and Trust Management Service (XSUAA) & Destinations Service, would be required for the application and the service instances will need to be created.

Prompt Engineering

Once your SAP BTP account is ready, we will build the prompt to be sent to the Foundation Model. This is a very important phase to get accurate insights from SAP’s Generative AI Hub. The prompt engineering step is usually recursive and conducted with the help of tools like the AI Launchpad or a Jupiter Notebook, to test and fine-tune the prompt.

Let’s review how we built our prompt to get insights from the Citizen’s social media posts.

Application Skeleton

The application flow begins with citizens reporting incidents on social media. The server-side application reads these posts and sends their content, along with a prompt, to SAP's Generative AI Hub. The SAP’s Generative AI Hub generates insights from each post, which are then presented to the Maintenance Manager.

The manager can choose to approve or reject each incident. Approved incidents result in a maintenance notification being created in SAP S/4 HANA, while all incidents, approved or rejected, are logged in the SAP HANA Cloud database.

We have implemented this application by building three different modules:

Social Media integration
Generative AI Hub integration
SAP S/4 HANA Cloud integration.

The full source code of our application is available in the SAP-samples/btp-generative-ai-hub-use-cases GitHub repository part of the SAP-Samples.

Consuming Generative AI Hub in JavaScript

To implement the server side of our social media citizen reporting app in JavaScript, we have leveraged the SAP Cloud application programming model.

The JavaScript implementation shown in this demo directly calls the SAP AI Core Generative AI Hub APIs through an HTTP POST and shows how to consume the SAP’s Generative AI Hub raw APIs. Please check Generative AI Reference Architecture including a GitHub repository to check other implementation options in TypeScript leveraging Langchain.

Consuming Generative AI Hub in Python

To consume Generative AI Hub in Python, we leverage the Generative AI Hub SDK, which simplifies the implementation of the calls to the SAP’s Generative AI Hub APIs.

SAP S/4 HANA Cloud Integration

Once the Maintenance Manager approves an incident, a maintenance notification is created in the SAP S/4 HANA Cloud tenant. We used the SAP Cloud SDK Maintenance Notification Service module to implement this functionality. The user interface calls a function to create the maintenance notification, providing all the required information from the citizen post and the insights generated by the Generative AI Hub.

User Interface Implementation

The user interface was implemented using SAP Build Apps, a low-code, no-code tool that enabled us to implement our proof of concept quickly, using ready-to-consume UI components. The user interface connects to the backend via OData APIs to retrieve all the details required by the Maintenance Manager.

All the source code of this application is available on our Github repository for further exploration and includes both the JavaScript as well as the Python version of the backend application among other resources.

Explore further resources on generative AI at SAP

Source code of our prototype

SAP-samples/btp-generative-ai-hub-use-cases

Generative AI at SAP

Implementation Samples

Social Media Citizen Reporting App – GitHub repository
Generative AI Reference Architecture
Generative AI Hub Tutorials

More sessions on this series

Please check the central blog Augment Your SAP BTP Use Cases With AI Foundation on SAP BTP to get all the details on the sessions part of this series, register for the upcoming sessions and earn a knowledge badge by attending passing the associated web assessment.

Conclusion and Key Takeaways

Large Language Models have ushered in a new era in the field of artificial intelligence. While they come with their limitations, their ability to generate novel content and adapt to a wide range of tasks makes them a powerful tool. By understanding these models and how to adapt, choose, and evaluate them, we can leverage their capabilities in various SAP environments and beyond.

The SAP’s Generative AI Hub offers a streamlined toolset for Generative AI solution development, instant access to top-rated foundation models from multiple providers, and an elevated level of trust and control.

We hope this blog has inspired you to start designing and implementing your own use cases to unlock the power of generative AI in your solutions. Stay tuned for more blogs in this series as we continue to explore the exciting possibilities of generative AI with SAP!

Boosting Benchmarking for Reliable Business AI

2024-03-27T20:20:07.257000+01:00

This blog post is part of a series that dives into various aspects of SAP’s approach to generative AI, and its technical underpinnings.

In the previous blog post of this series, we introduced the SAP generative AI hub, in SAP AI Core. In this blog post, we will discuss a significant aspect – benchmarking of AI models.

Read the first blog post of the series.

Introduction

At SAP, we are building business AI - AI which is solving our customers’ business problems and which is an integrated part of SAP’s applications and platforms. By integrating this technology in our applications, our customers can achieve relevant outcomes with direct business value (Tomasz Janasz et al. 2021). While generative AI (GenAI) can create great value for companies, applying this technology to business problems comes with high requirements for correctness and reliability.

The number of generative AI use cases developed at SAP and the number of AI models which are available, both commercial and open source, is growing quickly. Therefore, we need a systematic approach for selecting the most suitable model for each use case and need to answer the question “What is the best model?” for each particular use case.

In our previous blog posts, we have discussed how SAP's generative AI hub in SAP AI Core lets developers access the best AI models from our cloud-technology partners, such as Microsoft Azure, leading AI startups, and open-source models in an SAP environment. In this third blog post of the series, we will describe why benchmarking is important for business AI, how it fits into SAP’s AI strategy and what are current best practices for benchmarking and testing AI models, to maximize business value.

Why benchmarking matters

To state the obvious, customers expect their software to work correctly and reliably. Conducting rigorous testing is, therefore, part of any professional software development project. It becomes even more important when the software is used to make important business decisions. Now, customers have similar high expectations regarding correctness of any AI feature. Effective testing helps in identifying biases, inaccuracies, and unexpected behaviors, and helps to ensure that the models align with ethical standards and user expectations.

AI testing often utilizes collections of test samples, typically across different tasks, which are organized in standardized benchmarks. Benchmarks allow comparing AI models and have been a measure of progress in AI. Initiatives like Stanford’s Holistic Evaluation of Language Models (HELM) or the Huggingface Open LLM leaderboard compare AI models across multiple tasks and datasets.

Compared to traditional software testing, testing AI models, particularly large language models (LLMs), is still a less mature discipline. In conventional software testing, there are established methodologies, like unit testing, and the behavior of the software is largely predictable and deterministic, as it was programmed by a human developer. AI models resemble 'black boxes' which were trained, rather than explicitly programmed and whose inner workings are hard to understand, even for AI experts. This opacity makes understanding the causes behind outputs difficult, complicating the debugging and improvement processes. Furthermore, the vast and varied nature of language models complicates the creation of comprehensive test sets, as it is nearly impossible to predict all the ways users might interact with the model and the space of potential outputs is nearly unbounded.

How are models currently being used?

General-purpose large language models are flexible and have a wide range of applications. However, for specific use cases that require specialized knowledge, these general-purpose models might need fine-tuning or the use of retrieval-augmented generation (RAG) to improve their accuracy and reliability. The high cost of fine-tuning may necessitate the use of smaller models to ensure feasibility. Meanwhile, retrieval-augmented generation systems may require models with larger context window sizes.

For any large language model used in a user application, there's a balance to be struck between the responsiveness of the model and the quality of its output. For instance, models like GPT-4 from OpenAI are known for their excellent performance in terms of quality but are restricted by high latency and cost. Other models, such as Gemini Pro, Mixtral 8x7B, GPT-3.5 Turbo, and Llama 2 Chat 70B, provide a good mix of quality, speed, and cost-effectiveness. While these models may not perform as well as GPT-4, they still outperform many other models on the market.

Figure 1: Quality, Performance & Price Analysis of LLMs (from Artificial Analysis)

Therefore, choosing the right large language model requires a careful evaluation of the trade-offs between quality, speed, and cost. This underscores the importance of comprehensive benchmarking when assessing different models.

From “eyeballing” to supervised evaluation

Existing AI benchmarks are focused on academic datasets typically created from public sources, like Wikipedia. They do not yet sufficiently cover enterprise use cases which require data from specific business domain[1]. Therefore, additional testing is required. Let us investigate the testing process for generative AI in more detail.

Figure 2: Three approaches to AI model evaluation

Testing and evaluating AI models follows an iterative approach. When teams are ideating on new generative AI use cases, they typically start experimenting with various AI models in a “playground” environment, such as the generative AI hub in SAP AI Launchpad (read more about it in blog post 2 of this series, or this blog post). This environment allows for rapid prototyping via prompt engineering. SAP AI Launchpad also allows sending prompts to different models for easy comparison. Compared to traditional AI models, generative AI models allow fast prototyping and a first feasibility check without the significant upfront investment of creating training and test data sets.

Figure 3: Prompt Editor in SAP AI Launchpad

However, simply “eyeballing” a few prompt-answer pairs is not sufficient beyond the prototyping stage. Developers need more robust and reproducible tests. These tests should be automated so that they can be conducted as a part of automated testing and release pipelines. Automated testing is also important for regression testing, when the application, the input data or the underlying model have been changed. Finally, these methods should cater to diverse use case scenarios and languages, aiding in informed technology choices and application enhancements

The second stage in benchmarking models is using a large language model as a judge. This approach requires a test set of inputs but does not require ground truth references, i.e. it does not require the expected output or label. Instead, it uses the ability of powerful large language models to score the output – using one AI model to score another AI model. Employing language models as a judge sidesteps the expensive creation of human references, making it an efficient method to scale the evaluation after the eyeballing step. While research shows high correlation between language models as a judge and human scores (Liu, et al. 2023), a downside is the potential for bias or errors in the automatic judgments. Thus, the evaluation results should not be trusted blindly but undergo additional testing, either with a supervised test set, which we describe next, or human evaluation.

The third stage is supervised evaluation, consisting of a test set of model inputs and expected outputs, usually provided by human domain experts to ensure they are accurate and reliable. These test sets should cover a wide range of expected inputs including edge cases and hard examples. The model output can be compared to the expected outputs automatically with an automated metric which measures the agreement between model outputs and human reference, producing a measure on the model’s performance. Examples for metrics are statistical measures, like accuracy, precision and recall, or AI-specific metrics, like the BLEU (Bilingual Evaluation Understudy) score which is commonly used in machine translation evaluation to measure the similarity of generated translations to human reference translations.

Supervised evaluation is the most reliable way to benchmark AI models, but it is also the costliest. Therefore, it makes sense to start with the simpler and cheaper prompt-engineering approaches, then enhance the tests using language models as a judge, and finally move to supervised evaluation. An overview of these evaluation methods is depicted in Figure 3.

Figure 4: Ways of evaluating LLMs

Evaluation in practice: document processing with LLMs

Let us illustrate this with an example AI use case at SAP. Document Information Extraction is an SAP AI service which extracts information from business documents, such as invoices or purchase orders. The service was recently enhanced with large language model capabilities which greatly enhances its flexibility to process more document types and support more languages. After an initial prototyping phase, which showed the initial feasibility of using large language models, the team conducted a supervised evaluation of different large language models on a subset of header fields. We report the extraction accuracy of each model prompted in a zero-shot fashion relative to a fully supervised baseline. Figure 4 shows the anonymized results. We can see that some of the models can achieve high extraction accuracy, rivaling supervised models. But the results also show that the extraction accuracy between models varies, illustrating the importance of benchmarking. The main business value of large language model in this scenario is their flexibility: allowing users to process any type of document in almost any language (although the exact language coverage differs between models). It is important to note that large language models are several orders of magnitude larger and more expensive than the supervised model. All these factors need to be considered when producing AI.

Figure 5: Benchmarking results for information extraction from documents using zero-shot prompting on a subset of header fields. Results are relative to a supervised baseline. Model names are anonymized.

Benchmarking for Enterprise AI

To enhance the state of AI benchmarking for enterprise use cases, SAP is teaming up with its academic partners. In 2023, SAP joined the Stanford HAI Corporate Affiliate Program. Under this program, SAP is now starting a collaboration with the Stanford Center for Research on Foundation Models (CRFM) on the Holistic Evaluation of Language Models (HELM) project to study specific enterprise AI use cases.

Conclusion

In this blog post, we have outlined the importance and the challenges of benchmarking or business AI. We have recommended three stages of evaluation as current best practices for benchmarking, going from quick eyeballing of examples to using language models as a judge and supervised evaluation. This helps to quickly establish a first feasibility check and build more robust and reliable tests in an iterative manner. The field of AI benchmarking is still evolving, and we can be sure that we will see more benchmarks and test methodologies covering, for example AI safety, trustworthy AI, and industry verticals, in the future.

References

Tomasz Janasz, Peter Mortensen, Christian Reisswig, Tobias Weller, Maximilian Herrmann, Ivona Crnoja, and Johannes Höhne. "Advancements in ML-enabled intelligent document processing and how to overcome adoption challenges in enterprises." Die Unternehmung 75, no. 3 (2021): 340-358.

Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu, and Chenguang Zhu. "GPTEval: NLG evaluation using GPT-4 with better human alignment." arXiv preprint arXiv:2303.16634 (2023).

[1] But see the recently released Enterprise Scenarios Leaderboard by Patronus as a first benchmark for enterprise use cases

Demystifying Transformers and Embeddings: Some GenAI Concepts

2024-03-28T21:07:49.791000+01:00

Introduction:

Generative AI (GenAI) keeps generating so much news and so many innovative applications in many domains and industries. The potential impact of GenAI on businesses and society is becoming difficult to deny. Thought leaders have compared the impact of GenAI to the invention of the transistor, the internet, and even FIRE.

Being at SAP part of the Partner Ecosystem Success group, I had the opportunity to talk to partners and customers, testifying that Business AI is first on the agenda, as SAP's CEO Christian K. and Walter S., Head of AI at SAP, expressed in this short video.

SAP is at the forefront of this new enterprise revolution and rightfully so, as SAP applications constitute the digital platform of business in the world, where the oil of Gen Business AI being the business Data is being generated and transformed. As the saying goes, with great power comes great responsibility; therefore, SAP is strategically focusing on Relevant, Reliable, and Responsible AI.

On the GenAI technology news from SAP, the following are the latest:

Generative AI hub: This is the place where SAP and its ecosystem will integrate LLMs and AI into new business processes in a cost-efficient manner.

Joule: The Generative Business AI Assistant from SAP. This will become the main UX for SAP Applications like S/4HANA Cloud and SuccessFactors HCM, providing information retrieval, navigational assistance, transitional assistance, and analytics assistance with ad-hoc and natural language querying capabilities.

SAP Build Code: This is where the developer community will build new user experiences with such speed of development thanks to embedded productivity tools powered with GenAI.

SAP HANA Database Vector Engine: This is the in-memory database from SAP, integrating under one single APIs set all the required engines for transactional workloads, Analytics/ML, and now GenAI thanks to the added support for a Vector engine. The Vector engine is the technical component used for managing Embeddings and using them to ground GenAI business use cases. Grounding is a term used in the context of RAG (Retrieval-Augmented Generation) to condition the LLM with the relevant context and therefore limit hallucinations.

Understanding Transformers and Embeddings:

Talking to partners and customers about GenAI is fun, but for more productive brainstorming, it's sometimes important to level set on the technical side with some GenAI basics, such as Embeddings and Transformers.

There are many great blogs and videos explaining these topics in detail using GPT-2, so this is a great resource as most LLMs use a very similar architecture. The resources that helped me most and seemed to be the best for people with technical background are from Niels Rogge from HuggingFace:

Transformers like GPT act on tokens. Tokens are sub-words, and all the LLMs sub-words is the vocabulary. GPT-2 has a vocabulary size of 50,257 tokens, with each token represented by an Embedding of 768 dimensions. In comparison, GPT-4 seems to have 100,256 tokens and a default embedding vector of 1,536 dimensions (GPT-4 has a higher size of 3,072 dimensions and comes with the text-embedding-3-large model).

The tokens are produced with a tokenization ML process that tries to maximize the compression of language into an adequate vocabulary size. In GPT, the Embeddings are encoded into vectors using a training phase through the Transformer (see the 2nd from Niels).

You can experiment with tokenization on OpenAI here. For example, "hello world, my name is" will produce these 6 tokens: ['hello'- ' world' - ',' - ' my' - ' name' - ' is']. In French, when translated literally to "bonjour monde, mon nom est," it produces 7 tokens. Less common words will produce more tokens; therefore - fun fact - non-English languages will tend to produce more tokens, i.e., higher dollar cost for the same number of language words of a given prompt.

One forward pass through the Transformer architecture of GPT-2, i.e., predicting one next token, can be summarized in Python as follows. The code is self-explanatory with detailed inline comments.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load the GPT-2 tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")

# Prompt
prompt = "hello, my name is" 

# Tokenize the input text and return their IDs
input_ids = tokenizer(prompt, return_tensors="pt").input_ids

# Pass the tokenized input through the GPT-2 model to get the logits (Unnormalized embeddings) representing the likelihood scores of each possible next word.
logits = model(input_ids).logits

# slices the output to get the logits only for the last position in the sequence, as we're predicting the very next word
next_token_logits = logits[:, -1, :]

# finds the index (ID) of the word with the highest probability score.
next_token_id = torch.argmax(next_token_logits, dim=-1)

# Decode i.e. convert the numerical token ID back into its corresponding word.
next_token = tokenizer.decode(next_token_id[0])

# Print the predicted next token
print(f"Next token is {next_token}")

When running this program on a Jupiter Notebook or Visual Studio Code, the next keyword will be "John." As a developer, you can add a loop to keep appending the next_token to the end of the prompt and perform a forward pass to see what the following tokens will be.

The videos above from Niels are very informative but could be a bit technical for some community members. To make it easier to explain, I spent a lot of time thinking about how to demonstrate the concepts without resorting to the phrase “Some magic happens here…”. Here is what I came up with, hoping it will help you understand and explain broader GenAI topics to your partners and customers.

Imagine a 3-dimensional space representing some concepts like Fruits, Programming languages, and Software companies.

The following matrix with the name "embedding" is the algebraic representation of such a universe. The labels set is the vocabulary, and the X, Y, and Z coordinates are the corresponding embedding vectors. Please continue through the Python code, as it's purposefully well-documented to be self-explanatory.

import numpy as np
import random

# Pre-defined (random) word embeddings
embeddings = {
    # Fruits I love
    "Apple":        np.array([0.85,          0.10,       0]),
    "Banana":       np.array([0.91,          0.05,       0]),
    "Orange":       np.array([0.90,          0.06,       0]),
    "Figs":         np.array([0.88,          0.10,       0]),
    # Programming Languages I know
    "Java":         np.array([0.10,          0,          0.85]),
    "JavaScript":   np.array([0.05,          0,          0.91]),
    "Python":       np.array([0.06,          0,          0.90]),
    "ABAP":         np.array([0.10,          0,          0.88]),
    # Enterprise software companies :)
    "SAP":          np.array([0,             0.85,       0.10]),
    "Oracle":       np.array([0,             0.91,       0.05]),
    "Microsoft":    np.array([0,             0.90,       0.06]),
    "Salesforce":   np.array([0,             0.88,       0.10]),
}

# Prompt 
prompt = "I love ABAP and"

# Split the prompt into individual words
prompt_words = prompt.split()

# Function to get embedding for a word
# Return a vector of ones (identity vector) if the word is not in the vocabulary
# Since 'I', 'love, 'and' are not in our vocabulary, it will get an embedding of [1, 1, 1]
def get_embedding(word):
    return embeddings.get(word, np.ones(3))

# Calculate the average embedding of the prompt
prompt_embedding = np.mean([get_embedding(word) for word in prompt_words], axis=0)

# Get our vocabulary 
vocabulary = list(embeddings.keys())  
# Remove words in the prompt from the vocabulary so that they won't be used to complete the prompt
filtered_vocabulary = [word for word in vocabulary if word not in prompt_words]

# Calculate dot products between the prompt embedding and the remaining vocabulary embeddings
dot_products = {word: np.dot(prompt_embedding, get_embedding(word)) for word in filtered_vocabulary}
# Add some randomness in the chosen next word  Sort the words by their dot products and select the top two
top_two_words = sorted(dot_products, key=dot_products.get, reverse=True)[:2]
# Randomly select one of the two highest words. 
predicted_word = random.choice(top_two_words)

print(f"Predicted next word: {predicted_word}")

When running this code on a Jupiter Notebook or Visual Studio Code, you will be able to complete the prompts with the next programming language that I love 🙂 I you type "I like Apple and" it will complete with another fruit. Go ahead and experiment with you own categories and I hope this simulation can help you appreciate the scale of a real LLM like GPT-4 Turbo that acts on a vocabulary of 100,256 tokens and Embedding vectors of 3072 dimensions!!!

This simple code can predict the next word, classify words, etc. from the given vocabulary.

Conclusion:

To be clear, this is a very simplistic illustration of how high school algebra, i.e., matrix multiplication, is at the centre of today's technology revolution. Among other things, the Transformer neural networks like GPUs/TPUs/LPUs for their great capabilities in matrix operations and parallelization of such workloads. It's important not to underestimate other concepts in the Transformer architecture, such as tokenization, encoding, embedding, attention, and the neural network architecture of the Transformers, as well as Reinforcement Learning, Alignment, etc.

Understanding the technical concepts behind GenAI, such as Transformers and Embeddings, is crucial for productive brainstorming and discussions on GenAI applications in various domains and industries. This blog post aimed to provide a primer on these concepts, encouraging readers to explore the provided resources and code examples to deepen their understanding.

Bring Open-Source LLMs into SAP AI Core

2024-04-04T14:55:04.919000+02:00

The open-source community surrounding Large Language Models (LLMs) is evolving rapidly, with new models(model architectures), backends(inference engines), libraries, and tooling constantly emerging. These developments enable the running of LLMs locally or on self-hosted environments. SAP AI Core is a service in the SAP Business Technology Platform that is designed to handle the execution and operations of your AI assets in a standardized, scalable, and hyperscaler-agnostic way. In this blog post series, we will explore various options for running popular open-source Large Language Models like Mistral, Mixtral, LlaVA, Gemma, Llama 2, etc., in SAP AI Core, which complements SAP Generative AI Hub with self-hosted open-source LLMs We'll utilize widely adopted open-source LLM tools or backends such as Ollama, LocalAI, llama.cpp and vLLM, among others, through the BYOM (Bring Your Own Model) approach. The source code of AI Core sample app byom-oss-llm-ai-core used in this blog post series are delivered as an SAP example under Apache 2.0 license. Not part of SAP product.

Blog post series of Bring Open-Source LLMs into SAP AI Core

Part 1 – Bring Open-Source LLms into SAP AI Core: Overview (this blog post)
Part 2 – Bring Open-Source LLMs into SAP AI Core with Ollama (To be published)
Part 3 – Bring Open-Source LLMs into SAP AI Core with LocalAI (To be published)
Part 4 – Bring Open-Source LLMs into SAP AI Core with llama.cpp (To be published)
Part 5 – Bring Open-Source LLMs into SAP AI Core with vLLM (to Te published)

Note: You can try it out the sample AI Core sample app byom-oss-llm-ai-core by following its manual here with all the technical details. The following blog posts will just wrap up the technical details of each option.

We'll begin the series of blog posts with some overview introduction of bringing open-source LLMs into SAP AI Core.

Overview

The SAP Business API strategy aims to infuse SAP business applications with intelligence by integrating cutting-edge AI technologies. An essential component of this strategy involves fostering partnerships and investments within the AI ecosystem. In addition to collaboration with proprietary AI vendors as illustrated in this diagram, the open-source commercially viable foundation models are also supported through BYOM(Bring Your Own Models) approach, in which SAP AI Core is leveraged as a start-of-art and enterprise-ready AI Infrastructure as a Service to deploy and infer open-source LLMs. Notably, an open-source foundation model (falcon-40b) is already available in the SAP Generative AI Hub within SAP AI Core.

SAP Business AI Strategy

On the flip side, within the open-source community of LLMs, an increasing number of high-performing models such as Mistral, Mixtral, Gemma, LLaVa, LLaMA-2, etc., are emerging. These models have the potential to drop in and replace GPT-3.5 in certain use cases.

Sample#1: Customer Message Processing in Intelligent Customer Interaction use case

In our Intelligent Customer Interaction sample use case, we have used GPT3.5 to process the customer message. Details of this implementation can be found in our blog post Integrate with GPT Chat API for Advanced Text Processing. Now, this task can potentially be accomplished using Mistral-7b with comparable results.

Customer Message Processing with Mistral-7b

Sample#2: Public Facilities Issue Spotter for Citizen Reporting use case

We can enhance the citizen reporting use case with LLaVa model's vision capability by automatically analyzing images and identifying public facility issues. This simplifies the user experience in the citizen reporting use case, allowing users to take and upload photos to report an issue through the app effortlessly. Certainly, the same use case can be achieved with GPT-4V when it is available SAP Generative AI Hub.

Public Facilities Issue Spotter powered by LLaVa Vision

This is just a few samples about what the open-source LLMs can help in business context. Next, let's evaluate the rationale of why running open-source LLMs with SAP AI Core.

Why running open-source LLMs with SAP AI Core?

Data Protection & Privacy: Ensure compliance with stringent data protection and privacy regulations by leveraging SAP AI Core's robust security measures. Particularly in contexts where strict data protection and privacy policy and regulations prohibit the transmission of data to external proprietary LLMs. A self-hosted LLM with SAP AI Core becomes a complement for SAP Generative AI Hub.
Security: Safeguard your data and applications with state-of-the-art security protocols and infrastructure provided by SAP AI Core.
Cost-Effectiveness: Benefit from cost-effective solutions without compromising on quality or security, SAP AI Core offers efficient resource allocation(GPU) and dynamic scalability.
Flexibility of Choices: Enjoy the flexibility to choose from a variety of open-source LLMs and LLM backends, tailored to your specific business needs and requirements.
Making Open-Sourced LLMs Enterprise Ready: Elevate open-source LLMs to enterprise-level standards by integrating them seamlessly with SAP AI Core, ensuring reliability, scalability, and performance.

How to bring open-source LLMs into SAP AI Core?

Now, let's go straight to the most interesting part, bringing the open-source LLMs into SAP AI Core through BYOM approach. Let's start with explanation of some basic concepts around LLM.

Basic concepts:

LLM vs LMM
- Large Language Model(LLM): A type of machine learning model designed to process and generate human-like text. For instance, GPT-3.5, Mistral etc.
- Large Multi-modality Model(LMM): a more versatile type of machine learning model that can handle multiple modalities of data, such as text, images, audio, video, etc. These models are designed to process and generate content across different media formats. e.g. GPT-4V, LLaVa etc
Model Architecture, Model Family and Models
- Model architecture: Refers to the structural design and organization of the neural network used for natural language processing tasks.
- Model Family: Refers to a group of related LLMs that share similar architectural principles, training methodologies, and objectives. For instance, LLaMa-2 is a model family.
- Models refer to individual instances or versions within a model family, each with its own specific architecture, parameters, and trained weights. e.g. models LLaMa2-7b, LLaMa2-13b, LLaMa2-70b belongs the model family LLaMa-2.
Non-Quantization vs Quantization: A trade-off between efficiency and accuracy for model inference
- Non-quantization, also referred to as full-precision or floating-point representation, involves storing the model parameters using full precision floating-point numbers (e.g., 32-bit or 16-bit floating-point), which maintains the model's highest possible accuracy with much higher computational resources.
- Quantization is the process of reducing the precision of the parameters/weights in AI Models from floating-point numbers (typically 32-bit or 16-bit floating-point) to a lower bit precision, such as 8-bit integers or even lower, which can reduce the memory footprint and computational requirements of the model on the cost of accuracy loss. In this sample, we test the model with Quantization, like Q5_K_M, AWQ etc, which keep a balance of good model accurracy and inference efficiency.
LLM Backend or Inference Engine: Refers to the computational framework or infrastructure responsible for executing inference tasks with the trained LLM model. for example, transformer, llama.cpp and vLLM support multiple models with different model architectures for inference.
Model Hub: Refers to a centralized repository or platform where pre-trained LLM models are stored, managed, and made accessible to users for model inference tasks. The Model Hub serves as a resource hub for developers, researchers, and practitioners interested in leveraging pre-trained LLM models for their applications. Hugging Face is a great example of Model Hub.

Solution Architecture:

In this sample of Bringing Open-Source LLMs App in SAP AI Core, I have implemented four options(Ollama, LocalAI, llama.cpp and vLLM) which allow you can bring your preferred LLMs through configuration into SAP AI Core with OpenAI-compatible APIs. Some options(Ollama and LocalAI) even enables you to download and switch the model efficiently and dynamically during run-time.

In principle, there are three essential parts for bringing an open-source LLM/LMM into SAP AI Core

Commercially viable Open-Source or Open-Weight Models: e.g. Mistral, Mixtral, LLaVa etc.
Public accessible Model Hub: For instance, Ollama Model Library tailored for Ollama, hugging face as a general purposed model repository.
Custom LLM lnference Server in SAP AI Core: You can bring your own code to implement an inference server, for example, in the blog post Running Language Models – Deploy Llama2 7b on AI Core by felixbartler, it shows a great sample to implement a custom inference server with hugging face transformer. Alternatively, there are open-source and ready-to-use llm inference servers which can be reused in SAP AI Core, like Ollama, LocalAI, llama.cpp and vLLM with minimal custom code as a custom Dockerfile and configurable serving template adapted for SAP AI Core. Ollama is recommended for its simplicity and efficiency.

01-Solution Architecture

Why leverage Ollama, LocalAI, llama.cpp and vLLM as LLM inference Server within SAP AI Core?

Ollama, LocalAI, llama.cpp and vLLM offer a comprehensive solution for running Large Language Models (LLMs) locally or in self-hosted environments. Their full stack capabilities include:

Model Management: Dynamically pull or download LLMs from model repository through API during run-time (exclusive to Ollama and LocalAI. vLLM provides seamless integration with Hugging Face models)
Running LLM efficiently with GPU Acceleration in SAP AI Core using open-source backends such as llama.cpp, vllm, transformer, exllama etc.
Serving with built-in OpenAI-compatible chat completion and embedding APIs. Exact the same APIs in SAP Generative AI Hub.
Easy deployment and setup with the need for minimal custom code deployment in SAP AI Core, simply just a custom Dockerfile and a configurable serving template adapted for SAP AI Core.
Commercially viability: They are all under MIT licenses or Apache 2.0 License.

Let's have a close look at each option from different aspects list above.

Ollama vs LocalAI in context of SAP AI Core

	Ollama(Recommended)	LocalAI
Description	"Ollama: Get up and running with Llama 2, Mistral, Gemma, and other large language models."	"LocalAI is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that’s compatible with OpenAI API specifications for local inferencing..."
Recommendation	Recommended if just inference LLMs/LMMs in SAP AI Core. See its AI capabilities below for detail.	Recommended if speech recognition, speech generation and image generation are also required apart from LLMs/LMMs.
AI Capabilities	-Text generation -Vision -Text Embedding	-Text generation -Vision -Text Embedding -Speech to Text -Text to Speech -Image Generation
Installation & Setup	Easy installation and setup	Assure to use the corresponding docker image or build with the right variables for GPU acceleration.
GPU Acceleration	Automatically detect and apply GPU acceleration	Supported. Require additional configuration on individual model along side with GPU built docker image.
Model Management	Built-in model management through CLI commands or APIs	Experimental model gallery May require additional configuration for GPU acceleration per model
Supported Backends	llama.cpp	multi-backend support and backend agnostic. Default backend as llama.cpp, also support extra backends such as vLLM, rwkv, huggingface transformer, bert, whisper.cpp etc. Please check its model compatibility table for details
Supported Models	Built-in Model Library	Experimental Model Gallery
Model Switching	Seamless model switching with automatic memory management	Supported
APIs	-Model Management API -OpenAI-compatible chat/complemtion API -Embedding API	-Model Management API -OpenAI-compatible chat/complemtion API -Embedding API
Model Customization	supported	supported
License	MIT	MIT

llama.cpp vs vLLM in context of SAP AI Core

	llama.cpp	vLLM
Description	"The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud."	"A high-throughput and memory-efficient inference and serving engine for LLMs"
AI Capabilities	-Text generation -Vision -Text Embedding	-Text generation -Vision -Text Embedding
Deployment & Setup	Easy deployment via docker. Many arguments to explore on starting llama.cpp server	Easy deployment via docker. Many engine arguments on starting vllm.entrypoints.openai.api_server
GPU Acceleration	Supported	Supported
Model Management	Not supported. Need external tool(wget etc) to download models from Hugging Face	Seamless integration with popular HuggingFace models
Supported Quantization	1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization	GPTQ, AWQ, SqueezeLLM, FP8 KV Cache
Supported Models	https://github.com/ggerganov/llama.cpp > Supported models	Supported Model
Model Switching	Not supported. One deployment for one model.	Not supported. One deployment for one model.
APIs	-OpenAI-compatible chat/complemtion API -Embedding API	-OpenAI-compatible chat/complemtion API -Embedding API
License	MIT	Apache-2.0 license

In this sample, I choose llama.cpp and vLLM server for their popularity. Similarly, the same approach is applicable to llama-cpp-python's OpenAI-compatible Web Server or LLaVa 's API Server etc.

Test Summary

I have conducted some tests on three open-source models(mistral-7b, mixtral, llava-7b) within SAP AI Core, such as:

chain of thought
general Q&A
write a haiku about Run Ollama etc. in AI Core,
customer message process in intelligent customer interaction(json mode),
public facility issue spotter in citizen reporting(llava vision with json mode).

Overall, the quality of text generation is good engough in my test cases.

Let's have a look at the generation speed by token# generation per second in my test cases. Since Ollama, LocalAI and llama.cpp are using the same backend as llama.cpp, so they almost share the same results.

Important note: Due to the limited number of my test cases, the test result may not be accurrate and objective. And it may yield different result with some other use cases. To choose an open-source LLM and its backend, you should conduct test based on your use cases.

Mistral-7b on Ollama, LocalAI and llama.cpp

Model: mistral:7b-instruct-q5_K_M (5.1 G)	Resource Plan of SAP AI Core: infer.s
Ollama, LocalAI and llama.cpp	Avg token generation per second: 30~40. Response time less than 10s in all of test cases

Here is one of sample log screenshot in SAP AI Core about token# per second of mistral on Ollama, LocalAI and llama.cpp

Mistral-7b on vLLM

Model: Mistral-7B-Instruct-v0.2-AWQ (4.15 G)

Resource Plan of SAP AI Core: infer.s

vLLM with the following engine arguments

--dtype half

--gpu-memory-utilization 0.95

--enforce-eager

--max-model-len 2048

--max-num-batched-tokens 2048

--max-num-seqs 2048

--quantization awq

Avg token generation per second: 40~50
Response time less than 8s in all of test cases

encounter error "cuda out of memory" on max-model-len etc as 4096 or beyond.

Here is one of sample log screenshot in SAP AI Core about token# per second of mistral on vLLM. The calculation of avg generation throghput in vLLM in screenshot below seems different from llama.cpp backend. In fact, vLLM's response time is slightly faster than Ollama given the same task in my test cases. Also the response time may also attribute to its quantization method.

Mixtral on Ollama, LocalAI and llama.cpp

Model: mixtral:8x7b-instruct-v0.1-q4_0 (26 G)	Resource Plan of SAP AI Core: infer.l
Ollama, LocalAI and llama.cpp	Avg token generation per second: 4~6 Response time take up to 60 s for customer message process use case. It seems due to insufficient GPU VRAM. Resource plan infer.l only has 1 T4 GPU(16 G VRAM) though 58 G RAM. Only 17 layers out of 33 layers in mixtral are offloaded to GPU.

Here is one of sample log screenshot in SAP AI Core about token# per second of mixtral on Ollama, LocalAI and llama.cpp.

Mixtral on vLLM

Model: Mixtral-8x7B-Instruct-v0.1-AWQ (24.65 G)	Resource Plan of SAP AI Core: infer.l
vLLM with the following engine arguments --dtype half --gpu-memory-utilization 0.95 --enforce-eager --max-model-len 512 --max-num-batched-tokens 512 --max-num-seqs 512 --quantization awq	error with Cuda out of memory even with max-model-len as 512 Likely it is a bug of vLLM according to this issue

Llava on Ollama

Model: llava:7b-v1.6 (4.7 G)	Resource Plan of SAP AI Core: infer.s
Ollama	Avg token generation per second: 40~50 Response time less than 10 s for public facility issue use case with json output.

Try it out

Please refer to this manual to try it out. The source code of this sample is released under Apache 2.0 license. You should be accountable for your own choice of commercially viable open-source LLMs/LMMs.

Explore further resources on generative AI at SAP

Summary

Self-hosted open-source LLMs in SAP AI Core with BYOM complement SAP Generative AI Hub by combining the efficiency and innovations of open-source LLM community with enterprise-grade security and scalability of SAP AI Core, which make open-source LLMs ready for enterprise. This also open the door of open-source LLM to SAP partners and customer for AI adoption in their business.

Disclaimer: SAP notes that posts about potential uses of generative AI and large language models are merely the individual poster's ideas and opinions, and do not represent SAP's official position or future development roadmap. SAP has no legal obligation or other commitment to pursue any course of business, or develop or release any functionality, mentioned in any post or related content on this website.

AI Foundation on SAP BTP: Q1 2024 Release Highlights

2024-04-05T09:47:35.574000+02:00

AI Foundation is SAP’s all-in-one AI toolkit, offering developers AI that’s ready-to-use, customizable, grounded in business data, and supported by leading generative AI foundation models. It is also the basis for AI capabilities that SAP embeds across its portfolio.

If you are not already familiar with the different capabilities of the AI Foundation stack represented below, read this introductory blog post.

Let's delve into the Q1 2024 release highlights of the AI Foundation, including innovations for the Document Information Extraction service, SAP Translation Hub, SAP AI Core and SAP HANA Cloud.

AI Services

Document Processing - Document Information Extraction

New: Combine Different Setup Types When Adding Data Fields to Schemas

Users can now combine header fields with different setup types in the same schema.
They can add header fields with the following setup types to a schema created for a standard document type (e.g. invoice) or custom document type:

auto (with and without a default extractor)
manual

This extends the scope of the existing standard schemas from SAP, such as for the invoice document and reduce time to value for new business fields (key-value pairs) that need to be extracted from documents.

New: Conversion of Country Specific Unit of Measure Values to ISO Format

The conversion of country specific unit of measure values into ISO format for invoice documents has been improved. For instance, the processing of certain locale-specific unit of measures (e.g. German 'Stk.' for 'Stück' / 'piece'). Users can expect improved quality of extraction and faster business process execution.

New: Purchase order number extraction from line-item level

Users can extract purchase order numbers that are available on-line item field level from invoice documents. It allows a faster execution of accounts payable processes when there are multiple PO numbers listed in the tables of supplier invoices.

New: New Invoice Supported Language – Japanese

The Document Information Extraction service supports now the Japanese language for invoice documents, improving global coverage.

New: Better Models for the Extraction of Standard Document Types

The machine learning models for the extraction of invoice, paymentAdvice, and purchaseOrder documents have been improved. Users can expect improvements in particular when extracting dates, amounts, tax ID, bank accounts.

Invoices: higher extraction accuracy can be expected for Japan, Hungary, Türkiye, and Romania.
Purchase orders: improved extraction accuracy can be expected for Spanish purchase orders.
Payment advice: more consistent column extraction and improved extraction of amounts in line-items.

Get started with Document Information Extraction.

Machine Translation - SAP Translation Hub

New: User settings for the document translation UI

Users can define the preferred UI language and theme in the application, which will be carried over across software and document translation UIs, thus improving the overall user experience. They can also define the preferred source and target languages to translate content using the application, which will apply only while using the document translation UI. It automates the translation process by not having to define the source and target language every time the application opens.

New: Updated tile in SAP BTP

As part of the efforts of migrating the application for software translation to the multi-cloud environment, the tile available on SAP Business Technology Platform has been renamed, from Document Translation to SAP Translation Hub. That’s one of the last steps towards completing the migration planned in Q2 2024.

Get started with SAP Translation Hub.

Generative AI Management & AI Workload Management

SAP AI Core

New: Availability of additional large language models in the generative AI hub

Integration of additional large language models (LLM):

Google PaLM 2 for text (text-bison), PaLM 2 for chat (chat-bison), and embeddings for text (textembedding-gecko)
Google Gemini Pro
Updates to Microsoft Azure OpenAI model versions

You can now leverage a greater selection of "best-to-fit" LLMs for your use case, ease exploration over market-leading generative AI models, without the need to go through lengthy contractual, legal discussions and finally harmonize lifecycle management across models.

Find more information here.

New: SDK for support of large language models

We introduced a series of features in SAP AI Core:

Technical libraries that simplify inference on large language models (LLMs) by automatically injecting the correct headers and paths into each request. It improves the extensibility, allowing users to add additional adaptations as needed.
Tooling for the effective integration and use of LLMs with LangChain in the context of the generative AI hub. It simplifies the developer experience with ready-to-use libraries for access to LLMs deployed using the generative AI hub.
A new library, the ai-core-llm-sdk, in addition to enhancements to the existing ai-core-sdk, to accommodate the required changes to support LLM access. It boosts efficiency when working with various LLM models by streamlining the deployment of LLM models and the querying of available models.

Get started with SAP AI Core.

Business Data & Context

SAP HANA Cloud

New: Support for storage and retrieval of vector embeddings in SAP HANA Cloud, called SAP HANA Cloud vector engine

A vector datastore manages unstructured data - such as text, images, or audio - in high-dimensional vector space as vector embeddings, to provide long-term memory and better context to AI models. This makes it easy to find and retrieve similar objects quickly, for example, by asking a question using natural language. This both simplifies interactions with large language models (LLMs) and empowers developers to securely implement generative AI in applications.

SAP HANA Cloud vector engine can now natively store and search vector embeddings, which are numerical representations of objects, along with business data as part of its industry-leading multi-model processing capabilities to power intelligent data applications.

With these vector capabilities, SAP HANA Cloud will enable Retrieval Augmented Generation (RAG), facilitating the combination of LLMs with private business data. These applications learn and adapt to new information, enabling automated decision-making.

Key benefits of the SAP HANA Cloud vector engine include:

Multi-model: Users can unify all types of data into a single database to build innovative applications using an efficient data architecture and in-memory performance. By adding vector storage and processing to the same database already storing relational, graph, spatial, and even JSON data, application developers can create next-generation solutions that interact more naturally with the user.
Enhanced search and analysis: Businesses can now apply semantic and similarity search to business processes using documents like contracts, design specifications, and even service call notes.
Personalized recommendations: Users can benefit from an improved overall experience with more accurate and personalized suggestions.
Optimized large language models: The output of LLMs is augmented with more effective and contextual data.

To deep dive into the SAP HANA Cloud Vector Engine, read these blog posts by our experts:

Build business-ready AI applications with SAP and stay updated!

Leverage the AI Foundation capabilities by visiting the SAP Discovery Center. Compare and select the service that fits most to your business needs. 
Explore the AI Foundation roadmap to discover past and upcoming innovations.
Engage with our community of SAP experts through Q&A and blog posts.

See you next quarter for exciting innovations!

SAP Stammtisch Plus - Gurgaon - May 2024 (H1)

2024-04-22T16:27:18.510000+02:00

Dear SAP Community Members,

We are planning to run SAP Stammtisch Plus in Gurgaon - May 2024(H1). Details will be shared very soon.
Let us get together to participate, connect and benefit from our peer community members on latest insights from SAP on:
1. #SAPBusinessAI,
2. #GenAI,
3. #SAPHANACloud Vector Engine,
4. Retrieval Augmented Generation(#RAG),
5. Knowledge Graph,
6. Enterprise Automation and much more.

Thank you to all the community members from another SAP Community Member!

Chandra

#SAPCommunity #SAPStammtischPlusGGN #SAPChampion

Vectorize your data for Infuse AI in to Business using Hana Vector and Generative AI

2024-04-27T05:08:45.073000+02:00

Vectorize your data for Infuse AI in to your Business

using SAP HANA Vector Engine, RAG & BTP Gen-AI

1. Introduction:

SAP Generative AI HUB and SAP HANA Vector Engine enable businesses to harness the power of AI .

SAP HANA Vector Engine is a high-performance in-memory computing engine that leverages vectorized processing techniques to achieve lightning-fast data analysis. By vectorizing business data, the engine enables businesses to process vast amounts of information in parallel, significantly reducing query execution times.

This blog post aims to inform you on how we are enhancing SAP's partner foundation AI model for utilization towards our business cases. We will also discuss how we utilize tools from SAP Gen AI and SAP HANA Vector Engine, as well as RAG techniques, to incorporate AI into our business use case.

The blog offers a comprehensive, step-by-step guide for implementing the GenAI RAG feature into your business operations.

2 Adapt GenAI in to Business

To adapt GenAI in to business , we have started with task specific instructions like RAG and Zero shot prompting.

AI capabilities have already been embedded in most of the SAP business processes through a tool known as JOULE. This digital assistant aids users in text summarization, writing, question & answering, and code generation.

Within the BTP AI Foundation, the SAP AI Launchpad serves as the gateway to access GenAI HUB, ML operations, and the prompt editor. From the GenAI Hub, users can easily access any partner-built or SAP-built foundation models.

Users can easily access to the prompt engineering has been configured with foundation models. From this tool you can easily leverage the Gen AI capabilities such as text summarization, writing, question & answering, code generation, sentiment analysis, and auto-responses for customer service …etc. to your business.

Furthermore, it also effectively identifies potential risks and fraudulent activities through comprehensive data analysis, swiftly detecting any present anomalies. This unique solution significantly enhances customer satisfaction by adeptly responding to customer queries, ensuring a seamless and personalized experience.

Moreover, with its extraordinary ability to analyze extensive customer data, AI offers personalized recommendations and experiences tailored to each individual business's needs.

3. Author

Julian Bellarmin working as a SAP certified cloud solution BTP architect in Tech Mahindra .

Kedar Kulkarni Leading SAP BTP competency solutions.

Saranya Sampath working as a BTP consultant.

4.Vectorization , Retrieval and Generation:

4.1 Process Flow Diagram:

New column Data type: REAL_VECTOR
New vector Constructor : TO_REAL_VECTOR
New similarity search distance function:L2DISTANCE(),COSINE_SIMILARITY()

4.2 Process Steps :

Upload function:

we are collecting the TEXT and PDF document from the directory .
Converting all the documents in to chunks.
Converting all the chunks in to text embeddings through Gen AI HUB Proxy-lang chain embedding libraries and text-embedding ada model 002.
Converting all the embeddings in to vectors by TO_REAL_VECTOR constructor.
Storing the embeddings as vectors into HANA cloud vector DB.

Semantic Search function:

Getting the user query.
Converting the query in embeddings.
Performing the cosine similarity search by passing the query vector.
Extracting the top ranked results.

Text Generation through LLM:

Refine the prompting context according to the specific business requirements.
Passing the context and search result to partner Build foundation Models - GPT-35-TURBO/ GPT4 LLM.
Capturing the LLM response and display to the user.

5 Configuration of BTP AI services.

5.1 Pre-Requisites:

SAP BTP Enterprise Account.
SAP AI Core should be in your global account this has to be in extended service plan.
SAP AI Launchpad should be in standard service plan.
SAP HANA Instance – For configure the REAL VECTOR
If you are using trial version, then you can update this to extended / standard service plan.

5.2 Configuration in BTP Launchpad

5.2.1 Roles Assignment:

You need to have the right roles and permission to access the GEN AI Hub , AI Launchpad , ML Operations, Work space and AI core admin.

5.2.2 Service Key Creation:

Create the service Keys for AI CORE Instance. You can create your own service keys and get the credentials to make the connection between AI Core and python colab tool

5.2.3 Connection and Resource Group Creation:

You can create your ai-core connection and resource group. Here ‘default’ will be the resource group name.

5.2.4 Validate your Foundation Model - AZURE-OPENAI:

Click on the AI launchpad Go to ML Operations -> Scenarios.

Check whether the foundation-models scenario is present in your AI-Core workspace.

Click on the Azure-openai link to know your model supported in AI core by default.

Here we are using gpt-35-turbo , gpt-4 and text-embedding-ada-002

You can see the list of supported model in AI core – as of APR 26-2024.

5.3 Model Configuration:

Go to ML Operations -> Configurations. Click on the Create button to configure the models.

Enter the parameters and click on Next button.

Feed your required foundation model name and version.

You can see your configured model , once it has been created.

5.4 Creating your deployment model:

Create your deployment which is supported by the foundation model.

Choose your executables from the below options.

Once you created the deployment model , wait for the current status to be set to RUNNING.

Note down your deployment model id , which will be used when you want to use the LLM context.

5.5 Validate the prompts with the deployed model

In the Generative AI Hub, you can manage your prompts in the Prompt Management page. Go to Generative AI Hub -> Prompt Management. Here you can click on a prompt to go to its details page

Configure the parameters for the model response.

Frequency Penalty:

Number can be between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.

Presence Penalty :

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.

Max-Tokens :

The maximum number of tokens allowed for the generated answer.

You need to increase the response token size to get the larger result data.

Temperature :

What sampling temperature to use, between 0 and 2. Higher values will make the output more random, while lower values will make it more focused and deterministic.

You can validate your prompts with different contexts.

6. Create of HANA Cloud Vector Instance

From APR 2024 onwards , SAP supports REAL_VECTOR column in Hana DB containers. Create your Hana cloud instance and note down your DBADMIN credentials , host name & port number which will be used again to make the connection. Start the instance through click on hana cloud central. Once it’s started Open your Data Base explorer.

7 Vectorize your PDF and TEXT file:

7.1 Pre-requisites:

Python3 installed in your system.[ You can use google colab]

Have generative-ai-hub-sdk installed in your system.

[With this SDK you can use all the generative Models available in SAP's gen AI hub.]

7.2 Create the folder directory and upload the text / PDF files.

7.3 install the HANA ML and GenAI Hub SDK libraries

With this SDK you can leverage the power of generative Models like chatGPT available in SAP's generative AI hub.

7.4 Set the environment variable.

Update your AI core service key credentials and execute the below script to set the ENV variables.

7.5 Make the Connection Context to the HANA Vector store

Import the DBAPI libraries to invoke the connection.

7.6 Converting all the documents in to chunks

Use the below libraries for the PDF documents.

Use the below libraries for the text documents.

You can check the size of the chunks of your file by execute the len(text_chunks)script.

Based on the business data , you have to carefully set the chunk size and its overlap size.

7.7 Converting all the chunks in to embeddings and store it as vectors in HANA vector DB

For converting my text /pdf file in to embeddings , we are using langchain init embedding libraries.

For that we are using text-embedding-ada-002 model.

Once you have executed the above script , you can see the “MyTest_VecTable” has been created in the Hana DB with REAL_VECTOR data type.

You can open the table data and see all your chunks will be converted vectors in the REAL_VECTOR column.

Note: Don’t use the data upload option to load these vectors. That will give the error.

8 Extract query results from vector DB:

Now its’s time to validate the vector Store and GEN AI Deployment Models by pass the user prompt query.

8.1 Getting the user query and Convert the query in to embeddings.

We are using text-embedding-ada-002 / text-embedding-ada-002-v2 model for convert the input query strings to Embedding Text.

8.2 Performing the cosine similarity search by pass the query vector.

You need to Pass the ‘Embedding text query’ in to ‘vector store’ by passing ‘select SQL query’ for fetch highly matching results. You can use L2DISTANCE or COSINE_SIMILARITY functions based on your need.

8.3 Extracting the top ranked results.

You can verify the TOP ranking chunk results which is returned from the hana vector store.

9. Text Generation through LLM:

9.1 Configure the prompting context based on the business requirement.

Prompt is guiding GenAI models to produce desired output.

9.2 Pass and Fetch the response from GenAI Hub Deployed model.

In this use case, we are fetching the results based on the RAG [Retrieval Augmented Generation] flag. Make sure that , you are using the right deployment model– which you configured in AI HUB.

9.3 Capture the LLM response from the model and display it.

Here I have set my RAG Flag is True.

You can see the response which is generated by GPT-35-Turbo based on the context which is returned from vector store.

Now modify the [ Retrieval Augmented Generation] RAG flag is false. Then see the response which is directly comes from LLM without referring to HANA vector results.

Now your model is good to respond for any question based on the RAG flag.

10.Create User Interactive to demo the RAG technique

Here is the script to create the user interactive components. Put all your queries in to option array list.

Make the RAG is False by un-select the check box and click ASK LLM button and see the response.

Make the RAG is True by select the check box and click ASK LLM button and see the response.

11. Conclusion:

Infusion of AI capabilities ultimately results in improved business outcomes, enhanced decision-making, and a competitive edge in the market. After this simple use case , understood that, we can do classification, Answering Questions, content generation , zero shot prompting (Task Automation) , Creative writing, language translation ,Personalized Interaction, sensitivity analysis ,summarization, Zero shot prompting , extract content….etc. to our business use case.

This blog provides a comprehensive understanding of the seamless data loading and extraction process using the powerful HANA VECTOR DB and GEN AI solution to achieve the RAG AI feature.

You can also see how accurately this multi-dimensional Hana vector engine fetching the results and your how your LLM deployment model is helping you for generate text based on the context provided for end users.

Be a Cockroach: A Simple Guide to AI and SAP Full-Stack Development - Part I

2024-05-09T19:14:30.935000+02:00

Disclaimer: This blog delves into SAP full-stack development, incorporating SAP RAP for both front-end and back-end functionalities, Integration Suite for middleware tasks, TensorFlow for AI/ML modeling, and crafting a personalized system dashboard. Geared towards simplifying complex business systems through engaging stories, it's for those wanting to understand these concepts. If you're already well-versed, feel free to just stop here and explore other content.

I) Introduction (Story):

Most people came here to see what he has to say about cockroaches and their connection to SAP development. Some might be curious to learn about it. Few of you could feel grossed out and think cockroaches are yucky and ugly! But after reading this section, I hope you'll respect these special creatures called “COCKROACH”.

The world is more than 200,000 years old. Lots of animals have lived and expired over time. One of them was dinosaurs—they were huge and strong. Due to changes in the tectonic plates and environment, the Dinosaurs couldn't survive and disappeared. Now, you can see it in movies using computer-generated effects. However, cockroaches have been around for over 1,00,000 years and you can still come across them in your kitchen, living room, bathroom, trash can, and everywhere else. Cockroaches have adjusted to changes in their surroundings, which is why they still exist on the planet.

Even though companies produce items like HIT and Baygon to kill them, cockroaches survive by figuring out how to deal with those products. A study found that certain products cause some cockroaches to behave as if they're intoxicated. Their bodies adapt to these substances. This ability to change is known as adaptation.

There aren't any charities or groups dedicated to protecting cockroaches like there are for elephants and dolphins. Everyone wants to kill cockroach, yet these humble cockroaches keep on living and undefeated by human, and their existence is inevitable. Now that you understand, I trust you hold a greater appreciation for the modest cockroach. Similarly, Consultants must also adjust to shifts in market trends.

The IT market is going through changes like "Artificial Intelligence" and a higher demand for “SAP Full-Stack developers” and so on. In this blog we are go-through about SAP Full-Stack development.

(The story about cockroaches was inspired by a book called "11 Rules for Life" written by Chetan Bhagat)

II) Agenda:

Getting to Know SAP FullStack: Understanding the Full Stack
Background Insights: Delving into the Story Behind
System Design Visualization: Seeing the Technical Architecture
Practical Implementation: Learning the Practical Details Step-by-Step
Wrapping Up: Summarizing Key Points and Looking Ahead
Further Reading: Offering Links to More Research Materials

III) Getting to Know SAP Full-Stack: Understanding the SAP Full-Stack development

Many of our consultants believe that full stack development involves only front-end and back-end development. However, when it comes to SAP development, there's a third component: middleware.

In today's market, relying on just one system isn't enough. Data comes from various sources, and we need to consolidate it to generate reports. So, if you're interested in becoming an SAP full-stack developer, this blog will help you understand the basics of all three elements: front-end, middleware, and back-end, and explain them practically.

Fig1: Basic diagram for Full-Stack development

IV) Background Insights: Delving into the Story Behind

Sundhara Pandian is a big part of Kotravai Coffee Group in Queenstown, New Zealand. He makes really good coffee. His trick? He gets top-notch coffee beans by following a careful process.

Instead of just buying beans whenever he wants, Sundhara Pandian sends a request called a Purchase Order (PO) to a big coffee supplier in Bremen, Germany. But it's not as simple as filling out a form. The PO has to go through a smart system with AI and automation. This system checks the beans in the supplier's stock and confirms the order.

But Sundhara Pandian's job doesn't stop there. The results from the system are put into SAP S/4HANA Cloud. This helps keep track of orders and how much coffee is left.

Basically, Sundhara Pandian does more than just make coffee. He's good at handling complicated systems to make sure Queenstown always has enough beans for coffee lovers.

Let's dive into SAP full-stack development through Sundara Pandian's story as we build the SAP RAP App for PO cockpit, iFlows and AI/ML model.

V) Technical Architecture Diagram: Visualizing the System Design

Fig2: Architecture diagram for E2E – Full Stack development with SAP applications

VI) Technical Implementation: Step-by-Step Integration Details

AIM: (Custom Cockpit and Integrated Goods Receipts):

We're making a simple app for custom Purchase Orders (PO) with basic fields.
Obtaining the GR information electronically, then utilizing a machine learning algorithm to automatically update the GR within our application system (GR automation).
We're getting details about Goods Receipts (GR) from GR automation system and showing them in the custom PO form.

Target Audience:

Individuals with 0 to 4 years of experience in SAP.
Enthusiastic learners eager to explore new concepts and expand their knowledge base.

Before proceeding, please ensure the following prerequisites are met:

Familiarity with basic concepts of SAP S/4HANA Cloud and SAP RAP.
Activation of the ADT-Eclipse, Postman, Integration Suite API plan in your SAP BTP entitlement.
Understanding of fundamental AI and Automation concepts.
Knowledge related to Python, Javascript, System landscape and UI

Step 1: Install the ADT package in the Eclipse. You can refer the below link to download the ADT package in the eclipse.

ADT Link: https://tools.hana.ondemand.com/#abap

Step 2:

Step two has been divided into two sections, each detailed below:

Section-A: Connect to the S/4 HANA Cloud system.
Section-B: Create the modelling with SAP RAP

Go to help-> About Eclipse IDE -> Double check the below highlighted icon from your system.

Section-A: Connect to the S/4 HANA Cloud system

To connect the S/4 HANA Cloud system, please use the below steps.

Type the S/4 HANA Cloud web address: https://<Host>.s4hana.cloud.sap. Remember, select client 80. This client is used for making changes to the system.

Click the "Copy Logon URL to Clipboard" button. Then, open your web browser and paste the URL there. Enter your login details and press enter. After successful login, you'll see the screen below.

Click "Finish" and create your own package name. I've used "ZDEMO" here.

Part 1: Key Takeaway:

Gaining foundational knowledge of systems and landscapes through story-based learning.
Grasping the fundamentals of full-stack development with diagrams
Setting up Eclipse for SAP ADT and establishing a connection to the S/4 HANA Public Cloud system.

Part 2: Coming soon - Keep an eye out for an exciting story-based learning journey as we delve into creating SAP RAP applications.

Part 3: Coming soon - Embark on an exploration of AI and ML with our forthcoming model, create system with a dashboard for ( GR Automation )

Part 4: Coming soon - Learn how to connect the circuit using Integration Suite in our upcoming installment.

Author: If you find this information helpful, please consider clicking the "Like" button on this blog and sharing your thoughts in the comments section below. You can also connect with the author on their LinkedIn profile: [Author's LinkedIn Profile]