Einstein Trust Layer for Salesforce AI Specialists

This reference:

/content/learn/trails/drive-productivity-with-einstein-ai

/s/articleView?id=sf.generative_ai_trust_layer.htm&type=5

/meet-salesforces-trusted-ai-principles/

/einstein-ai-trail/

/content/learn/modules/large-language-models

preamble

With this year's DreamForce, salesforce is targeting AI for a stronger promotion, with two AI certificates free exams for salesforce practitioners. The next few blogs will focus on the knowledge of the Salesforce AI Specialist certificate exams, a better understanding of the functions of Einstein and how to better empower our businesses through Salesforce AI. Although I failed the exam two days ago and unfortunately missed only one question, I feel that some of the content is very interesting and may be used in the future, so I expect to start a few pieces of content and make blogs based on the main Topics of the AI Specialist. No question bank, no need to private chat/message question bank questions.

Einstein Trust Layer: 15% of exam
Generative AI in CRM Applications: 17% of exam
Prompt Builder: 37% of exam
Einstein Copilot: 23% of exam
Model Builder: 8% of exam

This post is about the Einstein Trust Layer, so let's take a look at how Salesforce is empowered by AI and how it addresses data privacy, data security, and other pain points.

Einstein Introduction to Generative AI Terminology

nomenclature	descriptive
Artificial intelligence (AI)	A branch of computer science in which computer systems use data to reason, perform tasks, and solve problems with human-like reasoning
Bias bias	Due to inaccurate assumptions in the machine learning process, systematic and reproducible errors in computer systems can produce unfair results in ways that are different from the intended function of the system.
Corpus text corpus	A large text dataset for training LLM.
Domain adaptation domain adaptation	The process of adding organization-specific knowledge to the Prompt and the base model.
Fine-tuning trimming	The process of adapting a pre-trained language model to a specific task by training on a smaller task-specific dataset.
Generative AI gateway Generative Artificial Intelligence Gateway	The gateway exposes standardized APIs to interact with the underlying models and services provided by different vendors in the internal and partner ecosystems
Generative Pre-Trained Transformer (GPT) Generative Pretraining Transformer (GPT)	A series of language models that are trained on large amounts of textual data in order to produce human-like text.
Grounding grounding	The process of adding domain-specific knowledge and customer information to the prompt provides the context needed for the model to respond more accurately.
Hallucination figment of one's imagination	However, given the context, the text output by the model is actually incorrect or almost meaningless
Large language model (LLM) Large Language Model (LLM)	A language model that consists of a neural network that has many parameters trained on a large amount of text.
Machine learning machine learning	A subfield of artificial intelligence that specializes in computer systems designed to learn, adapt, and improve based on feedback and inferences from data rather than explicit instructions.
Natural Language Processing (NLP) Natural Language Processing (NLP)	A branch of artificial intelligence that utilizes machine learning to understand human-written language. Large-scale language modeling is one of the many approaches to NLP.
Prompt	A natural language description of the task to be accomplished. LLM input.
Prompt chaining cue chain	Methods for breaking down complex tasks into several intermediate steps and then putting them back together so that AI generates more specific, customized, and better results.
Prompt design draw attention to sth.devise	Cue design is the process of creating cues that improve the quality and accuracy of model responses. Many models require specific cue structures, so it's important to test and iterate on them on the model you're using. Knowing which structure is best suited for the model allows you to optimize the cue for a given use case.
Prompt injection	A method of controlling or manipulating the output of a model by giving it certain cues. Through this method, users and third parties attempt to bypass restrictions and perform tasks that the model was not designed for
Prompt instructions prompt	Prompt commands are natural language commands entered into a prompt template. Your users simply send instructions to Einstein. Instructions have verb-noun structure and LLM tasks, such as "Write a description of no more than 500 characters". Your user instructions are added to the application's prompt template and the associated CRM data replaces the template's placeholders. The prompt template is now a grounded prompt and sent to LLM.
Prompt template Tip templates	Strings with placeholders that will be replaced with business data values to generate instructions to be sent to the LLM final text.
Retrieval-augmented generation (RAG) Retrieval Augmentation Generation (RAG)	A basic form that uses information retrieval systems such as knowledge bases to enrich relevant contextual cues for reasoning or training.
Semantic retrieval semantic retrieval	Scenarios that allow LLM to use similar and relevant historical business data that exists in the customer's CRM data.
Toxicity poisonous	Terms describing many types of discourse, including, but not limited to, offensive, unreasonable, disrespectful, offensive, harmful, abusive, or hateful language.
Trusted AI credible artificial intelligence (AI)	Salesforce created a guide focused on responsible AI development and implementation.

Einstein Trust Layer

Let's use the demo below to introduce today's topic. The gif below is an Email that selects a specific Prompt Template (more on that later) and then generates content based on gpt4. What is the process from selection to data generation back to salesforce?

1. Secure Data Retrieval & GroundingSecure data retrieval and grounding: In order for LLM to generate more relevant and personalized responses, it needs additional context from the CRM data. This process of adding additional context to the Prompt prompt is what we call Grounding. We can build Prompt prompts by merging fields with CRM data, which can be record fields, flows, Apex, Data Cloud DMOs, and related lists. in simple terms, grounding can be understood as the process of using the merge field/placeholder for additional context.

Secure data retrieval means that Prompt is based only on the data that the executing user has access to, and when we use grounding, such as record fields or related lists, with secure data retrieval we can ensure that it is based only on the data that the executing user has access to. The data retrieval process follows the existing access controls and permissions in Salesforce:

Data retrieval for grounding the prompt is based on the permissions of the user executing the prompt.
The data retrieval used to support prompts is based on the permissions of the user performing the prompt.
Data retrieval for grounding the prompt preserves in place all standard Salesforce role-based controls for user permissions and field-level security when grounding data from your CRM instance.
When data from a CRM instance is grounded, data retrieval for grounding prompts retains all standard Salesforce role-based user permissions and field-level security controls.

2. Data Masking for the LLM Data Masking for LLM: The Einstein Trust Layer identifies and masks personally identifiable information (PII) and payment card industry (PCI) data selected in the prompt and then sends it to the Large Language Model (LLM). Data masking prevents your sensitive data from being exposed to the LLM and stores your sensitive CRM data securely within Salesforce. Use schema and context to identify sensitive data. Then use placeholder text to mask the data to prevent it from being exposed to external models. The Einstein Trust Layer temporarily stores the relationship between the original entity and its respective placeholders. This relationship is later used to decrypt the data in the generated response. This step is associated with the subsequent Data Demasking. We can also use the audit trail to track data desensitization and view desensitized data. audit trail is stored in the data cloud. In addition to regular PII and PCI fields, salesforce also supports custom encryption/masking of some sensitive fields.

You need to enable Einstein Generative AI and Data Masking first, and then search for theThe Einstein Trust Layer can then be manipulated.

I don't have this feature in my dev environment, so I don't have the relevant screenshots. For more details, please refer to:/s/articleView?id=sf.generative_ai_mask_select.htm&type=5

3. Prompt Defense： To help decrease the likelihood of the LLM generating something unintended or harmful, Prompt Builder and Prompt Template Connect API use system policies. System policies are a set of instructions to the LLM for how to behave in a certain manner to build trust with users. For example, we can instruct the LLM to not address content or generate answers that it doesn’t have information about. System policies are one way to defend against jailbreaking and prompt injection attacks.

draw attention to sth.Defense: To help reduce the likelihood of LLM generating unexpected or harmful content, the Prompt Generator and Prompt Template Connection APIs use system policies. A system policy is a set of instructions for how LLM should behave in a certain way to establish trust with users. For example, we can instruct LLM not to process content or generate answers for which it has no information. System policies are a defense against jailbreak and hint injection attacks.

4. LLM Gateway & Zero Data Retention: The steps on the salesforce side have been prepared, nextResponse Generation Response Generation Session. After prompting for the completion of sensitive information protection, it can be sent through the LLM gateway. This gateway manages interactions with different model providers and represents a unified and secure way to communicate with multiple LLMs. The gateway and model providers use TLS encryption to secure data in transit.Salesforce has a zero-data retention policy in place with external partner model providers, such as OpenAI or Azure Open AI. This policy states that data sent from Salesforce to LLM is not retained, but rather deleted after the response is sent back to Salesforce.

5. Toxicity Detection Toxicity Detection: Generated responses are scanned for toxicity. The detection process includes a toxicity confidence score that reflects the probability of a response containing harmful or inappropriate content. The toxicity score and categories are stored in the data cloud. We can run the report in the data cloud by selecting theGenAIGatewayResponse with GenAIContentCategory report, set upfilter：Detector Type istoxicity。

6. Data Demasking Data decryption:The placeholders we created to mask data during prompting have now been replaced with actual data. The relationships between the original entities and their respective placeholders are used to rehydrate the response so that the response is useful and meaningful when sent back.

7. Feedback and Audit Feedback and Review: When a response appears in Salesforce, you can accept, modify, or reject it. You can also provide explicit feedback. Your explicit feedback on a response is captured and stored in the Data Cloud as part of the audit and feedback data (audit trail). Depending on the artificial intelligence capabilities, your implicit actions on responses can also be captured and stored in the Data Cloud.

Audit Trail also includes the original prompt, masked prompt, scores logged during toxicity detection, the original output from the LLM, and the demasked output.
The audit trail also includes raw cues, masked cues, scores recorded during toxicity testing, raw output from the LLM, and masked output.

Audit and feedback data are stored in your instance of Data Cloud. You have control over how long that data is stored in your instance of Data Cloud. Additionally, audit and feedback data are stored by Salesforce for 30 days for compliance purposes.
Audit and feedback data is stored in your Data Cloud instance. You can control how long the data is stored in your Data Cloud instance. In addition, Salesforce stores audit and feedback data for 30 days for compliance purposes.

Through the above 7 steps, a life cycle of AI-based content generation is completed.

Summary:This article mainly introduces Einstein Truest Layer how to protect data security in the process of AI execution and compliance processing, there are errors in this article welcome to point out, there is no understanding of welcome to leave a message.