Do you have trouble telling whether a piece of text was written by a human or generated by AI? Recognizing AI-generated content is critical to improving information credibility, addressing attribution errors, and suppressing misinformation.
Today.Google DeepMind and Hugging Face are very jointly announcing theTransformers v4.46.0 In the release, we have officially launched theSynthID Text Technology. This technology is able to be utilized through the use oflogits processor Add a watermark to the generated task and utilize thesorter Detect these watermarks.
For details of the technical implementation, please refer to the paper published in Nature (Nature)SynthID Text Thesisand Google'sResponsible Generative AI ToolkitLearn how to apply SynthID Text to your product.
Working Principle
The core goal of SynthID Text is to embed watermarks into AI-generated text so that you can determine whether the text was generated by your Large Language Model (LLM) without affecting the functionality of the model or the quality of the generation.Google DeepMind has developed a watermarking technique that enhances the generation process of any LLM by using a pseudo-random function (g function). This watermark is not visible to humans, but can be detected by trained models. This function is implemented as aGeneration Toolscan be used()
The API is compatible with any LLM, requires no modifications to the model, and provides a completeEnd-to-end example, showing how to train the detector to recognize watermarked text. Specific details can be found inResearch papers。
Configuring Watermarks
The watermark is passed through adata type configuration, this class parameterizes the g function and defines how it is applied in the sampling process. Each model should have its own unique watermark configuration and must beSecure and private storage, otherwise others may copy your watermark.
In the watermark configuration, two key parameters must be defined:
-
keys
Parameters: This is a list of integers to calculate the score of the g function on the model vocabulary. It is recommended to use 20 to 30 unique random numbers to strike a balance between detectability and generation quality. -
ngram_len
Parameter: used to balance robustness and detectability. The higher the value, the more easily the watermark can be detected, but also the more susceptible it is to interference. The recommended value is 5 and the minimum value should be 2.
You can also adjust the configuration to suit your actual performance needs. More information is available atSynthIDTextWatermarkingConfig Class. The research paper also analyzes the specific effects of how different configuration values affect watermarking performance.
application watermark
Applying watermarks to text generation is very simple. You just need to define the configuration and set theSynthIDTextWatermarkingConfig
object aswatermarking_config=
parameter passed to()
The generated text will then automatically carry a watermark. You can add a watermark to theSynthID Text Space Experience the interactive example in and see if you can detect the presence of a watermark.
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
SynthIDTextWatermarkingConfig,
)
# Initialize the model and splitter
tokenizer = AutoTokenizer.from_pretrained('repo/id')
model = AutoModelForCausalLM.from_pretrained('repo/id')
# configure SynthID Text
watermarking_config = SynthIDTextWatermarkingConfig(
keys=[654, 400, 836, 123, 340, 443, 597, 160, 57, ...],
ngram_len=5,
)
# Generating text using watermarks
tokenized_prompts = tokenizer(["your prompts here"])
output_sequences = (
**tokenized_prompts,
watermarking_config=watermarking_config,
do_sample=True,
)
watermarked_text = tokenizer.batch_decode(output_sequences)
Detecting Watermarks
Watermarks are designed to be nearly imperceptible to humans, but can be detected by trained classifiers. Each watermark configuration requires a corresponding detector.
The basic steps for training the detector are as follows:
- Determine a watermark configuration.
- Collect a training set containing both watermarked and unwatermarked text, divided into a training set and a test set, with a recommended minimum of 10,000 examples.
- Use the model to generate text without watermarks.
- Use the model to generate watermarked text.
- Training a watermark detection classifier.
- Put the watermark configuration and corresponding detectors into the production environment.
Transformers provides aBayesian detector classand comes with aEnd-to-end example, showing how to train detectors using a specific watermark configuration. If multiple models use the same splitter, the watermark configuration and detector can be shared, provided the training set contains samples from all relevant models. This trained detector can be uploaded to the private Hugging Face Hub to make it available within the organization.Google'sResponsible Generative AI Toolkit Provides more guidance on putting SynthID Text into production.
limitation
SynthID Text's watermarking is still effective with certain text distortions, such as truncation, small vocabulary changes, or minor rewrites, but has its limitations:
- In factual responses, watermarking is applied weakly because there is limited space for augmentation generation, which may otherwise reduce accuracy.
- If the AI-generated text is completely rewritten or translated into another language, the confidence level of the detector may be significantly reduced.
While SynthID Text can't directly stop purposeful attackers, it can make it more difficult to abuse AI-generated content and combine it with other methods to cover a wider range of content types and platforms.
Original in English./blog/zh/synthid-text
Authors: Sumedh Ghaisas (guest), Sumanth Dathathri (guest), Ryan Mullins (guest), Joao Gante, Marc Sun, Raushan Turganbay
Translator: Luke, Hugging Face Fellow