New research suggests that watermarking tools aimed at blocking AI image editing could backfire. Instead of stopping changing models like stable diffusion, actually some protection Help AI follow editing prompts become more tightly designed and unnecessary operations are even easier.
There is a striking and robust chain in the computer vision literature, specializing in protecting copyrighted images from training AI models and being used directly in the Image > Image AI process. This type of system is generally targeted at potential diffusion models (LDMs), such as stable diffusion and flux, which use noise-based procedures to encode and decode images.
By inserting hostile noise into an image that otherwise appears normal, it is possible to cause image detectors to incorrectly infer image content and to induce a hobble image generation system from leveraging copyrighted data.
From the MIT paper, we will introduce an example of a source image (the bottom row) for operations, from “Increasing the cost of malicious AI-equipped image editing.” Source: https://arxiv.org/pdf/2302.06588
Since the artist’s backlash against the liberal use of stable, spreading web scrap images (including copyrighted images) in 2023, the research scene has produced multiple variations on the same theme.
In all cases there is a direct correlation between the intensity of the imposed perturbation, the degree to which the subsequent image is protected, and the degree to which the image does not look as good as it should be.
The quality of the research PDF does not fully illustrate the problem, but the amount of victimization of adversarial perturbations is increasing due to security. Here we look at the range of quality obstruction in the 2020 “Fawkes” project led by the University of Chicago. Source: https://arxiv.org/pdf/2002.08327
Of particular interest for artists seeking to protect styles from unauthorized appropriation is to “convince” that no connection will be formed between the semantic and visual domains for “protected” training, not only obfuscating identity and other information, but also to “convince” the AI training process that is seeing something other than what you actually see (i.e., such a quick and fast thing. “In Paul Cree’s style.”).
Mist and Gl drugs are two common injection methods that can prevent you from trying to use copyrighted styles in your AI workflow and training routines. Source: https://arxiv.org/pdf/2506.04394
My goal
Now, a new study from the US has discovered that not only can perturbations fail to protect images, but it is actually possible to add perturbations. Improve Image exploitation potential in all AI processes that perturbations are intended to immunize.
The paper states:
‘Experiments using a variety of perturbation-based image protection methods across multiple domains (natural scene images and artwork) and editing tasks (image generation and style editing) reveal that such protection does not achieve this goal completely.
‘In most scenarios, diffusion-based editing of protected images will produce the desired output image that is accurately attached to the guidance prompt.
“Our findings suggest that adding noise to images can paradoxically increase the association with a given text prompt during the generation process, leading to unintended consequences. Better Editing the results.
“We therefore argue that perturbation-based methods may not provide sufficient solutions for robust image protection for diffusion-based editing.”
In tests, protected images were exposed to two familiar AI editing scenarios. Easy generation and style transfer from images to images. These processes reflect the common ways that AI models can leverage protected content by directly modifying images or borrowing their stylistic properties for use elsewhere.
Protected images drawn from standard sources of photographs and artwork passed through these pipelines to see if the added perturbations could block or break down the edits.
Instead, the presence of protection often sharpened consistency with the model prompts, producing clean and accurate outputs where failures were expected.
The author advises that in effect this highly popular method of protection may provide a false sense of security, and that such perturbation-based vaccination approaches should be thoroughly tested against the author’s own methods.
method
The authors performed the experiment using three protective methods that applied carefully designed adversarial perturbations. Mist; and GL drugs.
One of the frameworks tested by the author, Glaze offers examples of glaze protection for three artists. The first two columns show the original artwork. The third column shows the results of unprotected imitation. The fourth style moving version used to optimize the cloak along with the target style name. Columns 5 and 6 show the results of imitation with cloaking applied at the perturbation level p = 0.05 and p = 0.1. All results use a stable diffusion model. https://arxiv.org/pdf/2302.04222
Photoguards were applied to images of natural scenes, while fog and glaze were used in artwork (i.e., “artistically styled” domains).
The tests covered both natural and artistic images to reflect the potential for real-world use. The effectiveness of each method was assessed by checking whether the AI model could generate realistic and rapid edits when working with protected images. If the resulting image was persuasive and matched the prompt, the protection was deemed to have failed.
Stable diffusion V1.5 was used as a pre-trained image generator for researcher editing tasks. Five seeds were selected to ensure reproducibility: 9222, 999, 123, 66, and 42. All other generation settings, including guidance scale, intensity, and total steps, following the default values used in the photoguard experiment.
Photoguard was tested with natural scene images using the FlickR8K dataset.
Opposite thoughts
With the help of Claude Sonnet 3.5, two sets of modified captions were created from the first caption of each image. One set included a prompt Close to the context In the original caption. Other sets included that prompt Contextically distant.
For example, from the original caption “A young girl in a pink dress entering a wooden cabin”the nearest prompt is “A boy in a blue shirt going to a brick house”. In contrast, a far The prompt is “Two cats relaxing on the sofa”.
Intimate prompts were constructed by replacing nouns and adjectives with semantically similar terms. A distant prompt was generated by instructing the model to create a very different caption contextually.
All generated captions were manually checked for quality and semantic relevance. We used Google’s Universal Sente Encoder to calculate the semantic similarity scores between the original and modified captions.
From the supplementary material, the semantic similarity distribution of modified captions used in the FlickR8K test. The graph on the left shows similarity scores for closely modified captions, with an average of approximately 0.6. The graph on the right shows the captions that have been extensively modified, on average around 0.1, reflecting the large semantic distance from the original caption. Values were calculated using Google’s Universal Sente Encoder. Source: https://sigport.org/sites/default/files/docs/incupleteprotection_sm_0.pdf
Each image was edited using both the close and far prompts, along with the protected version. Image quality was assessed using a blind/reference image space quality evaluator (Brisque).
The result of image generation from natural photo images protected by photo guard. Despite the presence of perturbations, stable diffusion v1.5 was successful in both small and large semantic changes in the editing prompt, producing realistic outputs that matched the new instructions.
The generated image scored 17.88 on the brisk, 17.82 on the close prompt and 17.94 on the far prompt, and the original image scored 22.27. This indicates that the edited images remained in close quality to the original.
metric
To determine how well the protection interfered with AI editing, the researchers used a scoring system that compared the image’s content to a text prompt to measure how closely the given instructions matched the final image, and see how well they aligned.
To this end, Clip-S metrics use a model that can understand both images and text, and PAC-S++ adds additional samples created by AI to fit the comparison more closely into human estimation.
These Image Text Alignment (ITA) scores indicate how accurately the AI followed the instructions when modifying a protected image. If the protected image still leads to a highly aligned output, it means that the protection was considered to have We’re screwed Block editing.
The impact of protection on Flickr8K datasets across five seeds. Image text alignment was measured using CLIP-S and PAC-S++ scores.
Researchers compared how well AI went when editing unprotected and unprotected images. They first saw the difference between the two called, Actual changes. Then I expanded the difference to create a Change rateyou can compare results more easily on many tests.
This process revealed whether protection made it more difficult or easier for AI to match the prompt. The test was repeated five times using different random seeds, covering both small and large changes in the original caption.
Art Attack
Natural photography testing uses the FlickR1024 dataset and includes over 1000 high quality images. Each image was edited with a prompt following the pattern. “Change the style to (v)”where (v) It represents one of seven famous art styles. Cubism. Post-impressionism; Impressionism; Surrealist; Baroque; Faubism; and Renaissance.
This process involves applying photoguard to the original image, generating a protected version, and performing both unprotected and unprotected images with the same set of style transfer edits.
The original and protected versions of the natural scene image have been edited to apply Cubism, Surrealist and Fauvism styles, respectively.
I performed a style transfer on images from Wikiart Dataset to test how my artwork was protected. The edit prompt followed the same format as before, instructing the AI to change the style to an unrelated style drawn to a randomly selected, unrelated style.
Both glaze and fog protection methods were applied to the image prior to editing, allowing researchers to observe how much each defense blocks or distorts the outcome of the style transfer.
An example of how protection methods affect the transfer of artwork styles. The original baroque image is displayed along with the version protected by fog and glaze. After applying Cubism-style transfers, you can see the difference in how each protection changes the final output.
The researchers also tested the comparison quantitatively.
Changes to image text alignment score after editing style transfer.
Of these results, the author commented:
“The results highlight the major limitations of adversarial perturbations for protection. Instead of blocking alignment, adversarial perturbations often enhance the responsiveness of the generative model to prompts, allowing the exploit to produce outputs that are more closely aligned with the target. Such protections do not destroy the image editing process and may not prevent malicious agents from copying malformed material.
“The unintended consequences of using hostile perturbations reveal vulnerabilities in existing methods and highlight the urgent need for more effective protective technologies.”
The authors explain that unexpected results can be traced back to the mechanisms of diffusion models. LDMS edits the image by first converting it to a compressed version called latent. Thereafter, noise is added to this potential through many steps until the data is nearly random.
The model reverses this process during generation and removes noise in stages. At each stage of this reversal, the text prompt helps guide the way you clean the noise, gradually shaping the image and matching it to the prompt.
Intermediate latent states were returned to the image for comparison of generations from unprotected images and photogaard protected images, visualization.
Protective Methods Before you get into this process, add a little extra noise to the original image. These perturbations are initially minor, but accumulate as the model applies its own noise layer.
This accumulation leaves more parts of the image “uncertain” when the model begins to remove noise. As uncertainty increases, the model will lean heavily towards the text prompt and fill in the missing details and be prompted It’s even more influential than it normally has.
In fact, protection makes it easier for AI to reconstruct images rather than make them stiffer.
Finally, the authors performed tests that replaced the perturbations created from. Increases the cost of malicious AI-powered image editing Pure Gaussian noise paper.
The results followed the same pattern as previously observed. In all tests, rate of change values remained positive. Even this random, unstructured noise led to stronger alignment between the generated image and the prompt.
Effect of simulation protection using Gaussian noise on the FlickR8K dataset.
This supports a fundamental explanation that the added noise increases uncertainty in the model being generated regardless of the design, allowing the text prompt to have more control over the final image.
Conclusion
The research scene has driven hostile perturbations over the issue of LDM copyright as long as LDM exists. However, no resilient solutions emerge from the extraordinary number of papers published in this tack.
The imposed obstacles either overly reduce image quality, or the patterns prove to be insecure to manipulation and transformation processes.
However, it’s a difficult dream to abandon, as alternatives look like third-party surveillance and source frameworks, such as the Adobe-LED C2PA scheme, which attempts to maintain a chain of images in camera sensors.
In any case, as new papers show that they are often true, we believe that if adversarial perturbations actually exacerbate the problem, then we may find that a search for copyright protection through such means is under “alchemy.”
First published on Monday, June 9, 2025