California Now Requires AI Companies to Reveal If They Trained on Your Photos

April 1, 20268 min readBy Viallo Team

Quick take: California's AI Training Data Transparency Act (AB 2013) is now in effect. Every AI company serving Californians must publicly disclose what data they used to train their models - including whether they scraped your photos. For the first time, you have a legal right to know if your images ended up in an AI training dataset. Here's what the law requires, what companies have disclosed so far, and what it means for your photo privacy.

Stack of legal documents and folders on a glass desk in a modern office, soft diffused daylight from floor-to-ceiling windows, shot on Fujifilm X-T5 with 35mm f/1.4, clean sharp capture, cool neutral tones

What the law actually requires

Per California Assembly Bill 2013 (Civil Code §3110), the Generative AI Training Data Transparency Act took effect on January 1, 2026. It applies to every generative AI system that's publicly available to Californians and was released or substantially modified since January 2022. That covers basically every major AI model you've heard of.

The statute requires developers to post on their website a documentation summary of the datasets used to train the system. AB 2013 §3110(b) lists 12 mandatory categories: data sources or owners, dataset size, whether it includes personal information, whether it contains copyrighted or trademarked material, and the time period of data collection. Author Assemblymember Jacqui Irwin (D-Thousand Oaks) framed disclosure as the prerequisite for any meaningful consent debate.

In practical terms, if an AI company scraped publicly available photos from Flickr, Reddit, or Instagram to train an image generation model, they now have to say so. If they licensed a dataset containing millions of personal photos, they have to disclose that too.

What companies have revealed so far

The early disclosures have been illuminating. Several major AI developers have published their training data summaries, and the details confirm what privacy researchers suspected for years - personal photos from social media platforms, stock photo sites, and web scrapes are a fundamental ingredient in modern AI models.

Not every company has been forthcoming. Elon Musk's xAI has already filed a legal challenge against the law, arguing that requiring disclosure of training datasets threatens trade secrets. The case signals that some companies would rather fight the law in court than reveal what data they used.

That resistance is telling. If the training data was properly licensed and contained no personal information, there'd be little reason to fight the disclosure requirement. The pushback suggests that some training datasets contain exactly the kind of personal data that users would object to.

An overhead flat lay of scattered printed photographs on a white marble surface, some face-up showing landscapes and cityscapes, natural soft light, shot on Sony A7IV with 24mm f/2.8, clean flat composition, slightly warm tones

Were your photos used to train AI?

If you've ever posted a photo publicly on social media, the honest answer is: probably. The LAION-5B dataset, documented in arXiv:2210.08402 by Schuhmann et al. (2022), pairs 5.85 billion image-text records scraped from Common Crawl and was the training corpus behind Stable Diffusion 1.x. Stanford's Internet Observatory found 1,008 verified CSAM URLs inside the same dataset in December 2023, prompting a takedown - evidence of how little curation those public scrapes received before models trained on them.

California's law doesn't give you an individual right to check whether a specific photo of yours was included. But the disclosures will tell you whether the source platforms you use (Instagram, Flickr, Reddit, DeviantArt) were scraped for training data. If you posted publicly on those platforms, your photos were likely included.

Photos shared privately through messaging apps or dedicated private sharing platforms were generally not included in public scraping datasets. The distinction between public and private sharing has never mattered more.

How this compares to EU regulations

The EU AI Act (Regulation 2024/1689) entered into force on August 1, 2024 with a phased application schedule. It classifies AI systems by risk level and imposes requirements ranging from transparency obligations under Article 50 to outright bans on social scoring and untargeted facial-recognition scraping under Article 5. GDPR Article 17 still backs the right to erasure on top of the AI Act, so EU residents have two overlapping levers Californians lack.

California's AB 2013 is narrower - it focuses specifically on training data transparency rather than regulating how AI systems are used. But it's the first US law to require AI companies to tell you what's in their training data. That's a significant step for a country that still doesn't have a federal privacy law.

There's a catch, though. President Trump's executive order from December 2025 proposes a federal AI policy framework that could preempt state laws like AB 2013. The tension between state-level privacy protections and federal deregulation is already playing out, and your photo privacy rights may depend on which side wins.

What this means for how you share photos

The California law makes one thing crystal clear: public photos are fair game. If your photos are publicly accessible on any platform, they can be scraped, and now at least the companies that used them have to admit it.

Private photo sharing changes the equation entirely. Photos shared through private links, password-protected albums, or end-to-end encrypted services aren't accessible to web scrapers. They can't end up in training datasets because they were never publicly indexed in the first place.

How to protect your photos going forward

Check company disclosures. Major AI companies are now required to publish training data summaries on their websites. Look up the platforms you use to see if they were listed as data sources.
Audit your public posts. Any photo that's publicly accessible on social media could have been scraped for AI training. Consider making old posts private or deleting photos you no longer want public.
Share privately by default. When sharing photos with family and friends, use private links rather than public posts. This keeps your photos out of web scraping datasets entirely.
Choose platforms that don't train on your data. Viallo stores photos on European infrastructure, strips metadata from shared images, and never uses your photos for AI training. Recipients view albums through private links without needing an account.
Use metadata stripping. EXIF data embedded in your photos contains GPS coordinates, device information, and timestamps. Stripping this data before sharing prevents it from being harvested alongside your images.

A person sitting in a cozy window seat reading a book, seen from behind through a doorway, warm afternoon light, shallow depth of field f/2.0, slightly desaturated tones with warm highlights, shot on Sony A7III with 85mm f/1.4, subtle film grain

Frequently Asked Questions

Does the California law let me opt out of AI training?

No. AB 2013 requires disclosure of training data but doesn't create an opt-out right. You can't retroactively remove your photos from models that already trained on them. The law focuses on transparency - knowing what happened - rather than giving individuals control over their data after the fact.

Does this law apply outside California?

AB 2013 applies to AI systems available to Californians, which includes essentially every major AI platform. The disclosures are public, so anyone can read them. However, the enforcement mechanism is California-specific. Other states may pass similar laws.

Which AI companies have disclosed their training data?

Major AI developers including OpenAI, Google, and Meta have begun publishing training data summaries. Elon Musk's xAI has challenged the law in court rather than comply. Check each company's website for their AB 2013 disclosure page.

Can I check if a specific photo was used for training?

Not through this law. AB 2013 requires high-level dataset summaries, not individual image lookups. Tools like 'Have I Been Trained' (haveibeentrained.com) let you search some training datasets for specific images, but coverage is limited.

How do private photo sharing platforms protect against AI scraping?

Private platforms like Viallo serve photos through authenticated, private links that aren't indexed by search engines or accessible to web scrapers. Since AI training datasets are built primarily from public web scrapes, private sharing keeps your photos out of those datasets entirely.