Sure, porn-trained AI seems a core function.
Porn sites may have blown up Meta’s key defense in a copyright fight with book authors who earlier this year said that Meta torrented “at least 81.7 terabytes of data across multiple shadow libraries” to train its AI models.
Meta has defeated most of the authors’ claims and claimed there is no proof that Meta ever uploaded pirated data through seeding or leeching on the BitTorrent network used to download training data. But authors still have a chance to prove that Meta may have profited off its massive piracy, and a new lawsuit filed by adult sites last week appears to contain evidence that could help authors win their fight, TorrentFreak reported.
The new lawsuit was filed last Friday in a US district court in California by Strike 3 Holdings—which says it attracts “over 25 million monthly visitors” to sites that serve as “ethical sources” for adult videos that “are famous for redefining adult content with Hollywood style and quality.”
After authors revealed Meta’s torrenting, Strike 3 Holdings checked its proprietary BitTorrent-tracking tools designed to detect infringement of its videos and alleged that the company found evidence that Meta has been torrenting and seeding its copyrighted content for years—since at least 2018. Some of the IP addresses were clearly registered to Meta, while others appeared to be “hidden,” and at least one was linked to a Meta employee, the filing said.
Pure speculation: ;possibly to identify sexual nudity and “inappropriate” content as some kind of legitimate usecase. What was actually done, I have no idea.
This feels most likely to me.
Meta doesn’t exactly want to taint their brand image with purely sexual content being generated by their base models, so it’s probably for either content classification, and/or the also likely fine-tuning of their LLMs and other generative models in reverse - that is to say, fine tuning them to not create content that is like what they’re then being fed.