Categories: News

“It was a mistake.” A million posts from Bluesky were stolen to create training data for AI

The team of the Bluesky platform, which has been rapidly gaining popularity in recent months, promises not to use user data to train AI. However, no one is stopping someone else from collecting the data.

This week, one million public posts from Bluesky, along with user identification information, were scanned and then uploaded to Hugging Face. The dataset was created by machine learning expert Daniel van Strien and is intended for use in language modeling and natural language processing, as well as general analysis of social media trends, content moderation, and post patterns. It contained decentralized user identifiers (DIDs) and even had a feature to search for content from specific users, 404Media reported.

According to the dataset description, the posts were collected from Bluesky Social’s Firehose API. Bluesky users did not consent to such data use, but the platform does not prohibit such manipulation.

Shortly after this dataset became public, it was removed from Hugging Face.

«I have removed Bluesky data from the repository. While I wanted to support the development of tools for the platform, I recognize that this approach violates the principles of transparency and consent for data collection. I apologize for this mistake,” van Strien wrote in a post on Bluesky.

This could be a wake-up call for users of the platform, which has been rapidly gaining popularity in recent weeks. Although the platform’s owners have promised not to use user data to train AI, they have yet to create tools to force third-party companies to do so without users’ consent.

Natasha Kumar

Natasha Kumar has been a reporter on the news desk since 2018. Before that she wrote about young adolescence and family dynamics for Styles and was the legal affairs correspondent for the Metro desk. Before joining The Times Hub, Natasha Kumar worked as a staff writer at the Village Voice and a freelancer for Newsday, The Wall Street Journal, GQ and Mirabella. To get in touch, contact me through my natasha@thetimeshub.in 1-800-268-7116

Share
Published by
Natasha Kumar

Recent Posts

Musk against Pentagon, FBI and State Department: Why did state workers declare a boycott

< IMG SRC = "/Uploads/Blogs/39/CD/IB-FQGLC0I6D_7B271523.jpg" Alt = "Musk Against Pentagon, FBI and State Department: Why…

49 minutes ago

Azerbaijan closed the voice of America, BBC and Russian Sputnik

< img src = "/uploads/blogs/44/fa/ib-fqgnacsre_715237c6.jpg" Alt = "in Azerbaijan closed Voice of America, BBC and…

49 minutes ago

Tomasz Jakubiak posted a recording on the web. He was unable to stop moving. “I cried like a beaver”

Tomasz Jakubiak did not hide his movement. The latest recording has appeared on the web.…

3 hours ago

Dorota Wellman returned to “Good Morning TVN”. The viewers immediately saw the change

Dorota Wellman delighted the viewers. An avalanche of compliments appeared in the comments. < img…

3 hours ago

Joanna Koroniewska got a moving confession. “I read about myself in the media that I am disgusting”

Joanna Koroniewska got a sincere confession about the hate she met with. < img src…

3 hours ago

The Pope addressed moving words to the faithful. In the Vatican, they are preparing for the darkest scenario

The state of Pope Francis is still critical. < img src = "https://zycie.news/crrops/2d2e25/620x0/1/0/2025/02/24/hmo1rhzp4d4dbnhdlrobz20t6fcz8wbtpkwczc7ml.png" alt =…

4 hours ago