Loading...
Loading...
Browse all stories on DeepNewz
VisitGrass Network on Solana Open-Sources 600 Million Reddit Posts for AI Training
Jul 4, 2024, 03:51 AM
Grass Network, the data layer of AI on Solana, has open-sourced a dataset containing 600 million top Reddit posts and comments from 2024. This dataset, named UpvoteWeb-24-600M, includes media links and reply lineage, and has been anonymized to preserve user privacy. The data, gathered by 2 million nodes globally in just one week, aims to make AI training more accessible for developers, leveling the playing field with centralized model training sets. This marks a significant milestone for the Grass ecosystem and the broader AI community.
View original story
Markets
Yes • 50%
No • 50%
Major AI conferences and journals such as NeurIPS, ICML, or arXiv
No • 50%
Yes • 50%
Legal news databases and court records
No • 50%
Yes • 50%
Cryptocurrency market data sources like CoinMarketCap or CoinGecko
Anonymization • 25%
Accessibility • 25%
Scalability • 25%
Quality of Data • 25%
AI research papers, tech news articles, and developer testimonials
Other • 25%
Content Moderation • 25%
Natural Language Processing (NLP) • 25%
Recommendation Systems • 25%
Reports from AI research publications, tech news outlets, and company press releases
Meta • 25%
Google • 25%
OpenAI • 25%
Microsoft • 25%
Public announcements from AI companies such as Google, OpenAI, Microsoft, or Meta