Common Crawl

Making web crawl data accessible and analyzable for everyone.

A non-profit initiative that builds and maintains a free, open repository of web crawl data. This data is accessible to anyone and is a valuable resource for researchers. With over 240 billion pages spanning 15 years, it's a treasure trove of information. It's also a primary training corpus in many LLM's and has been cited in over 8000 research papers.

Visit Website

Comments

Share:

Category

Developer Tools

Topics

Data & Analytics

You might also like

HiAPI

One API for leading AI image, video, and text generation models. ✨ Premium

PJ

p5.js Web Editor

Web editor for p5.js to make coding accessible for all.

Not a Wheelchair

Affordable manual wheelchairs for everyone

One Million Checkboxes

Checking a box checks it for everyone!

YunoHost

Self-hosting for everyone

Open Food Facts

A comprehensive food products database made by everyone, for ever

NotebookLM

An AI notebook for everyone

Starlink Direct to Cell

Seamless access to text, voice, and data for LTE phones.

Carrier GNSS and Location Privacy

How carriers access GNSS data and why it matters.

Phosphor Icons

A flexible icon family for everyone — 588 icons in 6 weights

PromptQL

Agentic data access for your AI

Dynomight CRC Rates

Analyzing CRC trends and what they mean for young people.

Histogram Maker

web tool for making Histograms

Web-Check

All-in-one OSINT tool for analyzing any website

GO

Get out of my <head>

Make faster, more accessible, more environmentally friendly websi

shelf

Asset management infrastructure for everyone

ChromeWinner

Marketing and competitor analysis for Extensions

FlagWhiz.com

Flag quiz for everyone

Quadratic

Analyze your data with AI, Python, SQL, and formulas

Pomofocus

A simple and customizable pomodoro timer for the web

PimEyes

Face Recognition Search Engine Available for Everyone

by @Micadep