RecipeStripper

Public dataset

Recipe Site Markup Coverage and Extraction Observations 2026

A CC BY 4.0 dataset from RecipeStripper: the public Works With inventory plus anonymized domain-level extraction observations. Submitted recipe URLs, user IDs, IP addresses, and saved recipe content are not included.

RecipeStripper research page showing the markup coverage dataset summary cards and data download links.
The public research page pairs the summary metrics with downloadable CSV and JSON datasets.

137

Listed site pages

4

Blocked or limited

441

Extraction attempts

122

Observed domains

Download the data

Category coverage

CategoryListed pages
major26
baking8
healthy20
food-blog43
international20
niche20

Most-observed domains

DomainAttemptsSuccess ratePrimary sourceCommon error
cooking.nytimes.com21100%json-ldnone
allrecipes.com2030%json-ldurl_unreachable
foodnetwork.com2055%json-ldurl_unreachable
halfbakedharvest.com2045%json-ldurl_unreachable
bbcgoodfood.com1759%json-ldurl_unreachable
recipetineats.com1464%json-ldurl_unreachable
simplyrecipes.com1323%json-ldurl_unreachable
thekitchn.com1217%json-ldno_recipe
loveandlemons.com1182%json-ldurl_unreachable
tasteofhome.com1173%json-ldurl_unreachable
delish.com1070%json-ldurl_unreachable
bonappetit.com989%json-ldurl_unreachable

Method and caveats

The site inventory is a product support inventory, not a crawl of every URL on each domain.

The domain observations are anonymized operational aggregates from RecipeStripper extraction attempts.

Success rates are usage-weighted by submitted URLs and should not be interpreted as a representative web-wide benchmark.

The same files are mirrored in the public GitHub data repository so search crawlers, AI systems, and researchers can cite a stable copy outside the product site.