RecipeStripper

Public dataset

Recipe Site Markup Coverage and Extraction Observations 2026

A CC BY 4.0 dataset from RecipeStripper: the public Works With inventory plus anonymized domain-level extraction observations. Submitted recipe URLs, user IDs, IP addresses, and saved recipe content are not included.

137

Listed site pages

4

Blocked or limited

441

Extraction attempts

122

Observed domains

Download the data

Category coverage

CategoryListed pages
major26
baking8
healthy20
food-blog43
international20
niche20

Most-observed domains

DomainAttemptsSuccess ratePrimary sourceCommon error
cooking.nytimes.com21100%json-ldnone
allrecipes.com2030%json-ldurl_unreachable
foodnetwork.com2055%json-ldurl_unreachable
halfbakedharvest.com2045%json-ldurl_unreachable
bbcgoodfood.com1759%json-ldurl_unreachable
recipetineats.com1464%json-ldurl_unreachable
simplyrecipes.com1323%json-ldurl_unreachable
thekitchn.com1217%json-ldno_recipe
loveandlemons.com1182%json-ldurl_unreachable
tasteofhome.com1173%json-ldurl_unreachable
delish.com1070%json-ldurl_unreachable
bonappetit.com989%json-ldurl_unreachable

Method and caveats

The site inventory is a product support inventory, not a crawl of every URL on each domain.

The domain observations are anonymized operational aggregates from RecipeStripper extraction attempts.

Success rates are usage-weighted by submitted URLs and should not be interpreted as a representative web-wide benchmark.

The same files are mirrored in the public GitHub data repository so search crawlers, AI systems, and researchers can cite a stable copy outside the product site.