Crawl API
Automate content extraction from any domain. Simply define the root URL and retrieve the full website content as Markdown, Text, HTML, or JSON files.

- Map entire site structures in one request
- Capture both static and dynamic web content
- Flexible for SEO, AI, and compliance needs
- Integrates with popular dev frameworks and no-code
Confiado por 20,000+ clientes
const options = {
method: 'POST',
headers: {Authorization: 'Bearer ', 'Content-Type': 'application/json'},
body: '[{"url":"https://il.linkedin.com/company/bright-data"}]'
};
fetch('https://api.brightdata.com/datasets/v3/trigger', options)
.then(response => response.json())
.then(response => console.log(response))
.catch(err => console.error(err));
import requests
url = "https://api.brightdata.com/datasets/v3/trigger"
payload = [{"url": "https://il.linkedin.com/company/bright-data"}]
headers = {
"Authorization": "Bearer ",
"Content-Type": "application/json"
}
response = requests.request("POST", url, json=payload, headers=headers)
print(response.text)
Easy to start, easier to scale
-
Choose target domainDefine target URL and connect to the API with a single line of code
-
Send requestEdit crawl parameters and insert your custom logic using Python or JavaScript
-
Get your dataRetrieve website data as Markdown, Text, HTML, or JSON files
Developer-first experience
Quick Start
Custom Collection
Data Parsing
Crawl API pricing
Leading the way in ethical web data collection
Bright Data sets the gold standard in compliance, effectively self-regulating the industry. With transparent operations validated by top security firms, clear peer consent, and pioneering compliance units, we ensure legitimate and safe data collection. Upholding international privacy laws and utilizing tools like BrightBot, we minimize your legal exposure, making partnership with us a strategic move to curtail legal risks and associated costs.
Every 15 minutes, our customers scrape enough data to train ChatGPT from scratch.
API for Seamless Crawl Data Access
Comprehensive, Scalable, and Compliant Crawl Data Extraction
Tailored to your workflow
Get structured data in JSON, NDJSON, or CSV files through Webhook or API delivery.
Built-in infrastructure and unblocking
Get maximum control and flexibility without maintaining proxy and unblocking infrastructure. Easily scrape data from any geo-location while avoiding CAPTCHAs and blocks.
Battle-proven infrastructure
Bright Data’s platform powers over 20,000+ companies worldwide, offering peace of mind with 99.99% uptime, access to 150M+ real user IPs covering 195 countries.
Industry leading compliance
Our privacy practices comply with data protection laws, including the EU data protection regulatory framework, GDPR, and CCPA – respecting requests to exercise privacy rights and more.
Desea obtener más información?
Hable con un experto para analizar sus necesidades de raspado.
Crawl API FAQs
What is Bright Data’s Crawl API?
Bright Data’s Crawl API is a tool that lets you extract, map, and transform content from any website into structured data in formats like HTML, Markdown, and JSON, making it easy to use for AI training, SEO, compliance audits, and more.
What types of content and websites can I crawl?
You can crawl any public website, extracting both static and dynamic content such as articles, product listings, reviews, and complete site structures from any domain worldwide.
Which output formats are supported?
Crawl API delivers results in multiple formats, including Markdown, HTML, plain text, and structured schemas like ld_json. Choose the format that best fits your workflow.
How do I trigger a crawl job using the API?
Simply send an HTTP POST request to the API with your target URLs and preferred output format. You’ll receive a snapshot_id, which you can use to fetch the collected data once it's ready.
Can I run a crawl without coding?
Yes! Use the no-code option in the Bright Data Control Panel. Just enter your URLs, select an output format, and start crawling with no coding required.
How are the crawl results delivered?
Results can be delivered via webhook, downloaded through the API or Control Panel, or sent to your preferred external storage (such as AWS S3, Google Cloud Storage, etc.).
Can I schedule regular crawl jobs?
Yes, the Crawl API supports scheduling, so you can automate crawls daily, weekly, or on a custom timetable to keep your datasets up to date.
Is developer integration supported?
Absolutely! The API integrates seamlessly with Python, Node.js, BeautifulSoup, Cheerio, and many other popular libraries for developer flexibility.
What are common use cases for the Crawl API?
Customers use the Crawl API for LLM training dataset creation, SEO site audits, competitive research, compliance/accessibility checks, and website content migration and archiving.
What if my crawl returns errors or fails on certain pages?
You can include detailed error logs via the include_errors parameter for every crawl. Troubleshoot issues efficiently, or reach out to Bright Data support for further assistance.