JSDOM Data Extractor — Jsdom Extractor
Parses the HTML using the JSDOM library, providing the same DOM API as browsers do (e.g. `window`). It is able to process client-side JavaScript without using a real browser. Performance-wise, it stands somewhere between the Cheerio Data Extractor and the browser data extractors.
What This Tool Does
JSDOM Data Extractor — Parses the HTML using the JSDOM library, providing the same DOM API as browsers do (e.g. window). It is able to process client-side JavaScript without using a real browser. Performance-wise, it stands somewhere between the Cheerio Data Extractor and the browser data extractors.
Use it to extract structured data from any website: provide a URL and a custom extracting script, and the tool returns the data you need in JSON format.
Use Cases
- Data Extraction
- Developer Tools
Data Fields
Output fields depend on your extracting script. Common patterns include:
| Field | Type | Description |
|---|---|---|
| url | string | URL that was scraped |
| title | string | Page title |
| html | string | Raw HTML content (if requested) |
| text | string | Extracted plain text |
| links | array | Links found on the page |
| data | object | Custom fields extracted by your script |
Example Request
{
"startUrls": "https://example.com",
"pageFunction": 1,
"proxyConfiguration": {
"useApifyProxy": true
}
}
Example Response
{
"url": "https://example.com",
"title": "Example Domain",
"text": "This domain is for use in illustrative examples...",
"links": ["https://www.iana.org/domains/reserved"]
}
Limits and Tips
- JavaScript-heavy pages require a browser-based data extractor (browser automation or browser automation). For static HTML, jsdom is faster.
- Processing time depends on page load speed and script complexity — typically 10–60 seconds per page.
- Results are cached for up to 15 minutes. Re-running the same URL may return cached data.
- Respect robots.txt and the target site's terms of service when extracting.
On this page