Playwright Browser Extractor — Playwright Extractor

    Extracts websites with the headless Chromium, Chrome, or Firefox browser and browser automation library using a provided server-side Node.js code. Supports both recursive extracting and a list of URLs. Supports login to a website.

    1 credits per request
    ~30s
    6 runs
    Features
    Headless Browser
    JSON/CSV Export
    API Access
    Scalable Automation
    Use Cases
    Data Extraction
    Developer Tools

    What This Tool Does

    browser automation Data Extractor — Extracts websites with the headless Chromium, Chrome, or Firefox browser and browser automation library using a provided server-side Node.js code. Supports both recursive extracting and a list of URLs. Supports login to a website.

    Use it to extract structured data from any website: provide a URL and a custom extracting script, and the tool returns the data you need in JSON format.

    Use Cases

    • Data Extraction
    • Developer Tools

    Data Fields

    Output fields depend on your extracting script. Common patterns include:

    FieldTypeDescription
    urlstringURL that was scraped
    titlestringPage title
    htmlstringRaw HTML content (if requested)
    textstringExtracted plain text
    linksarrayLinks found on the page
    dataobjectCustom fields extracted by your script

    Example Request

    {
     "startUrls": "https://example.com",
     "pageFunction": 1,
     "proxyConfiguration": {
     "useApifyProxy": true
     }
    }
    

    Example Response

    {
     "url": "https://example.com",
     "title": "Example Domain",
     "text": "This domain is for use in illustrative examples...",
     "links": ["https://www.iana.org/domains/reserved"]
    }
    

    Limits and Tips

    • JavaScript-heavy pages require a browser-based data extractor (browser automation or browser automation). For static HTML, jsdom is faster.
    • Processing time depends on page load speed and script complexity — typically 10–60 seconds per page.
    • Results are cached for up to 15 minutes. Re-running the same URL may return cached data.
    • Respect robots.txt and the target site's terms of service when extracting.

    On this page