Smart Article Extractor — Article Extractor Smart

    📰 Smart Article Extractor extracts articles from any scientific, academic, or news website with just one click. The extractor extracts the whole website and automatically distinguishes articles from other web pages. Download your data as HTML table, JSON, Excel, RSS feed, and more.

    1 credits per request
    ~120s
    23 runs
    Features
    Article Extraction
    Full Content
    JSON Export
    Use Cases
    News Aggregation
    Media Monitoring
    Content Research

    What This Tool Does

    Smart Article Extractor — 📰 Smart Article Extractor extracts articles from any scientific, academic, or news website with just one click. The extractor extracts the whole website and automatically distinguishes articles from other web pages. Download your data as HTML table, JSON, Excel, RSS feed, and more.

    Provide a URL and get back the clean article text, headline, author, publication date, and metadata — without ads, navigation, or boilerplate.

    Use Cases

    • News Aggregation
    • Media Monitoring
    • Content Research

    Data Fields

    FieldTypeDescription
    urlstringSource URL
    titlestringArticle headline
    textstringClean article body text
    authorstringArticle author(s)
    publishedAtstringPublication date (ISO 8601)
    languagestringDetected content language
    descriptionstringArticle summary / meta description
    imagestringLead image URL
    tagsarrayTopic tags or keywords

    Example Request

    {
     "proxy": "example",
     "startUrls": "https://example.com"
    }
    

    Example Response

    {
     "url": "https://example.com/article",
     "title": "Example Article Headline",
     "text": "This is the full clean text of the article...",
     "author": "Jane Doe",
     "publishedAt": "2024-01-15T10:00:00.000Z",
     "language": "en",
     "description": "A brief summary of the article.",
     "image": "https://example.com/images/lead.jpg",
     "tags": ["technology", "news"]
    }
    

    Limits and Tips

    • Works best on standard news and blog URLs. Paywalled or login-required content may not be extractable.
    • Processing typically takes 5–30 seconds per article.
    • Results are cached for up to 15 minutes.
    • For bulk extraction, provide multiple URLs in the input array to process them in a single run.

    On this page