Smart Article Extractor — Article Extractor Smart
📰 Smart Article Extractor extracts articles from any scientific, academic, or news website with just one click. The extractor extracts the whole website and automatically distinguishes articles from other web pages. Download your data as HTML table, JSON, Excel, RSS feed, and more.
1 credits per request
~120s
23 runs
Features
Article Extraction
Full Content
JSON Export
Use Cases
News Aggregation
Media Monitoring
Content Research
What This Tool Does
Smart Article Extractor — 📰 Smart Article Extractor extracts articles from any scientific, academic, or news website with just one click. The extractor extracts the whole website and automatically distinguishes articles from other web pages. Download your data as HTML table, JSON, Excel, RSS feed, and more.
Provide a URL and get back the clean article text, headline, author, publication date, and metadata — without ads, navigation, or boilerplate.
Use Cases
- News Aggregation
- Media Monitoring
- Content Research
Data Fields
| Field | Type | Description |
|---|---|---|
| url | string | Source URL |
| title | string | Article headline |
| text | string | Clean article body text |
| author | string | Article author(s) |
| publishedAt | string | Publication date (ISO 8601) |
| language | string | Detected content language |
| description | string | Article summary / meta description |
| image | string | Lead image URL |
| tags | array | Topic tags or keywords |
Example Request
{
"proxy": "example",
"startUrls": "https://example.com"
}
Example Response
{
"url": "https://example.com/article",
"title": "Example Article Headline",
"text": "This is the full clean text of the article...",
"author": "Jane Doe",
"publishedAt": "2024-01-15T10:00:00.000Z",
"language": "en",
"description": "A brief summary of the article.",
"image": "https://example.com/images/lead.jpg",
"tags": ["technology", "news"]
}
Limits and Tips
- Works best on standard news and blog URLs. Paywalled or login-required content may not be extractable.
- Processing typically takes 5–30 seconds per article.
- Results are cached for up to 15 minutes.
- For bulk extraction, provide multiple URLs in the input array to process them in a single run.
On this page