📚 API Documentation
Welcome to the Agentic Data Interface API. This REST API provides AI agents and developers with efficient access to arXiv academic papers in various formats.
🎁 Free Testing
Papers 2409.05591 and 2504.21776 are available without authentication for testing purposes.
⚡ Key Features
🌐 Base URL
https://data.rag.ac.cn/arxiv/
🔑 Authentication
Most endpoints require a valid API token. You can provide the token in two ways:
Method 1: Authorization Header
Authorization: Bearer YOUR_TOKEN_HERE
Method 2: Query Parameter
?token=YOUR_TOKEN_HERE
🎁 Free Papers (No Token Required)
Papers 2409.05591 and 2504.21776 can be accessed without a token.
📝 Getting a Token
Visit /register to create an account and get your API token. Each token includes 10,000 free daily requests. Need more? Contact us with your use case.
🔌 API Endpoints
All endpoints use the same base URL with different type parameters to specify the data format.
📋 Get Paper Metadata
Returns structured metadata including title, abstract, authors, sections, and statistics.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| arxiv_id | string | Required | arXiv paper ID (e.g., 2409.05591, 2504.21776) |
| type | string | Required | Must be "head" |
| token | string | Optional | API token (not required for free papers) |
Response Fields
- title: Paper title
- abstract: Paper abstract
- authors: List of authors
- sections: Section names and metadata
- token_count: Total tokens in the paper
- categories: arXiv categories
- publish_at: Publication date
📌 Get Brief Information
Returns concise paper information including title, TLDR, keywords, publication date, and citation count. Perfect for quick summaries and list views.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| arxiv_id | string | Required | arXiv paper ID (e.g., 2409.05591, 2504.21776) |
| type | string | Required | Must be "brief" |
| token | string | Optional | API token (not required for free papers) |
Response Fields
- arxiv_id: arXiv paper ID
- src_url: Direct link to PDF
- title: Paper title
- tldr: AI-generated summary (if available)
- keywords: List of keywords (if available)
- publish_at: Publication date
- citations: Citation count
Example Response
"arxiv_id": "2409.05591",
"src_url": "https://arxiv.org/pdf/2409.05591",
"title": "Paper Title",
"tldr": "Brief summary...",
"keywords": ["AI", "Machine Learning"],
"publish_at": "2024-09-05",
"citations": 42
}
👀 Preview Paper Content
Returns a configurable number of characters from the paper for quick preview. Default is 10,000 characters, but you can adjust it from 100 to 100,000. Useful for mobile devices or when you want to quickly scan the introduction.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| arxiv_id | string | Required | arXiv paper ID |
| type | string | Required | Must be "preview" |
| characters | integer | Optional | Number of characters to return (default: 10000, range: 100-100000) |
Response Fields
- preview: First N characters (configurable)
- is_truncated: Whether content was truncated
- total_characters: Total characters in full document
- preview_characters: Actual characters in preview
📄 Get Full Content
Returns the complete paper content in Markdown format.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| arxiv_id | string | Required | arXiv paper ID |
| type | string | Required | Must be "raw" |
📑 Get Specific Section
Returns content from a specific section of the paper (e.g., "Introduction", "Conclusion").
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| arxiv_id | string | Required | arXiv paper ID |
| type | string | Required | Must be "section" |
| section | string | Required | Section name (e.g., "Introduction", "Methods") |
📊 Get Complete JSON
Returns the complete structured JSON file with all sections and metadata.
🌐 Get HTML View
Returns a beautifully rendered HTML page for viewing in a browser.
Quick Access
🔍 Search & Retrieve
Unified semantic retrieval over arXiv, bioRxiv and medRxiv.
Powered by the upstream jianlv retrieval service (semantic + section + RoC indexes with optional fine reranking),
with token authentication, daily quota, and Redis caching layered on top.
Sources
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| type | string | Required | Must be retrieve |
| query | string | Required | Search query (max 500 chars) |
| source | string | Optional | arxiv (default) / biorxiv / medrxiv |
| top_k | integer | Optional | Number of results, 1–100 (default: 10) |
| offset | integer | Optional | Pagination offset, 0–10000 (default: 0) |
| authors | array[string] | Optional | Author list (filters & affects ranking). Repeat the param for each value. |
| orgs | array[string] | Optional | Organization list (filters & affects ranking). Repeat the param for each value. |
| date_search_type | string | Optional | between / exact / after / before. Must be paired with date_str. |
| date_str | string | array[string] | Optional | Format YYYY / YYYY-MM / YYYY-MM-DD. For between, repeat the param twice (start, end). |
| min_citation | integer | Optional | Minimum citation count (filter, no rerank impact) |
| categories | array[string] | Optional | Category filter, e.g. cs.AI, cs.CL (no rerank impact) |
| search_funcs | array[string] | Optional | Index types to use. Default ["metadata","section","roc"] |
| use_fine_rerank | bool | Optional | Apply fine reranking after recall (default: true) |
| return_contents | bool | Optional | Return retrieved section contents (default: false) |
| return_roc | bool | Optional | Return retrieved RoC list (default: false) |
Response Format
The ID field name follows the requested source:
arxiv_id / biorxiv_id / medrxiv_id.
"status": "success",
"total_count": 3,
"result": [
{
"arxiv_id": "2506.18871",
"score": 0.9475,
"title": "Paper Title",
"tldr": "...",
"abstract": "...",
"authors": [{ "name": "...", "orgs": ["..."] }],
"url": "https://arxiv.org/abs/2506.18871",
"date": "2025-06-23T17:38:54Z",
"citation_count": 217,
"categories": ["cs.CV"],
"contents": [{ "section_name": "...", "section_contents": ["..."] }], // when return_contents=true
"roc": ["..."] // when return_roc=true
}
]
}
🎁 Free Queries
These queries don't require a token (case-insensitive, exact match):
- transformer
- attention mechanism
- large language model
⚠️ Migration note: Legacy parameters
size, search_mode, bm25_weight, vector_weight,
date_from, date_to are no longer supported. Use
top_k, date_search_type, date_str instead.
📈 Get Trending Signal
Get social media engagement metrics for a paper, including tweets, likes, views, and replies. Track how papers are trending in the research community.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| arxiv_id | string | Required | arXiv paper ID (e.g., 2409.05591) |
| token | string | Required | API token (required for all requests) |
Response Fields
- arxiv_id: arXiv paper ID
- total_tweets: Total number of tweets mentioning the paper
- total_likes: Total likes across all tweets
- total_views: Total views across all tweets
- total_replies: Total replies to tweets about the paper
- first_seen_date: When the paper was first mentioned
- last_seen_date: Most recent mention
Example Response
"arxiv_id": "2409.05591",
"total_tweets": 150,
"total_likes": 3200,
"total_views": 25000,
"total_replies": 45,
"first_seen_date": "2024-09-05T10:30:00",
"last_seen_date": "2024-09-10T14:20:00"
}
Use Cases
- 📊 Track paper virality and impact
- 🔥 Identify trending papers in your field
- 📅 Monitor engagement timeline
- 🎯 Discover influential research
💡 Note: If a paper has no social media engagement, you'll receive a 404 error. This is normal for papers that haven't been discussed on Twitter yet.
🏥 PMC Endpoints
Access PubMed Central (PMC) research articles. PMC is a free full-text archive of biomedical and life sciences journal literature.
🎁 Free Testing
Papers PMC544940 and PMC514704 are available without authentication for testing purposes.
🌐 Base URL
https://data.rag.ac.cn/pmc/
📋 Get PMC Paper Metadata
Returns structured metadata including title, DOI, abstract, authors, categories, and publication date.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| pmc_id | string | Required | PMC paper ID (e.g., PMC544940, PMC514704) |
| type | string | Optional | Must be "head" (default) |
| token | string | Optional | API token (not required for free papers) |
Response Fields
- pmc_id: PMC paper ID
- title: Paper title
- doi: Digital Object Identifier
- abstract: Paper abstract
- authors: List of authors
- categories: Medical subject categories
- publish_at: Publication date
📊 Get PMC Complete JSON
Returns the complete structured JSON file with full paper content and metadata.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| pmc_id | string | Required | PMC paper ID (e.g., PMC544940, PMC514704) |
| type | string | Required | Must be "json" |
| token | string | Optional | API token (not required for free papers) |
Quick Access
⚠️ Error Handling
HTTP Status Codes
| Code | Meaning | Description |
|---|---|---|
| 200 | Success | Request successful |
| 400 | Bad Request | Invalid parameters |
| 401 | Unauthorized | Invalid or missing token |
| 404 | Not Found | Paper not found |
| 429 | Too Many Requests | Rate limit exceeded |
| 503 | Service Unavailable | Retrieval service error |
📊 Rate Limits
Each API token has a daily limit of 10,000 free requests.
When you exceed this limit, you'll receive a 429 Too Many Requests error.
Need higher limits? Contact tommy[at]chien.io with your use case.
Checking Usage
View your usage statistics for the past N days (1-30).