📚 API Documentation

Welcome to the Agentic Data Interface API. This REST API provides AI agents and developers with efficient access to arXiv academic papers in various formats.

🎁 Free Testing

Papers 2409.05591 and 2504.21776 are available without authentication for testing purposes.

⚡ Key Features

🚀 Fast - Redis Cached
🎯 On-Demand Loading
📦 Multiple Formats
🔍 Built-in Search

🌐 Base URL

https://data.rag.ac.cn/arxiv/

🔑 Authentication

Most endpoints require a valid API token. You can provide the token in two ways:

Method 1: Authorization Header

Authorization: Bearer YOUR_TOKEN_HERE

Method 2: Query Parameter

?token=YOUR_TOKEN_HERE

🎁 Free Papers (No Token Required)

Papers 2409.05591 and 2504.21776 can be accessed without a token.

📝 Getting a Token

Visit /register to create an account and get your API token. Each token includes 10,000 free daily requests. Need more? Contact us with your use case.

🔌 API Endpoints

All endpoints use the same base URL with different type parameters to specify the data format.

📋 Get Paper Metadata

GET /arxiv/?type=head&arxiv_id={PAPER_ID}

Returns structured metadata including title, abstract, authors, sections, and statistics.

Parameters

Parameter Type Required Description
arxiv_id string Required arXiv paper ID (e.g., 2409.05591, 2504.21776)
type string Required Must be "head"
token string Optional API token (not required for free papers)

Response Fields

  • title: Paper title
  • abstract: Paper abstract
  • authors: List of authors
  • sections: Section names and metadata
  • token_count: Total tokens in the paper
  • categories: arXiv categories
  • publish_at: Publication date

📌 Get Brief Information

GET /arxiv/?type=brief&arxiv_id={PAPER_ID}

Returns concise paper information including title, TLDR, keywords, publication date, and citation count. Perfect for quick summaries and list views.

Parameters

Parameter Type Required Description
arxiv_id string Required arXiv paper ID (e.g., 2409.05591, 2504.21776)
type string Required Must be "brief"
token string Optional API token (not required for free papers)

Response Fields

  • arxiv_id: arXiv paper ID
  • src_url: Direct link to PDF
  • title: Paper title
  • tldr: AI-generated summary (if available)
  • keywords: List of keywords (if available)
  • publish_at: Publication date
  • citations: Citation count

Example Response

{
  "arxiv_id": "2409.05591",
  "src_url": "https://arxiv.org/pdf/2409.05591",
  "title": "Paper Title",
  "tldr": "Brief summary...",
  "keywords": ["AI", "Machine Learning"],
  "publish_at": "2024-09-05",
  "citations": 42
}

👀 Preview Paper Content

GET /arxiv/?type=preview&arxiv_id={PAPER_ID}

Returns a configurable number of characters from the paper for quick preview. Default is 10,000 characters, but you can adjust it from 100 to 100,000. Useful for mobile devices or when you want to quickly scan the introduction.

Parameters

Parameter Type Required Description
arxiv_id string Required arXiv paper ID
type string Required Must be "preview"
characters integer Optional Number of characters to return (default: 10000, range: 100-100000)

Response Fields

  • preview: First N characters (configurable)
  • is_truncated: Whether content was truncated
  • total_characters: Total characters in full document
  • preview_characters: Actual characters in preview

📄 Get Full Content

GET /arxiv/?type=raw&arxiv_id={PAPER_ID}

Returns the complete paper content in Markdown format.

Parameters

Parameter Type Required Description
arxiv_id string Required arXiv paper ID
type string Required Must be "raw"

📑 Get Specific Section

GET /arxiv/?type=section&arxiv_id={PAPER_ID}§ion={SECTION_NAME}

Returns content from a specific section of the paper (e.g., "Introduction", "Conclusion").

Parameters

Parameter Type Required Description
arxiv_id string Required arXiv paper ID
type string Required Must be "section"
section string Required Section name (e.g., "Introduction", "Methods")

📊 Get Complete JSON

GET /arxiv/?type=json&arxiv_id={PAPER_ID}

Returns the complete structured JSON file with all sections and metadata.

🌐 Get HTML View

GET /arxiv/?type=markdown&arxiv_id={PAPER_ID}

Returns a beautifully rendered HTML page for viewing in a browser.

Quick Access

🌐 HTML View (2409.05591) 👀 Preview Content 📋 Formatted Metadata (2504.21776)

🔍 Search & Retrieve

GET /arxiv/?type=retrieve&query={QUERY}

Unified semantic retrieval over arXiv, bioRxiv and medRxiv. Powered by the upstream jianlv retrieval service (semantic + section + RoC indexes with optional fine reranking), with token authentication, daily quota, and Redis caching layered on top.

Sources

📚 arxiv (default)
🧬 biorxiv
🏥 medrxiv

Parameters

Parameter Type Required Description
type string Required Must be retrieve
query string Required Search query (max 500 chars)
source string Optional arxiv (default) / biorxiv / medrxiv
top_k integer Optional Number of results, 1–100 (default: 10)
offset integer Optional Pagination offset, 0–10000 (default: 0)
authors array[string] Optional Author list (filters & affects ranking). Repeat the param for each value.
orgs array[string] Optional Organization list (filters & affects ranking). Repeat the param for each value.
date_search_type string Optional between / exact / after / before. Must be paired with date_str.
date_str string | array[string] Optional Format YYYY / YYYY-MM / YYYY-MM-DD. For between, repeat the param twice (start, end).
min_citation integer Optional Minimum citation count (filter, no rerank impact)
categories array[string] Optional Category filter, e.g. cs.AI, cs.CL (no rerank impact)
search_funcs array[string] Optional Index types to use. Default ["metadata","section","roc"]
use_fine_rerank bool Optional Apply fine reranking after recall (default: true)
return_contents bool Optional Return retrieved section contents (default: false)
return_roc bool Optional Return retrieved RoC list (default: false)

Response Format

The ID field name follows the requested source: arxiv_id / biorxiv_id / medrxiv_id.

{
  "status": "success",
  "total_count": 3,
  "result": [
    {
      "arxiv_id": "2506.18871",
      "score": 0.9475,
      "title": "Paper Title",
      "tldr": "...",
      "abstract": "...",
      "authors": [{ "name": "...", "orgs": ["..."] }],
      "url": "https://arxiv.org/abs/2506.18871",
      "date": "2025-06-23T17:38:54Z",
      "citation_count": 217,
      "categories": ["cs.CV"],
      "contents": [{ "section_name": "...", "section_contents": ["..."] }],  // when return_contents=true
      "roc": ["..."]  // when return_roc=true
    }
  ]
}

🎁 Free Queries

These queries don't require a token (case-insensitive, exact match):

  • transformer
  • attention mechanism
  • large language model

⚠️ Migration note: Legacy parameters size, search_mode, bm25_weight, vector_weight, date_from, date_to are no longer supported. Use top_k, date_search_type, date_str instead.

🏥 PMC Endpoints

Access PubMed Central (PMC) research articles. PMC is a free full-text archive of biomedical and life sciences journal literature.

🎁 Free Testing

Papers PMC544940 and PMC514704 are available without authentication for testing purposes.

🌐 Base URL

https://data.rag.ac.cn/pmc/

📋 Get PMC Paper Metadata

GET /pmc/?type=head&pmc_id={PMC_ID}

Returns structured metadata including title, DOI, abstract, authors, categories, and publication date.

Parameters

Parameter Type Required Description
pmc_id string Required PMC paper ID (e.g., PMC544940, PMC514704)
type string Optional Must be "head" (default)
token string Optional API token (not required for free papers)

Response Fields

  • pmc_id: PMC paper ID
  • title: Paper title
  • doi: Digital Object Identifier
  • abstract: Paper abstract
  • authors: List of authors
  • categories: Medical subject categories
  • publish_at: Publication date

📊 Get PMC Complete JSON

GET /pmc/?type=json&pmc_id={PMC_ID}

Returns the complete structured JSON file with full paper content and metadata.

Parameters

Parameter Type Required Description
pmc_id string Required PMC paper ID (e.g., PMC544940, PMC514704)
type string Required Must be "json"
token string Optional API token (not required for free papers)

Quick Access

📋 PMC Metadata (PMC544940) 📊 Full JSON (PMC514704)

⚠️ Error Handling

HTTP Status Codes

Code Meaning Description
200 Success Request successful
400 Bad Request Invalid parameters
401 Unauthorized Invalid or missing token
404 Not Found Paper not found
429 Too Many Requests Rate limit exceeded
503 Service Unavailable Retrieval service error

📊 Rate Limits

Each API token has a daily limit of 10,000 free requests. When you exceed this limit, you'll receive a 429 Too Many Requests error. Need higher limits? Contact tommy[at]chien.io with your use case.

Checking Usage

GET /stats/usage?days=7

View your usage statistics for the past N days (1-30).

🎮 API Playground
Test APIs with live requests
Current Endpoint
head
Auto-updates as you scroll
Bash
Python
JavaScript
Click "Send Request" to test the API...