Agentic Data Interface

Project Objective

1. Web data is not agent-friendly Raw web content contains noise, ads, navigation elements, and inconsistent formatting that confuse AI agents.

2. Directly reading full data is not efficient Loading entire documents wastes tokens and time. Agents need smart previews to decide what to read and which sections matter.

3. We refine data to LLM-ready format Clean markdown with proper structure, semantic markup, and consistent formatting optimized for language model consumption.

4. We provide head files for intelligent decision-making Metadata headers include token counts, section summaries, key entities, and structural information—enabling agents to preview and select content efficiently without loading full documents.

5. First focus on academic data, accelerating science discovery Starting with arXiv papers and academic publications to enable AI4Science—helping researchers discover insights faster and advance scientific knowledge through intelligent data access.

API Documentation

GET https://data.rag.ac.cn/arxiv/

Retrieve arXiv paper data in agent-friendly format.

🎁 Free Access: Papers 1106.0001 - 1106.0010 are available without token for testing!

Parameters

arxiv_id (required): arXiv paper ID (e.g., 2501.12345, 1106.0001)
type (optional): Data format to return
- head - Metadata only (JSON)
- raw - Raw markdown content (JSON)
- markdown - Rendered HTML page (view in browser) ⭐
- (omit) - Both head and raw (JSON)
token (required except free papers): Your API token

Example Requests

# Free access - no token needed
curl "https://data.rag.ac.cn/arxiv/?arxiv_id=1106.0001"

# With authentication for other papers
curl "https://data.rag.ac.cn/arxiv/?arxiv_id=2501.12345&token=YOUR_TOKEN"

# Get only metadata header
curl "https://data.rag.ac.cn/arxiv/?arxiv_id=1106.0001&type=head"

# Get only content
curl "https://data.rag.ac.cn/arxiv/?arxiv_id=1106.0001&type=raw"

# View formatted paper in browser (type=markdown)
curl "https://data.rag.ac.cn/arxiv/?arxiv_id=1106.0001&type=markdown"
                        

📖 Try in Browser

Click these links to view formatted papers directly in your browser:

📄 1106.0001 📄 1106.0002 📄 1106.0003 📄 1106.0004

💡 Tip: These papers are free to access without a token. Perfect for testing!

Response Format

{
  "arxiv_id": "1106.0001",
  "status": "processed",
  "head": {
    // Metadata: token count, sections, etc.
  },
  "raw": "# Paper Title\n\n## Abstract\n\n..."
}
                        

Development Roadmap

Phase 1: arXiv Papers Live

Complete collection of arXiv papers with structured metadata and markdown content.

Phase 2: Academic Papers Planned

Expanding to include papers from major academic publishers and conferences.

Phase 3: General Web Pages Planned

Comprehensive web content coverage with intelligent extraction and structuring.

🤖 Agentic Data Interface

Project Objective

API Documentation

Parameters

Example Requests

📖 Try in Browser

Response Format

Ready to Get Started?

Development Roadmap