🤖 Agentic Data Interface

Unified Data Protocol for AI Agents

Project Objective

1. Web data is not agent-friendly Raw web content contains noise, ads, navigation elements, and inconsistent formatting that confuse AI agents.

2. Directly reading full data is not efficient Loading entire documents wastes tokens and time. Agents need smart previews to decide what to read and which sections matter.

3. We refine data to LLM-ready format Clean markdown with proper structure, semantic markup, and consistent formatting optimized for language model consumption.

4. We provide head files for intelligent decision-making Metadata headers include token counts, section summaries, key entities, and structural information—enabling agents to preview and select content efficiently without loading full documents.

5. First focus on academic data, accelerating science discovery Starting with arXiv papers and academic publications to enable AI4Science—helping researchers discover insights faster and advance scientific knowledge through intelligent data access.

API Documentation

GET https://data.rag.ac.cn/arxiv/

Retrieve arXiv paper data in agent-friendly format.

🎁 Free Access: Papers 1106.0001 - 1106.0010 are available without token for testing!

Parameters

  • arxiv_id (required): arXiv paper ID (e.g., 2501.12345, 1106.0001)
  • type (optional): Data format to return
    • head - Metadata only (JSON)
    • raw - Raw markdown content (JSON)
    • markdown - Rendered HTML page (view in browser) ⭐
    • (omit) - Both head and raw (JSON)
  • token (required except free papers): Your API token

Example Requests

# Free access - no token needed curl "https://data.rag.ac.cn/arxiv/?arxiv_id=1106.0001" # With authentication for other papers curl "https://data.rag.ac.cn/arxiv/?arxiv_id=2501.12345&token=YOUR_TOKEN" # Get only metadata header curl "https://data.rag.ac.cn/arxiv/?arxiv_id=1106.0001&type=head" # Get only content curl "https://data.rag.ac.cn/arxiv/?arxiv_id=1106.0001&type=raw" # View formatted paper in browser (type=markdown) curl "https://data.rag.ac.cn/arxiv/?arxiv_id=1106.0001&type=markdown"

📖 Try in Browser

Click these links to view formatted papers directly in your browser:

💡 Tip: These papers are free to access without a token. Perfect for testing!

Response Format

{ "arxiv_id": "1106.0001", "status": "processed", "head": { // Metadata: token count, sections, etc. }, "raw": "# Paper Title\n\n## Abstract\n\n..." }

Ready to Get Started?

Register for an API token and start building your AI agent with access to structured, agent-friendly data.

Register for Free

Development Roadmap

Phase 1: arXiv Papers Live

Complete collection of arXiv papers with structured metadata and markdown content.

Phase 2: Academic Papers Planned

Expanding to include papers from major academic publishers and conferences.

Phase 3: General Web Pages Planned

Comprehensive web content coverage with intelligent extraction and structuring.