Project Objective
1. Web data is not agent-friendly Raw web content contains noise, ads, navigation elements, and inconsistent formatting that confuse AI agents.
2. Directly reading full data is not efficient Loading entire documents wastes tokens and time. Agents need smart previews to decide what to read and which sections matter.
3. We refine data to LLM-ready format Clean markdown with proper structure, semantic markup, and consistent formatting optimized for language model consumption.
4. We provide head files for intelligent decision-making Metadata headers include token counts, section summaries, key entities, and structural information—enabling agents to preview and select content efficiently without loading full documents.
5. First focus on academic data, accelerating science discovery Starting with arXiv papers and academic publications to enable AI4Science—helping researchers discover insights faster and advance scientific knowledge through intelligent data access.
API Documentation
Retrieve arXiv paper data in agent-friendly format.
1106.0001 - 1106.0010 are available without token for testing!
Parameters
- arxiv_id (required): arXiv paper ID (e.g., 2501.12345, 1106.0001)
- type
(optional): Data format to return
head- Metadata only (JSON)raw- Raw markdown content (JSON)markdown- Rendered HTML page (view in browser) ⭐- (omit) - Both head and raw (JSON)
- token (required except free papers): Your API token
Example Requests
📖 Try in Browser
Click these links to view formatted papers directly in your browser:
💡 Tip: These papers are free to access without a token. Perfect for testing!
Response Format
Ready to Get Started?
Register for an API token and start building your AI agent with access to structured, agent-friendly data.
Register for FreeDevelopment Roadmap
Complete collection of arXiv papers with structured metadata and markdown content.
Expanding to include papers from major academic publishers and conferences.
Comprehensive web content coverage with intelligent extraction and structuring.