Introduction
Arke is a public knowledge network for storing, discovering, and connecting documents. Everything is a node in a graph -- searchable, interconnected, and permanently archived.
Arke is a public knowledge network for storing, discovering, and connecting documents. Upload files, and they become nodes in a graph -- searchable, interconnected, and permanently archived. Everything in the network -- your collections, individual files, chapters extracted from a book, you as a user -- is a node with its own permanent identity, connected to everything else.
Alpha -- Arke is currently in alpha. An invite code is required. All operations are free. If you're interested in early access, email nick@arkeon.tech.
Who Is This For
- Anyone preserving information -- researchers, archivists, journalists, investigators, librarians, genealogists. If you work with large amounts of textual or image-based data and care about its integrity and permanence, this is built for you.
- Developers and AI agents -- the API is designed for programmatic access. Build tools, processing agents, or applications on top of the network.
- Curators -- you don't have to upload anything. Start an empty collection and connect to content from across the network. Curation is contribution.
How It Works
Here's what happens when you use Arke, step by step.
1. Create a collection
A collection is a container for related content -- a research project, an archive, a dataset. It's the top-level organizational unit.
POST /collections
{
"label": "Civil War Correspondence",
"description": "Letters and documents from 1861-1865"
}2. Upload files
Upload documents into your collection. Supported formats include PDF, JPEG, PNG, TIFF, WebP, AVIF, GIF, and any text file. Each file is stored and content-hashed -- it gets a unique content identifier (CID) derived from the file contents. Same file uploaded twice? Stored once.
POST /files
{
"collection_id": "01ABC...",
"filename": "gettysburg_address.txt",
"content_type": "text/plain"
}3. Process with an agent
This is where static files become structured knowledge. You invoke a processing agent on your files -- either one of the official Arke agents or one you build yourself.
For a text file, the structure extraction agent reads the document and turns it from a flat, linear file into a network of entities. A book becomes chapters, sections, and passages -- each one its own node in the graph, with summaries at every level.
POST /agents/{agent_id}/invoke
{
"target": "01ABC...",
"input": {
"entity_ids": ["01FILE_XYZ..."]
},
"confirm": true
}For a scanned document, you might run the OCR agent first to extract text, then the structure extraction agent to break it apart. Agents can be chained.
Before processing:
Collection
└── gettysburg_address.txt (one flat file)
After structure extraction:
Collection
└── gettysburg_address.txt
├── Introduction (node with summary + text)
├── Body (node with summary + text)
│ ├── Section 1 (node with summary + text)
│ └── Section 2 (node with summary + text)
└── Conclusion (node with summary + text)Every one of those nodes is a full entity in the network -- searchable, linkable, and independently addressable.
4. Every entity gets an EIDOS manifest
Every node in the network -- whether it's a collection, a file, a chapter, or a user -- is stored as an EIDOS manifest. This is the universal schema:
{
"schema": "arke/eidos@v2",
"id": "01CHAPTER_ABC...",
"type": "chunk",
"created_at": "2025-01-15T10:30:00Z",
"ver": 1,
"ts": "2025-01-15T10:30:00Z",
"prev": null,
"properties": {
"title": "Introduction",
"summary": "Lincoln opens by invoking the founding principles...",
"text": "Four score and seven years ago..."
},
"relationships": [
{ "predicate": "parent", "peer": "01FILE_XYZ...", "peer_type": "file" },
{ "predicate": "after", "peer": "01CHAPTER_DEF...", "peer_type": "chunk" }
],
"edited_by": {
"user_id": "01AGENT_STRUCTURE...",
"method": "ai_generated"
}
}The manifest is hashed to produce a CID -- a permanent, verifiable identifier for that exact version of the entity. Change anything, and you get a new CID. The previous version is preserved via the prev link, forming an immutable version chain.
5. Automatic indexing
You don't need to do anything for this step. Workers continuously watch the network and index every new entity into:
- Semantic search (Pinecone) -- find content by meaning, not just keywords
- Graph index (Neo4j) -- traverse relationships, find connections across collections
The moment an entity is created, it becomes discoverable. Search within your own collection, or across the entire network.
6. Permanent archival
Every entity that enters the network is attested to Arweave -- a permanent, decentralized storage network. Arke maintains a continuous event chain on Arweave where each block points to the previous one. The full manifest is stored (not just a hash), meaning the entire network is independently reconstructable from the Arweave record alone.
This is the permanence guarantee: even if Arke's infrastructure disappeared, the data survives.
Arweave Event Chain:
[Block N] ──prev──▶ [Block N-1] ──prev──▶ [Block N-2] ──prev──▶ ...
│ │ │
▼ ▼ ▼
Full manifest Full manifest Full manifest
(entity created) (entity updated) (entity created)Why Structure Your Data This Way
Organizing documents as interconnected graph nodes instead of flat files has practical consequences:
Better AI access. An LLM looking for a quote in a book can navigate through chapters, read summaries at each level, and get results from multiple levels of the hierarchy. This is substantially more effective than flat chunking, where an AI has to scan through arbitrarily sliced text with no structural awareness.
Cross-collection discovery. When your entities are indexed with semantic embeddings, they surface alongside related content from other collections -- other people's research, other archives, other investigations. Connections you didn't know existed become visible.
Data that resists typical formats. This works for information that's unstructured or hard to represent in rows and columns -- things better expressed as graphs and interconnected entities.
Official Processing Agents
These agents are maintained by Arke and available to all users:
| Agent | Input | What It Does |
|---|---|---|
| Structure Extraction | Text files | Extracts hierarchical structure (chapters, sections, passages). Turns linear documents into a tree of entities with summaries at each level. |
| OCR | JPEG images | Extracts text from scanned/photographed documents. Handles handwritten and historical documents. Outputs markdown with embedded image references. |
| PDF Processor | PDF files | Splits PDFs into one JPEG per page, linked as derivatives of the original. |
| Image to JPEG | PNG, WebP, TIFF, AVIF, GIF | Converts images to standardized JPEG format for processing. |
| Description | Any entity | Generates a natural-language description of an entity based on its content and context. |
| Image Description | Image entities | Generates contextual descriptions for images using vision models and surrounding document context. |
You can also build your own agent. An agent is any external service that registers with Arke, receives signed job requests, and uses the API to read and write entities. See the Agent Developer Guide for the full specification.
Quick Reference
| Question | Answer |
|---|---|
| How do I get access? | Alpha is invite-only. Email nick@arkeon.tech. |
| Is it free? | Yes, all operations are free during alpha. |
| What can I upload? | PDF, JPEG, PNG, TIFF, WebP, AVIF, GIF, text files, and any binary file type via MIME type. |
| Is there an SDK? | Yes -- @arke-institute/sdk (TypeScript). Install with npm install @arke-institute/sdk. |
| Is there an API reference? | Yes -- Ops Reference and Interactive API Docs. |
| Can AI agents use the API? | Yes. Agents authenticate with API keys and operate with scoped, time-limited permissions on collections. |
| What's the base URL? | https://arke-v1.arke.institute |
| Is there a test network? | Yes. Set network: 'test' in the SDK. Test network entities use II-prefixed IDs and auto-expire after 30 days. |
Next Steps
- Key Concepts -- Entities, versioning, and relationships
- Architecture -- System design and storage layers
- For AI Agents -- How AI agents interact with Arke
- FAQ -- Common questions
Documentation Roadmap
Available now:
- Ops Reference -- All API operations with parameters and permissions
- Interactive API Docs -- Redoc-powered API explorer
- OpenAPI Spec -- Machine-readable API specification
Coming soon:
| Doc | Description |
|---|---|
| Getting Started Guide | Step-by-step walkthrough: create an account, set up a collection, upload and process your first document. |
| SDK Reference | Full documentation for @arke-institute/sdk -- installation, authentication, uploads, error handling. |
| Agent Developer Guide | How to build a custom processing agent: registration, authentication, receiving jobs, status reporting. |
| Official Agents Reference | Detailed documentation for each official agent: inputs, outputs, configuration options, examples. |
| EIDOS Schema Reference | The universal entity schema -- field definitions, type profiles, validation rules, extensibility. |
| LLM Documentation Index | Auto-generated index of all documentation for AI agent consumption (llms.txt). |