Introduction

Arke is a public knowledge network for storing, discovering, and connecting documents. Everything is a node in a graph -- searchable, interconnected, and permanently archived.

Arke is a public knowledge network for storing, discovering, and connecting documents. Upload files, and they become nodes in a graph -- searchable, interconnected, and permanently archived. Everything in the network -- your collections, individual files, chapters extracted from a book, you as a user -- is a node with its own permanent identity, connected to everything else.

Alpha -- Arke is currently in alpha. An invite code is required. All operations are free. If you're interested in early access, email nick@arkeon.tech.

Who Is This For

Anyone preserving information -- researchers, archivists, journalists, investigators, librarians, genealogists. If you work with large amounts of textual or image-based data and care about its integrity and permanence, this is built for you.
Developers and AI agents -- the API is designed for programmatic access. Build tools, processing agents, or applications on top of the network.
Curators -- you don't have to upload anything. Start an empty collection and connect to content from across the network. Curation is contribution.

How It Works

Here's what happens when you use Arke, step by step.

1. Create a collection

A collection is a container for related content -- a research project, an archive, a dataset. It's the top-level organizational unit.

POST /collections
{
  "label": "Civil War Correspondence",
  "description": "Letters and documents from 1861-1865"
}

2. Upload files

Upload documents into your collection. Supported formats include PDF, JPEG, PNG, TIFF, WebP, AVIF, GIF, and any text file. Each file is stored and content-hashed -- it gets a unique content identifier (CID) derived from the file contents. Same file uploaded twice? Stored once.

POST /files
{
  "collection_id": "01ABC...",
  "filename": "gettysburg_address.txt",
  "content_type": "text/plain"
}

3. Process with an agent

This is where static files become structured knowledge. You invoke a processing agent on your files -- either one of the official Arke agents or one you build yourself.

For a text file, the structure extraction agent reads the document and turns it from a flat, linear file into a network of entities. A book becomes chapters, sections, and passages -- each one its own node in the graph, with summaries at every level.

POST /agents/{agent_id}/invoke
{
  "target": "01ABC...",
  "input": {
    "entity_ids": ["01FILE_XYZ..."]
  },
  "confirm": true
}

For a scanned document, you might run the OCR agent first to extract text, then the structure extraction agent to break it apart. Agents can be chained.

Before processing:

    Collection
    └── gettysburg_address.txt  (one flat file)

After structure extraction:

    Collection
    └── gettysburg_address.txt
        ├── Introduction (node with summary + text)
        ├── Body (node with summary + text)
        │   ├── Section 1 (node with summary + text)
        │   └── Section 2 (node with summary + text)
        └── Conclusion (node with summary + text)

Every one of those nodes is a full entity in the network -- searchable, linkable, and independently addressable.

4. Every entity gets an EIDOS manifest

Every node in the network -- whether it's a collection, a file, a chapter, or a user -- is stored as an EIDOS manifest. This is the universal schema:

{
  "schema": "arke/eidos@v2",
  "id": "01CHAPTER_ABC...",
  "type": "chunk",
  "created_at": "2025-01-15T10:30:00Z",
  "ver": 1,
  "ts": "2025-01-15T10:30:00Z",
  "prev": null,
  "properties": {
    "title": "Introduction",
    "summary": "Lincoln opens by invoking the founding principles...",
    "text": "Four score and seven years ago..."
  },
  "relationships": [
    { "predicate": "parent", "peer": "01FILE_XYZ...", "peer_type": "file" },
    { "predicate": "after", "peer": "01CHAPTER_DEF...", "peer_type": "chunk" }
  ],
  "edited_by": {
    "user_id": "01AGENT_STRUCTURE...",
    "method": "ai_generated"
  }
}

The manifest is hashed to produce a CID -- a permanent, verifiable identifier for that exact version of the entity. Change anything, and you get a new CID. The previous version is preserved via the prev link, forming an immutable version chain.

5. Automatic indexing

You don't need to do anything for this step. Workers continuously watch the network and index every new entity into:

Semantic search (Pinecone) -- find content by meaning, not just keywords
Graph index (Neo4j) -- traverse relationships, find connections across collections

The moment an entity is created, it becomes discoverable. Search within your own collection, or across the entire network.

6. Permanent archival

Every entity that enters the network is attested to Arweave -- a permanent, decentralized storage network. Arke maintains a continuous event chain on Arweave where each block points to the previous one. The full manifest is stored (not just a hash), meaning the entire network is independently reconstructable from the Arweave record alone.

This is the permanence guarantee: even if Arke's infrastructure disappeared, the data survives.

Arweave Event Chain:

  [Block N] ──prev──▶ [Block N-1] ──prev──▶ [Block N-2] ──prev──▶ ...
     │                    │                      │
     ▼                    ▼                      ▼
  Full manifest       Full manifest          Full manifest
  (entity created)    (entity updated)       (entity created)

Why Structure Your Data This Way

Organizing documents as interconnected graph nodes instead of flat files has practical consequences:

Better AI access. An LLM looking for a quote in a book can navigate through chapters, read summaries at each level, and get results from multiple levels of the hierarchy. This is substantially more effective than flat chunking, where an AI has to scan through arbitrarily sliced text with no structural awareness.

Cross-collection discovery. When your entities are indexed with semantic embeddings, they surface alongside related content from other collections -- other people's research, other archives, other investigations. Connections you didn't know existed become visible.

Data that resists typical formats. This works for information that's unstructured or hard to represent in rows and columns -- things better expressed as graphs and interconnected entities.

Official Processing Agents

These agents are maintained by Arke and available to all users:

Agent	Input	What It Does
Structure Extraction	Text files	Extracts hierarchical structure (chapters, sections, passages). Turns linear documents into a tree of entities with summaries at each level.
OCR	JPEG images	Extracts text from scanned/photographed documents. Handles handwritten and historical documents. Outputs markdown with embedded image references.
PDF Processor	PDF files	Splits PDFs into one JPEG per page, linked as derivatives of the original.
Image to JPEG	PNG, WebP, TIFF, AVIF, GIF	Converts images to standardized JPEG format for processing.
Description	Any entity	Generates a natural-language description of an entity based on its content and context.
Image Description	Image entities	Generates contextual descriptions for images using vision models and surrounding document context.

You can also build your own agent. An agent is any external service that registers with Arke, receives signed job requests, and uses the API to read and write entities. See the Agent Developer Guide for the full specification.

Quick Reference

Question	Answer
How do I get access?	Alpha is invite-only. Email nick@arkeon.tech.
Is it free?	Yes, all operations are free during alpha.
What can I upload?	PDF, JPEG, PNG, TIFF, WebP, AVIF, GIF, text files, and any binary file type via MIME type.
Is there an SDK?	Yes -- `@arke-institute/sdk` (TypeScript). Install with `npm install @arke-institute/sdk`.
Is there an API reference?	Yes -- Ops Reference and Interactive API Docs.
Can AI agents use the API?	Yes. Agents authenticate with API keys and operate with scoped, time-limited permissions on collections.
What's the base URL?	`https://arke-v1.arke.institute`
Is there a test network?	Yes. Set `network: 'test'` in the SDK. Test network entities use `II`-prefixed IDs and auto-expire after 30 days.

Next Steps

Key Concepts -- Entities, versioning, and relationships
Architecture -- System design and storage layers
For AI Agents -- How AI agents interact with Arke
FAQ -- Common questions

Documentation Roadmap

Available now:

Ops Reference -- All API operations with parameters and permissions
Interactive API Docs -- Redoc-powered API explorer
OpenAPI Spec -- Machine-readable API specification

Coming soon:

Doc	Description
Getting Started Guide	Step-by-step walkthrough: create an account, set up a collection, upload and process your first document.
SDK Reference	Full documentation for `@arke-institute/sdk` -- installation, authentication, uploads, error handling.
Agent Developer Guide	How to build a custom processing agent: registration, authentication, receiving jobs, status reporting.
Official Agents Reference	Detailed documentation for each official agent: inputs, outputs, configuration options, examples.
EIDOS Schema Reference	The universal entity schema -- field definitions, type profiles, validation rules, extensibility.
LLM Documentation Index	Auto-generated index of all documentation for AI agent consumption (`llms.txt`).

On this page