Architecture
How Arke's components work together — API, indexes, attestation, and applications.
Arke is a distributed system with several components that work together to provide a permanent, searchable knowledge graph.
┌─────────────────────────────────────────────────────────────────────────────┐
│ APPLICATIONS │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐ │
│ │ Arke Home │ │ AI Agents │ │ Processing Agents │ │
│ │ │ │ │ │ │ │
│ │ Web UI for │ │ LLMs acting │ │ OCR, structure │ │
│ │ browsing and │ │ on behalf of │ │ extraction, │ │
│ │ exploration │ │ users │ │ descriptions, etc. │ │
│ │ │ │ │ │ │ │
│ └────────┬────────┘ └────────┬────────┘ └────────────┬────────────┘ │
│ │ │ │ │
│ └─────────────────────┼─────────────────────────┘ │
│ │ │
│ │ API calls │
│ ▼ │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ ARKE API │
│ api.arke.institute │
│ │
│ Entities • Relationships • Files • Search • Versions • Permissions │
│ │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ CLOUD STORAGE │ │
│ │ │ │
│ │ Manifests (JSON) Tips Binary Files │ │
│ │ keyed by CID PI → CID mapping (images, PDFs) │ │
│ │ (content hash) (current version) │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
│ │
│ events │ events
▼ ▼
┌─────────────────────────┐ ┌─────────────────────────┐
│ INDEXING │ │ ATTESTATION │
│ │ │ │
│ Watches /events │ │ Queues every manifest │
│ every 60 seconds │ │ created on network │
│ │ │ │
│ ┌─────────────────┐ │ │ Signs and bundles │
│ │ Pinecone │ │ │ to Arweave │
│ │ (semantic search│ │ │ │
│ │ vectors) │ │ │ ┌─────────────────┐ │
│ └─────────────────┘ │ │ │ Arweave │ │
│ │ │ │ (permanent │ │
│ ┌─────────────────┐ │ │ │ chain of │ │
│ │ Neo4j │ │ │ │ all events) │ │
│ │ (graph index │ │ │ └─────────────────┘ │
│ │ for traversal) │ │ │ │
│ └─────────────────┘ │ │ Note: Binary files │
│ │ │ are NOT backed up │
└─────────────────────────┘ └─────────────────────────┘
Applications
Three types of actors interact with the Arke API:
- Arke Home — Web UI for browsing entities, managing collections, and exploring the knowledge graph
- AI Agents — LLMs and automation acting on behalf of users via API keys
- Processing Agents — Services that process entities (OCR, structure extraction, descriptions). See Building with Agents.
All three are equal actors on the network, calling the same API.
The API
The Arke API (api.arke.institute) is the core of the system:
- Entities — Create, read, update with full version history
- Relationships — Connect entities with bidirectional links
- Files — Upload and download binary content
- Search — Semantic similarity and graph traversal
- Permissions — Collection-based access control
Storage
| What | How |
|---|---|
| Manifests | JSON stored by CID (content hash). Immutable — same content always produces same CID. |
| Tips | Pointer from PI (persistent identifier) to current CID. Updated atomically on each version. |
| Binary files | Images, PDFs, etc. stored in cloud storage. Referenced by manifests but stored separately. |
When you update an entity:
- New manifest created with new CID
- Tip atomically updated to point to new CID
- Old version remains accessible via its CID
PI (01JXYZ...)
↓ tip lookup
Current CID (bafkrei...)
↓ fetch manifest
{ id, type, properties, relationships, ver, prev... }Indexing
The indexing system watches the network and maintains searchable indexes:
- Poller watches
/eventsevery 60 seconds for new entity versions - Pinecone stores vector embeddings for semantic search
- Neo4j stores the relationship graph for traversal queries
Semantic Search
Entity properties are embedded as vectors. Search queries are embedded the same way, returning entities with similar meaning.
Graph Index
All relationships are indexed for:
- Path finding between entities
- Relationship traversal
- Complex graph queries (Argo query language)
Attestation
Every entity manifest is permanently recorded on Arweave, creating an immutable audit trail.
How It Works
- When manifests are created, events are queued
- Events are signed and bundled together
- Bundles are uploaded to Arweave
- Each event links to the previous, forming a sequential chain
The Chain
Genesis (seq=0)
↓
Event 1 (seq=1, pi: 01JABC, cid: bafkrei...)
↓
Event 2 (seq=2, pi: 01JDEF, cid: bafyabc...)
↓
Event 3 (seq=3, pi: 01JABC, cid: bafynew...) ← same entity, new version
↓
... continues foreverThis chain is:
- Immutable — Changing any event breaks the chain
- Verifiable — Anyone can walk and verify the chain
- Permanent — Arweave stores data forever (one-time fee)
Binary files (images, PDFs, etc.) are not backed up to Arweave due to prohibitive costs. Only entity manifests are attested.
Design Principles
Content Addressing
Every manifest gets a SHA-256 hash (CID). Same content = same CID. This makes manifests immutable, verifiable, and cacheable forever.
Version Chaining
Each version links to its predecessor via prev. The complete history is preserved — no version is ever deleted.
Atomic Updates
Updates use Compare-And-Swap (CAS). If another update happened since you read the entity, yours fails with 409 Conflict. This prevents race conditions.