Architecture

How Arke's components work together — API, indexes, attestation, and applications.

Arke is a distributed system with several components that work together to provide a permanent, searchable knowledge graph.

┌─────────────────────────────────────────────────────────────────────────────┐
│                             APPLICATIONS                                     │
│                                                                              │
│  ┌─────────────────┐   ┌─────────────────┐   ┌─────────────────────────┐    │
│  │    Arke Home    │   │    AI Agents    │   │   Processing Agents     │    │
│  │                 │   │                 │   │                         │    │
│  │  Web UI for     │   │  LLMs acting    │   │  OCR, structure         │    │
│  │  browsing and   │   │  on behalf of   │   │  extraction,            │    │
│  │  exploration    │   │  users          │   │  descriptions, etc.     │    │
│  │                 │   │                 │   │                         │    │
│  └────────┬────────┘   └────────┬────────┘   └────────────┬────────────┘    │
│           │                     │                         │                  │
│           └─────────────────────┼─────────────────────────┘                  │
│                                 │                                            │
│                                 │ API calls                                  │
│                                 ▼                                            │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                                ARKE API                                     │
│                          api.arke.institute                             │
│                                                                             │
│   Entities • Relationships • Files • Search • Versions • Permissions        │
│                                                                             │
│  ┌───────────────────────────────────────────────────────────────────────┐  │
│  │                          CLOUD STORAGE                                │  │
│  │                                                                       │  │
│  │   Manifests (JSON)              Tips                 Binary Files     │  │
│  │   keyed by CID              PI → CID mapping         (images, PDFs)   │  │
│  │   (content hash)            (current version)                         │  │
│  │                                                                       │  │
│  └───────────────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────────────┘
                        │                              │
                        │ events                       │ events
                        ▼                              ▼
         ┌─────────────────────────┐    ┌─────────────────────────┐
         │        INDEXING         │    │      ATTESTATION        │
         │                         │    │                         │
         │  Watches /events        │    │  Queues every manifest  │
         │  every 60 seconds       │    │  created on network     │
         │                         │    │                         │
         │  ┌─────────────────┐    │    │  Signs and bundles      │
         │  │    Pinecone     │    │    │  to Arweave             │
         │  │ (semantic search│    │    │                         │
         │  │   vectors)      │    │    │  ┌─────────────────┐    │
         │  └─────────────────┘    │    │  │    Arweave      │    │
         │                         │    │  │  (permanent     │    │
         │  ┌─────────────────┐    │    │  │   chain of      │    │
         │  │     Neo4j       │    │    │  │   all events)   │    │
         │  │ (graph index    │    │    │  └─────────────────┘    │
         │  │  for traversal) │    │    │                         │
         │  └─────────────────┘    │    │  Note: Binary files     │
         │                         │    │  are NOT backed up      │
         └─────────────────────────┘    └─────────────────────────┘

Applications

Three types of actors interact with the Arke API:

Arke Home — Web UI for browsing entities, managing collections, and exploring the knowledge graph
AI Agents — LLMs and automation acting on behalf of users via API keys
Processing Agents — Services that process entities (OCR, structure extraction, descriptions). See Building with Agents.

All three are equal actors on the network, calling the same API.

The API

The Arke API (api.arke.institute) is the core of the system:

Entities — Create, read, update with full version history
Relationships — Connect entities with bidirectional links
Files — Upload and download binary content
Search — Semantic similarity and graph traversal
Permissions — Collection-based access control

Storage

What	How
Manifests	JSON stored by CID (content hash). Immutable — same content always produces same CID.
Tips	Pointer from PI (persistent identifier) to current CID. Updated atomically on each version.
Binary files	Images, PDFs, etc. stored in cloud storage. Referenced by manifests but stored separately.

When you update an entity:

New manifest created with new CID
Tip atomically updated to point to new CID
Old version remains accessible via its CID

PI (01JXYZ...)
    ↓ tip lookup
Current CID (bafkrei...)
    ↓ fetch manifest
{ id, type, properties, relationships, ver, prev... }

Indexing

The indexing system watches the network and maintains searchable indexes:

Poller watches /events every 60 seconds for new entity versions
Pinecone stores vector embeddings for semantic search
Neo4j stores the relationship graph for traversal queries

Semantic Search

Entity properties are embedded as vectors. Search queries are embedded the same way, returning entities with similar meaning.

Graph Index

All relationships are indexed for:

Path finding between entities
Relationship traversal
Complex graph queries (Argo query language)

Attestation

Every entity manifest is permanently recorded on Arweave, creating an immutable audit trail.

How It Works

When manifests are created, events are queued
Events are signed and bundled together
Bundles are uploaded to Arweave
Each event links to the previous, forming a sequential chain

The Chain

Genesis (seq=0)
    ↓
Event 1 (seq=1, pi: 01JABC, cid: bafkrei...)
    ↓
Event 2 (seq=2, pi: 01JDEF, cid: bafyabc...)
    ↓
Event 3 (seq=3, pi: 01JABC, cid: bafynew...)  ← same entity, new version
    ↓
... continues forever

This chain is:

Immutable — Changing any event breaks the chain
Verifiable — Anyone can walk and verify the chain
Permanent — Arweave stores data forever (one-time fee)

Binary files (images, PDFs, etc.) are not backed up to Arweave due to prohibitive costs. Only entity manifests are attested.

Design Principles

Content Addressing

Every manifest gets a SHA-256 hash (CID). Same content = same CID. This makes manifests immutable, verifiable, and cacheable forever.

Version Chaining

Each version links to its predecessor via prev. The complete history is preserved — no version is ever deleted.

Atomic Updates

Updates use Compare-And-Swap (CAS). If another update happened since you read the entity, yours fails with 409 Conflict. This prevents race conditions.

On this page