Semantic Search

Overview

Arke uses Pinecone for semantic similarity search. When entities are created or updated, their text content is embedded into vectors and indexed. You can search by meaning, not just keywords.

Basic Search

POST /search
Content-Type: application/json
Authorization: Bearer <token>

{
  "query": "climate change policy documents",
  "limit": 10
}

Filtering

Filter results by type, collection, or other properties:

{
  "query": "medical research",
  "type": "file",
  "collection_id": "01JCOL...",
  "limit": 20
}

How Indexing Works

Entity is created or updated
Text content is extracted (from properties, OCR output, etc.)
Content is embedded into a vector using the embedding model
Vector is stored in Pinecone with entity metadata
Searches compare query vectors against indexed vectors using cosine similarity

Two-Tier Discovery

Arke uses a two-tier search model:

Collection-level -- Find collections with similar content
Entity-level -- Drill into specific files within collections

This keeps costs sustainable while encouraging thoughtful curation.

Overview

Basic Search

Filtering

How Indexing Works

Two-Tier Discovery

On this page