Arke
BuildSearch & Query

Semantic Search

How to use Arke's semantic search powered by Pinecone vector embeddings.

Overview

Arke uses Pinecone for semantic similarity search. When entities are created or updated, their text content is embedded into vectors and indexed. You can search by meaning, not just keywords.

POST /search
Content-Type: application/json
Authorization: Bearer <token>

{
  "query": "climate change policy documents",
  "limit": 10
}

Filtering

Filter results by type, collection, or other properties:

{
  "query": "medical research",
  "type": "file",
  "collection_id": "01JCOL...",
  "limit": 20
}

How Indexing Works

  1. Entity is created or updated
  2. Text content is extracted (from properties, OCR output, etc.)
  3. Content is embedded into a vector using the embedding model
  4. Vector is stored in Pinecone with entity metadata
  5. Searches compare query vectors against indexed vectors using cosine similarity

Two-Tier Discovery

Arke uses a two-tier search model:

  1. Collection-level -- Find collections with similar content
  2. Entity-level -- Drill into specific files within collections

This keeps costs sustainable while encouraging thoughtful curation.

On this page