Semantic Search
How to use Arke's semantic search powered by Pinecone vector embeddings.
Overview
Arke uses Pinecone for semantic similarity search. When entities are created or updated, their text content is embedded into vectors and indexed. You can search by meaning, not just keywords.
Search Endpoints
Arke provides specialized search endpoints for different use cases:
| Endpoint | Purpose |
|---|---|
POST /search/collections | Search for collections by text query |
POST /search/entities | Search entities within specific collection(s) |
POST /search/discover | Two-step discovery: find collections, then search within them |
POST /search/agents | Search for agents across the network |
POST /search/similar/collections | Find collections similar to a given collection |
POST /search/similar/items | Find items similar to a given entity (cross-collection) |
Basic Collection Search
POST /search/collections
Content-Type: application/json
Authorization: Bearer <token>
{
"query": "medical research papers",
"limit": 10
}Entity Search Within Collections
Search within one or more collections:
POST /search/entities
Content-Type: application/json
Authorization: Bearer <token>
{
"collection_pi": "01KFNR0H0Q791Y1SMZWEQ09FGV",
"query": "cetology research",
"limit": 20,
"types": ["chapter", "document"]
}Or search multiple collections in parallel:
{
"collection_pis": ["01KCOLL1...", "01KCOLL2..."],
"query": "whale sightings",
"limit": 20,
"per_collection_limit": 5
}Discovery Search
When you do not know which collections to search, use the discover endpoint:
POST /search/discover
Content-Type: application/json
Authorization: Bearer <token>
{
"query": "white whale sightings",
"limit": 20,
"collection_limit": 10,
"per_collection_limit": 5,
"types": ["file", "document"]
}This performs a two-step search:
- Finds collections semantically related to your query
- Searches within each collection in parallel
- Aggregates and ranks results across all collections
Entity Expansion
All search endpoints support the expand parameter to control how much entity data is returned:
| Value | Description |
|---|---|
"preview" (default) | Lightweight preview with label, timestamps, truncated description |
"full" | Complete entity manifest with all properties and relationships |
"none" | Search metadata only (fastest, smallest payload) |
Example with expansion:
{
"query": "research papers",
"limit": 10,
"expand": "preview"
}Response includes entity_preview or entity field depending on mode:
{
"results": [
{
"pi": "01KENTITY...",
"label": "Research Paper.pdf",
"type": "file",
"score": 0.92,
"collection_pi": "01KCOLL...",
"entity_preview": {
"id": "01KENTITY...",
"type": "file",
"label": "Research Paper.pdf",
"description_preview": "Analysis of entity management patterns...",
"created_at": "2025-01-15T10:00:00.000Z",
"updated_at": "2025-01-20T14:30:00.000Z"
}
}
],
"metadata": {
"query": "research papers",
"result_count": 1
}
}Similarity Search
Find entities similar to a known entity:
POST /search/similar/items
Content-Type: application/json
Authorization: Bearer <token>
{
"pi": "01KENTITY...",
"collection_pi": "01KCOLL...",
"limit": 20,
"tier1_limit": 10,
"tier2_limit": 5,
"include_same_collection": true
}This performs a two-tier search:
- Finds collections similar to the entity's collection
- Searches within each collection for similar items
- Aggregates results with diversity weighting
How Indexing Works
- Entity is created or updated
- Text content is extracted (from properties, OCR output, etc.)
- Content is embedded into a vector using the embedding model
- Vector is stored in Pinecone with entity metadata
- Searches compare query vectors against indexed vectors using cosine similarity
Two-Tier Discovery Model
Arke uses a two-tier search model for discovery:
- Collection-level -- Find collections with similar content
- Entity-level -- Drill into specific files within collections
This keeps costs sustainable while encouraging thoughtful curation.