Vector search & embeddings¶
Neo4j 5.11+ supports native vector indexes for similarity search. The ORM provides end-to-end support: declare vector properties on your models, generate DDL, search via the query builder or sessions, and plug in any embedding provider.
Declare vector properties¶
Add __vector_indexes__ to a NodeModel. Each key is a property name, each value is a
VectorProperty descriptor:
from cypher_validator import NodeModel, VectorProperty
class Document(NodeModel):
__label__ = "Document"
__vector_indexes__ = {
"embedding": VectorProperty(dimensions=1536, similarity="cosine"),
}
title: str
content: str
embedding: list[float] = []
VectorProperty accepts dimensions (required) and similarity ("cosine" or
"euclidean", default "cosine").
Create the vector index¶
SchemaDDL.vector_indexes() generates the DDL. generate_all() includes it
automatically:
from cypher_validator import GraphSchema, SchemaDDL
schema = GraphSchema.from_models([Document])
ddl = SchemaDDL(schema)
for stmt in ddl.generate_all():
db.execute(stmt)
# Or just the vector indexes:
for stmt in ddl.vector_indexes():
db.execute(stmt)
The generated statement:
CREATE VECTOR INDEX idx_document_embedding_vector IF NOT EXISTS
FOR (n:Document) ON (n.embedding)
OPTIONS {indexConfig: {`vector.dimensions`: 1536, `vector.similarity_function`: 'cosine'}}
Search with the Query builder¶
vector_search()¶
from cypher_validator import Query
q = (Query()
.vector_search("idx_document_embedding_vector", query_vector, top_k=5)
.where("score > 0.8")
.return_("node.title", "score"))
cypher, params = q.build()
vector_search_model()¶
Derives the index name from the model automatically:
q = (Query()
.vector_search_model(Document, "embedding", query_vector, top_k=5)
.return_("node", "score"))
Search with sessions¶
GraphSession and AsyncGraphSession provide high-level methods that execute the
search and hydrate results into Pydantic models:
from cypher_validator import GraphSession
session = GraphSession(db, schema)
# Vector search — pass a pre-computed vector
results = session.vector_search(Document, "embedding", query_vector, top_k=5)
for r in results:
print(r["node"].title, r["score"]) # r["node"] is a Document instance
# Semantic search — pass text, let the embedding function handle it
results = session.semantic_search(
Document, "embedding", "what is graph RAG?", embedding_fn=embed, top_k=5
)
Async version:
from cypher_validator import AsyncGraphSession
async with AsyncGraphSession(db, schema) as session:
results = await session.vector_search(Document, "embedding", query_vector)
results = await session.semantic_search(
Document, "embedding", "graph databases", embedding_fn=embed
)
Embedding adapters¶
The cypher_validator.embeddings module provides adapters for common embedding providers.
All implement the EmbeddingFn protocol (__call__(text: str) -> list[float]) and the
BatchEmbeddingFn protocol (adds .batch(texts) -> list[list[float]]).
OpenAI¶
from cypher_validator.embeddings import OpenAIEmbeddings
embed = OpenAIEmbeddings(model="text-embedding-3-small")
# or with explicit API key:
embed = OpenAIEmbeddings(model="text-embedding-3-small", api_key="sk-...")
vector = embed("what is a knowledge graph?")
vectors = embed.batch(["text one", "text two"])
Requires: pip install openai
Sentence Transformers¶
from cypher_validator.embeddings import SentenceTransformerEmbeddings
embed = SentenceTransformerEmbeddings(model="all-MiniLM-L6-v2")
vector = embed("local embedding model")
Requires: pip install sentence-transformers
Cohere¶
from cypher_validator.embeddings import CohereEmbeddings
embed = CohereEmbeddings(model="embed-english-v3.0", api_key="...")
vector = embed("cohere embedding")
Requires: pip install cohere
Custom adapters¶
Any callable matching the protocol works:
from cypher_validator.embeddings import EmbeddingFn
def my_embed(text: str) -> list[float]:
return [0.0] * 384 # your embedding logic
assert isinstance(my_embed, EmbeddingFn) # runtime_checkable
session.semantic_search(Document, "embedding", "query", embedding_fn=my_embed)
CLI¶
The cypher vector-search command runs vector searches from the terminal:
# With a pre-computed vector (JSON array)
cypher vector-search --index idx_document_embedding_vector \
--vector '[0.1, 0.2, ...]' --top-k 5
# With text + embedding provider
cypher vector-search --index idx_document_embedding_vector \
--text "graph databases" --provider openai --top-k 5
Requires NEO4J_URI, NEO4J_USERNAME, NEO4J_PASSWORD environment variables and
(for --provider openai) an OPENAI_API_KEY.
End-to-end example¶
from cypher_validator import (
NodeModel, VectorProperty, GraphSchema, SchemaDDL, GraphSession,
)
from cypher_validator.embeddings import OpenAIEmbeddings
# 1. Define model with vector index
class Article(NodeModel):
__label__ = "Article"
__vector_indexes__ = {
"embedding": VectorProperty(dimensions=1536, similarity="cosine"),
}
title: str
embedding: list[float] = []
# 2. Create schema + apply DDL
schema = GraphSchema.from_models([Article])
session = GraphSession(db, schema)
session.apply_ddl()
# 3. Ingest with embeddings
embed = OpenAIEmbeddings()
for title in ["Graph RAG overview", "Neo4j fundamentals"]:
vec = embed(title)
session.create(Article(title=title, embedding=vec))
# 4. Search
results = session.semantic_search(
Article, "embedding", "how do knowledge graphs work?",
embedding_fn=embed, top_k=3
)
for r in results:
print(f"{r['node'].title} (score: {r['score']:.3f})")