Skip to content

Models

NodeModel and RelationshipModel are the two Pydantic base classes that define your graph schema. They auto-register on subclass creation, so GraphSchema.from_registry() discovers every model in your project without you having to enumerate them.

Both classes inherit from pydantic.BaseModel with populate_by_name=True and extra="allow".

NodeModel

from cypher_validator import NodeModel

class Person(NodeModel):
    __label__ = "Person"
    __description__ = "A human in the graph"
    __constraints__ = []
    __indexes__ = []

    name: str
    age: int = 0
    email: str | None = None

Class-level attributes

Attribute Type Default Purpose
__label__ str class name Primary Cypher label.
__labels__ list[str] [] Multi-label support — e.g. ["Person", "Employee"].
__description__ str "" Human / LLM description. Surfaces in to_schema_description().
__constraints__ list[str] [] Custom Cypher DDL — picked up by SchemaDDL.custom_constraints.
__indexes__ list[str] [] Custom Cypher DDL — picked up by SchemaDDL.custom_indexes.
__vector_indexes__ dict[str, VectorProperty] {} Vector index declarations — picked up by SchemaDDL.vector_indexes. See Vector search.

Class methods

Method Returns Notes
label() str Primary label (first of __labels__ if set).
labels() list[str] All labels for multi-label nodes.
labels_cypher() str Formatted for Cypher: ":Person:Employee".
property_names() list[str] Order of declaration.
property_types() dict[str, str] {"age": "int", "name": "str", …}
required_properties() list[str] Fields without a default.
optional_properties() list[str] Fields with a default.
from_record(record, key=None) NodeModel Hydrate from a Neo4j record dict. Handles raw dicts and neo4j.graph.Node.
from_records(records, key=None) list[NodeModel] Vectorised hydration.
match_cypher(var="n", where=None) (str, dict) MATCH (n:Label) [WHERE …] RETURN n
to_schema_description() str LLM-readable description block.

Instance methods

Method Returns Notes
to_property_map() dict[str, Any] Drops Nones. Suitable for Cypher $params.
to_create_cypher(var="n") (str, dict) CREATE (n:Label {props}) RETURN n.
to_merge_cypher(var="n", merge_keys=None) (str, dict) MERGE (n:Label {keys}) ON CREATE/MATCH SET … RETURN n.
alice = Person(name="Alice", age=30)

cypher, params = alice.to_create_cypher("p")
# CREATE (p:Person {name: $p_name, age: $p_age}) RETURN p
# params = {"p_name": "Alice", "p_age": 30}

cypher, params = alice.to_merge_cypher(merge_keys=["name"])
# MERGE (n:Person {name: $n_name})
#   ON CREATE SET n.age = $n_age
#   ON MATCH SET n.age = $n_age
# RETURN n

Merge-key validation

to_merge_cypher() raises ValueError if any merge_key is not a declared property of the model. This protects against silent typos like merge_keys=["nme"].

Auto-registry

When a NodeModel subclass is defined, its metaclass adds it to the global _NODE_REGISTRY keyed by label(). This powers:

  • GraphSchema.from_registry() — discover all models without listing them.
  • Dynamic schemas built via the node() / relationship() factories.
class Movie(NodeModel):
    __label__ = "Movie"
    title: str

# Movie is now in _NODE_REGISTRY["Movie"].

from cypher_validator import GraphSchema
schema = GraphSchema.from_registry()
assert any(m.label() == "Movie" for m in schema.node_models)

VectorProperty

Declare vector indexes on node properties for similarity search (Neo4j 5.11+):

from cypher_validator import NodeModel, VectorProperty

class Document(NodeModel):
    __label__ = "Document"
    __vector_indexes__ = {
        "embedding": VectorProperty(dimensions=1536, similarity="cosine"),
    }
    title: str
    embedding: list[float] = []
Parameter Type Default Notes
dimensions int required Vector dimensionality (e.g. 1536 for OpenAI text-embedding-3-small).
similarity str "cosine" "cosine" or "euclidean".

SchemaDDL.vector_indexes() reads these declarations to generate CREATE VECTOR INDEX statements. SchemaDDL.generate_all() includes them automatically, and drop_all() drops them. See DDL and Vector search for the full workflow.

RelationshipModel

from cypher_validator import RelationshipModel

class ActedIn(RelationshipModel):
    __source__ = Person
    __target__ = Movie
    __rel_type__ = "ACTED_IN"
    __description__ = "Person performed in Movie"

    roles: list[str] = []
    year: int | None = None

Class-level attributes

Attribute Type Default Purpose
__source__ Type[NodeModel] required Source node class.
__target__ Type[NodeModel] required Target node class.
__rel_type__ str _to_upper_snake(class_name) Cypher rel-type string.
__description__ str "" Human / LLM description.
__constraints__ list[str] [] Custom DDL.

If you omit __rel_type__, the class name is converted from CamelCase to UPPER_SNAKE_CASE automatically (e.g. ActedInACTED_IN).

Class methods

Method Returns Notes
rel_type() str Computed Cypher rel-type.
source_label() str __source__.label().
target_label() str __target__.label().
property_names() list[str]
property_types() dict[str, str]
required_properties() list[str]
from_record(record, key=None) RelationshipModel Hydrate from a Neo4j record.
to_schema_description() str Block describing pattern + properties.

Instance methods

to_property_map() → dict is identical to NodeModel.

to_create_cypher(src_var="a", tgt_var="b", rel_var="r", src_match=None, tgt_match=None) generates a full MATCH src, tgt CREATE (src)-[:R {props}]->(tgt) RETURN r:

rel = ActedIn(roles=["Trinity"])
cypher, params = rel.to_create_cypher(
    src_match={"name": "Carrie-Anne Moss"},
    tgt_match={"title": "The Matrix"},
)
# MATCH (a:Person), (b:Movie)
#   WHERE a.name = $a_name AND b.title = $b_title
# CREATE (a)-[r:ACTED_IN {roles: $r_roles}]->(b)
# RETURN r

src_match / tgt_match are property predicates used to locate the endpoints — the keys become WHERE filters bound through $src_* / $tgt_* params.

GraphSchema

GraphSchema is the bridge between Pydantic models and the Rust validator. It collects node/relationship models and offers conversions:

from cypher_validator import GraphSchema

# Option 1: explicit list
schema = GraphSchema.from_models([Person, Movie, ActedIn])

# Option 2: discover everything declared so far
schema = GraphSchema.from_registry()

# Option 3: introspect a running Neo4j and synthesise models
from cypher_validator import Neo4jDatabase
db = Neo4jDatabase("bolt://localhost:7687", "neo4j", "password")
schema = GraphSchema.from_neo4j_db(db, sample_limit=1000)

Methods

Method Returns Use case
to_dict() dict Matches Schema.from_dict() shape.
to_json() str Pretty-printed JSON.
to_cypher_schema() cypher_validator.Schema Feed into CypherValidator.
to_prompt() str LLM-readable prose schema block.
to_markdown() str Markdown table format.
merge(other) GraphSchema Union of two schemas (no duplicate registration).
get_constraints() list[str] All __constraints__ from every model.
get_indexes() list[str] All __indexes__ from every node model.
from_dict(d) GraphSchema Reverse of to_dict(). Dynamically creates models.
from_neo4j_db(db, sample_limit=1000) GraphSchema Introspect from a live DB.
schema.to_dict()
# {
#   "nodes": {"Person": ["name", "age"], "Movie": ["title", "year"]},
#   "relationships": {"ACTED_IN": ("Person", "Movie", ["roles", "year"])},
# }

Dynamic model factories

When you don't know the schema at type-checking time (e.g. an agent connecting to an unfamiliar database), node() and relationship() build models on the fly:

from cypher_validator.models import node, relationship

Tag = node("Tag", name=(str, ...), count=(int, 0))
# Required `name`, optional `count` with default 0.

TaggedWith = relationship("TAGGED_WITH", Person, Tag, weight=(float, 1.0))

Field definitions follow Pydantic's (type, default) tuple convention. A bare type means required with no default.

GraphSchema.from_dict() uses these factories internally:

schema = GraphSchema.from_dict({
    "nodes": {"Person": ["name", "age"]},
    "relationships": {"KNOWS": ["Person", "Person", ["since"]]},
})
# Person and Knows classes are now in the registry.