DDL & migrations¶
The ORM ships two helpers for keeping your live Neo4j schema in sync with your
declared Pydantic models: SchemaDDL generates CREATE CONSTRAINT / CREATE
INDEX statements from a GraphSchema, and SchemaDiff compares two schemas
and emits the DDL needed to migrate between them.
SchemaDDL¶
Construct from a GraphSchema. All methods read from the underlying
schema.node_models / schema.rel_models — they never touch the database.
from cypher_validator import GraphSchema, SchemaDDL, NodeModel, RelationshipModel
class Person(NodeModel):
__label__ = "Person"
name: str
age: int = 0
email: str | None = None
class Movie(NodeModel):
__label__ = "Movie"
title: str
year: int
class ActedIn(RelationshipModel):
__source__ = Person
__target__ = Movie
__rel_type__ = "ACTED_IN"
roles: list[str] = []
schema = GraphSchema.from_models([Person, Movie, ActedIn])
ddl = SchemaDDL(schema)
uniqueness_constraints() → list[str]¶
One CREATE CONSTRAINT … IS UNIQUE per required node property. The
constraint name follows uniq_<label_lower>_<prop>:
ddl.uniqueness_constraints()
# [
# "CREATE CONSTRAINT uniq_person_name IF NOT EXISTS FOR (n:Person) REQUIRE n.name IS UNIQUE",
# "CREATE CONSTRAINT uniq_movie_title IF NOT EXISTS FOR (n:Movie) REQUIRE n.title IS UNIQUE",
# "CREATE CONSTRAINT uniq_movie_year IF NOT EXISTS FOR (n:Movie) REQUIRE n.year IS UNIQUE",
# ]
Required = no default
A property counts as "required" when it has no Pydantic default. age: int = 0
is treated as optional and is not uniqueness-constrained — even though
Python's type-checker calls age required.
existence_constraints() → list[str]¶
IS NOT NULL constraints for required node properties and required
relationship properties. Neo4j Enterprise only — community edition rejects
these.
ddl.existence_constraints()
# [
# "CREATE CONSTRAINT exists_person_name IF NOT EXISTS FOR (n:Person) REQUIRE n.name IS NOT NULL",
# ...
# "CREATE CONSTRAINT exists_acted_in_<prop> IF NOT EXISTS FOR ()-[r:ACTED_IN]-() REQUIRE r.<prop> IS NOT NULL",
# ]
property_indexes() → list[str]¶
A CREATE INDEX … ON (n.<prop>) for every declared node property,
regardless of required-ness. Name pattern: idx_<label_lower>_<prop>.
ddl.property_indexes()
# [
# "CREATE INDEX idx_person_name IF NOT EXISTS FOR (n:Person) ON (n.name)",
# "CREATE INDEX idx_person_age IF NOT EXISTS FOR (n:Person) ON (n.age)",
# "CREATE INDEX idx_person_email IF NOT EXISTS FOR (n:Person) ON (n.email)",
# ...
# ]
composite_indexes(model, props) → str¶
One composite index across multiple props of a single label:
ddl.composite_indexes(Person, ["name", "age"])
# "CREATE INDEX idx_person_name_age IF NOT EXISTS FOR (n:Person) ON (n.name, n.age)"
Composite indexes aren't generated by generate_all() — emit them yourself
when you know an access pattern uses both keys.
fulltext_index(models, props, index_name) → str¶
Span a fulltext index across many labels and properties:
ddl.fulltext_index([Person, Movie], ["name", "title"], "search_all")
# 'CREATE FULLTEXT INDEX search_all IF NOT EXISTS FOR (n:Person|Movie) ON EACH [n.name, n.title]'
vector_indexes() → list[str]¶
Generate CREATE VECTOR INDEX for every property declared in __vector_indexes__ on
any node model. Requires Neo4j 5.11+.
from cypher_validator import NodeModel, VectorProperty, GraphSchema, SchemaDDL
class Document(NodeModel):
__label__ = "Document"
__vector_indexes__ = {
"embedding": VectorProperty(dimensions=1536, similarity="cosine"),
}
title: str
embedding: list[float] = []
schema = GraphSchema.from_models([Document])
ddl = SchemaDDL(schema)
ddl.vector_indexes()
# [
# "CREATE VECTOR INDEX idx_document_embedding_vector IF NOT EXISTS "
# "FOR (n:Document) ON (n.embedding) "
# "OPTIONS {indexConfig: {`vector.dimensions`: 1536, `vector.similarity_function`: 'cosine'}}"
# ]
Index name pattern: idx_<label_lower>_<prop>_vector.
generate_all() includes vector indexes automatically. drop_all() drops them too.
custom_constraints() / custom_indexes()¶
Return whatever DDL strings you declared via the model's __constraints__ and
__indexes__ lists — passed through verbatim. Use these for assertions the
generator can't express (range constraints, point indexes, etc.):
class Sensor(NodeModel):
__label__ = "Sensor"
__constraints__ = [
"CREATE CONSTRAINT sensor_id_range IF NOT EXISTS FOR (s:Sensor) "
"REQUIRE s.id > 0",
]
__indexes__ = [
"CREATE POINT INDEX sensor_location IF NOT EXISTS FOR (s:Sensor) ON (s.location)",
]
id: int
location: Any
generate_all(include_existence=False) → list[str]¶
Combine uniqueness + indexes + custom in a single list — the one-shot "apply everything" call:
Order: uniqueness_constraints → (optional) existence_constraints →
property_indexes → vector_indexes → custom_constraints → custom_indexes.
drop_all() → list[str]¶
DROP statements for every UNIQUE / property INDEX this generator would create. Use for tear-down between integration tests, never in production without review:
drop_all is not a perfect inverse of generate_all
drop_all drops uniqueness constraints, property indexes, and vector
indexes automatically derived from models. It does not drop existence
constraints, composite indexes, fulltext indexes, or custom DDL — those
don't follow the canonical name pattern.
SchemaDiff¶
Compare two GraphSchema instances and surface what changed. The diff is
computed in __init__ (the _compute private method), so all the fields
listed below are populated by the time the constructor returns.
v1 = GraphSchema.from_models([Person])
v2 = GraphSchema.from_models([Person, Movie, ActedIn])
diff = SchemaDiff(v1, v2)
Fields¶
| Field | Type | Meaning |
|---|---|---|
added_labels |
set[str] |
Node labels in new but not in old. |
removed_labels |
set[str] |
Node labels in old but not in new. |
unchanged_labels |
set[str] |
Labels present in both. |
added_rel_types |
set[str] |
Relationship types only in new. |
removed_rel_types |
set[str] |
Relationship types only in old. |
unchanged_rel_types |
set[str] |
Relationship types in both. |
added_properties |
dict[str, list[str]] |
{label: [props]} added to existing labels. |
removed_properties |
dict[str, list[str]] |
{label: [props]} removed from existing labels. |
Methods¶
| Method | Returns | Notes |
|---|---|---|
has_changes (property) |
bool |
True if any of the above are non-empty. |
summary() |
str |
Human-readable Markdown diff. |
migration_ddl() |
list[str] |
Cypher DDL to move from old → new. |
to_dict() |
dict |
JSON-serialisable diff record. |
migration_ddl() emits:
CREATE CONSTRAINT+CREATE INDEXfor every required prop of every new label.CREATE INDEXfor every property added to an existing label.DROP CONSTRAINT+DROP INDEXfor every required prop / property of a removed label.DROP INDEXfor every property removed from an existing label.
It does not drop or alter node data — only the index/constraint metadata. Run a separate cleanup query if you actually want to remove the nodes.
Typical migration workflow¶
# v1.py — what's in production today
class Person(NodeModel):
__label__ = "Person"
name: str
# v2.py — what you want next
class Person(NodeModel):
__label__ = "Person"
name: str
email: str | None = None # new property
class Movie(NodeModel): # new label
__label__ = "Movie"
title: str
old = GraphSchema.from_models([v1.Person])
new = GraphSchema.from_models([v2.Person, v2.Movie])
diff = SchemaDiff(old, new)
print(diff.summary())
# ## Schema Diff
#
# ### Added Node Labels: Movie
#
# ### Added Properties:
# - Person: email
if diff.has_changes:
for stmt in diff.migration_ddl():
db.execute(stmt)
Dry-run before applying
migration_ddl() is just a list of strings — diff your previous and
upcoming version in CI, render summary(), and require a human to
approve before invoking db.execute. The to_dict() method gives you
a stable JSON shape suitable for storing as a migration log.
Apply DDL through a session¶
GraphSession provides a small convenience for the common "apply all DDL"
operation. It returns the list of statements it ran so you can log them:
from cypher_validator import GraphSession
with GraphSession(db, schema) as session:
applied = session.apply_ddl(include_existence=False)
for stmt in applied:
print(f"OK: {stmt}")
See GraphSession.apply_ddl for the underlying flow —
it wraps SchemaDDL.generate_all then loops db.execute.