Skip to content

DDL & migrations

The ORM ships two helpers for keeping your live Neo4j schema in sync with your declared Pydantic models: SchemaDDL generates CREATE CONSTRAINT / CREATE INDEX statements from a GraphSchema, and SchemaDiff compares two schemas and emits the DDL needed to migrate between them.

SchemaDDL

SchemaDDL(schema: GraphSchema)

Construct from a GraphSchema. All methods read from the underlying schema.node_models / schema.rel_models — they never touch the database.

from cypher_validator import GraphSchema, SchemaDDL, NodeModel, RelationshipModel

class Person(NodeModel):
    __label__ = "Person"
    name: str
    age: int = 0
    email: str | None = None

class Movie(NodeModel):
    __label__ = "Movie"
    title: str
    year: int

class ActedIn(RelationshipModel):
    __source__ = Person
    __target__ = Movie
    __rel_type__ = "ACTED_IN"
    roles: list[str] = []

schema = GraphSchema.from_models([Person, Movie, ActedIn])
ddl = SchemaDDL(schema)

uniqueness_constraints() → list[str]

One CREATE CONSTRAINT … IS UNIQUE per required node property. The constraint name follows uniq_<label_lower>_<prop>:

ddl.uniqueness_constraints()
# [
#   "CREATE CONSTRAINT uniq_person_name IF NOT EXISTS FOR (n:Person) REQUIRE n.name IS UNIQUE",
#   "CREATE CONSTRAINT uniq_movie_title IF NOT EXISTS FOR (n:Movie) REQUIRE n.title IS UNIQUE",
#   "CREATE CONSTRAINT uniq_movie_year IF NOT EXISTS FOR (n:Movie) REQUIRE n.year IS UNIQUE",
# ]

Required = no default

A property counts as "required" when it has no Pydantic default. age: int = 0 is treated as optional and is not uniqueness-constrained — even though Python's type-checker calls age required.

existence_constraints() → list[str]

IS NOT NULL constraints for required node properties and required relationship properties. Neo4j Enterprise only — community edition rejects these.

ddl.existence_constraints()
# [
#   "CREATE CONSTRAINT exists_person_name IF NOT EXISTS FOR (n:Person) REQUIRE n.name IS NOT NULL",
#   ...
#   "CREATE CONSTRAINT exists_acted_in_<prop> IF NOT EXISTS FOR ()-[r:ACTED_IN]-() REQUIRE r.<prop> IS NOT NULL",
# ]

property_indexes() → list[str]

A CREATE INDEX … ON (n.<prop>) for every declared node property, regardless of required-ness. Name pattern: idx_<label_lower>_<prop>.

ddl.property_indexes()
# [
#   "CREATE INDEX idx_person_name IF NOT EXISTS FOR (n:Person) ON (n.name)",
#   "CREATE INDEX idx_person_age IF NOT EXISTS FOR (n:Person) ON (n.age)",
#   "CREATE INDEX idx_person_email IF NOT EXISTS FOR (n:Person) ON (n.email)",
#   ...
# ]

composite_indexes(model, props) → str

One composite index across multiple props of a single label:

ddl.composite_indexes(Person, ["name", "age"])
# "CREATE INDEX idx_person_name_age IF NOT EXISTS FOR (n:Person) ON (n.name, n.age)"

Composite indexes aren't generated by generate_all() — emit them yourself when you know an access pattern uses both keys.

fulltext_index(models, props, index_name) → str

Span a fulltext index across many labels and properties:

ddl.fulltext_index([Person, Movie], ["name", "title"], "search_all")
# 'CREATE FULLTEXT INDEX search_all IF NOT EXISTS FOR (n:Person|Movie) ON EACH [n.name, n.title]'

vector_indexes() → list[str]

Generate CREATE VECTOR INDEX for every property declared in __vector_indexes__ on any node model. Requires Neo4j 5.11+.

from cypher_validator import NodeModel, VectorProperty, GraphSchema, SchemaDDL

class Document(NodeModel):
    __label__ = "Document"
    __vector_indexes__ = {
        "embedding": VectorProperty(dimensions=1536, similarity="cosine"),
    }
    title: str
    embedding: list[float] = []

schema = GraphSchema.from_models([Document])
ddl = SchemaDDL(schema)

ddl.vector_indexes()
# [
#   "CREATE VECTOR INDEX idx_document_embedding_vector IF NOT EXISTS "
#   "FOR (n:Document) ON (n.embedding) "
#   "OPTIONS {indexConfig: {`vector.dimensions`: 1536, `vector.similarity_function`: 'cosine'}}"
# ]

Index name pattern: idx_<label_lower>_<prop>_vector.

generate_all() includes vector indexes automatically. drop_all() drops them too.

custom_constraints() / custom_indexes()

Return whatever DDL strings you declared via the model's __constraints__ and __indexes__ lists — passed through verbatim. Use these for assertions the generator can't express (range constraints, point indexes, etc.):

class Sensor(NodeModel):
    __label__ = "Sensor"
    __constraints__ = [
        "CREATE CONSTRAINT sensor_id_range IF NOT EXISTS FOR (s:Sensor) "
        "REQUIRE s.id > 0",
    ]
    __indexes__ = [
        "CREATE POINT INDEX sensor_location IF NOT EXISTS FOR (s:Sensor) ON (s.location)",
    ]
    id: int
    location: Any

generate_all(include_existence=False) → list[str]

Combine uniqueness + indexes + custom in a single list — the one-shot "apply everything" call:

for stmt in ddl.generate_all(include_existence=True):
    db.execute(stmt)

Order: uniqueness_constraints → (optional) existence_constraintsproperty_indexesvector_indexescustom_constraintscustom_indexes.

drop_all() → list[str]

DROP statements for every UNIQUE / property INDEX this generator would create. Use for tear-down between integration tests, never in production without review:

for stmt in ddl.drop_all():
    db.execute(stmt)

drop_all is not a perfect inverse of generate_all

drop_all drops uniqueness constraints, property indexes, and vector indexes automatically derived from models. It does not drop existence constraints, composite indexes, fulltext indexes, or custom DDL — those don't follow the canonical name pattern.

SchemaDiff

SchemaDiff(old: GraphSchema, new: GraphSchema)

Compare two GraphSchema instances and surface what changed. The diff is computed in __init__ (the _compute private method), so all the fields listed below are populated by the time the constructor returns.

v1 = GraphSchema.from_models([Person])
v2 = GraphSchema.from_models([Person, Movie, ActedIn])
diff = SchemaDiff(v1, v2)

Fields

Field Type Meaning
added_labels set[str] Node labels in new but not in old.
removed_labels set[str] Node labels in old but not in new.
unchanged_labels set[str] Labels present in both.
added_rel_types set[str] Relationship types only in new.
removed_rel_types set[str] Relationship types only in old.
unchanged_rel_types set[str] Relationship types in both.
added_properties dict[str, list[str]] {label: [props]} added to existing labels.
removed_properties dict[str, list[str]] {label: [props]} removed from existing labels.

Methods

Method Returns Notes
has_changes (property) bool True if any of the above are non-empty.
summary() str Human-readable Markdown diff.
migration_ddl() list[str] Cypher DDL to move from oldnew.
to_dict() dict JSON-serialisable diff record.

migration_ddl() emits:

  • CREATE CONSTRAINT + CREATE INDEX for every required prop of every new label.
  • CREATE INDEX for every property added to an existing label.
  • DROP CONSTRAINT + DROP INDEX for every required prop / property of a removed label.
  • DROP INDEX for every property removed from an existing label.

It does not drop or alter node data — only the index/constraint metadata. Run a separate cleanup query if you actually want to remove the nodes.

Typical migration workflow

# v1.py — what's in production today
class Person(NodeModel):
    __label__ = "Person"
    name: str

# v2.py — what you want next
class Person(NodeModel):
    __label__ = "Person"
    name: str
    email: str | None = None     # new property

class Movie(NodeModel):           # new label
    __label__ = "Movie"
    title: str

old = GraphSchema.from_models([v1.Person])
new = GraphSchema.from_models([v2.Person, v2.Movie])

diff = SchemaDiff(old, new)
print(diff.summary())
# ## Schema Diff
#
# ### Added Node Labels: Movie
#
# ### Added Properties:
#   - Person: email

if diff.has_changes:
    for stmt in diff.migration_ddl():
        db.execute(stmt)

Dry-run before applying

migration_ddl() is just a list of strings — diff your previous and upcoming version in CI, render summary(), and require a human to approve before invoking db.execute. The to_dict() method gives you a stable JSON shape suitable for storing as a migration log.

Apply DDL through a session

GraphSession provides a small convenience for the common "apply all DDL" operation. It returns the list of statements it ran so you can log them:

from cypher_validator import GraphSession

with GraphSession(db, schema) as session:
    applied = session.apply_ddl(include_existence=False)
    for stmt in applied:
        print(f"OK: {stmt}")

See GraphSession.apply_ddl for the underlying flow — it wraps SchemaDDL.generate_all then loops db.execute.

  • Models__constraints__ and __indexes__ class attributes feed custom_constraints / custom_indexes.
  • Sessions — execute DDL against a live database.
  • Caveats — schema declaration gotchas.