Tool specs & helpers¶
Standalone helpers from python/cypher_validator/llm_utils.py for working
with LLM output and constructing tool specs. These functions are reused by
LLMNLToCypher, GraphRAGPipeline, and the
AgentTools family, but you can use them directly
when building your own pipeline.
cypher_tool_spec¶
cypher_tool_spec(
schema: Schema | None = None,
db_description: str = "",
format: str = "anthropic",
) -> dict[str, Any]
Builds a tool specification for the LLM. The shape switches on format:
{
"name": "execute_cypher",
"description": "Execute a Cypher query against the Neo4j graph database...",
"input_schema": {
"type": "object",
"properties": {
"cypher": {"type": "string", "description": "..."},
"parameters": {"type": "object", "additionalProperties": True},
},
"required": ["cypher"],
},
}
{
"type": "function",
"function": {
"name": "execute_cypher",
"description": "Execute a Cypher query against the Neo4j graph database...",
"parameters": {
"type": "object",
"properties": {
"cypher": {"type": "string"},
"parameters": {"type": "object", "additionalProperties": True},
},
"required": ["cypher"],
},
},
}
When schema is supplied, the inline Cypher-pattern representation
(schema.to_cypher_context()) is appended to the description so the LLM
knows which labels and types are available. db_description is a short
free-text hint (e.g. "knowledge graph of scientific papers") that gets
slotted into the description in parentheses.
from cypher_validator import cypher_tool_spec, Schema
schema = Schema(
nodes={"Person": ["name", "age"]},
relationships={"WORKS_FOR": ("Person", "Company", [])},
)
tool = cypher_tool_spec(schema, db_description="employees graph", format="openai")
Anthropic is the default
Unlike AgentTools.query_tool_spec(format="openai"), this standalone
helper defaults to Anthropic. Pass format="openai" explicitly when
you want the function-calling shape.
extract_cypher_from_text¶
Lifts a Cypher query out of arbitrary LLM output. It walks five fallback tiers in order:
| Tier | Pattern | Notes |
|---|---|---|
| 1 | `cypher\n… / ```sql / ```sparql |
Strictest — tagged fence wins immediately. |
| 2 | ```<any>\n…``` |
Untagged fence, but only if the body contains a Cypher keyword. |
| 3 | `…` (inline backtick) |
Only if the span looks like Cypher. |
| 4 | Line-anchored MATCH / CREATE / MERGE / WITH / CALL / UNWIND / OPTIONAL |
Collects consecutive non-blank lines starting at the first match. |
| 5 | Fallback | Returns text.strip(). |
The Cypher-keyword check uses a frozenset lookup against the upper-cased
text — fast even on multi-kilobyte responses.
from cypher_validator import extract_cypher_from_text
output = """
Sure! Here's the query:
```cypher
MATCH (p:Person {name: $name})-[:WORKS_FOR]->(c:Company)
RETURN p, c
'MATCH (p:Person {name: $name})-[:WORKS_FOR]->(c:Company)\nRETURN p, c'¶
The regex patterns (`_RE_FENCED_TAGGED`, `_RE_FENCED_ANY`, `_RE_BACKTICK`,
`_RE_CYPHER_LINE`) are compiled at module load — extracting Cypher in a hot
ingestion loop is essentially free.
## `repair_cypher`
```python
repair_cypher(
validator: CypherValidator,
query: str,
llm_fn: Callable[[str, list[str]], str],
max_retries: int = 3,
) -> tuple[str, ValidationResult]
The standalone repair loop. The signature of llm_fn here is different
from LLMNLToCypher's callable — it receives both the current query and
the list of error strings, and must return a corrected query string.
The loop:
- Validate the current query.
- If valid → return.
- If
result.fixed_query is not None, apply that auto-fix first and re-validate. The auto-fix is purely local (typo correction, dropping unknown variables, etc.) — it doesn't burn an LLM call. - Otherwise call
llm_fn(query, errors)for a manual fix. - Loop up to
max_retriestimes.
from cypher_validator import CypherValidator, repair_cypher, Schema
schema = Schema(nodes={"Person": ["name"]}, relationships={})
validator = CypherValidator(schema)
def fix_with_llm(query: str, errors: list[str]) -> str:
error_block = "\n".join(f" - {e}" for e in errors)
return my_llm(f"Fix this Cypher:\n{query}\n\nErrors:\n{error_block}")
cypher, result = repair_cypher(validator, "MATCH (p:Persn) RETURN p", fix_with_llm)
if result.is_valid:
db.execute(cypher)
Auto-fix may converge without ever calling the LLM
result.fixed_query is computed by the Rust validator using a
Levenshtein-capped "did you mean?" lookup against the schema. For
simple label/property typos the loop converges on attempt #1 with zero
LLM cost.
format_records¶
Render Neo4j result records as an LLM-context-friendly string.
| Format | Output |
|---|---|
"markdown" (default) |
Aligned pipe-separated table. |
"csv" |
RFC 4180-ish CSV via csv.DictWriter. |
"json" |
Pretty-printed JSON via json.dumps(records, indent=2, default=str). |
"text" |
One record per block: Record N:\n key: value. |
Returns "" for an empty list. Unknown formats raise ValueError:
format_records(rows, format="yaml")
# ValueError: Unknown format 'yaml'. Use 'markdown', 'csv', 'json', or 'text'.
records = [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]
format_records(records)
# | name | age |
# |-------|-----|
# | Alice | 30 |
# | Bob | 25 |
format_records(records, format="csv")
# name,age
# Alice,30
# Bob,25
The markdown table aligns columns to the longest cell — fine for tens of
rows, but if you've got thousands consider format="json" or slicing
records[:50] before formatting (the LLM context window is the bottleneck,
not the function).
few_shot_examples¶
few_shot_examples(
generator: CypherGenerator,
n: int = 5,
query_type: str | None = None,
) -> list[tuple[str, str]]
Generates (description, cypher) pairs from a CypherGenerator. The
descriptions are derived from the query type plus the labels and rel types
the parser detects — so they're guaranteed consistent with the actual Cypher
output:
from cypher_validator import CypherGenerator, Schema, few_shot_examples
schema = Schema(
nodes={"Person": ["name", "age"], "Company": ["name"]},
relationships={"WORKS_FOR": ("Person", "Company", [])},
)
gen = CypherGenerator(schema, seed=0)
for desc, cypher in few_shot_examples(gen, n=4):
print(f"Q: {desc}")
print(f"A: {cypher}\n")
When query_type=None (default), examples are spread across all
generator.supported_types() cyclically. When set to a specific type, n
examples of that type are generated. Unknown types raise ValueError with
the supported list:
few_shot_examples(gen, n=2, query_type="bogus")
# ValueError: Unknown query_type 'bogus'. Supported: ['match_return', ...]
Stable few-shot prompts
Pair CypherGenerator(schema, seed=N) with a fixed N and the same
schema and you'll get identical few-shot prompts on every run — useful
when you want to A/B test prompt variants without LLM noise from the
examples drifting.
Putting it together¶
A minimal "ask the LLM, validate, retry, format" loop using only the helpers from this module:
from cypher_validator import (
Schema, CypherValidator, Neo4jDatabase,
cypher_tool_spec, extract_cypher_from_text,
repair_cypher, format_records,
)
schema = Schema(...)
validator = CypherValidator(schema)
db = Neo4jDatabase(...)
# 1. Get cypher from the LLM
raw = my_llm(question)
cypher = extract_cypher_from_text(raw)
# 2. Validate + auto-repair
def fix_with_llm(q, errs):
return extract_cypher_from_text(my_llm(f"fix: {q}\nerrors: {errs}"))
cypher, result = repair_cypher(validator, cypher, fix_with_llm, max_retries=3)
# 3. Execute + format
if result.is_valid:
rows = db.execute(cypher)
print(format_records(rows))
That's effectively what GraphRAGPipeline.query_with_context
does, with an answer-synthesis step bolted on top.
Related¶
- LLM pipeline — uses every helper on this page.
- Graph RAG — answer synthesis from query results.
- Agent tools — schema-aware versions of
cypher_tool_specplus dispatch. - Validator — the
ValidationResult.fixed_querycontract thatrepair_cypherleans on.