Neo4j Knowledge Graphs for AI Applications
Neo4j knowledge graphs AI integration has become a critical pattern for building intelligent applications that understand complex relationships in data. While relational databases store data in tables and rows, graph databases like Neo4j store data as nodes and relationships — making it natural to model and query interconnected information like social networks, supply chains, fraud patterns, and organizational knowledge.
This guide covers building production knowledge graphs with Neo4j, from data modeling and Cypher query optimization to integrating knowledge graphs with large language models for retrieval-augmented generation (GraphRAG). Moreover, you will learn patterns for recommendation engines, fraud detection, and impact analysis that leverage the unique strengths of graph databases.
Why Graph Databases for AI
Traditional RAG pipelines retrieve text chunks based on semantic similarity — they find documents that are about similar topics. Knowledge graphs add a structured relationship layer that captures how entities connect. When you ask “What products did customers who returned item X also buy?”, a graph database traverses relationships directly rather than searching through text. Furthermore, knowledge graphs provide explainable AI — you can trace exactly which relationships led to a recommendation.
The combination of knowledge graphs with LLMs creates GraphRAG — where the LLM uses graph-structured context instead of (or in addition to) vector-retrieved text chunks. This produces more accurate, contextual, and relationship-aware responses.
Neo4j Knowledge Graphs: Data Modeling
// Create a knowledge graph for a technology company
// Nodes represent entities, relationships capture connections
// Create Technology nodes
CREATE (java:Technology {name: 'Java', category: 'Language', version: '21'})
CREATE (spring:Technology {name: 'Spring Boot', category: 'Framework', version: '3.3'})
CREATE (postgres:Technology {name: 'PostgreSQL', category: 'Database', version: '16'})
CREATE (kafka:Technology {name: 'Apache Kafka', category: 'Messaging', version: '3.7'})
CREATE (k8s:Technology {name: 'Kubernetes', category: 'Orchestration', version: '1.29'})
// Create Team nodes
CREATE (platform:Team {name: 'Platform', headcount: 8})
CREATE (payments:Team {name: 'Payments', headcount: 6})
CREATE (search:Team {name: 'Search', headcount: 5})
// Create Service nodes
CREATE (orderSvc:Service {name: 'order-service', tier: 'critical', repo: 'github.com/myorg/order-service'})
CREATE (paySvc:Service {name: 'payment-service', tier: 'critical', repo: 'github.com/myorg/payment-service'})
CREATE (searchSvc:Service {name: 'search-service', tier: 'important', repo: 'github.com/myorg/search-service'})
CREATE (notifSvc:Service {name: 'notification-service', tier: 'standard', repo: 'github.com/myorg/notification-service'})
// Create relationships
CREATE (orderSvc)-[:USES]->(java)
CREATE (orderSvc)-[:USES]->(spring)
CREATE (orderSvc)-[:USES]->(postgres)
CREATE (orderSvc)-[:PUBLISHES_TO]->(kafka)
CREATE (orderSvc)-[:RUNS_ON]->(k8s)
CREATE (orderSvc)-[:OWNED_BY]->(payments)
CREATE (paySvc)-[:USES]->(java)
CREATE (paySvc)-[:USES]->(spring)
CREATE (paySvc)-[:SUBSCRIBES_TO {topic: 'orders'}]->(kafka)
CREATE (paySvc)-[:OWNED_BY]->(payments)
CREATE (orderSvc)-[:DEPENDS_ON {type: 'sync'}]->(paySvc)
CREATE (orderSvc)-[:DEPENDS_ON {type: 'async'}]->(notifSvc)
CREATE (searchSvc)-[:DEPENDS_ON {type: 'sync'}]->(orderSvc)Querying the Knowledge Graph
// Find all dependencies of a service (direct and transitive)
MATCH path = (s:Service {name: 'order-service'})-[:DEPENDS_ON*1..5]->(dep:Service)
RETURN s.name AS service,
[n IN nodes(path) | n.name] AS dependency_chain,
length(path) AS depth
ORDER BY depth;
// Impact analysis: What is affected if Kafka goes down?
MATCH (kafka:Technology {name: 'Apache Kafka'})
<-[:PUBLISHES_TO|SUBSCRIBES_TO]-(svc:Service)
-[:OWNED_BY]->(team:Team)
RETURN svc.name AS affected_service,
svc.tier AS service_tier,
team.name AS owning_team
ORDER BY CASE svc.tier
WHEN 'critical' THEN 1
WHEN 'important' THEN 2
ELSE 3
END;
// Find teams that share technology dependencies
MATCH (t1:Team)<-[:OWNED_BY]-(s1:Service)-[:USES]->(tech:Technology)
<-[:USES]-(s2:Service)-[:OWNED_BY]->(t2:Team)
WHERE t1 <> t2
RETURN t1.name AS team1, t2.name AS team2,
collect(DISTINCT tech.name) AS shared_technologies,
count(DISTINCT tech) AS shared_count
ORDER BY shared_count DESC;GraphRAG: Integrating Neo4j with LLMs
Therefore, GraphRAG combines knowledge graph traversal with LLM reasoning. Instead of retrieving flat text chunks, you retrieve structured subgraphs that preserve relationship context.
from neo4j import GraphDatabase
from openai import OpenAI
class GraphRAGEngine:
"""Knowledge graph-powered RAG for contextual AI responses."""
def __init__(self, neo4j_uri: str, neo4j_auth: tuple, openai_key: str):
self.driver = GraphDatabase.driver(neo4j_uri, auth=neo4j_auth)
self.llm = OpenAI(api_key=openai_key)
def retrieve_context(self, query: str, max_hops: int = 2) -> str:
"""Extract relevant subgraph based on entities in the query."""
# Step 1: Extract entities from query using LLM
entities = self._extract_entities(query)
# Step 2: Retrieve subgraph around those entities
with self.driver.session() as session:
subgraph = session.execute_read(
self._get_subgraph, entities, max_hops
)
# Step 3: Format subgraph as context
return self._format_context(subgraph)
def answer(self, query: str) -> str:
"""Answer a question using knowledge graph context."""
context = self.retrieve_context(query)
response = self.llm.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": f"""You are a helpful assistant with access to
a knowledge graph. Use the following graph context to
answer questions accurately. If the graph doesn't contain
enough information, say so.
Knowledge Graph Context:
{context}"""
},
{"role": "user", "content": query}
],
temperature=0.3
)
return response.choices[0].message.content
@staticmethod
def _get_subgraph(tx, entities: list[str], max_hops: int):
result = tx.run("""
UNWIND $entities AS entityName
MATCH (n)
WHERE n.name = entityName
CALL apoc.path.subgraphAll(n, {
maxLevel: $maxHops,
limit: 50
})
YIELD nodes, relationships
RETURN nodes, relationships
""", entities=entities, maxHops=max_hops)
return [record.data() for record in result]
def _extract_entities(self, query: str) -> list[str]:
response = self.llm.chat.completions.create(
model="gpt-4o-mini",
messages=[{
"role": "user",
"content": f"Extract entity names from: '{query}'. Return JSON array."
}],
response_format={"type": "json_object"}
)
import json
return json.loads(response.choices[0].message.content).get("entities", [])Recommendation Engine with Neo4j
Consequently, graph databases excel at recommendation systems because recommendations are fundamentally about finding connections — “users who liked X also liked Y” is a graph traversal pattern.
// Collaborative filtering: Users who bought similar items
MATCH (u:User {id: $userId})-[:PURCHASED]->(item:Product)
<-[:PURCHASED]-(similar:User)-[:PURCHASED]->(rec:Product)
WHERE NOT (u)-[:PURCHASED]->(rec)
AND u <> similar
WITH rec, count(DISTINCT similar) AS score,
collect(DISTINCT similar.name)[..3] AS similar_users
ORDER BY score DESC
LIMIT 10
RETURN rec.name AS recommendation,
score AS confidence,
similar_users AS recommended_by;When NOT to Use Graph Databases
Graph databases are not general-purpose replacements for relational databases. If your data is tabular with well-defined schemas and your queries involve simple JOINs, PostgreSQL will outperform Neo4j in throughput and operational simplicity. Additionally, graph databases are not ideal for heavy analytical aggregations (sum, average, group by) — use a columnar database like ClickHouse for those workloads.
Neo4j’s licensing model (Enterprise features like clustering and role-based access require a commercial license) may be prohibitive for smaller teams. Open-source alternatives like Apache AGE (PostgreSQL extension) or NebulaGraph may be more appropriate. As a result, evaluate the total cost of ownership before committing to Neo4j for production workloads.
Key Takeaways
Neo4j knowledge graphs AI applications unlock relationship-aware intelligence that traditional databases and vector search cannot provide. GraphRAG combines structured graph context with LLM reasoning for more accurate and explainable AI responses. Furthermore, graph databases naturally model the interconnected data patterns found in recommendations, fraud detection, and impact analysis.
Start by modeling a subset of your domain as a knowledge graph and experiment with Cypher queries to discover hidden relationships. For comprehensive documentation, see the Neo4j documentation and the Neo4j Graph Data Science library. Our posts on vector databases for AI and RAG architecture patterns provide complementary data storage approaches for AI applications.