# Neo4j Data Modeling Optimization Prompt When creating or optimizing Neo4j data models and data insertion strategies, follow these consolidated best practices: ## Core Modeling Principles 1. **Start with use cases**: Define specific questions the application must answer before modeling 2. **Follow iterative approach**: Conceptualize → Query Design → Real Data Testing 3. **Ensure node uniqueness**: Every node must have a unique identifying property or property set 4. **Model for query efficiency**: Design based on most common/critical queries ## Naming Conventions - **Node Labels**: CamelCase starting with uppercase (e.g., `Person`, `Company`) ```cypher CREATE (:Person {name: "Alice"}) CREATE (:Company {name: "Neo4j"}) ``` - **Relationship Types**: ALL_CAPS_WITH_UNDERSCORES (e.g., `WORKS_AT`, `CONTAINS`) ```cypher MATCH (a:Person {name: "Alice"}), (b:Company {name: "Neo4j"}) CREATE (a)-[:WORKS_AT]->(b) ``` - **Properties**: camelCase starting with lowercase (e.g., `firstName`, `deptId`) ```cypher CREATE (:Person {firstName: "Alice", lastName: "Smith", deptId: 101}) ``` ## Node Design - Limit to 4 or fewer labels per node ```cypher // Bad: Too many labels CREATE (:Person:Employee:Developer:Manager {name: "Alice"}) // Better: Use properties instead CREATE (:Person {name: "Alice", role: "Developer", department: "Engineering"}) ``` - Use properties instead of excessive labels for attributes - Avoid semantically orthogonal labels - Eliminate duplicate data across nodes ```cypher // Bad: Duplicate company data CREATE (:Person {name: "Alice", company: "Neo4j"}) CREATE (:Person {name: "Bob", company: "Neo4j"}) // Better: Shared company node CREATE (c:Company {name: "Neo4j"}) CREATE (a:Person {name: "Alice"})-[:WORKS_AT]->(c) CREATE (b:Person {name: "Bob"})-[:WORKS_AT]->(c) ``` - Consider fanout carefully to avoid supernodes - Model list properties as separate connected nodes when needed ```cypher CREATE (p:Person {name: "Alice"}) CREATE (s1:Skill {name: "Cypher"}) CREATE (s2:Skill {name: "Graph Modeling"}) CREATE (p)-[:HAS_SKILL]->(s1) CREATE (p)-[:HAS_SKILL]->(s2) ``` ## Relationship Design - Use specific relationship types instead of generic ones ```cypher // Bad: Generic relationship CREATE (:Person {name: "Alice"})-[:RELATED_TO]->(:Person {name: "Bob"}) // Better: Specific relationship CREATE (:Person {name: "Alice"})-[:FRIENDS_WITH]->(:Person {name: "Bob"}) ``` - Avoid symmetric relationships ```cypher // Bad: Redundant symmetric relationships CREATE (:Person {name: "Alice"})-[:PARENT_OF]->(:Person {name: "Bob"}) CREATE (:Person {name: "Bob"})-[:CHILD_OF]->(:Person {name: "Alice"}) // Better: Single directional relationship CREATE (:Person {name: "Alice"})-[:PARENT_OF]->(:Person {name: "Bob"}) ``` - Choose relationship types vs. properties based on query patterns - Use intermediate nodes for hyperedges (3+ node relationships) ```cypher CREATE (a:Person {name: "Alice"}) CREATE (b:Person {name: "Bob"}) CREATE (c:Project {name: "GraphDB Project"}) CREATE (w:Work {role: "Contributor"}) CREATE (a)-[:WORKED_ON]->(w)-[:FOR_PROJECT]->(c) CREATE (b)-[:WORKED_ON]->(w) ``` - Direction matters even for seemingly mutual relationships ## Property Strategy - Properties serve two purposes: unique identification and answering queries - Simple/indexed properties for anchoring and traversal - Complex properties acceptable for output/decoration only - Follow data accessibility hierarchy: Anchor node data > Relationship types > Downstream data ## Data Insertion Patterns - Create unique constraints for business keys ```cypher CREATE CONSTRAINT ON (p:Person) ASSERT p.email IS UNIQUE; CREATE CONSTRAINT ON (e:Employee) ASSERT e.empId IS UNIQUE; ``` - Use `MERGE` for nodes with unique identifiers - Batch operations for large datasets - Clean and deduplicate data before loading - Add indexes for frequently queried properties - Convert relational foreign keys to relationships ```cypher // Relational: SELECT * FROM employees WHERE manager_id = 101; // Graph: MATCH (e:Employee)-[:REPORTS_TO]->(m:Employee {id: 101}) RETURN e; ``` ## Query Optimization - Anchor queries on indexed properties - Use specific relationship types in traversals - Minimize gather-and-inspect patterns - Consider data accessibility when designing traversal paths - Profile queries with `PROFILE` keyword ```cypher PROFILE MATCH (p:Person)-[:FOLLOWS]->(f) RETURN p, f; ``` - Consider aggregating frequently accessed data ```cypher MATCH (p:Post)-[:LIKED_BY]->(u:User) RETURN p, count(u) AS likes; ``` ## Common Structures - **Intermediate nodes**: For hyperedges, sharing context/data, organizing data - **Linked lists**: For sequences using `NEXT`/`PREVIOUS` relationships ```cypher CREATE (a)-[:NEXT]->(b)-[:NEXT]->(c); ``` - **Timeline trees**: For time-based anchoring and navigation - **Fanout patterns**: Balance between property duplication and query efficiency ## Validation Checklist - [ ] Can the model answer all defined business questions? - [ ] Are nodes uniquely identifiable? - [ ] Are relationship types specific and meaningful? - [ ] Is the model optimized for the most critical queries? - [ ] Have you tested with representative data? - [ ] Are naming conventions consistent? - [ ] Have you avoided symmetric relationships? ## Industry Reference Example For financial transaction and account modeling patterns, review the Neo4j transaction base model: - **Financial Data Model Reference**: https://neo4j.com/developer/industry-use-cases/_attachments/transactions-base-model.txt - Study how account hierarchies, transaction flows, and temporal relationships are modeled - Note patterns for handling high-volume transactional data and account relationships - Observe techniques for modeling financial entity relationships and transaction lineage Apply this framework iteratively, testing each iteration against real use cases and data.