POLE+O Data Model

The POLE+O data model is the default entity classification system in neo4j-agent-memory. It provides a structured, extensible framework for categorizing entities extracted from text.

What is POLE+O?

POLE+O stands for Person, Object, Location, Event + Organization. Originally developed for law enforcement and intelligence analysis, it has been adapted for general-purpose entity extraction in AI applications.

┌─────────────────────────────────────────────────────────────┐
│                      POLE+O Model                           │
├─────────────┬─────────────┬─────────────┬─────────────┬─────┤
│   PERSON    │   OBJECT    │  LOCATION   │    EVENT    │ ORG │
├─────────────┼─────────────┼─────────────┼─────────────┼─────┤
│ Individuals │ Physical/   │ Places &    │ Things that │ Com-│
│ & people    │ digital     │ geographic  │ happen or   │ pan-│
│ mentioned   │ items       │ areas       │ occurred    │ ies │
└─────────────┴─────────────┴─────────────┴─────────────┴─────┘

Entity Types & Subtypes

PERSON

Individuals mentioned by name, role, or description.

Subtype Description

INDIVIDUAL

A specific named person

ALIAS

An alternative name or identity

PERSONA

A role or character

SUSPECT

Person of interest (law enforcement context)

WITNESS

Someone who observed an event

VICTIM

Someone affected by an event

Examples:

"John Smith"          → PERSON
"CEO"                 → PERSON (role)
"@johndoe"           → PERSON:ALIAS
"Dr. Jane Wilson"    → PERSON:INDIVIDUAL

OBJECT

Physical or digital items, artifacts, or things.

Subtype Description

VEHICLE

Cars, trucks, boats, aircraft

PHONE

Phone numbers and devices

EMAIL

Email addresses

DOCUMENT

Papers, files, records

DEVICE

Electronic devices

WEAPON

Weapons and armaments

MONEY

Currency and financial instruments

DRUG

Controlled substances

SOFTWARE

Applications and programs

PRODUCT

Commercial products

Examples:

"Tesla Model 3"       → OBJECT:VEHICLE
"555-123-4567"        → OBJECT:PHONE
"john@example.com"    → OBJECT:EMAIL
"passport"            → OBJECT:DOCUMENT
"iPhone 15"           → OBJECT:DEVICE

LOCATION

Places, addresses, and geographic areas.

Subtype Description

ADDRESS

Street addresses

CITY

Cities and towns

REGION

States, provinces, regions

COUNTRY

Countries and nations

LANDMARK

Notable places and monuments

FACILITY

Buildings and structures

Examples:

"123 Main Street"     → LOCATION:ADDRESS
"San Francisco"       → LOCATION:CITY
"California"          → LOCATION:REGION
"United States"       → LOCATION:COUNTRY
"Eiffel Tower"        → LOCATION:LANDMARK
"JFK Airport"         → LOCATION:FACILITY

EVENT

Things that happened, meetings, transactions, or temporal occurrences.

Subtype Description

INCIDENT

Accidents, crimes, occurrences

MEETING

Scheduled gatherings

TRANSACTION

Financial or business exchanges

COMMUNICATION

Calls, messages, correspondence

DATE

Calendar dates

TIME

Times of day

Examples:

"car accident"        → EVENT:INCIDENT
"board meeting"       → EVENT:MEETING
"wire transfer"       → EVENT:TRANSACTION
"January 15, 2024"    → EVENT:DATE
"3:30 PM"             → EVENT:TIME

ORGANIZATION

Companies, institutions, groups, and collective entities.

Subtype Description

COMPANY

Businesses and corporations

NONPROFIT

Charitable organizations

GOVERNMENT

Government agencies

EDUCATIONAL

Schools and universities

GROUP

Informal groups and associations

Examples:

"Apple Inc."          → ORGANIZATION:COMPANY
"Red Cross"           → ORGANIZATION:NONPROFIT
"FBI"                 → ORGANIZATION:GOVERNMENT
"MIT"                 → ORGANIZATION:EDUCATIONAL
"Book Club"           → ORGANIZATION:GROUP

Neo4j Schema

Entities are stored as nodes in Neo4j with multiple labels for efficient querying. Each entity has the base :Entity label plus the type and subtype as additional labels.

(:Entity:Person:Individual {
    id: "uuid",
    name: "John Smith",
    type: "PERSON",                    // POLE+O type (stored as uppercase)
    subtype: "INDIVIDUAL",             // Optional subtype (stored as uppercase)
    canonical_name: "John Smith",      // Resolved name
    description: "CEO of Acme Corp",
    confidence: 0.92,
    embedding: [0.1, 0.2, ...],       // Vector for search
    created_at: datetime(),
    metadata: "{...}"                  // JSON metadata
})
Entity types and subtypes are stored as uppercase properties (e.g., type: "PERSON") but labels use PascalCase (e.g., :Person) following Neo4j naming conventions.

Label Structure

Each entity has:

  • :Entity - Base label (always present)

  • :<Type> - Entity type as PascalCase label (e.g., :Person, :Object, :Location, :Event, :Organization)

  • :<Subtype> - Subtype as PascalCase label when present (e.g., :Vehicle, :Address, :Company)

Both POLE+O types and custom types are added as PascalCase labels, as long as they are valid Neo4j label identifiers (start with a letter, contain only letters, numbers, and underscores).

This enables efficient queries like:

// Find all people
MATCH (p:Person) RETURN p

// Find all vehicles (regardless of whether they're Object type)
MATCH (v:Vehicle) RETURN v

// Find all entities (any type)
MATCH (e:Entity) RETURN e

// Find people who are individuals
MATCH (p:Person:Individual) RETURN p

// Combine with relationship traversal
MATCH (p:Person)-[:WORKS_AT]->(o:Organization:Company)
RETURN p.name, o.name

Custom Entity Types

Custom entity types outside the POLE+O model also become PascalCase labels:

# Custom types become PascalCase labels
await client.long_term.add_entity(
    name="Widget Pro",
    entity_type="PRODUCT",
    subtype="ELECTRONICS",
)

# Creates: (:Entity:Product:Electronics {name: "Widget Pro", ...})

Query by custom type:

MATCH (p:Product) RETURN p
MATCH (e:Electronics) RETURN e
For POLE+O types, subtypes are validated against the known subtypes for that type. For custom types, any valid Neo4j label identifier can be used as a subtype.

Relationships

Entities can have relationships to other entities:

// Person works at organization
(:Person)-[:WORKS_AT]->(:Organization)

// Person lives at location
(:Person)-[:LIVES_IN]->(:Location)

// Organization located at location
(:Organization)-[:LOCATED_AT]->(:Location)

// Person owns object
(:Person)-[:OWNS]->(:Object)

// Person participated in event
(:Person)-[:PARTICIPATED_IN]->(:Event)

Entities are linked to the messages that mention them:

(:Message)-[:MENTIONS {
    confidence: 0.85,
    start_pos: 10,
    end_pos: 20
}]->(:Entity)

Configuring POLE+O

Using Default POLE+O

from neo4j_agent_memory import MemoryClient, MemorySettings

# Default configuration uses POLE+O
settings = MemorySettings()

async with MemoryClient(settings) as memory:
    # Entities extracted using POLE+O types
    await memory.short_term.add_message(
        session_id="session-1",
        role="user",
        content="I work at Acme Corp in San Francisco"
    )
    # Extracts: "Acme Corp" (ORGANIZATION), "San Francisco" (LOCATION)

Selecting Specific POLE+O Types

from neo4j_agent_memory import MemorySettings, ExtractionConfig

# Only extract people and organizations
settings = MemorySettings(
    extraction=ExtractionConfig(
        entity_types=["PERSON", "ORGANIZATION"]
    )
)

Using Subtypes

Subtypes are automatically extracted by the LLM extractor and can be specified when manually adding entities:

async with MemoryClient(settings) as memory:
    # Add entity with subtype
    await memory.long_term.add_entity(
        name="Tesla Model 3",
        entity_type="OBJECT",
        subtype="VEHICLE",
        description="Electric sedan"
    )

    # Query by type and subtype
    vehicles = await memory.long_term.search_entities(
        query="cars",
        entity_type="OBJECT"
    )

Custom Schema Models

While POLE+O is the default, you can use alternative schema models:

SchemaModel Options

Model Description

POLEO

Default POLE+O model with all subtypes

LEGACY

Backward-compatible with older EntityType enum

CUSTOM

User-defined entity types

Using Custom Entity Types

from neo4j_agent_memory import MemorySettings, SchemaConfig, SchemaModel, ExtractionConfig

# E-commerce domain example
settings = MemorySettings(
    schema=SchemaConfig(
        model=SchemaModel.CUSTOM,
        entity_types=["CUSTOMER", "PRODUCT", "ORDER", "STORE", "REVIEW"],
        strict_types=True  # Reject unknown types
    ),
    extraction=ExtractionConfig(
        entity_types=["CUSTOMER", "PRODUCT", "ORDER", "STORE", "REVIEW"]
    )
)

Loading Schema from File

Create a JSON schema file:

{
    "name": "ecommerce_schema",
    "version": "1.0",
    "entity_types": [
        {
            "name": "CUSTOMER",
            "description": "A person who purchases products",
            "subtypes": ["PREMIUM", "REGULAR", "NEW"],
            "attributes": ["name", "email", "tier"]
        },
        {
            "name": "PRODUCT",
            "description": "An item available for purchase",
            "subtypes": ["ELECTRONICS", "CLOTHING", "FOOD"],
            "attributes": ["sku", "price", "category"]
        }
    ],
    "relationship_types": [
        {
            "name": "PURCHASED",
            "source": "CUSTOMER",
            "target": "PRODUCT"
        }
    ]
}

Load it in configuration:

from neo4j_agent_memory import MemorySettings, SchemaConfig, SchemaModel

settings = MemorySettings(
    schema=SchemaConfig(
        model=SchemaModel.CUSTOM,
        custom_schema_path="./ecommerce_schema.json"
    )
)

Working with Entities

Searching Entities

async with MemoryClient(settings) as memory:
    # Semantic search across all entities
    entities = await memory.long_term.search_entities(
        query="companies in technology",
        limit=10
    )

    # Filter by type
    people = await memory.long_term.search_entities(
        query="engineers",
        entity_type="PERSON",
        limit=10
    )

    # Get entity by name
    entity = await memory.long_term.get_entity(
        name="Acme Corp",
        entity_type="ORGANIZATION"
    )

Entity Resolution

Entity resolution merges duplicate entities and links aliases:

from neo4j_agent_memory import MemorySettings, ResolutionConfig, ResolverStrategy

settings = MemorySettings(
    resolution=ResolutionConfig(
        strategy=ResolverStrategy.COMPOSITE,
        exact_threshold=1.0,      # Exact match
        fuzzy_threshold=0.85,     # Fuzzy match threshold
        semantic_threshold=0.9,   # Embedding similarity
    )
)

async with MemoryClient(settings) as memory:
    # These might resolve to the same entity
    await memory.short_term.add_message(
        session_id="s1", role="user",
        content="I talked to John Smith"
    )
    await memory.short_term.add_message(
        session_id="s1", role="user",
        content="Johnny Smith called me back"
    )
    # "John Smith" and "Johnny Smith" may be resolved as the same person

POLE+O in the Extraction Pipeline

Each extractor maps to POLE+O types differently:

spaCy Mapping

SPACY_TO_POLEO = {
    "PERSON": "PERSON",
    "ORG": "ORGANIZATION",
    "GPE": "LOCATION",     # Geopolitical entities
    "LOC": "LOCATION",     # Other locations
    "FAC": "LOCATION",     # Facilities
    "EVENT": "EVENT",
    "PRODUCT": "OBJECT",
    "WORK_OF_ART": "OBJECT",
    "DATE": "EVENT",
    "TIME": "EVENT",
    "MONEY": "OBJECT",
}

GLiNER Labels

GLiNER uses lowercase labels that map to POLE+O:

GLINER_LABELS = [
    "person",           # → PERSON
    "organization",     # → ORGANIZATION
    "company",          # → ORGANIZATION:COMPANY
    "location",         # → LOCATION
    "city",             # → LOCATION:CITY
    "country",          # → LOCATION:COUNTRY
    "event",            # → EVENT
    "meeting",          # → EVENT:MEETING
    "object",           # → OBJECT
    "vehicle",          # → OBJECT:VEHICLE
    "document",         # → OBJECT:DOCUMENT
]

LLM Prompt

The LLM extractor uses explicit POLE+O instructions:

Extract entities using the POLE+O model:

- PERSON: Individuals, people mentioned by name or role
- OBJECT: Physical or digital items (vehicles, phones, documents)
- LOCATION: Places, addresses, geographic areas
- EVENT: Incidents, meetings, transactions, things that happened
- ORGANIZATION: Companies, groups, institutions

For each entity, also identify:
- Subtype (e.g., VEHICLE for OBJECT, COMPANY for ORGANIZATION)
- Confidence score (0.0 to 1.0)
- Description (brief context)

Best Practices

Choosing Entity Types

  1. Start with POLE+O - The default types cover most use cases

  2. Add subtypes - Use subtypes for finer classification without changing core types

  3. Custom types - Only create custom types for domain-specific needs

Entity Naming

  1. Canonical names - Use full, proper names when possible

  2. Consistency - Use consistent capitalization and formatting

  3. Aliases - Store alternative names as separate entities with ALIAS_OF relationships

Performance

  1. Limit types - Extract only the types you need

  2. Use subtypes - Subtypes are more efficient than many custom types

  3. Index entities - Neo4j indexes are created automatically for name+type