Build an Agent with Anthropic and Local Embeddings
A step-by-step tutorial: build a memory-enabled agent using Anthropic Claude for entity extraction and local sentence-transformers for embeddings. Zero OpenAI dependency.
By the end you’ll have a working memory system where the LLM is Anthropic and embeddings are computed locally on your machine — no embedding API calls leave your network. This is the headline configuration for v0.3.
What you’ll learn
-
How to wire
neo4j-agent-memoryto a non-OpenAI LLM -
How to use a local sentence-transformers embedder
-
How to confirm your chosen models are actually being used
-
How to switch back without losing data
Prerequisites
-
Python 3.10 or higher
-
An Anthropic API key (free trial available)
-
A running Neo4j 5.11+ instance
-
About 30 minutes
Step 1: Install dependencies
pip install "neo4j-agent-memory[anthropic,sentence-transformers]"
This pulls in:
-
The core memory library
-
The native Anthropic adapter (uses the
anthropicSDK directly for strict forced-tool-use structured output) -
sentence-transformersfor local embeddings
Step 2: Set up Neo4j
If you don’t already have a Neo4j 5.11+ instance running:
docker run \
--name neo4j-memory \
-p 7474:7474 -p 7687:7687 \
-e NEO4J_AUTH=neo4j/password123 \
-e NEO4J_PLUGINS='["apoc"]' \
-d \
neo4j:5.26-community
Wait ~30 seconds for it to boot, then open http://localhost:7474 to confirm.
Step 3: Configure environment variables
export ANTHROPIC_API_KEY=sk-ant-...
export NEO4J_URI=bolt://localhost:7687
export NEO4J_PASSWORD=password123
Step 4: Wire the agent
Create agent.py:
import asyncio
import os
from pydantic import SecretStr
from neo4j_agent_memory import MemoryClient, MemorySettings, Neo4jConfig
async def main():
settings = MemorySettings(
neo4j=Neo4jConfig(
uri=os.environ["NEO4J_URI"],
password=SecretStr(os.environ["NEO4J_PASSWORD"]),
),
# Anthropic LLM for entity extraction.
llm="anthropic/claude-3-5-sonnet-latest",
# Local 384-dim sentence-transformers embedder. First run
# downloads the model (~130 MB); subsequent runs use the cache.
embedding="BAAI/bge-small-en-v1.5",
)
async with MemoryClient(settings) as client:
# Store a conversation message. The extractor uses Anthropic's
# forced-tool-use structured output to extract entities.
await client.short_term.add_message(
session_id="tutorial-1",
role="user",
content=(
"Hi, I'm Maya Chen. I work as a product manager at "
"Acme Robotics in San Francisco. I'm building a new "
"feature for our warehouse automation product."
),
extract_entities=True,
)
# Inspect what was extracted.
entities = await client.long_term.search_entities(
query="Maya",
limit=5,
)
print(f"Found {len(entities)} entities:")
for entity in entities:
print(f" - {entity.name} ({entity.full_type})")
# Add a preference.
await client.long_term.add_preference(
category="communication",
preference="Prefers async written updates over meetings",
)
# Assemble context for an LLM prompt.
context = await client.get_context(
"What does Maya prefer for status updates?",
session_id="tutorial-1",
)
print("\nContext:\n", context)
asyncio.run(main())
Run it:
python agent.py
You should see the extracted entities (Maya Chen, Acme Robotics, San Francisco) and the assembled context.
Step 5: Confirm your providers were used
The factory chooses adapters silently. Verify with debug logging:
import logging
logging.getLogger("neo4j_agent_memory.llm.factory").setLevel(logging.DEBUG)
# Run agent.py again. You should see something like:
# DEBUG: from_provider: routing 'anthropic/claude-3-5-sonnet-latest' to native AnthropicProvider
# DEBUG: from_provider: routing 'BAAI/bge-small-en-v1.5' to SentenceTransformersProvider
If you see LiteLLMProvider in the log instead of AnthropicProvider, the [anthropic] extra isn’t installed.
Step 6: Inspect the graph
Open Neo4j Browser at http://localhost:7474 and run:
MATCH (e:Entity)
WHERE e.embedding IS NOT NULL
RETURN e.name, e.type, size(e.embedding) AS dim
LIMIT 10
You should see dim = 384 — the dimensionality of BAAI/bge-small-en-v1.5. This is what the vector index is sized for; if you tried to insert a 1536-dim OpenAI vector now, Neo4j would reject it.
Step 7: Verify dimensions match
SHOW VECTOR INDEXES YIELD name, options
RETURN name, options.indexConfig.`vector.dimensions` AS dim
Every library-managed index should report dim = 384. MemoryClient.connect() already validates this for you — if the dimensions drift you get EmbeddingDimensionMismatchError with a pointer to the migration runbook.
What just happened?
-
MemorySettings.embedding = "BAAI/bge-small-en-v1.5"resolved viafrom_provider. Sentence-transformers prefix detection pickedSentenceTransformersProvider, and dimensions were auto-populated from the defaults table (384). -
MemorySettings.llm = "anthropic/claude-3-5-sonnet-latest"resolved toAnthropicProvider. The[anthropic]extra was installed, so the factory chose the native adapter over LiteLLM. -
When you added a message with
extract_entities=True,LLMEntityExtractorsaw that the provider implementsStructuredExtractorand calledcomplete_structured(…)with a Pydantic schema. Anthropic responded via forced tool use — the model was required to emit valid JSON matching the schema. -
Each extracted entity got embedded locally by
BAAI/bge-small-en-v1.5. No outbound HTTP calls for embeddings.
Switching providers later
The whole config is two strings. Swap them at any time:
# Back to OpenAI for both:
settings = MemorySettings(
neo4j={"password": os.environ["NEO4J_PASSWORD"]},
llm="openai/gpt-4o-mini",
embedding="openai/text-embedding-3-small",
)
If you change the embedding model the dimensions change, and you’ll need to rebuild indexes — see Migrate Embedding Model. Changing only the LLM has no schema impact.
Going further
-
Bring Your Own Model — the full provider matrix.
-
Conversation Memory — add a real chat loop.
-
Knowledge Graphs — bulk-ingest from documents.
-
Why the Provider Protocol? — design rationale.