Bring Your Own Model
How to plug in any LLM or embedding provider — native adapters for OpenAI, Anthropic, Bedrock, Vertex AI, and sentence-transformers; LiteLLM universal fallback for everything else.
As of v0.3 neo4j-agent-memory is provider-pluggable. You pass either a provider-string shorthand ("anthropic/claude-3-5-sonnet-latest"), an explicit Provider instance, or hand off your already-configured framework model. The library picks the most reliable backend available.
TL;DR — string shorthand
from neo4j_agent_memory import MemoryClient, MemorySettings
settings = MemorySettings(
neo4j={"password": "p"},
llm="anthropic/claude-3-5-sonnet-latest",
embedding="openai/text-embedding-3-small",
)
async with MemoryClient(settings) as client:
...
-
Install:
pip install neo4j-agent-memory[anthropic,openai] -
The factory picks a native adapter when the matching extra is installed, and falls back to LiteLLM for unsupported providers.
-
Returned Providers implement the
LLMProvider/EmbeddingProviderProtocols.
Pick a provider
| Provider | Example model string | Extra | Notes |
|---|---|---|---|
OpenAI |
|
|
Native: strict-mode structured output. |
Anthropic |
|
|
Native: forced tool-use + optional prompt caching. |
AWS Bedrock |
|
|
Native: Converse API; reads boto3 credential chain. |
Vertex AI (Gemini) |
|
|
Routes via LiteLLM; needs ADC credentials. |
Ollama (local) |
|
|
Pass |
Groq |
|
|
LiteLLM universal. |
Together |
|
|
LiteLLM universal. |
Cohere |
|
|
LiteLLM universal. |
OpenRouter (any) |
|
|
LiteLLM universal. |
Embedding-only providers:
| Provider | Example model string | Extra | Dimensions |
|---|---|---|---|
OpenAI |
|
|
1536 |
OpenAI (large) |
|
|
3072 |
Vertex AI |
|
|
768 |
Bedrock Titan |
|
|
1024 |
sentence-transformers |
|
|
384 |
sentence-transformers |
|
|
1024 |
Cohere |
|
|
1024 |
Voyage |
|
|
1024 |
For models not in the defaults table, pass an explicit dimensions=N when constructing the adapter directly, or via --embedding-dimensions on the MCP CLI.
Native-first resolution
When you call from_provider("openai/gpt-4o-mini"):
-
Parse
openaias the provider prefix. -
If
[openai]is installed and the prefix is one of{openai, anthropic, bedrock}, use the native adapter. -
Otherwise, if
[litellm]is installed, route throughLiteLLMProvider. -
Otherwise, raise
ImportErrorwith an install hint for both the native extra and the universal fallback.
You can force LiteLLM even when a native adapter is available:
from neo4j_agent_memory.llm import from_provider
provider = from_provider(
"openai/gpt-4o",
prefer_litellm=True,
)
Why this design? Native adapters get provider-specific features (OpenAI strict-mode JSON, Anthropic prompt caching, Bedrock Converse) that LiteLLM normalizes away or lags on. The escape hatch exists for consistency-across-providers testing.
Three ways to wire a provider
A. Provider-string shorthand
Simplest. The factory resolves the string.
settings = MemorySettings(
neo4j={"password": "p"},
llm="anthropic/claude-3-5-sonnet-latest",
)
B. Explicit Provider instance
When you need adapter-specific kwargs (api_base, cache_system, aws_region):
from neo4j_agent_memory.llm.adapters.anthropic import AnthropicProvider
from neo4j_agent_memory.llm.adapters.litellm import LiteLLMProvider
settings = MemorySettings(
neo4j={"password": "p"},
llm=AnthropicProvider(
"anthropic/claude-3-5-sonnet-latest",
cache_system=True, # opt-in prompt caching
),
)
# Or a local model behind LiteLLM:
ollama = LiteLLMProvider(
"ollama/llama3.2",
api_base="http://localhost:11434",
)
C. Framework pass-through
Hand off a model you’ve already configured with your agent framework:
from langchain_anthropic import ChatAnthropic
from neo4j_agent_memory.integrations.langchain import (
llm_provider_from_langchain,
)
chat = ChatAnthropic(model_name="claude-3-5-sonnet-latest")
settings = MemorySettings(
neo4j={"password": "p"},
llm=llm_provider_from_langchain(chat),
)
See the migration guide for the full list of llm_provider_from_<framework> helpers.
Embedding models — the dimension gotcha
Embedding adapters require dimensions: int so Neo4j vector indexes are sized correctly at connect(). The defaults table covers common models; for an unknown model, pass dimensions= explicitly:
from neo4j_agent_memory.llm.adapters.sentence_transformers import (
SentenceTransformersProvider,
)
# Known model — dimensions auto-populated from defaults.
embedder = SentenceTransformersProvider("BAAI/bge-small-en-v1.5")
assert embedder.dimensions == 384
# Unknown model — must specify dimensions.
custom = SentenceTransformersProvider("my-org/my-internal-model", dimensions=512)
If you change embedding model after creating data, see Migrate Embedding Model for the index-rebuild runbook.
Structured extraction
The library’s entity extractor calls complete_structured() when the provider implements StructuredExtractor. This is what makes extraction quality high across providers:
-
OpenAI: strict mode (
response_format={"type": "json_schema", "strict": True}) — schema-conforming output guaranteed. -
Anthropic: forced tool use — the model is required to call a single tool whose input is your Pydantic schema.
-
LiteLLM: schema-aligned retry (
schema_aligned_extract) — feeds validation errors back to the LLM as feedback for up to 2 retries.
You can use the same pattern directly:
from pydantic import BaseModel
from neo4j_agent_memory.llm import ChatMessage, from_provider
class City(BaseModel):
name: str
population: int
provider = from_provider("anthropic/claude-3-5-sonnet-latest")
city = await provider.complete_structured(
[ChatMessage(role="user", content="Population of Paris in 2024?")],
response_model=City,
)
print(city.name, city.population)
If a provider does not implement StructuredExtractor, the universal schema_aligned_extract helper still works:
from neo4j_agent_memory.llm import schema_aligned_extract
city = await schema_aligned_extract(
provider,
messages=[ChatMessage(role="user", content="...")],
response_model=City,
max_retries=2,
)
Error handling
Every adapter translates SDK-specific exceptions to the provider-agnostic hierarchy in neo4j_agent_memory.llm.errors:
from neo4j_agent_memory.llm import ProviderRateLimitError, ProviderTimeoutError
try:
result = await provider.complete([...])
except ProviderRateLimitError as e:
# Same except clause works across OpenAI, Anthropic, Bedrock, LiteLLM.
await asyncio.sleep(e.retry_after or 1.0)
result = await provider.complete([...])
except ProviderTimeoutError:
...
The full hierarchy:
-
ProviderError(base)-
ProviderAuthError— invalid/missing API key. -
ProviderRateLimitError— carriesretry_after: float | None. -
ProviderTimeoutError. -
ProviderInvalidRequestError— unknown model, malformed request. -
ProviderServiceError— 5xx / retriable. -
StructuredExtractionError— SAP retries exhausted; carrieslast_attemptsandvalidation_errors. -
EmbeddingDimensionMismatchError— see migration runbook.
-
Provider matrix at a glance
| Adapter | LLM Bronze | Structured Silver | Embedding | Notes |
|---|---|---|---|---|
|
✓ |
✓ (strict mode) |
Most reliable. |
|
|
✓ |
Dimension reduction supported. |
||
|
✓ |
✓ (forced tool) |
Optional prompt caching. |
|
|
✓ |
✓ (tool use) |
Boto3 credential chain. |
|
|
✓ |
Titan + Cohere via Bedrock. |
||
|
✓ |
✓ (via SAP) |
100+ providers. |
|
|
✓ |
Cohere, Voyage, etc. |
||
|
✓ |
Local, no API key. |
||
|
✓ |
Wraps existing Vertex AI embedder. |
||
|
✓ (Instructor SDK) |
For users already on Instructor. |
Configure via the MCP CLI
Match the Python API surface from the command line:
neo4j-agent-memory mcp serve \
--password mypw \
--llm anthropic/claude-3-5-sonnet-latest \
--embedding BAAI/bge-small-en-v1.5 \
--llm-api-key $ANTHROPIC_API_KEY
Or via env vars:
export NAM_LLM=anthropic/claude-3-5-sonnet-latest
export NAM_EMBEDDING=BAAI/bge-small-en-v1.5
neo4j-agent-memory mcp serve --password mypw
See CLI Reference for the full flag set.
Related
-
Tutorial: Anthropic + local embeddings — a copy-paste-runnable walkthrough.
-
Why the Provider Protocol? — design rationale.
-
Migrate to v0.3 — backward compat and side-by-side examples.