An Extremely Simple but Effective Way to Improve Search Over Text Embeddings

Field Engineer, Neo4j
8 min read

Experiments on AWS, Azure, andย GCP

Abstract
In many GenAI solutions built today, e.g., RAG, doing similarity searches over embeddings of text, images, and other data becomes more and more popular. There are quite some approaches proposed and implemented to improve the embedding search, using techniques like embedding model fine-tuning, prompt engineering, and cross-encoder for re-ranking, to name aย few.
In this article, I will demonstrate an extremely simple but effective way to improve similarity search, without using ANY of the methods mentioned above. Some samples will be given, and test results shown for embeddings done on all 3 major cloud providers, i.e., Azure OpenAI, Google VertexAI, and AWS Bedrock, using Neo4j APOC ML procedures.
For a more holistic explanation of text embedding, please feel free to check my previousย article:
Text EmbeddingโโโWhat, Why and How?
The Goals
In a classical RAG (Retrieval Augmented Generation) solution, the retrieval process often performs a similarity search over the (text) embedding (represented as a data type called vector) of the question (asked in natural language), against text embeddings of content stored in a knowledge base.

For effective vector search, weโd expect a similarity scoreย that:
- There is enough gap between the vectors of higher relevance and those of less relevance.
- There is a reasonable threshold to be used to exclude those irrelevant vectors from theย results.
The Samples
Letโs say we have some questions on banking products:
1. 'tell the differences bwteen standard home loan and fixed loan'
2. 'compare standard home loan, line of credit and fixed loan'
3. 'how to calculate interest?'
4. 'what is fixed rate loan?'
5. 'can I make more replayment?'
And a summary of theย intent:
The question is about comparing banking product features and summarising differences.
By a quick look at them, it wouldnโt be too hard for us (human beings) to expect that for the above intent, questions #1 and #2 should have higher similarity scores than the rest of the questions.
Letโs testย it.
Making Embedding APIย Calls
As I use Neo4j to store both text and vector data in a knowledge graph, to start quickly, I will just use the machine learning procedures from Neo4j APOC library to get embeddings of text from all three cloud providers.
1. AWSย Bedrock
AWS offers Titan Embedding G1 for text embedding tasks, which is available in certain regions by request. AWS access key and keyID are required. It is also required to grant AmazonBedrockFullAccess permission to the user account inย IAM.
:param aws_secret_access_key=>'AWS-SECRET-ACCESS-KEY';
:param aws_key_id=>'AWS-KEY-ID';
:param region=>'us-west-2';
:param model=>'amazon.titan-embed-text-v1';
CALL apoc.ml.bedrock.embedding(['Some Text'], {region:$region,keyId:$aws_key_id, secretKey:$aws_secret_access_key, model:$model});
2. Azureย OpenAI
On Azure, OpenAI text-embedding-ada-002 is the model that produces text embeddings of 1536 dimensions. The OpenAI API key is required toย run:
CALL apoc.ml.openai.embedding(['Some Text'], $apiKey, {}) yield index, text, embedding;
3. GCPย VertexAI
On GCP Vertex AI, embeddings for Text (model name textembedding-gecko) is the name for the model that supports text embeddings, which has 768 dimensions. GCP account access token, project ID, and region are required:
CALL apoc.ml.vertexai.embedding(['Some Text'], $accessToken, $project, {region:'<region>'}) yield index, text, embedding;
4. Cosine Similairy Function
In Neo4jโs Graph Data Science library, there is a Cosine Similarity function. If you donโt have it installed, here is a custom function done purely inย Cypher.
CALL apoc.custom.declareFunction(
"cosineSimilarity(vector1::LIST OF FLOAT, vector2::LIST OF FLOAT)::FLOAT",
"WITH
reduce(s = 0.0, i IN range(0, size($vector1)-1) | s + $vector1[i] * $vector2[i]) AS dotProduct,
sqrt(reduce(s = 0.0, i IN range(0, size($vector1)-1) | s + $vector1[i]^2)) AS magnitude1,
sqrt(reduce(s = 0.0, i IN range(0, size($vector2)-1) | s + $vector2[i]^2)) AS magnitude2
RETURN
toFloat(dotProduct / (magnitude1 * magnitude2)) AS score;"
);
The Baseline
Letโs start with the Azure OpenAI embedding model using the Cypher statement below to calculate similarity:
CALL apoc.ml.openai.embedding(['The question is about comparing banking products features and summarising differences'],NULL , {})
YIELD index, text, embedding
WITH text AS text1, embedding AS emb1
CALL apoc.ml.openai.embedding(['tell the differences bwteen standard home loan and fixed loan',
'compare standard home loan, line of credit and fixed loan',
'how to calculate interest?',
'what is fixed rate loan?',
'can I make more replayment?'
],NULL , {})
YIELD index, text, embedding
WITH text AS text2, embedding AS emb2, emb1, text1
RETURN text1, text2, custom.cosineSimilarity(emb1, emb2) AS score;
and resultsย are:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโโโโโโโโโโโโโโโโโโโ
โtext1 โtext2 โscore โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโโโก
โ"The question is about comparing banking products features and summariโ"tell the differences bwteen standard home loan and fixed loan"โ0.786548104272935 โ
โsing differences" โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโค
โ"The question is about comparing banking products features and summariโ"compare standard home loan, line of credit and fixed loan" โ0.8082209952027013โ
โsing differences" โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโค
โ"The question is about comparing banking products features and summariโ"how to calculate interest?" โ0.7799410426972614โ
โsing differences" โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโค
โ"The question is about comparing banking products features and summariโ"what is fixed rate loan?" โ0.7562771823612668โ
โsing differences" โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโค
โ"The question is about comparing banking products features and summariโ"can I make more replayment?" โ0.7278839096948918โ
โsing differences" โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโ
The initial results are aligned with what we expected, i.e., question #2 has the highest score followed by #1, and closely by#3. #4 & 5 have relatively lower scores. However, this is not as good as the goals we defined at the beginning of theย article:
- Questions #1~3 have similarity scores very close to eachย other.
- The distance between max and min similarity scores is only aboutย 0.08.
- Question #1 is relevant, but it has a relatively low score, if we consider 0.8 as a reasonable threshold.
For a large amount of vectors to compare, those issues can make the search and filtering more challenging.
Is there any way we can improve this using simple techniques, but not by doing model finetuning or sophisticated prompt engineering?
The Extremely Simpleย Way
OpenAI Embedding
Of course, we could add some keywords to the questions. For example, by adding โquestion: โ to the front of all questions, weโll have slightly better results,as shownย below:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโโโโโโโโโโโโโโโโโโโ
โtext1 โtext2 โscore โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโโโก
โ"The question is about comparing banking products features and summariโ"question: tell the differences bwteen standard home loan and fixed loโ0.8076027901384112โ
โsing differences" โan" โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโค
โ"The question is about comparing banking products features and summariโ"question: compare standard home loan, line of credit and fixed loan" โ0.8209218883771804โ
โsing differences" โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโค
โ"The question is about comparing banking products features and summariโ"question: how to calculate interest?" โ0.7897457504166066โ
โsing differences" โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโค
โ"The question is about comparing banking products features and summariโ"question: what is fixed rate loan?" โ0.7752367420838878โ
โsing differences" โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโค
โ"The question is about comparing banking products features and summariโ"question: can I make more replayment?" โ0.7434177316292667โ
โsing differences" โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโค
We can see now both #1 & #2 are above 0.8, together with scores boosted for all questions except #3, which, in fact, decreased.
A more generic, extremely simple, and effective way is to add someย symbols.
For example, if I add ### to the beginning of questions, the scores are boostedย further:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโโโโโโโโโโโโโโโโโโโ
โtext1 โtext2 โscore โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโโโก
โ"The question is about comparing banking products features and summariโ"###question: tell the differences bwteen standard home loan and fixedโ0.8153785384022421โ
โsing differences" โ loan" โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโค
โ"The question is about comparing banking products features and summariโ"###question: compare standard home loan, line of credit and fixed loaโ0.8396923324014403โ
โsing differences" โn" โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโค
โ"The question is about comparing banking products features and summariโ"###question: how to calculate interest?" โ0.8033634786411539โ
โsing differences" โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโค
โ"The question is about comparing banking products features and summariโ"###question: what is fixed rate loan?" โ0.7924533227169328โ
โsing differences" โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโค
โ"The question is about comparing banking products features and summariโ"###question: can I make more replayment?" โ0.7517756135667428โ
โsing differences" โ โ โ
So, I tested various patterns and came up with the summaryย below:


For OpenAIโs embedding model, it seems adding ### to both sides of the questions can achieve quite good outcomes, which boosted relevant questions more than less relevant ones, and increased distance byย 13%+!
What about other embedding models?
AWS Bedrock
The baseline test shows Bedrock / Titan model has differentiated questions #1 & 2 away from the other ones better than OpenAI, as similarity scores have a much bigger distance but much lower absolute values,ย too.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโโโโโโโโโโโโโโโโโโโโ
โtext1 โtext2 โscore โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโโโโก
โ"The question is about comparing banking products features and summariโ"question: tell the differences bwteen standard home loan and fixed loโ0.48950020909002845โ
โsing differences" โan" โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโค
โ"The question is about comparing banking products features and summariโ"question: compare standard home loan, line of credit and fixed loan" โ0.49591356152310995โ
โsing differences" โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโค
โ"The question is about comparing banking products features and summariโ"question: how to calculate interest?" โ0.3239678489106068 โ
โsing differences" โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโค
โ"The question is about comparing banking products features and summariโ"question: what is fixed rate loan?" โ0.32782421303093584โ
โsing differences" โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโค
โ"The question is about comparing banking products features and summariโ"question: can I make more replayment?" โ0.10463201924339886โ
โsing differences" โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโ
And the results after applying various patterns:

Again ### shows more influence over the text embedding.
Google VertexAI
VertexAI has a similar distance but lower scores compared toย OpenAI.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโโโโโโโโโโโโโโโโโโโ
โtext1 โtext2 โscore โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโโโก
โ"The question is about comparing banking products features and summariโ"question: tell the differences bwteen standard home loan and fixed loโ0.7222057116992117โ
โsing differences" โan" โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโค
โ"The question is about comparing banking products features and summariโ"question: compare standard home loan, line of credit and fixed loan" โ0.7196310737891332โ
โsing differences" โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโค
โ"The question is about comparing banking products features and summariโ"question: how to calculate interest?" โ0.6515748673785347โ
โsing differences" โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโค
โ"The question is about comparing banking products features and summariโ"question: what is fixed rate loan?" โ0.618704888535842 โ
โsing differences" โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโค
โ"The question is about comparing banking products features and summariโ"question: can I make more replayment?" โ0.5397404750178617โ
โsing differences" โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโ
And the results after applying some patterns:

It looks like the VertexAI embedding model prefers &&& overย ###.
Summary
Embeddings are knowledge of AI. However, even for texts of the same semantic meanings, their embeddings can be very different, which are subject to tokenization and pre-training processes of the language model, and trivial things like special characters/symbols included can have quite significant impacts on the outcomes, too.
Some models, especially subword-based models like BERT, treat symbols as separate tokens or as parts of tokens (like โ##โ in BERT), which influences the subsequent embeddings.
In models capable of generating context-sensitive embeddings (like transformers), the presence of symbols can change the context of the surrounding words, thus altering their embeddings. However, for closed-sourced LLMs, there is no easy explanation for this, which poses unknown opportunities as well asย risks.
For more practical guidance on evaluating embedding & its search, you may find this article helpful,ย too:
Why Vector Search Didnโt Work for Your RAG Solution?
An Extremely Simple but Effective Way to Improve Search Over Text Embeddings was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.