Amazon Web Services (AWS)
The Amazon Web Services (AWS) Comprehend Natural Language API uses machine learning to find insights and relationships in text. The procedures in this chapter act as a wrapper around calls to this API to extract entities and key phrases from text stored as node properties.
Each procedure has two modes:
-
Stream - returns a map constructed from the JSON returned from the API
-
Graph - creates a graph or virtual graph based on the values returned by the API
The procedures described in this chapter make API calls and subsequent updates to the database on the calling thread. If we want to make parallel requests to the API and avoid out of memory errors from keeping too much transaction state in memory while running procedures that write to the database, see Batching Requests. |
Procedure Overview
The procedures are described below:
Entity Extraction
The entity extraction procedures (apoc.nlp.aws.entities.*
) are wrappers around the Detect Entities operations of the AWS Comprehend Natural Language API.
This API method finds entities in the text, which are defined as a textual reference to the unique name of a real-world object such as people, places, and commercial items, and to precise references to measures such as dates and quantities.
The procedures are described below:
signature |
---|
apoc.nlp.aws.entities.stream(source :: ANY?, config = {} :: MAP?) :: (node :: NODE?, value :: MAP?, error :: MAP?) |
apoc.nlp.aws.entities.graph(source :: ANY?, config = {} :: MAP?) :: (graph :: MAP?) |
The procedures support the following config parameters:
name | type | default | description |
---|---|---|---|
key |
String |
null |
AWS Access Control Key |
secret |
String |
null |
AWS Access Control Secret |
nodeProperty |
String |
text |
The property on the provided node that contains the unstructured text to be analyzed |
In addition, apoc.nlp.aws.entities.graph
supports the following config parameters:
name | type | default | description |
---|---|---|---|
scoreCutoff |
Double |
0.0 |
Lower limit for the score of an entity to be present in the graph. Value must be between 0 and 1. Score is an indicator of the level of confidence that Amazon Comprehend has in the accuracy of the detection. |
write |
Boolean |
false |
persist the graph of entities |
writeRelationshipType |
String |
ENTITY |
relationship type for relationships from source node to entity nodes |
writeRelationshipProperty |
String |
score |
relationship property for relationships from source node to entity nodes |
CALL apoc.nlp.aws.entities.stream(source:Node or List<Node>, {
key: String,
secret: String,
nodeProperty: String
})
YIELD value
CALL apoc.nlp.aws.entities.graph(source:Node or List<Node>, {
key: String,
secret: String,
nodeProperty: String,
scoreCutoff: Double,
writeRelationshipType: String,
writeRelationshipProperty: String,
write: Boolean
})
YIELD graph
Key Phrases
The key phrase procedures (apoc.nlp.aws.keyPhrases.*
) are wrappers around the Detect Key Phrases operations of the AWS Comprehend Natural Language API.
A key phrase is a string containing a noun phrase that describes a particular thing.
It generally consists of a noun and the modifiers that distinguish it.
The procedures are described below:
signature |
---|
apoc.nlp.aws.keyPhrases.stream(source :: ANY?, config = {} :: MAP?) :: (node :: NODE?, value :: MAP?, error :: MAP?) |
apoc.nlp.aws.keyPhrases.graph(source :: ANY?, config = {} :: MAP?) :: (graph :: MAP?) |
The procedures support the following config parameters:
name | type | default | description |
---|---|---|---|
key |
String |
null |
AWS Access Control Key |
secret |
String |
null |
AWS Access Control Secret |
nodeProperty |
String |
text |
The property on the provided node that contains the unstructured text to be analyzed |
In addition, apoc.nlp.aws.keyPhrases.graph
supports the following config parameters:
name | type | default | description |
---|---|---|---|
scoreCutoff |
Double |
0.0 |
Lower limit for the score of an entity to be present in the graph. Value must be between 0 and 1. Score is an indicator of the level of confidence that Amazon Comprehend has in the accuracy of the detection. |
write |
Boolean |
false |
persist the graph of key phrases |
writeRelationshipType |
String |
KEY_PHRASE |
relationship type for relationships from source node to key phrase nodes |
writeRelationshipProperty |
String |
score |
relationship property for relationships from source node to key phrase nodes |
CALL apoc.nlp.aws.keyPhrases.stream(source:Node or List<Node>, {
key: String,
secret: String,
nodeProperty: String
})
YIELD value
CALL apoc.nlp.aws.keyPhrases.graph(source:Node or List<Node>, {
key: String,
secret: String,
nodeProperty: String,
scoreCutoff: Double,
writeRelationshipType: String,
writeRelationshipProperty: String,
write: Boolean
})
YIELD graph
Sentiment
The sentiment procedures (apoc.nlp.aws.sentiment.*
) are wrappers around the Determine Sentiment operations of the AWS Comprehend Natural Language API.
You can determine if the sentiment is positive, negative, neutral, or mixed.
The procedures are described below:
signature |
---|
apoc.nlp.aws.sentiment.stream(source :: ANY?, config = {} :: MAP?) :: (node :: NODE?, value :: MAP?, error :: MAP?) |
apoc.nlp.aws.sentiment.graph(source :: ANY?, config = {} :: MAP?) :: (graph :: MAP?) |
The procedures support the following config parameters:
name | type | default | description |
---|---|---|---|
key |
String |
null |
AWS Access Control Key |
secret |
String |
null |
AWS Access Control Secret |
nodeProperty |
String |
text |
The property on the provided node that contains the unstructured text to be analyzed |
In addition, apoc.nlp.aws.sentiment.graph
supports the following config parameters:
name | type | default | description |
---|---|---|---|
write |
Boolean |
false |
persist the graph of sentiment |
CALL apoc.nlp.aws.sentiment.stream(source:Node or List<Node>, {
key: String,
secret: String,
nodeProperty: String
})
YIELD value
CALL apoc.nlp.aws.sentiment.graph(source:Node or List<Node>, {
key: String,
secret: String,
nodeProperty: String,
writeRelationshipType: String,
write: Boolean
})
YIELD graph
Install Dependencies
The NLP procedures have dependencies on Kotlin and client libraries that are not included in the APOC Library.
These dependencies are included in apoc-nlp-dependencies-4.1.0.11.jar, which can be downloaded from the releases page.
Once that file is downloaded, it should be placed in the plugins
directory and the Neo4j Server restarted.
Setting up API Key and Secret
We can generate an Access Key and Secret by following the instructions at docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html. Once we’ve done that, we can populate and execute the following commands to create parameters that contains these details.
apiKey
and apiSecret
parameters:param apiKey => ("<api-key-here>");
:param apiSecret => ("<api-secret-here>");
Alternatively we can add these credentials to apoc.conf
and retrieve them using the static value storage functions.
See Static Value Storage.
apoc.static.aws.apiKey=<api-key-here>
apoc.static.aws.apiSecret=<api-secret-here>
apoc.conf
RETURN apoc.static.getAll("aws") AS aws;
aws |
---|
{apiKey: "<api-key-here>", apiSecret: "<api-secret-here>"} |
Batching Requests
Batching requests to the AWS API and the processing of results can be done using Periodic Iterate. This approach is useful if we want to make parallel requests to the AWS API and reduce the amount of transaction state kept in memory while running procedures that write to the database.
The AWS Comprehend API processes a maximum of 25 documents in one request, so for optimal performance we should pass in lists that are a multiple of this size. Keep in mind that if we pass in big lists this will result in more transaction state when writing to the database and may cause out of memory exceptions. |
CALL apoc.periodic.iterate("
MATCH (n)
WITH collect(n) as total
CALL apoc.coll.partition(total, 25)
YIELD value as nodes
RETURN nodes", "
CALL apoc.nlp.aws.entities.graph(nodes, {
key: $apiKey,
secret: $apiSecret,
nodeProperty: 'body',
writeRelationshipType: 'AWS_ENTITY',
write:true
})
YIELD graph
RETURN distinct 'done'", {
batchSize: 1,
params: { apiKey: $apiKey, apiSecret: $apiSecret }
}
);
Examples
The examples in this section are based on the following sample graph:
CREATE (:Article {
uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/",
body: "These days I’m rarely more than a few feet away from my Nintendo Switch and I play board games, card games and role playing games with friends at least once or twice a week. I’ve even organised lunch-time Mario Kart 8 tournaments between the Neo4j European offices!"
});
CREATE (:Article {
uri: "https://en.wikipedia.org/wiki/Nintendo_Switch",
body: "The Nintendo Switch is a video game console developed by Nintendo, released worldwide in most regions on March 3, 2017. It is a hybrid console that can be used as a home console and portable device. The Nintendo Switch was unveiled on October 20, 2016. Nintendo offers a Joy-Con Wheel, a small steering wheel-like unit that a Joy-Con can slot into, allowing it to be used for racing games such as Mario Kart 8."
});
Entity Extraction
Let’s start by extracting the entities from the Article node.
The text that we want to analyze is stored in the body
property of the node, so we’ll need to specify that via the nodeProperty
configuration parameter.
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.aws.entities.stream(a, {
key: $apiKey,
secret: $apiSecret,
nodeProperty: "body"
})
YIELD value
UNWIND value.entities AS entity
RETURN entity;
entity |
---|
{score: 0.780032217502594, endOffset: 71, text: "Nintendo Switch", type: "COMMERCIAL_ITEM", beginOffset: 56} |
{score: 0.8155304193496704, endOffset: 151, text: "at least", type: "QUANTITY", beginOffset: 143} |
{score: 0.7507548332214355, endOffset: 156, text: "once", type: "QUANTITY", beginOffset: 152} |
{score: 0.8760746717453003, endOffset: 172, text: "twice a week", type: "QUANTITY", beginOffset: 160} |
{score: 0.9944096803665161, endOffset: 217, text: "Mario Kart 8", type: "TITLE", beginOffset: 205} |
{score: 0.9946564435958862, endOffset: 247, text: "Neo4j", type: "ORGANIZATION", beginOffset: 242} |
{score: 0.6274040937423706, endOffset: 256, text: "European", type: "LOCATION", beginOffset: 248} |
We get back 7 different entities.
We could then apply a Cypher statement that creates one node per entity and an ENTITY
relationship from each of those nodes back to the Article
node.
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.aws.entities.stream(a, {
key: $apiKey,
secret: $apiSecret,
nodeProperty: "body"
})
YIELD value
UNWIND value.entities AS entity
MERGE (e:Entity {name: entity.text})
SET e.type = entity.type
MERGE (a)-[:ENTITY]->(e)
Alternatively we can use the graph mode to automatically create the entity graph.
As well as having the Entity
label, each entity node will have another label based on the value of the type
property.
By default, a virtual graph is returned.
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.aws.entities.graph(a, {
key: $apiKey,
secret: $apiSecret,
nodeProperty: "body",
writeRelationshipType: "ENTITY"
})
YIELD graph AS g
RETURN g;
We can see a Neo4j Browser visualization of the virtual graph in Pokemon entities graph.
We can compute the entities for multiple nodes by passing a list of nodes to the procedure.
MATCH (a:Article)
WITH collect(a) AS articles
CALL apoc.nlp.aws.entities.graph(articles, {
key: $apiKey,
secret: $apiSecret,
nodeProperty: "body",
writeRelationshipType: "ENTITY"
})
YIELD graph AS g
RETURN g
We can see a Neo4j Browser visualization of the virtual graph in Pokemon and Nintendo Switch entities graph.
On this visualization we can also see the score for each entity node.
This score represents the level of confidence that the API has in its detection of the entity.
We can specify a minimum cut off value for the score using the scoreCutoff
property.
MATCH (a:Article)
WITH collect(a) AS articles
CALL apoc.nlp.aws.entities.graph(articles, {
key: $apiKey,
secret: $apiSecret,
nodeProperty: "body",
scoreCutoff: 0.7,
writeRelationshipType: "ENTITY"
})
YIELD graph AS g
RETURN g
We can see a Neo4j Browser visualization of the virtual graph in Pokemon and Nintendo Switch entities graph with confidence >= 0.7.
If we’re happy with this graph and would like to persist it in Neo4j, we can do this by specifying the write: true
configuration.
HAS_ENTITY
relationship from the article to each entityMATCH (a:Article)
WITH collect(a) AS articles
CALL apoc.nlp.aws.entities.graph(articles, {
key: $apiKey,
secret: $apiSecret,
nodeProperty: "body",
scoreCutoff: 0.7,
writeRelationshipType: "HAS_ENTITY",
writeRelationshipProperty: "awsEntityScore",
write: true
})
YIELD graph AS g
RETURN g;
We can then write a query to return the entities that have been created.
MATCH (article:Article)
RETURN article.uri AS article,
[(article)-[r:HAS_ENTITY]->(e:Entity) | {text: e.text, score: r.awsEntityScore}] AS entities;
article | entities |
---|---|
"https://neo4j.com/blog/pokegraph-gotta-graph-em-all/" |
[{score: 0.9944096803665161, text: "Mario Kart 8"}, {score: 0.8760746717453003, text: "twice a week"}, {score: 0.9946564435958862, text: "Neo4j"}, {score: 0.7507548332214355, text: "once"}, {score: 0.8155304193496704, text: "at least"}, {score: 0.780032217502594, text: "Nintendo Switch"}] |
"https://en.wikipedia.org/wiki/Nintendo_Switch" |
[{score: 0.9990180134773254, text: "Mario Kart 8"}, {score: 0.9997879862785339, text: "March 3, 2017"}, {score: 0.9958534240722656, text: "Nintendo"}, {score: 0.9998348355293274, text: "October 20, 2016"}, {score: 0.753325343132019, text: "Nintendo Switch"}] |
Key Phrases
Let’s now extract the key phrases from the Article node.
The text that we want to analyze is stored in the body
property of the node, so we’ll need to specify that via the nodeProperty
configuration parameter.
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.aws.keyPhrases.stream(a, {
key: $apiKey,
secret: $apiSecret,
nodeProperty: "body"
})
YIELD value
UNWIND value.keyPhrases AS keyPhrase
RETURN keyPhrase;
keyPhrase |
---|
{score: 0.9999966621398926, endOffset: 10, text: "These days", beginOffset: 0} |
{score: 0.9867414236068726, endOffset: 42, text: "more than a few feet", beginOffset: 22} |
{score: 0.9999999403953552, endOffset: 71, text: "my Nintendo Switch", beginOffset: 53} |
{score: 0.9999997019767761, endOffset: 94, text: "board games", beginOffset: 83} |
{score: 0.9999964237213135, endOffset: 106, text: "card games", beginOffset: 96} |
{score: 0.9998161792755127, endOffset: 129, text: "role playing games", beginOffset: 111} |
{score: 1.0, endOffset: 142, text: "friends", beginOffset: 135} |
{score: 0.8642383217811584, endOffset: 172, text: "a week", beginOffset: 166} |
{score: 0.9999430179595947, endOffset: 215, text: "lunch-time Mario Kart", beginOffset: 194} |
{score: 0.9983567595481873, endOffset: 229, text: "8 tournaments", beginOffset: 216} |
{score: 0.999997615814209, endOffset: 264, text: "the Neo4j European offices", beginOffset: 238} |
Alternatively we can use the graph mode to automatically create a key phrase graph.
One node with the KeyPhrase
label will be created for each key phrase extracted.
By default, a virtual graph is returned, but the graph can be persisted by specifying the write: true
configuration.
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.aws.keyPhrases.graph(a, {
key: $apiKey,
secret: $apiSecret,
nodeProperty: "body",
writeRelationshipType: "KEY_PHRASE",
write: true
})
YIELD graph AS g
RETURN g;
We can see a Neo4j Browser visualization of the virtual graph in Pokemon key phrases graph.
We can then write a query to return the key phrases that have been created.
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
RETURN a.uri AS article,
[(a)-[:KEY_PHRASE]->(k:KeyPhrase) | k.text] AS keyPhrases;
article | keyPhrases |
---|---|
"https://neo4j.com/blog/pokegraph-gotta-graph-em-all/" |
["the Neo4j European offices", "a week", "friends", "8 tournaments", "lunch-time Mario Kart", "card games", "board games", "role playing games", "my Nintendo Switch", "more than a few feet", "These days"] |
Sentiment
Let’s now extract the sentiment for the Article node.
The text that we want to analyze is stored in the body
property of the node, so we’ll need to specify that via the nodeProperty
configuration parameter.
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.aws.sentiment.stream(a, {
key: $apiKey,
secret: $apiSecret,
nodeProperty: "body"
})
YIELD value
RETURN value;
value |
---|
{index: 0, sentiment: "POSITIVE", sentimentScore: {neutral: 0.33138760924339294, negative: 0.0026062370743602514, mixed: 3.5950531582784606E-6, positive: 0.6660025119781494}} |
Alternatively we can use the graph mode to automatically store the sentiment and its score.
By default, a virtual graph is returned, but the graph can be persisted by specifying the write: true
configuration.
The sentiment is stored in the sentiment
property and the score for that sentiment in the sentimentScore
property.
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.aws.sentiment.graph(a, {
key: $apiKey,
secret: $apiSecret,
nodeProperty: "body",
write: true
})
YIELD graph AS g
UNWIND g.nodes AS node
RETURN node {.uri, .sentiment, .sentimentScore} AS node;
node |
---|
{sentiment: "Positive", sentimentScore: 0.6660025119781494, uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"} |