Amazon Web Services (AWS)

The Amazon Web Services (AWS) Comprehend Natural Language API uses machine learning to find insights and relationships in text. The procedures in this chapter act as a wrapper around calls to this API to extract entities and key phrases from text stored as node properties.

Each procedure has two modes:

Stream - returns a map constructed from the JSON returned from the API
Graph - creates a graph or virtual graph based on the values returned by the API

The procedures described in this chapter make API calls and subsequent updates to the database on the calling thread. If we want to make parallel requests to the API and avoid out of memory errors from keeping too much transaction state in memory while running procedures that write to the database, see Batching Requests.

Procedure Overview

The procedures are described below:

Entity Extraction

The entity extraction procedures (apoc.nlp.aws.entities.*) are wrappers around the Detect Entities operations of the AWS Comprehend Natural Language API. This API method finds entities in the text, which are defined as a textual reference to the unique name of a real-world object such as people, places, and commercial items, and to precise references to measures such as dates and quantities.

The procedures are described below:

signature
apoc.nlp.aws.entities.stream(source :: ANY?, config = {} :: MAP?) :: (node :: NODE?, value :: MAP?, error :: MAP?)
apoc.nlp.aws.entities.graph(source :: ANY?, config = {} :: MAP?) :: (graph :: MAP?)

signature

apoc.nlp.aws.entities.stream(source :: ANY?, config = {} :: MAP?) :: (node :: NODE?, value :: MAP?, error :: MAP?)

apoc.nlp.aws.entities.graph(source :: ANY?, config = {} :: MAP?) :: (graph :: MAP?)

The procedures support the following config parameters:

Table 1. Config parameters
name	type	default	description
key	String	null	AWS Access Control Key
secret	String	null	AWS Access Control Secret
nodeProperty	String	text	The property on the provided node that contains the unstructured text to be analyzed

In addition, apoc.nlp.aws.entities.graph supports the following config parameters:

Table 2. Config parameters
name	type	default	description
scoreCutoff	Double	0.0	Lower limit for the score of an entity to be present in the graph. Value must be between 0 and 1. Score is an indicator of the level of confidence that Amazon Comprehend has in the accuracy of the detection.
write	Boolean	false	persist the graph of entities
writeRelationshipType	String	ENTITY	relationship type for relationships from source node to entity nodes
writeRelationshipProperty	String	score	relationship property for relationships from source node to entity nodes

Streaming mode

CALL apoc.nlp.aws.entities.stream(source:Node or List<Node>, {
  key: String,
  secret: String,
  nodeProperty: String
})
YIELD value

Graph mode

CALL apoc.nlp.aws.entities.graph(source:Node or List<Node>, {
  key: String,
  secret: String,
  nodeProperty: String,
  scoreCutoff: Double,
  writeRelationshipType: String,
  writeRelationshipProperty: String,
  write: Boolean
})
YIELD graph

Key Phrases

The key phrase procedures (apoc.nlp.aws.keyPhrases.*) are wrappers around the Detect Key Phrases operations of the AWS Comprehend Natural Language API. A key phrase is a string containing a noun phrase that describes a particular thing. It generally consists of a noun and the modifiers that distinguish it.

The procedures are described below:

signature
apoc.nlp.aws.keyPhrases.stream(source :: ANY?, config = {} :: MAP?) :: (node :: NODE?, value :: MAP?, error :: MAP?)
apoc.nlp.aws.keyPhrases.graph(source :: ANY?, config = {} :: MAP?) :: (graph :: MAP?)

signature

apoc.nlp.aws.keyPhrases.stream(source :: ANY?, config = {} :: MAP?) :: (node :: NODE?, value :: MAP?, error :: MAP?)

apoc.nlp.aws.keyPhrases.graph(source :: ANY?, config = {} :: MAP?) :: (graph :: MAP?)

The procedures support the following config parameters:

Table 3. Config parameters
name	type	default	description
key	String	null	AWS Access Control Key
secret	String	null	AWS Access Control Secret
nodeProperty	String	text	The property on the provided node that contains the unstructured text to be analyzed

In addition, apoc.nlp.aws.keyPhrases.graph supports the following config parameters:

Table 4. Config parameters
name	type	default	description
scoreCutoff	Double	0.0	Lower limit for the score of an entity to be present in the graph. Value must be between 0 and 1. Score is an indicator of the level of confidence that Amazon Comprehend has in the accuracy of the detection.
write	Boolean	false	persist the graph of key phrases
writeRelationshipType	String	KEY_PHRASE	relationship type for relationships from source node to key phrase nodes
writeRelationshipProperty	String	score	relationship property for relationships from source node to key phrase nodes

Streaming mode

CALL apoc.nlp.aws.keyPhrases.stream(source:Node or List<Node>, {
  key: String,
  secret: String,
  nodeProperty: String
})
YIELD value

Graph mode

CALL apoc.nlp.aws.keyPhrases.graph(source:Node or List<Node>, {
  key: String,
  secret: String,
  nodeProperty: String,
  scoreCutoff: Double,
  writeRelationshipType: String,
  writeRelationshipProperty: String,
  write: Boolean
})
YIELD graph

Sentiment

The sentiment procedures (apoc.nlp.aws.sentiment.*) are wrappers around the Determine Sentiment operations of the AWS Comprehend Natural Language API. You can determine if the sentiment is positive, negative, neutral, or mixed.

The procedures are described below:

signature
apoc.nlp.aws.sentiment.stream(source :: ANY?, config = {} :: MAP?) :: (node :: NODE?, value :: MAP?, error :: MAP?)
apoc.nlp.aws.sentiment.graph(source :: ANY?, config = {} :: MAP?) :: (graph :: MAP?)

signature

apoc.nlp.aws.sentiment.stream(source :: ANY?, config = {} :: MAP?) :: (node :: NODE?, value :: MAP?, error :: MAP?)

apoc.nlp.aws.sentiment.graph(source :: ANY?, config = {} :: MAP?) :: (graph :: MAP?)

The procedures support the following config parameters:

Table 5. Config parameters
name	type	default	description
key	String	null	AWS Access Control Key
secret	String	null	AWS Access Control Secret
nodeProperty	String	text	The property on the provided node that contains the unstructured text to be analyzed

In addition, apoc.nlp.aws.sentiment.graph supports the following config parameters:

Table 6. Config parameters
name	type	default	description
write	Boolean	false	persist the graph of sentiment

Streaming mode

CALL apoc.nlp.aws.sentiment.stream(source:Node or List<Node>, {
  key: String,
  secret: String,
  nodeProperty: String
})
YIELD value

Graph mode

CALL apoc.nlp.aws.sentiment.graph(source:Node or List<Node>, {
  key: String,
  secret: String,
  nodeProperty: String,
  writeRelationshipType: String,
  write: Boolean
})
YIELD graph

Install Dependencies

The NLP procedures have dependencies on Kotlin and client libraries that are not included in the APOC Library.

These dependencies are included in apoc-nlp-dependencies-4.1.0.11.jar, which can be downloaded from the releases page. Once that file is downloaded, it should be placed in the plugins directory and the Neo4j Server restarted.

Setting up API Key and Secret

We can generate an Access Key and Secret by following the instructions at docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html. Once we’ve done that, we can populate and execute the following commands to create parameters that contains these details.

The following define the apiKey and apiSecret parameters

:param apiKey => ("<api-key-here>");
:param apiSecret => ("<api-secret-here>");

Alternatively we can add these credentials to apoc.conf and retrieve them using the static value storage functions. See Static Value Storage.

apoc.conf

apoc.static.aws.apiKey=<api-key-here>
apoc.static.aws.apiSecret=<api-secret-here>

The following retrieves AWS credentials from apoc.conf

RETURN apoc.static.getAll("aws") AS aws;

Table 7. Results
aws
{apiKey: "<api-key-here>", apiSecret: "<api-secret-here>"}

Batching Requests

Batching requests to the AWS API and the processing of results can be done using Periodic Iterate. This approach is useful if we want to make parallel requests to the AWS API and reduce the amount of transaction state kept in memory while running procedures that write to the database.

The AWS Comprehend API processes a maximum of 25 documents in one request, so for optimal performance we should pass in lists that are a multiple of this size. Keep in mind that if we pass in big lists this will result in more transaction state when writing to the database and may cause out of memory exceptions.

The following creates an entity graph in batches of 25 nodes

CALL apoc.periodic.iterate("
  MATCH (n)
  WITH collect(n) as total
  CALL apoc.coll.partition(total, 25)
  YIELD value as nodes
  RETURN nodes", "
  CALL apoc.nlp.aws.entities.graph(nodes, {
    key: $apiKey,
    secret: $apiSecret,
    nodeProperty: 'body',
    writeRelationshipType: 'AWS_ENTITY',
    write:true
  })
  YIELD graph
  RETURN distinct 'done'", {
    batchSize: 1,
    params: { apiKey: $apiKey, apiSecret: $apiSecret }
  }
);

Examples

The examples in this section are based on the following sample graph:

CREATE (:Article {
  uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/",
  body: "These days I’m rarely more than a few feet away from my Nintendo Switch and I play board games, card games and role playing games with friends at least once or twice a week. I’ve even organised lunch-time Mario Kart 8 tournaments between the Neo4j European offices!"
});

CREATE (:Article {
  uri: "https://en.wikipedia.org/wiki/Nintendo_Switch",
  body: "The Nintendo Switch is a video game console developed by Nintendo, released worldwide in most regions on March 3, 2017. It is a hybrid console that can be used as a home console and portable device. The Nintendo Switch was unveiled on October 20, 2016. Nintendo offers a Joy-Con Wheel, a small steering wheel-like unit that a Joy-Con can slot into, allowing it to be used for racing games such as Mario Kart 8."
});

Entity Extraction

Let’s start by extracting the entities from the Article node. The text that we want to analyze is stored in the body property of the node, so we’ll need to specify that via the nodeProperty configuration parameter.

The following streams the entities for the Pokemon article

MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.aws.entities.stream(a, {
  key: $apiKey,
  secret: $apiSecret,
  nodeProperty: "body"
})
YIELD value
UNWIND value.entities AS entity
RETURN entity;

Table 8. Results
entity
{score: 0.780032217502594, endOffset: 71, text: "Nintendo Switch", type: "COMMERCIAL_ITEM", beginOffset: 56}
{score: 0.8155304193496704, endOffset: 151, text: "at least", type: "QUANTITY", beginOffset: 143}
{score: 0.7507548332214355, endOffset: 156, text: "once", type: "QUANTITY", beginOffset: 152}
{score: 0.8760746717453003, endOffset: 172, text: "twice a week", type: "QUANTITY", beginOffset: 160}
{score: 0.9944096803665161, endOffset: 217, text: "Mario Kart 8", type: "TITLE", beginOffset: 205}
{score: 0.9946564435958862, endOffset: 247, text: "Neo4j", type: "ORGANIZATION", beginOffset: 242}
{score: 0.6274040937423706, endOffset: 256, text: "European", type: "LOCATION", beginOffset: 248}

We get back 7 different entities. We could then apply a Cypher statement that creates one node per entity and an ENTITY relationship from each of those nodes back to the Article node.

The following streams the entities for the Pokemon article and then creates nodes for each entity

MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.aws.entities.stream(a, {
  key: $apiKey,
  secret: $apiSecret,
  nodeProperty: "body"
})
YIELD value
UNWIND value.entities AS entity
MERGE (e:Entity {name: entity.text})
SET e.type = entity.type
MERGE (a)-[:ENTITY]->(e)

Alternatively we can use the graph mode to automatically create the entity graph. As well as having the Entity label, each entity node will have another label based on the value of the type property. By default, a virtual graph is returned.

The following returns a virtual graph of entities for the Pokemon article

MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.aws.entities.graph(a, {
  key: $apiKey,
  secret: $apiSecret,
  nodeProperty: "body",
  writeRelationshipType: "ENTITY"
})
YIELD graph AS g
RETURN g;

We can see a Neo4j Browser visualization of the virtual graph in Pokemon entities graph.

Figure 1. Pokemon entities graph

We can compute the entities for multiple nodes by passing a list of nodes to the procedure.

The following returns a virtual graph of entities for the Pokemon and Nintendo Switch articles

MATCH (a:Article)
WITH collect(a) AS articles
CALL apoc.nlp.aws.entities.graph(articles, {
  key: $apiKey,
  secret: $apiSecret,
  nodeProperty: "body",
  writeRelationshipType: "ENTITY"
})
YIELD graph AS g
RETURN g

We can see a Neo4j Browser visualization of the virtual graph in Pokemon and Nintendo Switch entities graph.

Figure 2. Pokemon and Nintendo Switch entities graph

On this visualization we can also see the score for each entity node. This score represents the level of confidence that the API has in its detection of the entity. We can specify a minimum cut off value for the score using the scoreCutoff property.

The following returns a virtual graph of entities with a score >= 0.7 for the Pokemon and Nintendo Switch articles

MATCH (a:Article)
WITH collect(a) AS articles
CALL apoc.nlp.aws.entities.graph(articles, {
  key: $apiKey,
  secret: $apiSecret,
  nodeProperty: "body",
  scoreCutoff: 0.7,
  writeRelationshipType: "ENTITY"
})
YIELD graph AS g
RETURN g

We can see a Neo4j Browser visualization of the virtual graph in Pokemon and Nintendo Switch entities graph with confidence >= 0.7.

apoc.nlp.aws.entities multiple.graph cutoff

Figure 3. Pokemon and Nintendo Switch entities graph with confidence >= 0.7

If we’re happy with this graph and would like to persist it in Neo4j, we can do this by specifying the write: true configuration.

The following creates a HAS_ENTITY relationship from the article to each entity

MATCH (a:Article)
WITH collect(a) AS articles
CALL apoc.nlp.aws.entities.graph(articles, {
  key: $apiKey,
  secret: $apiSecret,
  nodeProperty: "body",
  scoreCutoff: 0.7,
  writeRelationshipType: "HAS_ENTITY",
  writeRelationshipProperty: "awsEntityScore",
  write: true
})
YIELD graph AS g
RETURN g;

We can then write a query to return the entities that have been created.

The following returns articles and their entities

MATCH (article:Article)
RETURN article.uri AS article,
       [(article)-[r:HAS_ENTITY]->(e:Entity) | {text: e.text, score: r.awsEntityScore}] AS entities;

Table 9. Results
article	entities
"https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"	[{score: 0.9944096803665161, text: "Mario Kart 8"}, {score: 0.8760746717453003, text: "twice a week"}, {score: 0.9946564435958862, text: "Neo4j"}, {score: 0.7507548332214355, text: "once"}, {score: 0.8155304193496704, text: "at least"}, {score: 0.780032217502594, text: "Nintendo Switch"}]
"https://en.wikipedia.org/wiki/Nintendo_Switch"	[{score: 0.9990180134773254, text: "Mario Kart 8"}, {score: 0.9997879862785339, text: "March 3, 2017"}, {score: 0.9958534240722656, text: "Nintendo"}, {score: 0.9998348355293274, text: "October 20, 2016"}, {score: 0.753325343132019, text: "Nintendo Switch"}]

Key Phrases

Let’s now extract the key phrases from the Article node. The text that we want to analyze is stored in the body property of the node, so we’ll need to specify that via the nodeProperty configuration parameter.

The following streams the key phrases for the Pokemon article

MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.aws.keyPhrases.stream(a, {
  key: $apiKey,
  secret: $apiSecret,
  nodeProperty: "body"
})
YIELD value
UNWIND value.keyPhrases AS keyPhrase
RETURN keyPhrase;

Table 10. Results
keyPhrase
{score: 0.9999966621398926, endOffset: 10, text: "These days", beginOffset: 0}
{score: 0.9867414236068726, endOffset: 42, text: "more than a few feet", beginOffset: 22}
{score: 0.9999999403953552, endOffset: 71, text: "my Nintendo Switch", beginOffset: 53}
{score: 0.9999997019767761, endOffset: 94, text: "board games", beginOffset: 83}
{score: 0.9999964237213135, endOffset: 106, text: "card games", beginOffset: 96}
{score: 0.9998161792755127, endOffset: 129, text: "role playing games", beginOffset: 111}
{score: 1.0, endOffset: 142, text: "friends", beginOffset: 135}
{score: 0.8642383217811584, endOffset: 172, text: "a week", beginOffset: 166}
{score: 0.9999430179595947, endOffset: 215, text: "lunch-time Mario Kart", beginOffset: 194}
{score: 0.9983567595481873, endOffset: 229, text: "8 tournaments", beginOffset: 216}
{score: 0.999997615814209, endOffset: 264, text: "the Neo4j European offices", beginOffset: 238}

Alternatively we can use the graph mode to automatically create a key phrase graph. One node with the KeyPhrase label will be created for each key phrase extracted.

By default, a virtual graph is returned, but the graph can be persisted by specifying the write: true configuration.

The following returns a graph of key phrases for the Pokemon article

MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.aws.keyPhrases.graph(a, {
  key: $apiKey,
  secret: $apiSecret,
  nodeProperty: "body",
  writeRelationshipType: "KEY_PHRASE",
  write: true
})
YIELD graph AS g
RETURN g;

We can see a Neo4j Browser visualization of the virtual graph in Pokemon key phrases graph.

Figure 4. Pokemon key phrases graph

We can then write a query to return the key phrases that have been created.

The following returns articles and their entities

MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
RETURN a.uri AS article,
       [(a)-[:KEY_PHRASE]->(k:KeyPhrase) | k.text] AS keyPhrases;

Table 11. Results
article	keyPhrases
"https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"	["the Neo4j European offices", "a week", "friends", "8 tournaments", "lunch-time Mario Kart", "card games", "board games", "role playing games", "my Nintendo Switch", "more than a few feet", "These days"]

Sentiment

Let’s now extract the sentiment for the Article node. The text that we want to analyze is stored in the body property of the node, so we’ll need to specify that via the nodeProperty configuration parameter.

The following streams the key phrases for the Pokemon article

MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.aws.sentiment.stream(a, {
  key: $apiKey,
  secret: $apiSecret,
  nodeProperty: "body"
})
YIELD value
RETURN value;

Table 12. Results
value
{index: 0, sentiment: "POSITIVE", sentimentScore: {neutral: 0.33138760924339294, negative: 0.0026062370743602514, mixed: 3.5950531582784606E-6, positive: 0.6660025119781494}}

Alternatively we can use the graph mode to automatically store the sentiment and its score.

By default, a virtual graph is returned, but the graph can be persisted by specifying the write: true configuration. The sentiment is stored in the sentiment property and the score for that sentiment in the sentimentScore property.

The following returns a graph with the sentiment for the Pokemon article

MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.aws.sentiment.graph(a, {
  key: $apiKey,
  secret: $apiSecret,
  nodeProperty: "body",
  write: true
})
YIELD graph AS g
UNWIND g.nodes AS node
RETURN node {.uri, .sentiment, .sentimentScore} AS node;

Table 13. Results
node
{sentiment: "Positive", sentimentScore: 0.6660025119781494, uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"}