Google Cloud Platform (GCP)
Google Cloud Platform’s Natural Language API lets users derive insights from unstructured text using Google machine learning. The procedures in this chapter act as a wrapper around calls to this API to extract entities, categories, or sentiment from text stored as node properties.
Each procedure has two modes:
-
Stream - returns a map constructed from the JSON returned from the API
-
Graph - creates a graph or virtual graph based on the values returned by the API
The procedures described in this chapter make API calls and subsequent updates to the database on the calling thread. If we want to make parallel requests to the API and avoid out of memory errors from keeping too much transaction state in memory while running procedures that write to the database, see Batching Requests. |
Procedure Overview
The procedures are described below:
Entity Extraction
The entity extraction procedures (apoc.nlp.gcp.entities.*
) are wrappers around the documents.analyzeEntities
method of the Google Natural Language API.
This API method finds named entities (currently proper names and common nouns) in the text along with entity types, salience, mentions for each entity, and other properties.
The procedures are described below:
signature |
---|
apoc.nlp.gcp.entities.stream(source :: ANY?, config = {} :: MAP?) :: (node :: NODE?, value :: MAP?, error :: MAP?) |
apoc.nlp.gcp.entities.graph(source :: ANY?, config = {} :: MAP?) :: (graph :: MAP?) |
The procedures support the following config parameters:
name | type | default | description |
---|---|---|---|
key |
String |
null |
API Key for Google Natural Language API |
nodeProperty |
String |
text |
The property on the provided node that contains the unstructured text to be analyzed |
In addition, apoc.nlp.gcp.entities.graph
supports the following config parameters:
name | type | default | description |
---|---|---|---|
scoreCutoff |
Double |
0.0 |
Lower limit for the salience score of an entity to be present in the graph. Value must be between 0 and 1. Salience is an indicator of the importance or centrality of that entity to the entire document text. Scores closer to 0 are less salient, while scores closer to 1.0 are highly salient. |
write |
Boolean |
false |
persist the graph of entities |
writeRelationshipType |
String |
ENTITY |
relationship type for relationships from source node to entity nodes |
writeRelationshipProperty |
String |
score |
relationship property for relationships from source node to entity nodes |
CALL apoc.nlp.gcp.entities.stream(source:Node or List<Node>, {
key: String,
nodeProperty: String
})
YIELD value
CALL apoc.nlp.gcp.entities.graph(source:Node or List<Node>, {
key: String,
nodeProperty: String,
scoreCutoff: Double,
writeRelationshipType: String,
writeRelationshipProperty: String,
write: Boolean
})
YIELD graph
Classification
The entity extraction procedures (apoc.nlp.gcp.classify.*
) are wrappers around the documents.classifyText
method of the Google Natural Language API.
This API method classifies a document into categories.
The procedures are described below:
signature |
---|
apoc.nlp.gcp.classify.stream(source :: ANY?, config = {} :: MAP?) :: (node :: NODE?, value :: MAP?, error :: MAP?) |
apoc.nlp.gcp.classify.graph(source :: ANY?, config = {} :: MAP?) :: (graph :: MAP?) |
The procedures support the following config parameters:
name | type | default | description |
---|---|---|---|
key |
String |
null |
API Key for Google Natural Language API |
nodeProperty |
String |
text |
The property on the provided node that contains the unstructured text to be analyzed |
In addition, apoc.nlp.gcp.classify.graph
supports the following config parameters:
name | type | default | description |
---|---|---|---|
scoreCutoff |
Double |
0.0 |
Lower limit for the confidence score of a category to be present in the graph. Value must be between 0 and 1. Confidence is a number representing how certain the classifier is that this category represents the given text. |
write |
Boolean |
false |
persist the graph of entities |
writeRelationshipType |
String |
CATEGORY |
relationship type for relationships from source node to category nodes |
writeRelationshipProperty |
String |
score |
relationship property for relationships from source node to category nodes |
CALL apoc.nlp.gcp.classify.stream(source:Node or List<Node>, {
key: String,
nodeProperty: String
})
YIELD value
CALL apoc.nlp.gcp.classify.graph(source:Node or List<Node>, {
key: String,
nodeProperty: String,
scoreCutoff: Double,
writeRelationshipType: String,
writeRelationshipProperty: String,
write: Boolean
})
YIELD graph
Install Dependencies
The NLP procedures have dependencies on Kotlin and client libraries that are not included in the APOC Library.
These dependencies are included in apoc-nlp-dependencies-4.1.0.11.jar, which can be downloaded from the releases page.
Once that file is downloaded, it should be placed in the plugins
directory and the Neo4j Server restarted.
Setting up API Key
We can generate an API Key that has access to the Cloud Natural Language API by going to console.cloud.google.com/apis/credentials. Once we’ve created a key, we can populate and execute the following command to create a parameter that contains these details.
apiKey
parameter:param apiKey => ("<api-key-here>")
Alternatively we can add these credentials to apoc.conf
and load them using the static value storage functions.
See Static Value Storage.
apoc.static.gcp.apiKey=<api-key-here>
apoc.conf
RETURN apoc.static.getAll("gcp") AS gcp;
gcp |
---|
{apiKey: "<api-key-here>"} |
Batching Requests
Batching requests to the GCP API and the processing of results can be done using Periodic Iterate. This approach is useful if we want to make parallel requests to the GCP API and reduce the amount of transaction state kept in memory while running procedures that write to the database.
CALL apoc.periodic.iterate("
MATCH (n)
WITH collect(n) as total
CALL apoc.coll.partition(total, 25)
YIELD value as nodes
RETURN nodes", "
CALL apoc.nlp.gcp.entities.graph(nodes, {
key: $apiKey,
nodeProperty: 'body',
writeRelationshipType: 'GCP_ENTITY',
write:true
})
YIELD graph
RETURN distinct 'done'", {
batchSize: 1,
params: { apiKey: $apiKey }
}
);
Examples
The examples in this section are based on the following sample graph:
CREATE (:Article {
uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/",
body: "These days I’m rarely more than a few feet away from my Nintendo Switch and I play board games, card games and role playing games with friends at least once or twice a week. I’ve even organised lunch-time Mario Kart 8 tournaments between the Neo4j European offices!"
});
CREATE (:Article {
uri: "https://en.wikipedia.org/wiki/Nintendo_Switch",
body: "The Nintendo Switch is a video game console developed by Nintendo, released worldwide in most regions on March 3, 2017. It is a hybrid console that can be used as a home console and portable device. The Nintendo Switch was unveiled on October 20, 2016. Nintendo offers a Joy-Con Wheel, a small steering wheel-like unit that a Joy-Con can slot into, allowing it to be used for racing games such as Mario Kart 8."
});
Entity Extraction
Let’s start by extracting the entities from the Article node.
The text that we want to analyze is stored in the body
property of the node, so we’ll need to specify that via the nodeProperty
configuration parameter.
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.gcp.entities.stream(a, {
key: $apiKey,
nodeProperty: "body"
})
YIELD value
UNWIND value.entities AS entity
RETURN entity;
entity |
---|
{name: "card games", salience: 0.17967656, metadata: {}, type: "CONSUMER_GOOD", mentions: [{type: "COMMON", text: {content: "card games", beginOffset: -1}}]} |
{name: "role playing games", salience: 0.16441391, metadata: {}, type: "OTHER", mentions: [{type: "COMMON", text: {content: "role playing games", beginOffset: -1}}]} |
{name: "Switch", salience: 0.143287, metadata: {}, type: "OTHER", mentions: [{type: "COMMON", text: {content: "Switch", beginOffset: -1}}]} |
{name: "friends", salience: 0.13336793, metadata: {}, type: "PERSON", mentions: [{type: "COMMON", text: {content: "friends", beginOffset: -1}}]} |
{name: "Nintendo", salience: 0.12601112, metadata: {mid: "/g/1ymzszlpz"}, type: "ORGANIZATION", mentions: [{type: "PROPER", text: {content: "Nintendo", beginOffset: -1}}]} |
{name: "board games", salience: 0.08861496, metadata: {}, type: "CONSUMER_GOOD", mentions: [{type: "COMMON", text: {content: "board games", beginOffset: -1}}]} |
{name: "tournaments", salience: 0.0603245, metadata: {}, type: "EVENT", mentions: [{type: "COMMON", text: {content: "tournaments", beginOffset: -1}}]} |
{name: "offices", salience: 0.034420907, metadata: {}, type: "LOCATION", mentions: [{type: "COMMON", text: {content: "offices", beginOffset: -1}}]} |
{name: "Mario Kart 8", salience: 0.029095741, metadata: {wikipedia_url: "https://en.wikipedia.org/wiki/Mario_Kart_8", mid: "/m/0119mf7q"}, type: "PERSON", mentions: [{type: "PROPER", text: {content: "Mario Kart 8", beginOffset: -1}}]} |
{name: "European", salience: 0.020393685, metadata: {mid: "/m/02j9z", wikipedia_url: "https://en.wikipedia.org/wiki/Europe"}, type: "LOCATION", mentions: [{type: "PROPER", text: {content: "European", beginOffset: -1}}]} |
{name: "Neo4j", salience: 0.020393685, metadata: {mid: "/m/0b76t3s", wikipedia_url: "https://en.wikipedia.org/wiki/Neo4j"}, type: "ORGANIZATION", mentions: [{type: "PROPER", text: {content: "Neo4j", beginOffset: -1}}]} |
{name: "8", salience: 0, metadata: {value: "8"}, type: "NUMBER", mentions: [{type: "TYPE_UNKNOWN", text: {content: "8", beginOffset: -1}}]} |
We get back 12 different entities.
We could then apply a Cypher statement that creates one node per entity and an ENTITY
relationship from each of those nodes back to the Article
node.
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.gcp.entities.stream(a, {
key: $apiKey,
nodeProperty: "body"
})
YIELD value
UNWIND value.entities AS entity
MERGE (e:Entity {name: entity.name})
SET e.type = entity.type
MERGE (a)-[:ENTITY]->(e)
Alternatively we can use the graph mode to automatically create the entity graph.
As well as having the Entity
label, each entity node will have another label based on the value of the type
property.
By default a virtual graph is returned.
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.gcp.entities.graph(a, {
key: $apiKey,
nodeProperty: "body",
writeRelationshipType: "ENTITY"
})
YIELD graph AS g
RETURN g;
We can see a Neo4j Browser visualization of the virtual graph in Pokemon entities graph.
We can compute the entities for multiple nodes by passing a list of nodes to the procedure.
MATCH (a:Article)
WITH collect(a) AS articles
CALL apoc.nlp.gcp.entities.graph(articles, {
key: $apiKey,
nodeProperty: "body",
writeRelationshipType: "ENTITY"
})
YIELD graph AS g
RETURN g;
We can see a Neo4j Browser visualization of the virtual graph in Pokemon and Nintendo Switch entities graph.
On this visualization we can also see the score for each entity node.
This score represents importance of that entity in the entire document.
We can specify a minimum cut off value for the score using the scoreCutoff
property.
MATCH (a:Article)
WITH collect(a) AS articles
CALL apoc.nlp.gcp.entities.graph(articles, {
key: $apiKey,
nodeProperty: "body",
writeRelationshipType: "ENTITY",
scoreCutoff: 0.01
})
YIELD graph AS g
RETURN g;
We can see a Neo4j Browser visualization of the virtual graph in Pokemon and Nintendo Switch entities graph with importance >= 0.01.
If we’re happy with this graph and would like to persist it in Neo4j, we can do this by specifying the write: true
configuration.
HAS_ENTITY
relationship from the article to each entityMATCH (a:Article)
WITH collect(a) AS articles
CALL apoc.nlp.gcp.entities.graph(articles, {
key: $apiKey,
nodeProperty: "body",
scoreCutoff: 0.01,
writeRelationshipType: "HAS_ENTITY",
writeRelationshipProperty: "gcpEntityScore",
write: true
})
YIELD graph AS g
RETURN g;
We can then write a query to return the entities that have been created.
MATCH (article:Article)
RETURN article.uri AS article,
[(article)-[r:HAS_ENTITY]->(e) | {entity: e.text, score: r.gcpEntityScore}] AS entities;
article | entities |
---|---|
"https://neo4j.com/blog/pokegraph-gotta-graph-em-all/" |
[{score: 0.020393685, entity: "Neo4j"}, {score: 0.034420907, entity: "offices"}, {score: 0.0603245, entity: "tournaments"}, {score: 0.020393685, entity: "European"}, {score: 0.029095741, entity: "Mario Kart 8"}, {score: 0.12601112, entity: "Nintendo"}, {score: 0.13336793, entity: "friends"}, {score: 0.08861496, entity: "board games"}, {score: 0.143287, entity: "Switch"}, {score: 0.16441391, entity: "role playing games"}, {score: 0.17967656, entity: "card games"}] |
"https://en.wikipedia.org/wiki/Nintendo_Switch" |
[{score: 0.76108575, entity: "Nintendo Switch"}, {score: 0.07424594, entity: "Nintendo"}, {score: 0.015900765, entity: "home console"}, {score: 0.012772448, entity: "device"}, {score: 0.038113687, entity: "regions"}, {score: 0.07299799, entity: "Joy-Con Wheel"}] |
Classification
Now let’s extract categories from the Article node.
The text that we want to analyze is stored in the body
property of the node, so we’ll need to specify that via the nodeProperty
configuration parameter.
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.gcp.classify.stream(a, {
key: $apiKey,
nodeProperty: "body"
})
YIELD value
UNWIND value.categories AS category
RETURN category;
category |
---|
{name: "/Games", confidence: 0.91} |
We get back only one category
We could then apply a Cypher statement that creates one node per category and a CATEGORY
relationship from each of those nodes back to the Article
node.
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.gcp.classify.stream(a, {
key: $apiKey,
nodeProperty: "body"
})
YIELD value
UNWIND value.categories AS category
MERGE (c:Category {name: category.name})
MERGE (a)-[:CATEGORY]->(c)
Alternatively we can use the graph mode to automatically create the category graph.
As well as having the Category
label, each category node will have another label based on the value of the type
property.
By default, a virtual graph is returned.
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.gcp.classify.graph(a, {
key: $apiKey,
nodeProperty: "body",
writeRelationshipType: "CATEGORY"
})
YIELD graph AS g
RETURN g;
We can see a Neo4j Browser visualization of the virtual graph in Pokemon categories graph.
HAS_CATEGORY
relationship from the article to each entityMATCH (a:Article)
WITH collect(a) AS articles
CALL apoc.nlp.gcp.classify.graph(articles, {
key: $apiKey,
nodeProperty: "body",
writeRelationshipType: "HAS_CATEGORY",
writeRelationshipProperty: "gcpCategoryScore",
write: true
})
YIELD graph AS g
RETURN g;
We can then write a query to return the entities that have been created.
MATCH (article:Article)
RETURN article.uri AS article,
[(article)-[r:HAS_CATEGORY]->(c) | {category: c.text, score: r.gcpCategoryScore}] AS categories;
article | categories |
---|---|
"https://neo4j.com/blog/pokegraph-gotta-graph-em-all/" |
[{category: "/Games", score: 0.91}] |
"https://en.wikipedia.org/wiki/Nintendo_Switch" |
[{category: "/Computers & Electronics/Consumer Electronics/Game Systems & Consoles", score: 0.99}, {category: "/Games/Computer & Video Games", score: 0.99}] |