Microsoft Azure Cognitive Services
The Microsoft Azure Cognitive Services API uses machine learning to find insights and relationships in text. The procedures in this chapter act as a wrapper around calls to this API to extract entities and key phrases and provide sentiment analysis from text stored as node properties.
Each procedure has two modes:
-
Stream - returns a map constructed from the JSON returned from the API
Procedure Overview
The procedures are described below:
Entity Extraction
The entity extraction procedures (apoc.nlp.azure.entities.*
) are wrappers around the Entities end point of the Azure Text Analytics API.
This API method returns a list of known entities and general named entities ("Person", "Location", "Organization" etc) in a given document.
The procedures are described below:
signature |
---|
apoc.nlp.azure.entities.graph(source :: ANY?, config = {} :: MAP?) :: (graph :: MAP?) |
apoc.nlp.azure.entities.stream(source :: ANY?, config = {} :: MAP?) :: (node :: NODE?, value :: MAP?, error :: MAP?) |
The procedure supports the following config parameters:
name | type | default | description |
---|---|---|---|
key |
String |
null |
Microsoft.CognitiveServicesTextAnalytics API Key |
url |
String |
null |
Microsoft.CognitiveServicesTextAnalytics Endpoint |
nodeProperty |
String |
text |
The property on the provided node that contains the unstructured text to be analyzed |
In addition, apoc.nlp.azure.entities.graph
supports the following config parameters:
name | type | default | description |
---|---|---|---|
scoreCutoff |
Double |
0.0 |
Lower limit for the score of an entity to be present in the graph. Value must be between 0 and 1. Score is an indicator of the level of confidence that Amazon Comprehend has in the accuracy of the detection. |
write |
Boolean |
false |
persist the graph of entities |
writeRelationshipType |
String |
ENTITY |
relationship type for relationships from source node to entity nodes |
writeRelationshipProperty |
String |
score |
relationship property for relationships from source node to entity nodes |
CALL apoc.nlp.azure.entities.stream(source:Node or List<Node>, {
key: String,
url: String,
nodeProperty: String
})
YIELD value
CALL apoc.nlp.azure.entities.graph(source:Node or List<Node>, {
key: String,
url: String,
nodeProperty: String,
scoreCutoff: Double,
writeRelationshipType: String,
writeRelationshipProperty: String,
write: Boolean
})
YIELD graph
Key Phrases
The key phrase procedures (apoc.nlp.azure.keyPhrases.*
) are wrappers around the Key Phrases end point of the Azure Text Analytics API.
A key phrase is a key talking point in the input text.
The procedure is described below:
signature |
---|
apoc.nlp.azure.keyPhrases.graph(source :: ANY?, config = {} :: MAP?) :: (graph :: MAP?) |
apoc.nlp.azure.keyPhrases.stream(source :: ANY?, config = {} :: MAP?) :: (node :: NODE?, value :: MAP?, error :: MAP?) |
The procedure support the following config parameters:
name | type | default | description |
---|---|---|---|
key |
String |
null |
Microsoft.CognitiveServicesTextAnalytics API Key |
url |
String |
null |
Microsoft.CognitiveServicesTextAnalytics Endpoint |
nodeProperty |
String |
text |
The property on the provided node that contains the unstructured text to be analyzed |
In addition, apoc.nlp.azure.keyPhrases.graph
supports the following config parameters:
name | type | default | description |
---|---|---|---|
write |
Boolean |
false |
persist the graph of key phrases |
writeRelationshipType |
String |
KEY_PHRASE |
relationship type for relationships from source node to key phrase nodes |
CALL apoc.nlp.azure.keyPhrases.stream(source:Node or List<Node>, {
key: String,
url: String,
nodeProperty: String
})
YIELD value
CALL apoc.nlp.azure.keyPhrases.graph(source:Node or List<Node>, {
key: String,
url: String,
nodeProperty: String,
writeRelationshipType: String,
write: Boolean
})
YIELD graph
Sentiment
The sentiment procedures (apoc.nlp.azure.sentiment.*
) are wrappers around the Sentiment end point of the Azure Text Analytics API.
The API returns a numeric score between 0 and 1.
Scores close to 1 indicate positive sentiment, while scores close to 0 indicate negative sentiment.
A score of 0.5 indicates the lack of sentiment (e.g. a factoid statement).
The procedures are described below:
signature |
---|
apoc.nlp.azure.sentiment.graph(source :: ANY?, config = {} :: MAP?) :: (graph :: MAP?) |
apoc.nlp.azure.sentiment.stream(source :: ANY?, config = {} :: MAP?) :: (node :: NODE?, value :: MAP?, error :: MAP?) |
The procedures support the following config parameters:
name | type | default | description |
---|---|---|---|
key |
String |
null |
Microsoft.CognitiveServicesTextAnalytics API Key |
url |
String |
null |
Microsoft.CognitiveServicesTextAnalytics Endpoint |
nodeProperty |
String |
text |
The property on the provided node that contains the unstructured text to be analyzed |
In addition, apoc.nlp.azure.sentiment.graph
supports the following config parameters:
name | type | default | description |
---|---|---|---|
write |
Boolean |
false |
persist the graph of sentiment |
CALL apoc.nlp.azure.sentiment.stream(source:Node or List<Node>, {
key: String,
url: String,
nodeProperty: String
})
YIELD value
CALL apoc.nlp.azure.sentiment.graph(source:Node or List<Node>, {
key: String,
url: String,
nodeProperty: String,
write: Boolean
})
YIELD graph
Install Dependencies
The NLP procedures have dependencies on Kotlin and client libraries that are not included in the APOC Library.
These dependencies are included in apoc-nlp-dependencies-4.1.0.11.jar, which can be downloaded from the releases page.
Once that file is downloaded, it should be placed in the plugins
directory and the Neo4j Server restarted.
Setting up API Key and URL
We can generate an API key and URL by following the instructions in the Quickstart: Use the Text Analytics client library article. Once we’ve done that, we should be able to see a page listing our credentials, similar to the screenshot below:
In this case our API URL is https://neo4j-nlp-text-analytics.cognitiveservices.azure.com/
, and we can use either of the hidden keys.
Let’s populate and execute the following commands to create parameters that contains these details.
apiKey
and apiSecret
parameters:param apiKey => ("<api-key-here>");
:param apiUrl => ("<api-url-here>");
Alternatively we can add these credentials to apoc.conf
and retrieve them using the static value storage functions.
See Static Value Storage
apoc.static.azure.apiKey=<api-key-here>
apoc.static.azure.apiUrl=<api-url-here>
apoc.conf
RETURN apoc.static.getAll("azure") AS azure;
azure |
---|
{apiKey: "<api-key-here>", apiUrl: "<api-url-here>"} |
Examples
The examples in this section are based on the following sample graph:
CREATE (:Article {
uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/",
body: "These days I’m rarely more than a few feet away from my Nintendo Switch and I play board games, card games and role playing games with friends at least once or twice a week. I’ve even organised lunch-time Mario Kart 8 tournaments between the Neo4j European offices!"
});
CREATE (:Article {
uri: "https://en.wikipedia.org/wiki/Nintendo_Switch",
body: "The Nintendo Switch is a video game console developed by Nintendo, released worldwide in most regions on March 3, 2017. It is a hybrid console that can be used as a home console and portable device. The Nintendo Switch was unveiled on October 20, 2016. Nintendo offers a Joy-Con Wheel, a small steering wheel-like unit that a Joy-Con can slot into, allowing it to be used for racing games such as Mario Kart 8."
});
Entity Extraction
Let’s start by extracting the entities from one of the Article
nodes.
The text that we want to analyze is stored in the body
property of the node, so we’ll need to specify that via the nodeProperty
configuration parameter.
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.azure.entities.stream(a, {
key: $apiKey,
url: $apiUrl,
nodeProperty: "body"
})
YIELD value
UNWIND value.entities AS entity
RETURN entity;
entity |
---|
{name: "Nintendo Switch", wikipediaId: "Nintendo Switch", type: "Other", matches: [{length: 15, text: "Nintendo Switch", wikipediaScore: 0.8339868065025469, offset: 56}], bingId: "b3d617ef-81fc-4188-9a2b-a5cf1f8534b5", wikipediaLanguage: "en", wikipediaUrl: "https://en.wikipedia.org/wiki/Nintendo_Switch"} |
{name: "Nintendo Switch", type: "Organization", matches: [{length: 15, entityTypeScore: 0.94, text: "Nintendo Switch", offset: 56}]} |
{name: "Oberon Media", wikipediaId: "Oberon Media", type: "Organization", matches: [{length: 6, text: "I play", wikipediaScore: 0.032446316016667254, offset: 76}], bingId: "166f6e0f-33b7-8707-bb8b-5a932c498333", wikipediaLanguage: "en", wikipediaUrl: "https://en.wikipedia.org/wiki/Oberon_Media"} |
{name: "a week", subType: "Duration", type: "DateTime", matches: [{length: 6, entityTypeScore: 0.8, text: "a week", offset: 166}]} |
{name: "Mario Kart 8", wikipediaId: "Mario Kart 8", type: "Other", matches: [{length: 12, text: "Mario Kart 8", wikipediaScore: 0.7802000593632747, offset: 205}], bingId: "ce6f55ec-d3d7-032a-0bf8-15ad3e8df3f4", wikipediaLanguage: "en", wikipediaUrl: "https://en.wikipedia.org/wiki/Mario_Kart_8"} |
{name: "Mario Kart", type: "Organization", matches: [{length: 10, entityTypeScore: 0.72, text: "Mario Kart", offset: 205}]} |
{name: "8", subType: "Number", type: "Quantity", matches: [{length: 1, entityTypeScore: 0.8, text: "8", offset: 216}]} |
{name: "Neo4j", wikipediaId: "Neo4j", type: "Other", matches: [{length: 5, text: "Neo4j", wikipediaScore: 0.8150388253887939, offset: 242}], bingId: "bc2f436b-8edd-6ba6-b2d3-69901348d653", wikipediaLanguage: "en", wikipediaUrl: "https://en.wikipedia.org/wiki/Neo4j"} |
{name: "Europe", wikipediaId: "Europe", type: "Location", matches: [{length: 8, text: "European", wikipediaScore: 0.00591759926701263, offset: 248}], bingId: "501457aa-5b70-cfba-cfd8-be882b4bac1e", wikipediaLanguage: "en", wikipediaUrl: "https://en.wikipedia.org/wiki/Europe"} |
We get back 9 different entities, although we can see that some of them are referring to the same things, albeit with different type
values.
We could then apply a Cypher statement that creates one node per entity and an ENTITY
relationship from each of those nodes back to the Article
node.
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.azure.entities.stream(a, {
key: $apiKey,
url: $apiUrl,
nodeProperty: "body"
})
YIELD value
UNWIND value.entities AS entity
WITH a, entity.name AS entity, collect(entity.type) AS types
MERGE (e:Entity {name: entity})
SET e.type = types
MERGE (a)-[:ENTITY]->(e);
Alternatively we can use the graph mode to automatically create the entity graph.
As well as having the Entity
label, each entity node will have another label based on the value of the type
property.
By default, a virtual graph is returned.
MATCH (a:Article)
WITH collect(a) AS articles
CALL apoc.nlp.azure.entities.graph(articles, {
key: $apiKey,
url: $apiUrl,
nodeProperty: "body",
writeRelationshipType: "ENTITY"
})
YIELD graph AS g
RETURN g
We can see a Neo4j Browser visualization of the virtual graph in Pokemon and Nintendo Switch entities graph.
On this visualization we can also see the score for each entity node.
This score represents the level of confidence that the API has in its detection of the entity.
We can specify a minimum cut off value for the score using the scoreCutoff
property.
MATCH (a:Article)
WITH collect(a) AS articles
CALL apoc.nlp.azure.entities.graph(articles, {
key: $apiKey,
url: $apiUrl,
nodeProperty: "body",
scoreCutoff: 0.7,
writeRelationshipType: "ENTITY"
})
YIELD graph AS g
RETURN g
We can see a Neo4j Browser visualization of the virtual graph in Pokemon and Nintendo Switch entities graph with confidence >= 0.7.
If we’re happy with this graph and would like to persist it in Neo4j, we can do this by specifying the write: true
configuration.
HAS_ENTITY
relationship from the article to each entityMATCH (a:Article)
WITH collect(a) AS articles
CALL apoc.nlp.azure.entities.graph(articles, {
key: $apiKey,
url: $apiUrl,
nodeProperty: "body",
scoreCutoff: 0.7,
writeRelationshipType: "HAS_ENTITY",
writeRelationshipProperty: "azureEntityScore",
write: true
})
YIELD graph AS g
RETURN g;
We can then write a query to return the entities that have been created.
MATCH (article:Article)
RETURN article.uri AS article,
[(article)-[r:HAS_ENTITY]->(e:Entity) | {text: e.text, score: r.azureEntityScore}] AS entities;
article | entities |
---|---|
"https://neo4j.com/blog/pokegraph-gotta-graph-em-all/" |
[{score: 0.72, text: "Mario Kart"}, {score: 0.7802000593632747, text: "Mario Kart 8"}, {score: 0.8, text: "8"}, {score: 0.8, text: "a week"}, {score: 0.94, text: "Nintendo Switch"}, {score: 0.8150388253887939, text: "Neo4j"}] |
"https://en.wikipedia.org/wiki/Nintendo_Switch" |
[{score: 0.9023679924293266, text: "Joy-Con"}, {score: 0.98, text: "Nintendo"}, {score: 0.8, text: "March 3, 2017"}, {score: 0.9355623498560008, text: "Nintendo Switch"}, {score: 0.92, text: "Mario Kart"}, {score: 0.8, text: "8"}, {score: 0.8863202650046607, text: "Mario Kart 8"}, {score: 0.8, text: "October 20, 2016"}] |
Key Phrases
Let’s now extract the key phrases from the Article node.
The text that we want to analyze is stored in the body
property of the node, so we’ll need to specify that via the nodeProperty
configuration parameter.
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.azure.keyPhrases.stream(a, {
key: $apiKey,
url: $apiUrl,
nodeProperty: "body"
})
YIELD value
UNWIND value.keyPhrases AS keyPhrase
RETURN keyPhrase;
keyPhrase |
---|
"board games" |
"card games" |
"tournaments" |
"role" |
"organised lunch-time Mario Kart" |
"Neo4j European offices" |
"Nintendo Switch" |
"friends" |
"feet" |
"days" |
Alternatively we can use the graph mode to automatically create a key phrase graph.
One node with the KeyPhrase
label will be created for each key phrase extracted.
By default, a virtual graph is returned, but the graph can be persisted by specifying the write: true
configuration.
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.azure.keyPhrases.graph(a, {
key: $apiKey,
url: $apiUrl,
nodeProperty: "body",
writeRelationshipType: "KEY_PHRASE",
write: true
})
YIELD graph AS g
RETURN g;
We can see a Neo4j Browser visualization of the virtual graph in Pokemon key phrases graph.
We can then write a query to return the key phrases that have been created.
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
RETURN a.uri AS article,
[(a)-[:KEY_PHRASE]->(k:KeyPhrase) | k.text] AS keyPhrases;
article | keyPhrases |
---|---|
"https://neo4j.com/blog/pokegraph-gotta-graph-em-all/" |
["card games", "board games", "friends", "feet", "Nintendo Switch", "days", "organised lunch-time Mario Kart", "tournaments", "Neo4j European offices", "role"] |
Sentiment
Let’s now extract the sentiment for the Article node.
The text that we want to analyze is stored in the body
property of the node, so we’ll need to specify that via the nodeProperty
configuration parameter.
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.azure.sentiment.stream(a, {
key: $apiKey,
url: $apiUrl,
nodeProperty: "body"
})
YIELD value
RETURN value;
value |
---|
{score: 0.5, id: "0"} |
Alternatively we can use the graph mode to automatically store the sentiment and its score.
By default, a virtual graph is returned, but the graph can be persisted by specifying the write: true
configuration.
The sentiment score is stored in the sentimentScore
property.
MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.azure.sentiment.graph(a, {
key: $apiKey,
url: $apiUrl,
nodeProperty: "body",
write: true
})
YIELD graph AS g
UNWIND g.nodes AS node
RETURN node {.uri, .sentimentScore} AS node;
node |
---|
{uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/", sentimentScore: 0.5} |