apoc.load.html
Procedure APOC Full
apoc.load.html('url',{name: jquery, name2: jquery}, config) YIELD value - Load Html page and return the result as a Map
Signature
apoc.load.html(url :: STRING?, query = {} :: MAP?, config = {} :: MAP?) :: (value :: MAP?)
Config parameters
The procedure support the following config parameters:
name | type | default | description |
---|---|---|---|
charset |
String |
"UTF-8" |
the character set of the page being scraped |
baseUri |
String |
"" |
Base URI used to resolve relative paths |
failSilently |
Enum [FALSE, WITH_LOG, WITH_LIST] |
FALSE |
If the parse fails with one or more elements, using |
Usage Examples
We can extract the metadata and h2 heading from the Wikipedia home page, by running the following query:
CALL apoc.load.html("https://en.wikipedia.org/",{metadata:"meta", h2:"h2"});
Output |
---|
|
Let’s suppose we have a test.html
file like this:
<!DOCTYPE html>
<html class="client-nojs" lang="en" dir="ltr">
<h6 i d="error">test</h6>
<h6 id="correct">test</h6>
</html>
We can handle the parse error caused by i d
through failSilently
configuration.
So, we can execute:
CALL apoc.load.html("test.html",{h6:"h6"});
Failed to invoke procedure apoc.load.html : Caused by: java.lang.RuntimeException: Error during parsing element: <h6 i d="error">test</h6> |
---|
or with failSilently WITH_LIST
:
CALL apoc.load.html("test.html",{h6:"h6"}, {failSilently: 'WITH_LIST'});
Output |
---|
|
or with failSilently WITH_LOG
(note that will be created a log.warn("Error during parsing element: <h6 i d="error">test</h6>")
):
CALL apoc.load.html("test.html",{h6:"h6"}, {failSilently: 'WITH_LOG'});
Output |
---|
|