apoc.load.html

Procedure APOC Full

apoc.load.html('url',{name: jquery, name2: jquery}, config) YIELD value - Load Html page and return the result as a Map

Signature

apoc.load.html(url :: STRING?, query = {} :: MAP?, config = {} :: MAP?) :: (value :: MAP?)

Input parameters

Name Type Default

url

STRING?

null

query

MAP?

{}

config

MAP?

{}

Config parameters

The procedure support the following config parameters:

Table 1. Config parameters
name type default description

charset

String

"UTF-8"

the character set of the page being scraped

baseUri

String

""

Base URI used to resolve relative paths

Output parameters

Name Type

value

MAP?

Usage Examples

We can extract the metadata and h2 heading from the Wikipedia home page, by running the following query:

CALL apoc.load.html("https://en.wikipedia.org/",{metadata:"meta", h2:"h2"});
Table 2. Results
Output
{
   "metadata":[
      {
         "tagName":"meta",
         "attributes":{
            "charset":"UTF-8"
         }
      },
      {
         "tagName":"meta",
         "attributes":{
            "name":"ResourceLoaderDynamicStyles"
         }
      },
      {
         "tagName":"meta",
         "attributes":{
            "name":"generator",
            "content":"MediaWiki 1.36.0-wmf.16"
         }
      },
      {
         "tagName":"meta",
         "attributes":{
            "name":"referrer",
            "content":"origin"
         }
      },
      {
         "tagName":"meta",
         "attributes":{
            "name":"referrer",
            "content":"origin-when-crossorigin"
         }
      },
      {
         "tagName":"meta",
         "attributes":{
            "name":"referrer",
            "content":"origin-when-cross-origin"
         }
      },
      {
         "tagName":"meta",
         "attributes":{
            "property":"og:image",
            "content":"https://upload.wikimedia.org/wikipedia/commons/1/1c/Orion_pulse_unit_%28transparent%29.png"
         }
      }
   ],
   "h2":[
      {
         "attributes":{
            "class":"mp-h2",
            "id":"mp-tfa-h2"
         },
         "text":"From today's featured article",
         "tagName":"h2"
      },
      {
         "attributes":{
            "class":"mp-h2",
            "id":"mp-dyk-h2"
         },
         "text":"Did you know ...",
         "tagName":"h2"
      },
      {
         "attributes":{
            "class":"mp-h2",
            "id":"mp-itn-h2"
         },
         "text":"In the news",
         "tagName":"h2"
      },
      {
         "attributes":{
            "class":"mp-h2",
            "id":"mp-otd-h2"
         },
         "text":"On this day",
         "tagName":"h2"
      },
      {
         "attributes":{
            "class":"mp-h2",
            "id":"mp-tfl-h2"
         },
         "text":"From today's featured list",
         "tagName":"h2"
      },
      {
         "attributes":{
            "class":"mp-h2",
            "id":"mp-tfp-h2"
         },
         "text":"Today's featured picture",
         "tagName":"h2"
      },
      {
         "attributes":{
            "class":"mp-h2",
            "id":"mp-other"
         },
         "text":"Other areas of Wikipedia",
         "tagName":"h2"
      },
      {
         "attributes":{
            "class":"mp-h2",
            "id":"mp-sister"
         },
         "text":"Wikipedia's sister projects",
         "tagName":"h2"
      },
      {
         "attributes":{
            "class":"mp-h2",
            "id":"mp-lang"
         },
         "text":"Wikipedia languages",
         "tagName":"h2"
      },
      {
         "tagName":"h2",
         "text":"Navigation menu"
      }
   ]
}