One Hot Encoding

This section describes the One Hot Encoding function in the Neo4j Graph Data Science library.

The One Hot Encoding function is used to convert categorical data into a numerical format that can be used by Machine Learning libraries.

This algorithm is in the alpha tier. For more information on algorithm tiers, see Algorithms.

1. One Hot Encoding sample

One hot encoding will return a list equal to the length of the available values. In the list, selected values are represented by 1, and unselected values are represented by 0.

The following will run the algorithm on hardcoded lists:
RETURN gds.alpha.ml.oneHotEncoding(['Chinese', 'Indian', 'Italian'], ['Italian']) AS embedding
Table 1. Results
embedding

[0,0,1]

The following will create a sample graph:
CREATE (french:Cuisine {name:'French'}),
       (italian:Cuisine {name:'Italian'}),
       (indian:Cuisine {name:'Indian'}),

       (zhen:Person {name: "Zhen"}),
       (praveena:Person {name: "Praveena"}),
       (michael:Person {name: "Michael"}),
       (arya:Person {name: "Arya"}),

       (praveena)-[:LIKES]->(indian),
       (zhen)-[:LIKES]->(french),
       (michael)-[:LIKES]->(french),
       (michael)-[:LIKES]->(italian)
The following will return a one hot encoding for each user and the types of cuisine that they like:
MATCH (cuisine:Cuisine)
WITH cuisine
  ORDER BY cuisine.name
WITH collect(cuisine) AS cuisines
MATCH (p:Person)
RETURN p.name AS name, gds.alpha.ml.oneHotEncoding(cuisines, [(p)-[:LIKES]->(cuisine) | cuisine]) AS embedding
  ORDER BY name
Table 2. Results
name embedding

Arya

[0,0,0]

Michael

[1,0,1]

Praveena

[0,1,0]

Zhen

[1,0,0]

Table 3. Parameters
Name Type Default Optional Description

availableValues

list

null

yes

The available values. If null, the function will return an empty list.

selectedValues

list

null

yes

The selected values. If null, the function will return a list of all 0’s.

Table 4. Results
Type Description

list

One hot encoding of the selected values.