Scale Properties

This feature is in the alpha tier. For more information on feature tiers, see API Tiers.

1. Introduction

The Scale Properties algorithm is a utility algorithm that is used to pre-process node properties for model training or post-process algorithm results such as PageRank scores. It scales the node properties based on the specified scaler. Multiple properties can be scaled at once and are returned in a list property.

The input properties must be numbers or lists of numbers. The lists must all have the same size. The output property will always be a list. The size of the output list is equal to the sum of length of the input properties. That is, if the input properties are two scalar numeric properties and one list property of length three, the output list will have a total length of five.

There are a number of supported scalers for the Scale Properties algorithm. These can be configured using the `scaler` configuration parameter.

List properties are scaled index-by-index. See the list example for more details.

In the following equations, `p` denotes the vector containing all property values for a single property across all nodes in the graph.

1.1. Min-max scaler

Scales all property values into the range `[0, 1]` where the minimum value(s) get the scaled value `0` and the maximum value(s) get the scaled value `1`, according to this formula:

1.2. Max scaler

Scales all property values into the range `[-1, 1]` where the absolute maximum value(s) get the scaled value `1`, according to this formula:

1.3. Mean scaler

Scales all property values into the range `[-1, 1]` where the average value(s) get the scaled value `0`.

1.4. Log scaler

Transforms all property values using the natural logarithm.

1.5. Standard Score

Scales all property values using the Standard Score (Wikipedia).

1.6. Center

Transforms all properties by subtracting the mean.

1.7. L1 Norm

Scales all property values into the range `[0.0, 1.0]`.

1.8. L2 Norm

Scales all property values using the L2 Norm (Wikipedia).

2. Syntax

This section covers the syntax used to execute the Scale Properties algorithm in each of its execution modes. We are describing the named graph variant of the syntax. To learn more about general syntax variants, see Syntax overview.

Scale Properties syntax per mode
Run Scale Properties in stream mode on a named graph.
``````CALL gds.alpha.scaleProperties.stream(
graphName: String,
configuration: Map
) YIELD
nodeId: Integer,
scaledProperty: List of Float``````
Table 1. Parameters
Name Type Default Optional Description

graphName

String

`n/a`

no

The name of a graph stored in the catalog.

configuration

Map

`{}`

yes

Configuration for algorithm-specifics and/or graph filtering.

Table 2. Configuration
Name Type Default Optional Description

nodeLabels

List of String

`['*']`

yes

Filter the named graph using the given node labels.

relationshipTypes

List of String

`['*']`

yes

Filter the named graph using the given relationship types.

concurrency

Integer

`4`

yes

The number of concurrent threads used for running the algorithm.

jobId

String

`Generated internally`

yes

An ID that can be provided to more easily track the algorithm’s progress.

logProgress

Boolean

`true`

yes

If disabled the progress percentage will not be logged.

nodeProperties

List of String

`n/a`

no

The names of the node properties that are to be scaled. All property names must exist in the projected graph.

scaler

String

`n/a`

no

The name of the scaler applied for the properties. Supported values are `MinMax`, `Max`, `Mean`, `Log`, `L1Norm`, `L2Norm` and `StdScore`.

Table 3. Results
Name Type Description

nodeId

Integer

Node ID.

scaledProperty

List of Float

Scaled values for each input node property.

Run Scale Properties in mutate mode on a named graph.
``````CALL gds.alpha.scaleProperties.mutate(
graphName: String,
configuration: Map
) YIELD
preProcessingMillis: Integer,
computeMillis: Integer,
mutateMillis: Integer,
postProcessingMillis: Integer,
nodePropertiesWritten: Integer,
configuration: Map``````
Table 4. Parameters
Name Type Default Optional Description

graphName

String

`n/a`

no

The name of a graph stored in the catalog.

configuration

Map

`{}`

yes

Configuration for algorithm-specifics and/or graph filtering.

Table 5. Configuration
Name Type Default Optional Description

mutateProperty

String

`n/a`

no

The node property in the GDS graph to which the scaled properties is written.

nodeLabels

List of String

`['*']`

yes

Filter the named graph using the given node labels.

relationshipTypes

List of String

`['*']`

yes

Filter the named graph using the given relationship types.

concurrency

Integer

`4`

yes

The number of concurrent threads used for running the algorithm.

jobId

String

`Generated internally`

yes

An ID that can be provided to more easily track the algorithm’s progress.

nodeProperties

List of String

`n/a`

no

The names of the node properties that are to be scaled. All property names must exist in the projected graph.

scaler

String

`n/a`

no

The name of the scaler applied for the properties. Supported values are `MinMax`, `Max`, `Mean`, `Log`, `L1Norm`, `L2Norm` and `StdScore`.

Table 6. Results
Name Type Description

preProcessingMillis

Integer

Milliseconds for preprocessing the data.

computeMillis

Integer

Milliseconds for running the algorithm.

mutateMillis

Integer

Milliseconds for adding properties to the projected graph.

postProcessingMillis

Integer

Unused.

nodePropertiesWritten

Integer

Number of node properties written.

configuration

Map

Configuration used for running the algorithm.

3. Examples

In this section we will show examples of running the Scale Properties algorithm on a concrete graph. The intention is to illustrate what the results look like and to provide a guide in how to make use of the algorithm in a real setting. We will do this on a small hotel graph of a handful nodes connected in a particular pattern. The example graph looks like this:

The following Cypher statement will create the example graph in the Neo4j database:
``````CREATE
(:Hotel {avgReview: 4.2, buildYear: 1978, storyCapacity: [32, 32, 0], name: 'East'}),
(:Hotel {avgReview: 8.1, buildYear: 1958, storyCapacity: [18, 20, 0], name: 'Plaza'}),
(:Hotel {avgReview: 19.0, buildYear: 1999, storyCapacity: [100, 100, 70], name: 'Central'}),
(:Hotel {avgReview: -4.12, buildYear: 2005, storyCapacity: [250, 250, 250], name: 'West'}),
(:Hotel {avgReview: 0.01, buildYear: 2020, storyCapacity: [1250, 1250, 900], name: 'Polar'}),
(:Hotel {avgReview: 3.3, buildYear: 1981, storyCapacity: [240, 240, 0], name: 'Beach'}),
(:Hotel {avgReview: 6.7, buildYear: 1984, storyCapacity: [80, 0, 0], name: 'Mountain'}),
(:Hotel {avgReview: -1.2, buildYear: 2010, storyCapacity: [55, 20, 0], name: 'Forest'})``````

With the graph in Neo4j we can now project it into the graph catalog to prepare it for algorithm execution. We do this using a native projection targeting the `Hotel` nodes, including their properties. Note that no relationships are necessary to scale the node properties. Thus we use a star projection ('*') for relationships.

 In the examples below we will use named graphs and native projections as the norm. However, Cypher projections can also be used.
The following statement will project a graph using a native projection and store it in the graph catalog under the name 'myGraph'.
``````CALL gds.graph.project(
'myGraph',
'Hotel',
'*',
{ nodeProperties: ['avgReview', 'buildYear', 'storyCapacity'] }
)``````

In the following examples we will demonstrate how to scale the node properties of this graph.

3.1. Stream

In the `stream` execution mode, the algorithm returns the scaled properties for each node. This allows us to inspect the results directly or post-process them in Cypher without any side effects. Note that the output is always a single list property, containing all scaled node properties in the input order.

For more details on the `stream` mode in general, see Stream.

The following will run the algorithm in `stream` mode:
``````CALL gds.alpha.scaleProperties.stream('myGraph', {
nodeProperties: ['buildYear', 'avgReview'],
scaler: 'MinMax'
}) YIELD nodeId, scaledProperty
RETURN gds.util.asNode(nodeId).name AS name, scaledProperty
ORDER BY name ASC``````
Table 7. Results
name scaledProperty

"Beach"

[0.3709677419354839, 0.3209342560553633]

"Central"

[0.6612903225806451, 1.0]

"East"

[0.3225806451612903, 0.35986159169550175]

"Forest"

[0.8387096774193549, 0.12629757785467127]

"Mountain"

[0.41935483870967744, 0.4679930795847751]

"Plaza"

[0.0, 0.5285467128027681]

"Polar"

[1.0, 0.17863321799307957]

"West"

[0.7580645161290323, 0.0]

In the results we can observe that the first element in the resulting `scaledProperty` we get the min-max-scaled values for `buildYear`, where the `Plaza` hotel has the minimum value and is scaled to zero, while the `Polar` hotel has the maximum value and is scaled to one. This can be verified with the example graph. The second value in the `scaledProperty` result are the scaled values of the `avgReview` property.

3.2. Mutate

The `mutate` execution mode enables updating the named graph with a new node property containing the scaled properties for that node. The name of the new property is specified using the mandatory configuration parameter `mutateProperty`. The result is a single summary row containing metrics from the computation. The `mutate` mode is especially useful when multiple algorithms are used in conjunction.

For more details on the `mutate` mode in general, see Mutate.

In this example we will scale the two hotel properties of `buildYear` and `avgReview` using the Mean scaler. The output is a list property which we will call `hotelFeatures`, imagining that we will use this as input for a machine learning model later on.

The following will run the algorithm in `mutate` mode:
``````CALL gds.alpha.scaleProperties.mutate('myGraph', {
nodeProperties: ['buildYear', 'avgReview'],
scaler: 'Mean',
mutateProperty: 'hotelFeatures'
}) YIELD nodePropertiesWritten``````
Table 8. Results
nodePropertiesWritten

8

The result shows that there are now eight new node properties in the in-memory graph. These contain the scaled values from the input properties, where the scaled `buildYear` values are in the first list position and scaled `avgReview` values are in the second position. To find out how to inspect the new schema of the in-memory graph, see Listing graphs in the catalog.

3.3. List properties

The `storyCapacity` property models the amount of rooms on each story of the hotel. The property is normalized so that hotels with fewer stories have a zero value. This is because the Scale Properties algorithm requires that all values for the same property have the same length. In this example we will show how to scale the values in these lists using the Scale Properties algorithm. We imagine using the output as feature vector to input in a machine learning algorithm. Additionally, we will include the `avgReview` property in our feature vector.

The following will run the algorithm in `mutate` mode:
``````CALL gds.alpha.scaleProperties.stream('myGraph', {
nodeProperties: ['avgReview', 'storyCapacity'],
scaler: 'StdScore'
}) YIELD nodeId, scaledProperty
RETURN gds.util.asNode(nodeId).name AS name, scaledProperty AS features
ORDER BY name ASC``````
Table 9. Results
name features

"Beach"

[-0.17956547594003253, -0.03401933556831381, 0.00254261210704973, -0.5187592498702616]

"Central"

[2.172199255871029, -0.3968922482969945, -0.3534230828799124, -0.2806402499298136]

"East"

[-0.0447509371737933, -0.5731448059080679, -0.526320706159294, -0.5187592498702616]

"Forest"

[-0.8536381697712284, -0.513529970245499, -0.5568320514438908, -0.5187592498702616]

"Mountain"

[0.32973389273242665, -0.4487312358296632, -0.6076842935848854, -0.5187592498702616]

"Plaza"

[0.5394453974799097, -0.609432097180936, -0.5568320514438908, -0.5187592498702616]

"Polar"

[-0.672387512096618, 2.583849534831454, 2.5705808402272767, 2.542770749364069]

"West"

[-1.2910364511016934, -0.00809984180197948, 0.027968733177547028, 0.3316657499170525]

The resulting feature vector contains the standard-score scaled value for the `avgReview` property in the first list position. We can see that some values are negative and that the maximum value sticks out for the `Central` hotel.

The other three list positions are the scaled values for the `storyCapacity` list property. Note that each list item is scaled only with respect to the corresponding item in the other lists. Thus, the `Polar` hotel has the greatest scaled value in all list positions.