Node Embedding Algorithms

class graphdatascience.procedure_surface.api.node_embedding.FastRPEndpoints
abstract estimate(G: GraphV2 | dict[str, Any], embedding_dimension: int, iteration_weights: list[float] = [0.0, 1.0, 1.0], normalization_strength: float = 0.0, node_self_influence: float = 0.0, property_ratio: float = 0.0, feature_properties: list[str] | None = None, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], concurrency: int | None = None, relationship_weight_property: str | None = None, random_seed: int | None = None) EstimationResult

Returns an estimation of the memory consumption for that procedure.

Parameters:
  • G (GraphV2 | dict[str, Any]) – Graph object to use or a dictionary representing the graph dimensions.

  • embedding_dimension (int) – The dimension of the generated embeddings

  • iteration_weights (list[float] = [0.0, 1.0, 1.0]) – Weights for each iteration. Controls the influence of each iteration on the final embedding.

  • normalization_strength (float, default=0.0) – The normalization strength parameter controls how much the embedding is normalized

  • node_self_influence (float, default=0.0) – The influence of the node’s own features on its embedding

  • property_ratio (float, default=0.0) – The ratio of node properties to use in the embedding

  • feature_properties (list[str] | None, default=None) – List of node properties to use as features in the embedding. Defaults to [] if not specified

  • relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.

  • node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.

  • concurrency (int | None) – Number of concurrent threads to use.

  • relationship_weight_property (str | None) – Name of the property to be used as weights.

  • random_seed (int | None) – Seed for random number generation to ensure reproducible results.

Returns:

Memory estimation details

Return type:

EstimationResult

abstract mutate(G: GraphV2, mutate_property: str, embedding_dimension: int, iteration_weights: list[float] = [0.0, 1.0, 1.0], normalization_strength: float = 0.0, node_self_influence: float = 0.0, property_ratio: float = 0.0, feature_properties: list[str] | None = None, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None, relationship_weight_property: str | None = None, random_seed: int | None = None) FastRPMutateResult

Executes the FastRP algorithm and writes the results back to the graph as a node property.

Parameters:
  • G (GraphV2) – Graph object to use

  • mutate_property (str) – Name of the node property to store the results in.

  • embedding_dimension (int) – The dimension of the generated embeddings

  • iteration_weights (list[float] = [0.0, 1.0, 1.0]) – Weights for each iteration. Controls the influence of each iteration on the final embedding.

  • normalization_strength (float, default=0.0) – The normalization strength parameter controls how much the embedding is normalized

  • node_self_influence (float, default=0.0) – The influence of the node’s own features on its embedding

  • property_ratio (float, default=0.0) – The ratio of node properties to use in the embedding

  • feature_properties (list[str] | None, default=None) – List of node properties to use as features in the embedding. Defaults to [] if not specified

  • relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.

  • node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.

  • sudo (bool) – Disable the memory guard.

  • log_progress (bool) – Display progress logging.

  • username (str | None) – As an administrator, impersonate a different user for accessing their graphs.

  • concurrency (int | None) – Number of concurrent threads to use.

  • job_id (str | None) – Identifier for the computation.

  • relationship_weight_property (str | None) – Name of the property to be used as weights.

  • random_seed (int | None) – Seed for random number generation to ensure reproducible results.

Returns:

Algorithm metrics and statistics

Return type:

FastRPMutateResult

abstract stats(G: GraphV2, embedding_dimension: int, iteration_weights: list[float] = [0.0, 1.0, 1.0], normalization_strength: float = 0.0, node_self_influence: float = 0.0, property_ratio: float = 0.0, feature_properties: list[str] | None = None, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None, relationship_weight_property: str | None = None, random_seed: int | None = None) FastRPStatsResult

Executes the FastRP algorithm and returns result statistics without writing the result to Neo4j.

Parameters:
  • G (GraphV2) – Graph object to use

  • embedding_dimension (int) – The dimension of the generated embeddings

  • iteration_weights (list[float] = [0.0, 1.0, 1.0]) – Weights for each iteration. Controls the influence of each iteration on the final embedding.

  • normalization_strength (float, default=0.0) – The normalization strength parameter controls how much the embedding is normalized

  • node_self_influence (float, default=0.0) – The influence of the node’s own features on its embedding

  • property_ratio (float, default=0.0) – The ratio of node properties to use in the embedding

  • feature_properties (list[str] | None, default=None) – List of node properties to use as features in the embedding. Defaults to [] if not specified

  • relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.

  • node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.

  • sudo (bool) – Disable the memory guard.

  • log_progress (bool) – Display progress logging.

  • username (str | None) – As an administrator, impersonate a different user for accessing their graphs.

  • concurrency (int | None) – Number of concurrent threads to use.

  • job_id (str | None) – Identifier for the computation.

  • relationship_weight_property (str | None) – Name of the property to be used as weights.

  • random_seed (int | None) – Seed for random number generation to ensure reproducible results.

Returns:

Algorithm statistics

Return type:

FastRPStatsResult

abstract stream(G: GraphV2, embedding_dimension: int, iteration_weights: list[float] = [0.0, 1.0, 1.0], normalization_strength: float = 0.0, node_self_influence: float = 0.0, property_ratio: float = 0.0, feature_properties: list[str] | None = None, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None, relationship_weight_property: str | None = None, random_seed: int | None = None) DataFrame

Executes the FastRP algorithm and returns the results as a stream.

Parameters:
  • G (GraphV2) – Graph object to use

  • embedding_dimension (int) – The dimension of the generated embeddings

  • iteration_weights (list[float] = [0.0, 1.0, 1.0]) – Weights for each iteration. Controls the influence of each iteration on the final embedding.

  • normalization_strength (float, default=0.0) – The normalization strength parameter controls how much the embedding is normalized

  • node_self_influence (float, default=0.0) – The influence of the node’s own features on its embedding

  • property_ratio (float, default=0.0) – The ratio of node properties to use in the embedding

  • feature_properties (list[str] | None, default=None) – List of node properties to use as features in the embedding. Defaults to [] if not specified

  • relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.

  • node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.

  • sudo (bool) – Disable the memory guard.

  • log_progress (bool) – Display progress logging.

  • username (str | None) – As an administrator, impersonate a different user for accessing their graphs.

  • concurrency (int | None) – Number of concurrent threads to use.

  • job_id (str | None) – Identifier for the computation.

  • relationship_weight_property (str | None) – Name of the property to be used as weights.

  • random_seed (int | None) – Seed for random number generation to ensure reproducible results.

Returns:

DataFrame with node IDs and their FastRP embeddings

Return type:

DataFrame

abstract write(G: GraphV2, write_property: str, embedding_dimension: int, iteration_weights: list[float] = [0.0, 1.0, 1.0], normalization_strength: float = 0.0, node_self_influence: float = 0.0, property_ratio: float = 0.0, feature_properties: list[str] | None = None, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None, relationship_weight_property: str | None = None, random_seed: int | None = None, write_concurrency: int | None = None) FastRPWriteResult

Executes the FastRP algorithm and writes the results to Neo4j.

Parameters:
  • G (GraphV2) – Graph object to use

  • write_property (str) – Name of the node property to store the results in.

  • embedding_dimension (int) – The dimension of the generated embeddings

  • iteration_weights (list[float] = [0.0, 1.0, 1.0]) – Weights for each iteration. Controls the influence of each iteration on the final embedding.

  • normalization_strength (float, default=0.0) – The normalization strength parameter controls how much the embedding is normalized

  • node_self_influence (float, default=0.0) – The influence of the node’s own features on its embedding

  • property_ratio (float, default=0.0) – The ratio of node properties to use in the embedding

  • feature_properties (list[str] | None, default=None) – List of node properties to use as features in the embedding. Defaults to [] if not specified

  • relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.

  • node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.

  • sudo (bool) – Disable the memory guard.

  • log_progress (bool) – Display progress logging.

  • username (str | None) – As an administrator, impersonate a different user for accessing their graphs.

  • concurrency (int | None) – Number of concurrent threads to use.

  • job_id (str | None) – Identifier for the computation.

  • relationship_weight_property (str | None) – Name of the property to be used as weights.

  • random_seed (int | None) – Seed for random number generation to ensure reproducible results.

  • write_concurrency (int | None) – Number of concurrent threads to use for writing.Returns

  • -------

  • FastRPWriteResult – Algorithm metrics and statistics

Return type:

FastRPWriteResult

pydantic model graphdatascience.procedure_surface.api.node_embedding.FastRPMutateResult
field compute_millis: int
field configuration: dict[str, Any]
field mutate_millis: int
field node_count: int
field node_properties_written: int
field pre_processing_millis: int
pydantic model graphdatascience.procedure_surface.api.node_embedding.FastRPStatsResult
field compute_millis: int
field configuration: dict[str, Any]
field node_count: int
field pre_processing_millis: int
pydantic model graphdatascience.procedure_surface.api.node_embedding.FastRPWriteResult
field compute_millis: int
field configuration: dict[str, Any]
field node_count: int
field node_properties_written: int
field pre_processing_millis: int
field write_millis: int
class graphdatascience.procedure_surface.api.node_embedding.GraphSageEndpoints

API for the GraphSage algorithm, combining both training and prediction functionalities.

estimate(G: GraphV2 | dict[str, Any], model_name: str, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], batch_size: int = 100, concurrency: int | None = None, log_progress: bool = True, username: str | None = None, sudo: bool = False, job_id: str | None = None) EstimationResult

Returns an estimation of the memory consumption for that procedure.

Parameters:
  • G (GraphV2 | dict[str, Any]) – Graph object to use or a dictionary representing the graph dimensions.

  • model_name (str) – Name under which the model will is stored

  • relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.

  • node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.

  • batch_size (int = 100) – The batch size for prediction.

  • concurrency (int | None) – Number of concurrent threads to use.

  • log_progress (bool) – Display progress logging.

  • username (str | None) – As an administrator, impersonate a different user for accessing their graphs.

  • sudo (bool) – Disable the memory guard.

  • job_id (str | None) – Identifier for the computation.

Returns:

The estimated cost of running the algorithm

Return type:

EstimationResult

mutate(G: GraphV2, model_name: str, mutate_property: str, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None, batch_size: int = 100) GraphSageMutateResult

Uses a pre-trained GraphSage model to predict embeddings for a graph and writes the results back to the graph as a node property.

Parameters:
  • G (GraphV2) – Graph object to use

  • model_name (str) – Name under which the model will is stored

  • mutate_property (str) – Name of the node property to store the results in.

  • relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.

  • node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.

  • username (str | None) – As an administrator, impersonate a different user for accessing their graphs.

  • log_progress (bool) – Display progress logging.

  • sudo (bool) – Disable the memory guard.

  • concurrency (int | None) – Number of concurrent threads to use.

  • job_id (str | None) – Identifier for the computation.

  • batch_size (int) – Number of nodes to process in each batch.

Returns:

Algorithm metrics and statistics

Return type:

GraphSageMutateResult

stream(G: GraphV2, model_name: str, *, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None, batch_size: int = 100) DataFrame

Uses a pre-trained GraphSage model to predict embeddings for a graph and returns the results as a stream.

Parameters:
  • G (GraphV2) – Graph object to use

  • model_name (str) – Name under which the model will is stored

  • relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.

  • node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.

  • username (str | None) – As an administrator, impersonate a different user for accessing their graphs.

  • log_progress (bool) – Display progress logging.

  • sudo (bool) – Disable the memory guard.

  • concurrency (int | None) – Number of concurrent threads to use.

  • job_id (str | None) – Identifier for the computation.

  • batch_size (int) – Number of nodes to process in each batch.

Returns:

DataFrame with node IDs and their embeddings

Return type:

DataFrame

property train: GraphSageTrainEndpoints

Trains a GraphSage model on the given graph.

Parameters:
  • G – Graph object to use

  • model_name (str) – Name under which the model will be stored

  • feature_properties (list[str]) – The names of the node properties to use as input features

  • activation_function (str | None) – The activation function to apply after each layer

  • negative_sample_weight (int | None, default=None) – Weight of negative samples in the loss function

  • embedding_dimension (int | None, default=None) – The dimension of the generated embeddings

  • tolerance – Minimum change in loss between iterations for early stopping an epoch.

  • learning_rate (float | None, default=None) – Learning rate for the training optimization

  • max_iterations – Maximum number of iterations to run.

  • sample_sizes (list[int] | None, default=None) – Number of neighbors to sample at each layer

  • aggregator (str | None) – The aggregator function for neighborhood aggregation

  • penalty_l2 (float | None, default=None) – L2 regularization penalty

  • search_depth (int | None, default=None) – Maximum search depth for neighbor sampling

  • epochs (int | None, default=None) – Number of training epochs

  • projected_feature_dimension (int | None, default=None) – Dimension to project input features to before training

  • batch_sampling_ratio (float | None, default=None) – Ratio of nodes to sample for each training batch

  • store_model_to_disk (bool | None, default=None) – Whether to persist the model to disk

  • relationship_types – Filter the graph using the given relationship types. Relationships with any of the given types will be included.

  • node_labels – Filter the graph using the given node labels. Nodes with any of the given labels will be included.

  • username – As an administrator, impersonate a different user for accessing their graphs.

  • log_progress – Display progress logging.

  • sudo – Disable the memory guard.

  • concurrency – Number of concurrent threads to use.

  • job_id – Identifier for the computation.

  • batch_size – Number of nodes to process in each batch.

  • relationship_weight_property – Name of the property to be used as weights.

  • random_seed – Seed for random number generation to ensure reproducible results.

Returns:

Trained model

Return type:

GraphSageModelV2

write(G: GraphV2, model_name: str, write_property: str, *, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, write_concurrency: int | None = None, job_id: str | None = None, batch_size: int = 100) GraphSageWriteResult

Uses a pre-trained GraphSage model to predict embeddings for a graph and writes the results back to the database.

Parameters:
  • G (GraphV2) – Graph object to use

  • model_name (str) – Name under which the model will is stored

  • write_property (str) – Name of the node property to store the results in.

  • relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.

  • node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.

  • username (str | None) – As an administrator, impersonate a different user for accessing their graphs.

  • log_progress (bool) – Display progress logging.

  • sudo (bool) – Disable the memory guard.

  • concurrency (int | None) – Number of concurrent threads to use.

  • write_concurrency (int | None) – Number of concurrent threads to use for writing.

  • job_id (str | None) – Identifier for the computation.

  • batch_size (int) – Number of nodes to process in each batch.

Returns:

Algorithm metrics and statistics

Return type:

GraphSageWriteResult

pydantic model graphdatascience.procedure_surface.api.node_embedding.GraphSageMutateResult
field compute_millis: int
field configuration: dict[str, Any]
field mutate_millis: int
field node_count: int
field node_properties_written: int
field pre_processing_millis: int
class graphdatascience.procedure_surface.api.node_embedding.GraphSagePredictEndpoints
abstract estimate(G: GraphV2 | dict[str, Any], model_name: str, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], batch_size: int = 100, concurrency: int | None = None, log_progress: bool = True, username: str | None = None, sudo: bool = False, job_id: str | None = None) EstimationResult

Returns an estimation of the memory consumption for that procedure.

Parameters:
  • G (GraphV2 | dict[str, Any]) – Graph object to use or a dictionary representing the graph dimensions.

  • model_name (str) – Name under which the model will is stored

  • relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.

  • node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.

  • batch_size (int = 100) – The batch size for prediction.

  • concurrency (int | None) – Number of concurrent threads to use.

  • log_progress (bool) – Display progress logging.

  • username (str | None) – As an administrator, impersonate a different user for accessing their graphs.

  • sudo (bool) – Disable the memory guard.

  • job_id (str | None) – Identifier for the computation.

Returns:

The estimated cost of running the algorithm

Return type:

EstimationResult

abstract mutate(G: GraphV2, model_name: str, mutate_property: str, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None, batch_size: int = 100) GraphSageMutateResult

Uses a pre-trained GraphSage model to predict embeddings for a graph and writes the results back to the graph as a node property.

Parameters:
  • G (GraphV2) – Graph object to use

  • model_name (str) – Name under which the model will is stored

  • mutate_property (str) – Name of the node property to store the results in.

  • relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.

  • node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.

  • username (str | None) – As an administrator, impersonate a different user for accessing their graphs.

  • log_progress (bool) – Display progress logging.

  • sudo (bool) – Disable the memory guard.

  • concurrency (int | None) – Number of concurrent threads to use.

  • job_id (str | None) – Identifier for the computation.

  • batch_size (int) – Number of nodes to process in each batch.

Returns:

Algorithm metrics and statistics

Return type:

GraphSageMutateResult

abstract stream(G: GraphV2, model_name: str, *, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None, batch_size: int = 100) DataFrame

Uses a pre-trained GraphSage model to predict embeddings for a graph and returns the results as a stream.

Parameters:
  • G (GraphV2) – Graph object to use

  • model_name (str) – Name under which the model will is stored

  • relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.

  • node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.

  • username (str | None) – As an administrator, impersonate a different user for accessing their graphs.

  • log_progress (bool) – Display progress logging.

  • sudo (bool) – Disable the memory guard.

  • concurrency (int | None) – Number of concurrent threads to use.

  • job_id (str | None) – Identifier for the computation.

  • batch_size (int) – Number of nodes to process in each batch.

Returns:

DataFrame with node IDs and their embeddings

Return type:

DataFrame

abstract write(G: GraphV2, model_name: str, write_property: str, *, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, write_concurrency: int | None = None, job_id: str | None = None, batch_size: int = 100) GraphSageWriteResult

Uses a pre-trained GraphSage model to predict embeddings for a graph and writes the results back to the database.

Parameters:
  • G (GraphV2) – Graph object to use

  • model_name (str) – Name under which the model will is stored

  • write_property (str) – Name of the node property to store the results in.

  • relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.

  • node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.

  • username (str | None) – As an administrator, impersonate a different user for accessing their graphs.

  • log_progress (bool) – Display progress logging.

  • sudo (bool) – Disable the memory guard.

  • concurrency (int | None) – Number of concurrent threads to use.

  • write_concurrency (int | None) – Number of concurrent threads to use for writing.

  • job_id (str | None) – Identifier for the computation.

  • batch_size (int) – Number of nodes to process in each batch.

Returns:

Algorithm metrics and statistics

Return type:

GraphSageWriteResult

class graphdatascience.procedure_surface.api.node_embedding.GraphSageTrainEndpoints
abstract estimate(G: GraphV2, model_name: str, feature_properties: list[str], *, activation_function: str = 'SIGMOID', negative_sample_weight: int = 20, embedding_dimension: int = 64, tolerance: float = 0.0001, learning_rate: float = 0.1, max_iterations: int = 10, sample_sizes: list[int] | None = None, aggregator: str = 'MEAN', penalty_l2: float = 0.0, search_depth: int = 5, epochs: int = 1, projected_feature_dimension: int | None = None, batch_sampling_ratio: float | None = None, store_model_to_disk: bool = False, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None, batch_size: int = 100, relationship_weight_property: str | None = None, random_seed: int | None = None) EstimationResult

Estimates memory requirements and other statistics for training a GraphSage model.

This method provides memory estimation for the GraphSage training algorithm without actually executing the training. It helps determine the computational requirements before running the actual training procedure.

Parameters:
  • G (GraphV2) – Graph object to use

  • model_name (str) – Name under which the model will be stored

  • feature_properties (list[str]) – The names of the node properties to use as input features

  • activation_function (str = "SIGMOID") – The activation function to apply after each layer

  • negative_sample_weight (int = 20) – Weight of negative samples in the loss function

  • embedding_dimension (int = 64) – The dimension of the generated embeddings

  • tolerance (float) – Minimum change in loss between iterations for early stopping an epoch.

  • learning_rate (float = 0.1) – Learning rate for the training optimization

  • max_iterations (int) – Maximum number of iterations to run.

  • sample_sizes (list[int] | None = None) – Number of neighbors to sample at each layer. Defaults to [25, 10] if not specified

  • aggregator (str = "MEAN") – The aggregator function for neighborhood aggregation

  • penalty_l2 (float = 0.0) – L2 regularization penalty

  • search_depth (int = 5) – Maximum search depth for neighbor sampling

  • epochs (int = 1) – Number of training epochs

  • projected_feature_dimension (int | None = None) – Dimension to project input features to before training

  • batch_sampling_ratio (float | None = None) – Ratio of nodes to sample for each training batch

  • store_model_to_disk (bool = False) – Whether to persist the model to disk

  • relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.

  • node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.

  • username (str | None) – As an administrator, impersonate a different user for accessing their graphs.

  • log_progress (bool) – Display progress logging.

  • sudo (bool) – Disable the memory guard.

  • concurrency (int | None) – Number of concurrent threads to use.

  • job_id (str | None) – Identifier for the computation.

  • batch_size (int) – Number of nodes to process in each batch.

  • relationship_weight_property (str | None) – Name of the property to be used as weights.

  • random_seed (int | None) – Seed for random number generation to ensure reproducible results.

Returns:

The estimation result containing memory requirements and other statistics

Return type:

EstimationResult

pydantic model graphdatascience.procedure_surface.api.node_embedding.GraphSageTrainResult
field configuration: dict[str, Any]
field model_info: dict[str, Any]
field train_millis: int
pydantic model graphdatascience.procedure_surface.api.node_embedding.GraphSageWriteResult
field compute_millis: int
field configuration: dict[str, Any]
field node_count: int
field node_properties_written: int
field pre_processing_millis: int
field write_millis: int
class graphdatascience.procedure_surface.api.node_embedding.HashGNNEndpoints
abstract estimate(G: GraphV2 | dict[str, Any], iterations: int, embedding_density: int, output_dimension: int | None = None, neighbor_influence: float = 1.0, generate_features: dict[str, Any] | None = None, binarize_features: dict[str, Any] | None = None, heterogeneous: bool = False, feature_properties: list[str] | None = None, random_seed: int | None = None) EstimationResult

Returns an estimation of the memory consumption for that procedure.

Parameters:
  • G (GraphV2 | dict[str, Any]) – Graph object to use or a dictionary representing the graph dimensions.

  • iterations (int) – Number of iterations to run.

  • embedding_density (int) – The density of the generated embeddings (number of bits per embedding)

  • output_dimension (int | None, default=None) – The dimension of the output embeddings.

  • neighbor_influence (float, default=1.0) – The influence of neighboring nodes.

  • generate_features (dict[str, Any] | None, default=None) – Configuration for generating synthetic features from existing node properties

  • binarize_features (dict[str, Any] | None, default=None) – Configuration for binarizing continuous features

  • heterogeneous (bool, default=False) – Whether to use heterogeneous node processing for different node types

  • feature_properties (list[str] | None, default=None) – The names of the node properties to use as input features. Defaults to [] if not specified

  • random_seed (int | None) – Seed for random number generation to ensure reproducible results.

Returns:

The estimated cost of running the algorithm

Return type:

EstimationResult

abstract mutate(G: GraphV2, iterations: int, embedding_density: int, mutate_property: str, output_dimension: int | None = None, neighbor_influence: float = 1.0, generate_features: dict[str, Any] | None = None, binarize_features: dict[str, Any] | None = None, heterogeneous: bool = False, feature_properties: list[str] | None = None, random_seed: int | None = None, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None) HashGNNMutateResult

Executes the HashGNN algorithm and writes the results back to the graph as a node property.

Parameters:
  • G (GraphV2) – Graph object to use

  • iterations (int) – Number of iterations to run.

  • embedding_density (int) – The density of the generated embeddings (number of bits per embedding)

  • mutate_property (str) – Name of the node property to store the results in.

  • output_dimension (int | None, default=None) – The dimension of the output embeddings

  • neighbor_influence (float, default=1.0) – The influence of neighboring nodes

  • generate_features (dict[str, Any] | None, default=None) – Configuration for generating synthetic features from existing node properties

  • binarize_features (dict[str, Any] | None, default=None) – Configuration for binarizing continuous features

  • heterogeneous (bool, default=False) – Whether to use heterogeneous node processing for different node types

  • feature_properties (list[str] | None, default=None) – The names of the node properties to use as input features. Defaults to [] if not specified

  • random_seed (int | None) – Seed for random number generation to ensure reproducible results.

  • relationship_types (list[str])

  • node_labels (list[str])

  • sudo (bool)

  • log_progress (bool)

  • username (str | None)

  • concurrency (int | None)

  • job_id (str | None)

Returns:

Algorithm metrics and statistics

Return type:

HashGNNMutateResult

abstract stream(G: GraphV2, iterations: int, embedding_density: int, output_dimension: int | None = None, neighbor_influence: float = 1.0, generate_features: dict[str, Any] | None = None, binarize_features: dict[str, Any] | None = None, heterogeneous: bool = False, feature_properties: list[str] | None = None, random_seed: int | None = None, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None) DataFrame

Executes the HashGNN algorithm and returns the results as a stream.

Parameters:
  • G (GraphV2) – Graph object to use

  • iterations (int) – Number of iterations to run.

  • embedding_density (int) – The density of the generated embeddings (number of bits per embedding)

  • output_dimension (int | None, default=None) – The dimension of the output embeddings

  • neighbor_influence (float, default=1.0) – The influence of neighboring nodes

  • generate_features (dict[str, Any] | None, default=None) – Configuration for generating synthetic features from existing node properties

  • binarize_features (dict[str, Any] | None, default=None) – Configuration for binarizing continuous features

  • heterogeneous (bool, default=False) – Whether to use heterogeneous node processing for different node types

  • feature_properties (list[str] | None, default=None) – The names of the node properties to use as input features. Defaults to [] if not specified

  • random_seed (int | None) – Seed for random number generation to ensure reproducible results.

  • relationship_types (list[str])

  • node_labels (list[str])

  • sudo (bool)

  • log_progress (bool)

  • username (str | None)

  • concurrency (int | None)

  • job_id (str | None)

Returns:

DataFrame with node IDs and their embeddings

Return type:

DataFrame

abstract write(G: GraphV2, iterations: int, embedding_density: int, write_property: str, output_dimension: int | None = None, neighbor_influence: float = 1.0, generate_features: dict[str, Any] | None = None, binarize_features: dict[str, Any] | None = None, heterogeneous: bool = False, feature_properties: list[str] | None = None, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None, write_concurrency: int | None = None, random_seed: int | None = None) HashGNNWriteResult

Executes the HashGNN algorithm and writes the results back to the database.

Parameters:
  • G (GraphV2) – Graph object to use

  • iterations (int) – Number of iterations to run.

  • embedding_density (int) – The density of the generated embeddings (number of bits per embedding)

  • write_property (str) – Name of the node property to store the results in.

  • output_dimension (int | None, default=None) – The dimension of the output embeddings. If not specified, defaults to embedding_density / 64

  • neighbor_influence (float, default=1.0) – The influence of neighboring nodes (0.0 to 1.0)

  • generate_features (dict[str, Any] | None, default=None) – Configuration for generating synthetic features from existing node properties

  • binarize_features (dict[str, Any] | None, default=None) – Configuration for binarizing continuous features

  • heterogeneous (bool, default=False) – Whether to use heterogeneous node processing for different node types

  • feature_properties (list[str] | None, default=None) – The names of the node properties to use as input features. Defaults to [] if not specified

  • relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.

  • node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.

  • sudo (bool) – Disable the memory guard.

  • log_progress (bool) – Display progress logging.

  • username (str | None) – As an administrator, impersonate a different user for accessing their graphs.

  • concurrency (int | None) – Number of concurrent threads to use.

  • job_id (str | None) – Identifier for the computation.

  • write_concurrency (int | None) – Number of concurrent threads to use for writing.

  • random_seed (int | None) – Seed for random number generation to ensure reproducible results.

Returns:

Algorithm metrics and statistics

Return type:

HashGNNWriteResult

pydantic model graphdatascience.procedure_surface.api.node_embedding.HashGNNMutateResult

Result object representing the results of running a HashGNN algorithm in mutate mode.

field compute_millis: int
field configuration: dict[str, Any]
field mutate_millis: int
field node_count: int
field node_properties_written: int
field pre_processing_millis: int
pydantic model graphdatascience.procedure_surface.api.node_embedding.HashGNNWriteResult

Result object representing the results of running a HashGNN algorithm in write mode.

field compute_millis: int
field configuration: dict[str, Any]
field node_count: int
field node_properties_written: int
field pre_processing_millis: int
field write_millis: int
class graphdatascience.procedure_surface.api.node_embedding.Node2VecEndpoints
abstract estimate(G: GraphV2 | dict[str, Any], iterations: int = 1, negative_sampling_rate: int = 5, positive_sampling_factor: float = 0.001, embedding_dimension: int = 128, embedding_initializer: str = 'NORMALIZED', initial_learning_rate: float = 0.025, min_learning_rate: float = 0.0001, window_size: int = 10, negative_sampling_exponent: float = 0.75, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], concurrency: int | None = None, walk_length: int = 80, walks_per_node: int = 10, in_out_factor: float = 1.0, return_factor: float = 1.0, walk_buffer_size: int = 1000, relationship_weight_property: str | None = None, random_seed: int | None = None) EstimationResult

Returns an estimation of the memory consumption for that procedure.

Parameters:
  • G (GraphV2 | dict[str, Any]) – Graph object to use or a dictionary representing the graph dimensions.

  • iterations (int) – Number of iterations to run.

  • negative_sampling_rate (int, default=5) – Number of negative samples for each positive sample

  • positive_sampling_factor (float, default=0.001) – Factor to multiply positive sampling weights

  • embedding_dimension (int, default=128) – The dimension of the generated embeddings

  • embedding_initializer (str, default="NORMALIZED") – Strategy for initializing node embeddings. Either “UNIFORM” or “NORMALIZED”

  • initial_learning_rate (float, default=0.025) – The initial learning rate

  • min_learning_rate (float, default=0.0001) – The minimum learning rate

  • window_size (int, default=10) – Size of the context window

  • negative_sampling_exponent (float, default=0.75) – Exponent for negative sampling probability distribution

  • relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.

  • node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.

  • concurrency (int | None) – Number of concurrent threads to use.

  • walk_length (int, default=80) – The length of each random walk

  • walks_per_node (int, default=10) – Number of walks to sample for each node

  • in_out_factor (float, default=1.0) – Controls the likelihood of immediately revisiting a node in the walk

  • return_factor (float, default=1.0) – Controls the likelihood of visiting already visited nodes

  • walk_buffer_size (int, default=1000) – Buffer size for walk sampling

  • relationship_weight_property (str | None) – Name of the property to be used as weights.

  • random_seed (int | None) – Seed for random number generation to ensure reproducible results.

Returns:

Memory estimation details

Return type:

EstimationResult

abstract mutate(G: GraphV2, mutate_property: str, iterations: int = 1, negative_sampling_rate: int = 5, positive_sampling_factor: float = 0.001, embedding_dimension: int = 128, embedding_initializer: str = 'NORMALIZED', initial_learning_rate: float = 0.025, min_learning_rate: float = 0.0001, window_size: int = 10, negative_sampling_exponent: float = 0.75, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None, walk_length: int = 80, walks_per_node: int = 10, in_out_factor: float = 1.0, return_factor: float = 1.0, walk_buffer_size: int = 1000, relationship_weight_property: str | None = None, random_seed: int | None = None) Node2VecMutateResult

Executes the Node2Vec algorithm and writes the results back to the graph as a node property.

Parameters:
  • G (GraphV2) – Graph object to use

  • mutate_property (str) – Name of the node property to store the results in.

  • iterations (int) – Number of iterations to run.

  • negative_sampling_rate (int, default=5) – Number of negative samples for each positive sample

  • positive_sampling_factor (float, default=0.001) – Factor to multiply positive sampling weights

  • embedding_dimension (int, default=128) – The dimension of the generated embeddings

  • embedding_initializer (str, default="NORMALIZED") – Strategy for initializing node embeddings. Either “UNIFORM” or “NORMALIZED”

  • initial_learning_rate (float, default=0.025) – The initial learning rate

  • min_learning_rate (float, default=0.0001) – The minimum learning rate

  • window_size (int, default=10) – Size of the context window

  • negative_sampling_exponent (float, default=0.75) – Exponent for negative sampling probability distribution

  • relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.

  • node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.

  • username (str | None) – As an administrator, impersonate a different user for accessing their graphs.

  • log_progress (bool) – Display progress logging.

  • sudo (bool) – Disable the memory guard.

  • concurrency (int | None) – Number of concurrent threads to use.

  • job_id (str | None) – Identifier for the computation.

  • walk_length (int, default=80) – The length of each random walk

  • walks_per_node (int, default=10) – Number of walks to sample for each node

  • in_out_factor (float, default=1.0) – Controls the likelihood of immediately revisiting a node in the walk

  • return_factor (float, default=1.0) – Controls the likelihood of visiting already visited nodes

  • walk_buffer_size (int, default=1000) – Buffer size for walk sampling

  • relationship_weight_property (str | None) – Name of the property to be used as weights.

  • random_seed (int | None) – Seed for random number generation to ensure reproducible results.

Returns:

Algorithm metrics and statistics

Return type:

Node2VecMutateResult

abstract stream(G: GraphV2, iterations: int = 1, negative_sampling_rate: int = 5, positive_sampling_factor: float = 0.001, embedding_dimension: int = 128, embedding_initializer: str = 'NORMALIZED', initial_learning_rate: float = 0.025, min_learning_rate: float = 0.0001, window_size: int = 10, negative_sampling_exponent: float = 0.75, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None, walk_length: int = 80, walks_per_node: int = 10, in_out_factor: float = 1.0, return_factor: float = 1.0, walk_buffer_size: int = 1000, relationship_weight_property: str | None = None, random_seed: int | None = None) DataFrame

Executes the Node2Vec algorithm and returns the results as a stream.

Parameters:
  • G (GraphV2) – Graph object to use

  • iterations (int) – Number of iterations to run.

  • negative_sampling_rate (int, default=5) – Number of negative samples for each positive sample

  • positive_sampling_factor (float, default=0.001) – Factor to multiply positive sampling weights

  • embedding_dimension (int, default=128) – The dimension of the generated embeddings

  • embedding_initializer (str, default="NORMALIZED") – Strategy for initializing node embeddings. Either “UNIFORM” or “NORMALIZED”

  • initial_learning_rate (float, default=0.025) – The initial learning rate

  • min_learning_rate (float, default=0.0001) – The minimum learning rate

  • window_size (int, default=10) – Size of the context window

  • negative_sampling_exponent (float, default=0.75) – Exponent for negative sampling probability distribution

  • relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.

  • node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.

  • username (str | None) – As an administrator, impersonate a different user for accessing their graphs.

  • log_progress (bool) – Display progress logging.

  • sudo (bool) – Disable the memory guard.

  • concurrency (int | None) – Number of concurrent threads to use.

  • job_id (str | None) – Identifier for the computation.

  • walk_length (int, default=80) – The length of each random walk

  • walks_per_node (int, default=10) – Number of walks to sample for each node

  • in_out_factor (float, default=1.0) – Controls the likelihood of immediately revisiting a node in the walk

  • return_factor (float, default=1.0) – Controls the likelihood of visiting already visited nodes

  • walk_buffer_size (int, default=1000) – Buffer size for walk sampling

  • relationship_weight_property (str | None) – Name of the property to be used as weights.

  • random_seed (int | None) – Seed for random number generation to ensure reproducible results.

Returns:

Embeddings as a stream with columns nodeId and embedding

Return type:

DataFrame

abstract write(G: GraphV2, write_property: str, iterations: int = 1, negative_sampling_rate: int = 5, positive_sampling_factor: float = 0.001, embedding_dimension: int = 128, embedding_initializer: str = 'NORMALIZED', initial_learning_rate: float = 0.025, min_learning_rate: float = 0.0001, window_size: int = 10, negative_sampling_exponent: float = 0.75, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None, walk_length: int = 80, walks_per_node: int = 10, in_out_factor: float = 1.0, return_factor: float = 1.0, walk_buffer_size: int = 1000, relationship_weight_property: str | None = None, random_seed: int | None = None, write_concurrency: int | None = None) Node2VecWriteResult

Executes the Node2Vec algorithm and writes the results back to the database.

Parameters:
  • G (GraphV2) – Graph object to use

  • write_property (str) – Name of the node property to store the results in.

  • iterations (int) – Number of iterations to run.

  • negative_sampling_rate (int, default=5) – Number of negative samples for each positive sample

  • positive_sampling_factor (float, default=0.001) – Factor to multiply positive sampling weights

  • embedding_dimension (int, default=128) – The dimension of the generated embeddings

  • embedding_initializer (str, default="NORMALIZED") – Strategy for initializing node embeddings. Either “UNIFORM” or “NORMALIZED”

  • initial_learning_rate (float, default=0.025) – The initial learning rate

  • min_learning_rate (float, default=0.0001) – The minimum learning rate

  • window_size (int, default=10) – Size of the context window

  • negative_sampling_exponent (float, default=0.75) – Exponent for negative sampling probability distribution

  • relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.

  • node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.

  • username (str | None) – As an administrator, impersonate a different user for accessing their graphs.

  • log_progress (bool) – Display progress logging.

  • sudo (bool) – Disable the memory guard.

  • concurrency (int | None) – Number of concurrent threads to use.

  • job_id (str | None) – Identifier for the computation.

  • walk_length (int, default=80) – The length of each random walk

  • walks_per_node (int, default=10) – Number of walks to sample for each node

  • in_out_factor (float, default=1.0) – Controls the likelihood of immediately revisiting a node in the walk

  • return_factor (float, default=1.0) – Controls the likelihood of visiting already visited nodes

  • walk_buffer_size (int, default=1000) – Buffer size for walk sampling

  • relationship_weight_property (str | None) – Name of the property to be used as weights.

  • random_seed (int | None) – Seed for random number generation to ensure reproducible results.

  • write_concurrency (int | None) – Number of concurrent threads to use for writing.Returns

  • -------

  • Node2VecWriteResult – Algorithm metrics and statistics

Return type:

Node2VecWriteResult

pydantic model graphdatascience.procedure_surface.api.node_embedding.Node2VecMutateResult
field compute_millis: int
field configuration: dict[str, Any]
field loss_per_iteration: list[float]
field mutate_millis: int
field node_count: int
field node_properties_written: int
field post_processing_millis: int
field pre_processing_millis: int
pydantic model graphdatascience.procedure_surface.api.node_embedding.Node2VecWriteResult
field compute_millis: int
field configuration: dict[str, Any]
field loss_per_iteration: list[float]
field node_count: int
field node_properties_written: int
field pre_processing_millis: int
field write_millis: int