Node Embedding Algorithms¶
- class graphdatascience.procedure_surface.api.node_embedding.FastRPEndpoints¶
- abstract estimate(G: GraphV2 | dict[str, Any], embedding_dimension: int, iteration_weights: list[float] = [0.0, 1.0, 1.0], normalization_strength: float = 0.0, node_self_influence: float = 0.0, property_ratio: float = 0.0, feature_properties: list[str] | None = None, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], concurrency: int | None = None, relationship_weight_property: str | None = None, random_seed: int | None = None) EstimationResult¶
Returns an estimation of the memory consumption for that procedure.
- Parameters:
G (GraphV2 | dict[str, Any]) – Graph object to use or a dictionary representing the graph dimensions.
embedding_dimension (int) – The dimension of the generated embeddings
iteration_weights (list[float] = [0.0, 1.0, 1.0]) – Weights for each iteration. Controls the influence of each iteration on the final embedding.
normalization_strength (float, default=0.0) – The normalization strength parameter controls how much the embedding is normalized
node_self_influence (float, default=0.0) – The influence of the node’s own features on its embedding
property_ratio (float, default=0.0) – The ratio of node properties to use in the embedding
feature_properties (list[str] | None, default=None) – List of node properties to use as features in the embedding. Defaults to [] if not specified
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
concurrency (int | None) – Number of concurrent threads to use.
relationship_weight_property (str | None) – Name of the property to be used as weights.
random_seed (int | None) – Seed for random number generation to ensure reproducible results.
- Returns:
Memory estimation details
- Return type:
- abstract mutate(G: GraphV2, mutate_property: str, embedding_dimension: int, iteration_weights: list[float] = [0.0, 1.0, 1.0], normalization_strength: float = 0.0, node_self_influence: float = 0.0, property_ratio: float = 0.0, feature_properties: list[str] | None = None, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None, relationship_weight_property: str | None = None, random_seed: int | None = None) FastRPMutateResult¶
Executes the FastRP algorithm and writes the results back to the graph as a node property.
- Parameters:
G (GraphV2) – Graph object to use
mutate_property (str) – Name of the node property to store the results in.
embedding_dimension (int) – The dimension of the generated embeddings
iteration_weights (list[float] = [0.0, 1.0, 1.0]) – Weights for each iteration. Controls the influence of each iteration on the final embedding.
normalization_strength (float, default=0.0) – The normalization strength parameter controls how much the embedding is normalized
node_self_influence (float, default=0.0) – The influence of the node’s own features on its embedding
property_ratio (float, default=0.0) – The ratio of node properties to use in the embedding
feature_properties (list[str] | None, default=None) – List of node properties to use as features in the embedding. Defaults to [] if not specified
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
sudo (bool) – Disable the memory guard.
log_progress (bool) – Display progress logging.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.
relationship_weight_property (str | None) – Name of the property to be used as weights.
random_seed (int | None) – Seed for random number generation to ensure reproducible results.
- Returns:
Algorithm metrics and statistics
- Return type:
- abstract stats(G: GraphV2, embedding_dimension: int, iteration_weights: list[float] = [0.0, 1.0, 1.0], normalization_strength: float = 0.0, node_self_influence: float = 0.0, property_ratio: float = 0.0, feature_properties: list[str] | None = None, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None, relationship_weight_property: str | None = None, random_seed: int | None = None) FastRPStatsResult¶
Executes the FastRP algorithm and returns result statistics without writing the result to Neo4j.
- Parameters:
G (GraphV2) – Graph object to use
embedding_dimension (int) – The dimension of the generated embeddings
iteration_weights (list[float] = [0.0, 1.0, 1.0]) – Weights for each iteration. Controls the influence of each iteration on the final embedding.
normalization_strength (float, default=0.0) – The normalization strength parameter controls how much the embedding is normalized
node_self_influence (float, default=0.0) – The influence of the node’s own features on its embedding
property_ratio (float, default=0.0) – The ratio of node properties to use in the embedding
feature_properties (list[str] | None, default=None) – List of node properties to use as features in the embedding. Defaults to [] if not specified
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
sudo (bool) – Disable the memory guard.
log_progress (bool) – Display progress logging.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.
relationship_weight_property (str | None) – Name of the property to be used as weights.
random_seed (int | None) – Seed for random number generation to ensure reproducible results.
- Returns:
Algorithm statistics
- Return type:
- abstract stream(G: GraphV2, embedding_dimension: int, iteration_weights: list[float] = [0.0, 1.0, 1.0], normalization_strength: float = 0.0, node_self_influence: float = 0.0, property_ratio: float = 0.0, feature_properties: list[str] | None = None, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None, relationship_weight_property: str | None = None, random_seed: int | None = None) DataFrame¶
Executes the FastRP algorithm and returns the results as a stream.
- Parameters:
G (GraphV2) – Graph object to use
embedding_dimension (int) – The dimension of the generated embeddings
iteration_weights (list[float] = [0.0, 1.0, 1.0]) – Weights for each iteration. Controls the influence of each iteration on the final embedding.
normalization_strength (float, default=0.0) – The normalization strength parameter controls how much the embedding is normalized
node_self_influence (float, default=0.0) – The influence of the node’s own features on its embedding
property_ratio (float, default=0.0) – The ratio of node properties to use in the embedding
feature_properties (list[str] | None, default=None) – List of node properties to use as features in the embedding. Defaults to [] if not specified
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
sudo (bool) – Disable the memory guard.
log_progress (bool) – Display progress logging.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.
relationship_weight_property (str | None) – Name of the property to be used as weights.
random_seed (int | None) – Seed for random number generation to ensure reproducible results.
- Returns:
DataFrame with node IDs and their FastRP embeddings
- Return type:
DataFrame
- abstract write(G: GraphV2, write_property: str, embedding_dimension: int, iteration_weights: list[float] = [0.0, 1.0, 1.0], normalization_strength: float = 0.0, node_self_influence: float = 0.0, property_ratio: float = 0.0, feature_properties: list[str] | None = None, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None, relationship_weight_property: str | None = None, random_seed: int | None = None, write_concurrency: int | None = None) FastRPWriteResult¶
Executes the FastRP algorithm and writes the results to Neo4j.
- Parameters:
G (GraphV2) – Graph object to use
write_property (str) – Name of the node property to store the results in.
embedding_dimension (int) – The dimension of the generated embeddings
iteration_weights (list[float] = [0.0, 1.0, 1.0]) – Weights for each iteration. Controls the influence of each iteration on the final embedding.
normalization_strength (float, default=0.0) – The normalization strength parameter controls how much the embedding is normalized
node_self_influence (float, default=0.0) – The influence of the node’s own features on its embedding
property_ratio (float, default=0.0) – The ratio of node properties to use in the embedding
feature_properties (list[str] | None, default=None) – List of node properties to use as features in the embedding. Defaults to [] if not specified
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
sudo (bool) – Disable the memory guard.
log_progress (bool) – Display progress logging.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.
relationship_weight_property (str | None) – Name of the property to be used as weights.
random_seed (int | None) – Seed for random number generation to ensure reproducible results.
write_concurrency (int | None) – Number of concurrent threads to use for writing.Returns
-------
FastRPWriteResult – Algorithm metrics and statistics
- Return type:
- pydantic model graphdatascience.procedure_surface.api.node_embedding.FastRPMutateResult¶
- pydantic model graphdatascience.procedure_surface.api.node_embedding.FastRPStatsResult¶
- pydantic model graphdatascience.procedure_surface.api.node_embedding.FastRPWriteResult¶
- class graphdatascience.procedure_surface.api.node_embedding.GraphSageEndpoints¶
API for the GraphSage algorithm, combining both training and prediction functionalities.
- estimate(G: GraphV2 | dict[str, Any], model_name: str, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], batch_size: int = 100, concurrency: int | None = None, log_progress: bool = True, username: str | None = None, sudo: bool = False, job_id: str | None = None) EstimationResult¶
Returns an estimation of the memory consumption for that procedure.
- Parameters:
G (GraphV2 | dict[str, Any]) – Graph object to use or a dictionary representing the graph dimensions.
model_name (str) – Name under which the model will is stored
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
batch_size (int = 100) – The batch size for prediction.
concurrency (int | None) – Number of concurrent threads to use.
log_progress (bool) – Display progress logging.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
sudo (bool) – Disable the memory guard.
job_id (str | None) – Identifier for the computation.
- Returns:
The estimated cost of running the algorithm
- Return type:
- mutate(G: GraphV2, model_name: str, mutate_property: str, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None, batch_size: int = 100) GraphSageMutateResult¶
Uses a pre-trained GraphSage model to predict embeddings for a graph and writes the results back to the graph as a node property.
- Parameters:
G (GraphV2) – Graph object to use
model_name (str) – Name under which the model will is stored
mutate_property (str) – Name of the node property to store the results in.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
log_progress (bool) – Display progress logging.
sudo (bool) – Disable the memory guard.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.
batch_size (int) – Number of nodes to process in each batch.
- Returns:
Algorithm metrics and statistics
- Return type:
- stream(G: GraphV2, model_name: str, *, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None, batch_size: int = 100) DataFrame¶
Uses a pre-trained GraphSage model to predict embeddings for a graph and returns the results as a stream.
- Parameters:
G (GraphV2) – Graph object to use
model_name (str) – Name under which the model will is stored
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
log_progress (bool) – Display progress logging.
sudo (bool) – Disable the memory guard.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.
batch_size (int) – Number of nodes to process in each batch.
- Returns:
DataFrame with node IDs and their embeddings
- Return type:
DataFrame
- property train: GraphSageTrainEndpoints¶
Trains a GraphSage model on the given graph.
- Parameters:
G – Graph object to use
model_name (str) – Name under which the model will be stored
feature_properties (list[str]) – The names of the node properties to use as input features
activation_function (str | None) – The activation function to apply after each layer
negative_sample_weight (int | None, default=None) – Weight of negative samples in the loss function
embedding_dimension (int | None, default=None) – The dimension of the generated embeddings
tolerance – Minimum change in loss between iterations for early stopping an epoch.
learning_rate (float | None, default=None) – Learning rate for the training optimization
max_iterations – Maximum number of iterations to run.
sample_sizes (list[int] | None, default=None) – Number of neighbors to sample at each layer
aggregator (str | None) – The aggregator function for neighborhood aggregation
penalty_l2 (float | None, default=None) – L2 regularization penalty
search_depth (int | None, default=None) – Maximum search depth for neighbor sampling
epochs (int | None, default=None) – Number of training epochs
projected_feature_dimension (int | None, default=None) – Dimension to project input features to before training
batch_sampling_ratio (float | None, default=None) – Ratio of nodes to sample for each training batch
store_model_to_disk (bool | None, default=None) – Whether to persist the model to disk
relationship_types – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
username – As an administrator, impersonate a different user for accessing their graphs.
log_progress – Display progress logging.
sudo – Disable the memory guard.
concurrency – Number of concurrent threads to use.
job_id – Identifier for the computation.
batch_size – Number of nodes to process in each batch.
relationship_weight_property – Name of the property to be used as weights.
random_seed – Seed for random number generation to ensure reproducible results.
- Returns:
Trained model
- Return type:
GraphSageModelV2
- write(G: GraphV2, model_name: str, write_property: str, *, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, write_concurrency: int | None = None, job_id: str | None = None, batch_size: int = 100) GraphSageWriteResult¶
Uses a pre-trained GraphSage model to predict embeddings for a graph and writes the results back to the database.
- Parameters:
G (GraphV2) – Graph object to use
model_name (str) – Name under which the model will is stored
write_property (str) – Name of the node property to store the results in.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
log_progress (bool) – Display progress logging.
sudo (bool) – Disable the memory guard.
concurrency (int | None) – Number of concurrent threads to use.
write_concurrency (int | None) – Number of concurrent threads to use for writing.
job_id (str | None) – Identifier for the computation.
batch_size (int) – Number of nodes to process in each batch.
- Returns:
Algorithm metrics and statistics
- Return type:
- pydantic model graphdatascience.procedure_surface.api.node_embedding.GraphSageMutateResult¶
- class graphdatascience.procedure_surface.api.node_embedding.GraphSagePredictEndpoints¶
- abstract estimate(G: GraphV2 | dict[str, Any], model_name: str, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], batch_size: int = 100, concurrency: int | None = None, log_progress: bool = True, username: str | None = None, sudo: bool = False, job_id: str | None = None) EstimationResult¶
Returns an estimation of the memory consumption for that procedure.
- Parameters:
G (GraphV2 | dict[str, Any]) – Graph object to use or a dictionary representing the graph dimensions.
model_name (str) – Name under which the model will is stored
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
batch_size (int = 100) – The batch size for prediction.
concurrency (int | None) – Number of concurrent threads to use.
log_progress (bool) – Display progress logging.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
sudo (bool) – Disable the memory guard.
job_id (str | None) – Identifier for the computation.
- Returns:
The estimated cost of running the algorithm
- Return type:
- abstract mutate(G: GraphV2, model_name: str, mutate_property: str, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None, batch_size: int = 100) GraphSageMutateResult¶
Uses a pre-trained GraphSage model to predict embeddings for a graph and writes the results back to the graph as a node property.
- Parameters:
G (GraphV2) – Graph object to use
model_name (str) – Name under which the model will is stored
mutate_property (str) – Name of the node property to store the results in.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
log_progress (bool) – Display progress logging.
sudo (bool) – Disable the memory guard.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.
batch_size (int) – Number of nodes to process in each batch.
- Returns:
Algorithm metrics and statistics
- Return type:
- abstract stream(G: GraphV2, model_name: str, *, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None, batch_size: int = 100) DataFrame¶
Uses a pre-trained GraphSage model to predict embeddings for a graph and returns the results as a stream.
- Parameters:
G (GraphV2) – Graph object to use
model_name (str) – Name under which the model will is stored
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
log_progress (bool) – Display progress logging.
sudo (bool) – Disable the memory guard.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.
batch_size (int) – Number of nodes to process in each batch.
- Returns:
DataFrame with node IDs and their embeddings
- Return type:
DataFrame
- abstract write(G: GraphV2, model_name: str, write_property: str, *, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, write_concurrency: int | None = None, job_id: str | None = None, batch_size: int = 100) GraphSageWriteResult¶
Uses a pre-trained GraphSage model to predict embeddings for a graph and writes the results back to the database.
- Parameters:
G (GraphV2) – Graph object to use
model_name (str) – Name under which the model will is stored
write_property (str) – Name of the node property to store the results in.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
log_progress (bool) – Display progress logging.
sudo (bool) – Disable the memory guard.
concurrency (int | None) – Number of concurrent threads to use.
write_concurrency (int | None) – Number of concurrent threads to use for writing.
job_id (str | None) – Identifier for the computation.
batch_size (int) – Number of nodes to process in each batch.
- Returns:
Algorithm metrics and statistics
- Return type:
- class graphdatascience.procedure_surface.api.node_embedding.GraphSageTrainEndpoints¶
- abstract estimate(G: GraphV2, model_name: str, feature_properties: list[str], *, activation_function: str = 'SIGMOID', negative_sample_weight: int = 20, embedding_dimension: int = 64, tolerance: float = 0.0001, learning_rate: float = 0.1, max_iterations: int = 10, sample_sizes: list[int] | None = None, aggregator: str = 'MEAN', penalty_l2: float = 0.0, search_depth: int = 5, epochs: int = 1, projected_feature_dimension: int | None = None, batch_sampling_ratio: float | None = None, store_model_to_disk: bool = False, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None, batch_size: int = 100, relationship_weight_property: str | None = None, random_seed: int | None = None) EstimationResult¶
Estimates memory requirements and other statistics for training a GraphSage model.
This method provides memory estimation for the GraphSage training algorithm without actually executing the training. It helps determine the computational requirements before running the actual training procedure.
- Parameters:
G (GraphV2) – Graph object to use
model_name (str) – Name under which the model will be stored
feature_properties (list[str]) – The names of the node properties to use as input features
activation_function (str = "SIGMOID") – The activation function to apply after each layer
negative_sample_weight (int = 20) – Weight of negative samples in the loss function
embedding_dimension (int = 64) – The dimension of the generated embeddings
tolerance (float) – Minimum change in loss between iterations for early stopping an epoch.
learning_rate (float = 0.1) – Learning rate for the training optimization
max_iterations (int) – Maximum number of iterations to run.
sample_sizes (list[int] | None = None) – Number of neighbors to sample at each layer. Defaults to [25, 10] if not specified
aggregator (str = "MEAN") – The aggregator function for neighborhood aggregation
penalty_l2 (float = 0.0) – L2 regularization penalty
search_depth (int = 5) – Maximum search depth for neighbor sampling
epochs (int = 1) – Number of training epochs
projected_feature_dimension (int | None = None) – Dimension to project input features to before training
batch_sampling_ratio (float | None = None) – Ratio of nodes to sample for each training batch
store_model_to_disk (bool = False) – Whether to persist the model to disk
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
log_progress (bool) – Display progress logging.
sudo (bool) – Disable the memory guard.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.
batch_size (int) – Number of nodes to process in each batch.
relationship_weight_property (str | None) – Name of the property to be used as weights.
random_seed (int | None) – Seed for random number generation to ensure reproducible results.
- Returns:
The estimation result containing memory requirements and other statistics
- Return type:
- pydantic model graphdatascience.procedure_surface.api.node_embedding.GraphSageTrainResult¶
- pydantic model graphdatascience.procedure_surface.api.node_embedding.GraphSageWriteResult¶
- class graphdatascience.procedure_surface.api.node_embedding.HashGNNEndpoints¶
- abstract estimate(G: GraphV2 | dict[str, Any], iterations: int, embedding_density: int, output_dimension: int | None = None, neighbor_influence: float = 1.0, generate_features: dict[str, Any] | None = None, binarize_features: dict[str, Any] | None = None, heterogeneous: bool = False, feature_properties: list[str] | None = None, random_seed: int | None = None) EstimationResult¶
Returns an estimation of the memory consumption for that procedure.
- Parameters:
G (GraphV2 | dict[str, Any]) – Graph object to use or a dictionary representing the graph dimensions.
iterations (int) – Number of iterations to run.
embedding_density (int) – The density of the generated embeddings (number of bits per embedding)
output_dimension (int | None, default=None) – The dimension of the output embeddings.
neighbor_influence (float, default=1.0) – The influence of neighboring nodes.
generate_features (dict[str, Any] | None, default=None) – Configuration for generating synthetic features from existing node properties
binarize_features (dict[str, Any] | None, default=None) – Configuration for binarizing continuous features
heterogeneous (bool, default=False) – Whether to use heterogeneous node processing for different node types
feature_properties (list[str] | None, default=None) – The names of the node properties to use as input features. Defaults to [] if not specified
random_seed (int | None) – Seed for random number generation to ensure reproducible results.
- Returns:
The estimated cost of running the algorithm
- Return type:
- abstract mutate(G: GraphV2, iterations: int, embedding_density: int, mutate_property: str, output_dimension: int | None = None, neighbor_influence: float = 1.0, generate_features: dict[str, Any] | None = None, binarize_features: dict[str, Any] | None = None, heterogeneous: bool = False, feature_properties: list[str] | None = None, random_seed: int | None = None, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None) HashGNNMutateResult¶
Executes the HashGNN algorithm and writes the results back to the graph as a node property.
- Parameters:
G (GraphV2) – Graph object to use
iterations (int) – Number of iterations to run.
embedding_density (int) – The density of the generated embeddings (number of bits per embedding)
mutate_property (str) – Name of the node property to store the results in.
output_dimension (int | None, default=None) – The dimension of the output embeddings
neighbor_influence (float, default=1.0) – The influence of neighboring nodes
generate_features (dict[str, Any] | None, default=None) – Configuration for generating synthetic features from existing node properties
binarize_features (dict[str, Any] | None, default=None) – Configuration for binarizing continuous features
heterogeneous (bool, default=False) – Whether to use heterogeneous node processing for different node types
feature_properties (list[str] | None, default=None) – The names of the node properties to use as input features. Defaults to [] if not specified
random_seed (int | None) – Seed for random number generation to ensure reproducible results.
sudo (bool)
log_progress (bool)
username (str | None)
concurrency (int | None)
job_id (str | None)
- Returns:
Algorithm metrics and statistics
- Return type:
- abstract stream(G: GraphV2, iterations: int, embedding_density: int, output_dimension: int | None = None, neighbor_influence: float = 1.0, generate_features: dict[str, Any] | None = None, binarize_features: dict[str, Any] | None = None, heterogeneous: bool = False, feature_properties: list[str] | None = None, random_seed: int | None = None, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None) DataFrame¶
Executes the HashGNN algorithm and returns the results as a stream.
- Parameters:
G (GraphV2) – Graph object to use
iterations (int) – Number of iterations to run.
embedding_density (int) – The density of the generated embeddings (number of bits per embedding)
output_dimension (int | None, default=None) – The dimension of the output embeddings
neighbor_influence (float, default=1.0) – The influence of neighboring nodes
generate_features (dict[str, Any] | None, default=None) – Configuration for generating synthetic features from existing node properties
binarize_features (dict[str, Any] | None, default=None) – Configuration for binarizing continuous features
heterogeneous (bool, default=False) – Whether to use heterogeneous node processing for different node types
feature_properties (list[str] | None, default=None) – The names of the node properties to use as input features. Defaults to [] if not specified
random_seed (int | None) – Seed for random number generation to ensure reproducible results.
sudo (bool)
log_progress (bool)
username (str | None)
concurrency (int | None)
job_id (str | None)
- Returns:
DataFrame with node IDs and their embeddings
- Return type:
DataFrame
- abstract write(G: GraphV2, iterations: int, embedding_density: int, write_property: str, output_dimension: int | None = None, neighbor_influence: float = 1.0, generate_features: dict[str, Any] | None = None, binarize_features: dict[str, Any] | None = None, heterogeneous: bool = False, feature_properties: list[str] | None = None, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None, write_concurrency: int | None = None, random_seed: int | None = None) HashGNNWriteResult¶
Executes the HashGNN algorithm and writes the results back to the database.
- Parameters:
G (GraphV2) – Graph object to use
iterations (int) – Number of iterations to run.
embedding_density (int) – The density of the generated embeddings (number of bits per embedding)
write_property (str) – Name of the node property to store the results in.
output_dimension (int | None, default=None) – The dimension of the output embeddings. If not specified, defaults to embedding_density / 64
neighbor_influence (float, default=1.0) – The influence of neighboring nodes (0.0 to 1.0)
generate_features (dict[str, Any] | None, default=None) – Configuration for generating synthetic features from existing node properties
binarize_features (dict[str, Any] | None, default=None) – Configuration for binarizing continuous features
heterogeneous (bool, default=False) – Whether to use heterogeneous node processing for different node types
feature_properties (list[str] | None, default=None) – The names of the node properties to use as input features. Defaults to [] if not specified
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
sudo (bool) – Disable the memory guard.
log_progress (bool) – Display progress logging.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.
write_concurrency (int | None) – Number of concurrent threads to use for writing.
random_seed (int | None) – Seed for random number generation to ensure reproducible results.
- Returns:
Algorithm metrics and statistics
- Return type:
- pydantic model graphdatascience.procedure_surface.api.node_embedding.HashGNNMutateResult¶
Result object representing the results of running a HashGNN algorithm in mutate mode.
- pydantic model graphdatascience.procedure_surface.api.node_embedding.HashGNNWriteResult¶
Result object representing the results of running a HashGNN algorithm in write mode.
- class graphdatascience.procedure_surface.api.node_embedding.Node2VecEndpoints¶
- abstract estimate(G: GraphV2 | dict[str, Any], iterations: int = 1, negative_sampling_rate: int = 5, positive_sampling_factor: float = 0.001, embedding_dimension: int = 128, embedding_initializer: str = 'NORMALIZED', initial_learning_rate: float = 0.025, min_learning_rate: float = 0.0001, window_size: int = 10, negative_sampling_exponent: float = 0.75, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], concurrency: int | None = None, walk_length: int = 80, walks_per_node: int = 10, in_out_factor: float = 1.0, return_factor: float = 1.0, walk_buffer_size: int = 1000, relationship_weight_property: str | None = None, random_seed: int | None = None) EstimationResult¶
Returns an estimation of the memory consumption for that procedure.
- Parameters:
G (GraphV2 | dict[str, Any]) – Graph object to use or a dictionary representing the graph dimensions.
iterations (int) – Number of iterations to run.
negative_sampling_rate (int, default=5) – Number of negative samples for each positive sample
positive_sampling_factor (float, default=0.001) – Factor to multiply positive sampling weights
embedding_dimension (int, default=128) – The dimension of the generated embeddings
embedding_initializer (str, default="NORMALIZED") – Strategy for initializing node embeddings. Either “UNIFORM” or “NORMALIZED”
initial_learning_rate (float, default=0.025) – The initial learning rate
min_learning_rate (float, default=0.0001) – The minimum learning rate
window_size (int, default=10) – Size of the context window
negative_sampling_exponent (float, default=0.75) – Exponent for negative sampling probability distribution
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
concurrency (int | None) – Number of concurrent threads to use.
walk_length (int, default=80) – The length of each random walk
walks_per_node (int, default=10) – Number of walks to sample for each node
in_out_factor (float, default=1.0) – Controls the likelihood of immediately revisiting a node in the walk
return_factor (float, default=1.0) – Controls the likelihood of visiting already visited nodes
walk_buffer_size (int, default=1000) – Buffer size for walk sampling
relationship_weight_property (str | None) – Name of the property to be used as weights.
random_seed (int | None) – Seed for random number generation to ensure reproducible results.
- Returns:
Memory estimation details
- Return type:
- abstract mutate(G: GraphV2, mutate_property: str, iterations: int = 1, negative_sampling_rate: int = 5, positive_sampling_factor: float = 0.001, embedding_dimension: int = 128, embedding_initializer: str = 'NORMALIZED', initial_learning_rate: float = 0.025, min_learning_rate: float = 0.0001, window_size: int = 10, negative_sampling_exponent: float = 0.75, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None, walk_length: int = 80, walks_per_node: int = 10, in_out_factor: float = 1.0, return_factor: float = 1.0, walk_buffer_size: int = 1000, relationship_weight_property: str | None = None, random_seed: int | None = None) Node2VecMutateResult¶
Executes the Node2Vec algorithm and writes the results back to the graph as a node property.
- Parameters:
G (GraphV2) – Graph object to use
mutate_property (str) – Name of the node property to store the results in.
iterations (int) – Number of iterations to run.
negative_sampling_rate (int, default=5) – Number of negative samples for each positive sample
positive_sampling_factor (float, default=0.001) – Factor to multiply positive sampling weights
embedding_dimension (int, default=128) – The dimension of the generated embeddings
embedding_initializer (str, default="NORMALIZED") – Strategy for initializing node embeddings. Either “UNIFORM” or “NORMALIZED”
initial_learning_rate (float, default=0.025) – The initial learning rate
min_learning_rate (float, default=0.0001) – The minimum learning rate
window_size (int, default=10) – Size of the context window
negative_sampling_exponent (float, default=0.75) – Exponent for negative sampling probability distribution
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
log_progress (bool) – Display progress logging.
sudo (bool) – Disable the memory guard.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.
walk_length (int, default=80) – The length of each random walk
walks_per_node (int, default=10) – Number of walks to sample for each node
in_out_factor (float, default=1.0) – Controls the likelihood of immediately revisiting a node in the walk
return_factor (float, default=1.0) – Controls the likelihood of visiting already visited nodes
walk_buffer_size (int, default=1000) – Buffer size for walk sampling
relationship_weight_property (str | None) – Name of the property to be used as weights.
random_seed (int | None) – Seed for random number generation to ensure reproducible results.
- Returns:
Algorithm metrics and statistics
- Return type:
- abstract stream(G: GraphV2, iterations: int = 1, negative_sampling_rate: int = 5, positive_sampling_factor: float = 0.001, embedding_dimension: int = 128, embedding_initializer: str = 'NORMALIZED', initial_learning_rate: float = 0.025, min_learning_rate: float = 0.0001, window_size: int = 10, negative_sampling_exponent: float = 0.75, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None, walk_length: int = 80, walks_per_node: int = 10, in_out_factor: float = 1.0, return_factor: float = 1.0, walk_buffer_size: int = 1000, relationship_weight_property: str | None = None, random_seed: int | None = None) DataFrame¶
Executes the Node2Vec algorithm and returns the results as a stream.
- Parameters:
G (GraphV2) – Graph object to use
iterations (int) – Number of iterations to run.
negative_sampling_rate (int, default=5) – Number of negative samples for each positive sample
positive_sampling_factor (float, default=0.001) – Factor to multiply positive sampling weights
embedding_dimension (int, default=128) – The dimension of the generated embeddings
embedding_initializer (str, default="NORMALIZED") – Strategy for initializing node embeddings. Either “UNIFORM” or “NORMALIZED”
initial_learning_rate (float, default=0.025) – The initial learning rate
min_learning_rate (float, default=0.0001) – The minimum learning rate
window_size (int, default=10) – Size of the context window
negative_sampling_exponent (float, default=0.75) – Exponent for negative sampling probability distribution
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
log_progress (bool) – Display progress logging.
sudo (bool) – Disable the memory guard.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.
walk_length (int, default=80) – The length of each random walk
walks_per_node (int, default=10) – Number of walks to sample for each node
in_out_factor (float, default=1.0) – Controls the likelihood of immediately revisiting a node in the walk
return_factor (float, default=1.0) – Controls the likelihood of visiting already visited nodes
walk_buffer_size (int, default=1000) – Buffer size for walk sampling
relationship_weight_property (str | None) – Name of the property to be used as weights.
random_seed (int | None) – Seed for random number generation to ensure reproducible results.
- Returns:
Embeddings as a stream with columns nodeId and embedding
- Return type:
DataFrame
- abstract write(G: GraphV2, write_property: str, iterations: int = 1, negative_sampling_rate: int = 5, positive_sampling_factor: float = 0.001, embedding_dimension: int = 128, embedding_initializer: str = 'NORMALIZED', initial_learning_rate: float = 0.025, min_learning_rate: float = 0.0001, window_size: int = 10, negative_sampling_exponent: float = 0.75, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None, walk_length: int = 80, walks_per_node: int = 10, in_out_factor: float = 1.0, return_factor: float = 1.0, walk_buffer_size: int = 1000, relationship_weight_property: str | None = None, random_seed: int | None = None, write_concurrency: int | None = None) Node2VecWriteResult¶
Executes the Node2Vec algorithm and writes the results back to the database.
- Parameters:
G (GraphV2) – Graph object to use
write_property (str) – Name of the node property to store the results in.
iterations (int) – Number of iterations to run.
negative_sampling_rate (int, default=5) – Number of negative samples for each positive sample
positive_sampling_factor (float, default=0.001) – Factor to multiply positive sampling weights
embedding_dimension (int, default=128) – The dimension of the generated embeddings
embedding_initializer (str, default="NORMALIZED") – Strategy for initializing node embeddings. Either “UNIFORM” or “NORMALIZED”
initial_learning_rate (float, default=0.025) – The initial learning rate
min_learning_rate (float, default=0.0001) – The minimum learning rate
window_size (int, default=10) – Size of the context window
negative_sampling_exponent (float, default=0.75) – Exponent for negative sampling probability distribution
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
log_progress (bool) – Display progress logging.
sudo (bool) – Disable the memory guard.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.
walk_length (int, default=80) – The length of each random walk
walks_per_node (int, default=10) – Number of walks to sample for each node
in_out_factor (float, default=1.0) – Controls the likelihood of immediately revisiting a node in the walk
return_factor (float, default=1.0) – Controls the likelihood of visiting already visited nodes
walk_buffer_size (int, default=1000) – Buffer size for walk sampling
relationship_weight_property (str | None) – Name of the property to be used as weights.
random_seed (int | None) – Seed for random number generation to ensure reproducible results.
write_concurrency (int | None) – Number of concurrent threads to use for writing.Returns
-------
Node2VecWriteResult – Algorithm metrics and statistics
- Return type:
- pydantic model graphdatascience.procedure_surface.api.node_embedding.Node2VecMutateResult¶
- pydantic model graphdatascience.procedure_surface.api.node_embedding.Node2VecWriteResult¶