Pathfinding Algorithms¶

abstract add_logistic_regression(pipeline_name: str, *, batch_size: int | tuple[int, int] = 100, class_weights: list[float] | None = None, focus_weight: float | tuple[float, float] = 0.0, learning_rate: float | tuple[float, float] = 0.001, max_epochs: int | tuple[int, int] = 100, min_epochs: int | tuple[int, int] = 1, patience: int | tuple[int, int] = 1, penalty: float | tuple[float, float] = 0.0, tolerance: float | tuple[float, float] = 0.001) → LinkPredictionPipelineInfoResult¶

Add a logistic regression model candidate to the pipeline.

Parameters:

pipeline_name (str) – Name of the pipeline.
batch_size (int | tuple[int, int]) – Batch size to use during training. Pass a two-value tuple to define a parameter range.
class_weights (list[float] | None) – Optional class weights to use during training.
focus_weight (float | tuple[float, float]) – Focus weight for optimization. Pass a two-value tuple to define a parameter range.
learning_rate (float | tuple[float, float]) – Learning rate for optimization. Pass a two-value tuple to define a parameter range.
max_epochs (int | tuple[int, int]) – Maximum number of training epochs. Pass a two-value tuple to define a parameter range.
min_epochs (int | tuple[int, int]) – Minimum number of training epochs. Pass a two-value tuple to define a parameter range.
patience (int | tuple[int, int]) – Early stopping patience. Pass a two-value tuple to define a parameter range.
penalty (float | tuple[float, float]) – Penalty term to use during training. Pass a two-value tuple to define a parameter range.
tolerance (float | tuple[float, float]) – Convergence tolerance. Pass a two-value tuple to define a parameter range.

Returns:

The updated pipeline state.

Return type:

abstract add_mlp(pipeline_name: str, *, batch_size: int | tuple[int, int] = 100, class_weights: list[float] | None = None, focus_weight: float | tuple[float, float] = 0.0, hidden_layer_sizes: list[int] = [100], learning_rate: float | tuple[float, float] = 0.001, max_epochs: int | tuple[int, int] = 100, min_epochs: int | tuple[int, int] = 1, patience: int | tuple[int, int] = 1, penalty: float | tuple[float, float] = 0.0, tolerance: float | tuple[float, float] = 0.001) → LinkPredictionPipelineInfoResult¶

Add a multi-layer perceptron model candidate to the pipeline.

Parameters:

pipeline_name (str) – Name of the pipeline.
batch_size (int | tuple[int, int]) – Batch size to use during training. Pass a two-value tuple to define a parameter range.
class_weights (list[float] | None) – Optional class weights to use during training.
focus_weight (float | tuple[float, float]) – Focus weight for optimization. Pass a two-value tuple to define a parameter range.
hidden_layer_sizes (list[int]) – Sizes of the hidden layers in the neural network.
learning_rate (float | tuple[float, float]) – Learning rate for optimization. Pass a two-value tuple to define a parameter range.
max_epochs (int | tuple[int, int]) – Maximum number of training epochs. Pass a two-value tuple to define a parameter range.
min_epochs (int | tuple[int, int]) – Minimum number of training epochs. Pass a two-value tuple to define a parameter range.
patience (int | tuple[int, int]) – Early stopping patience. Pass a two-value tuple to define a parameter range.
penalty (float | tuple[float, float]) – Penalty term to use during training. Pass a two-value tuple to define a parameter range.
tolerance (float | tuple[float, float]) – Convergence tolerance. Pass a two-value tuple to define a parameter range.

Returns:

The updated pipeline state.

Return type:

abstract add_node_property(pipeline_name: str, task_name: str, **config: Any) → LinkPredictionPipelineInfoResult¶

Add a node property step to the pipeline.

Parameters:

pipeline_name (str) – Name of the pipeline.
task_name (str) – The task name of the node property step to add.
config (Any) – Additional configuration for the node property step.

Returns:

The updated pipeline state.

Return type:

abstract add_random_forest(pipeline_name: str, *, criterion: str | None = 'GINI', max_depth: int | tuple[int, int] = 2147483647, max_features_ratio: float | tuple[float, float] | None = None, min_leaf_size: int | tuple[int, int] = 1, min_split_size: int | tuple[int, int] = 2, number_of_decision_trees: int | tuple[int, int] = 100, number_of_samples_ratio: float | tuple[float, float] = 1.0) → LinkPredictionPipelineInfoResult¶

Add a random forest model candidate to the pipeline.

Parameters:

pipeline_name (str) – Name of the pipeline.
criterion (str | None) – Split criterion to optimize.
max_depth (int | tuple[int, int]) – Maximum tree depth. Pass a two-value tuple to define a parameter range.
max_features_ratio (float | tuple[float, float] | None) – Fraction of features sampled per split. Pass a two-value tuple to define a parameter range.
min_leaf_size (int | tuple[int, int]) – Minimum number of samples in a leaf. Pass a two-value tuple to define a parameter range.
min_split_size (int | tuple[int, int]) – Minimum number of samples required to split a node. Pass a two-value tuple to define a parameter range.
number_of_decision_trees (int | tuple[int, int]) – Number of trees to train. Pass a two-value tuple to define a parameter range.
number_of_samples_ratio (float | tuple[float, float]) – Fraction of samples used per tree. Pass a two-value tuple to define a parameter range.

Returns:

The updated pipeline state.

Return type:

abstract configure_auto_tuning(pipeline_name: str, *, max_trials: int = 10) → LinkPredictionPipelineInfoResult¶

Configure auto-tuning for the pipeline.

Parameters:

pipeline_name (str) – Name of the pipeline.
max_trials (int) – Maximum number of trials to run during auto-tuning.

Returns:

The updated pipeline state.

Return type:

abstract configure_split(pipeline_name: str, *, negative_relationship_type: str | None = None, negative_sampling_ratio: float = 1.0, test_fraction: float = 0.1, train_fraction: float = 0.1, validation_folds: int = 3) → LinkPredictionPipelineInfoResult¶

Configure the train-test split used by the pipeline.

Parameters:

pipeline_name (str) – Name of the pipeline.
negative_relationship_type (str | None) – Relationship type to use for the negative samples.
negative_sampling_ratio (float) – Ratio of sampled negative relationships.
test_fraction (float) – Fraction of relationships reserved for testing.
train_fraction (float) – Fraction of relationships reserved for training.
validation_folds (int) – Number of validation folds to use.

Returns:

The updated pipeline state.

Return type:

abstract create(pipeline_name: str) → tuple[LinkPredictionPipeline, LinkPredictionPipelineInfoResult]¶

Create a new link prediction pipeline.

Parameters:: pipeline_name (str) – Name of the pipeline.
Returns:: The created pipeline and the corresponding result payload.
Return type:: tuple[LinkPredictionPipeline, LinkPredictionPipelineInfoResult]

abstract get(pipeline_name: str) → LinkPredictionPipeline¶

Retrieve an existing link prediction pipeline by name.

Parameters:: pipeline_name (str) – Name of the pipeline.
Returns:: The reconstructed pipeline object.
Return type:: LinkPredictionPipeline

abstract property predict: LinkPredictionPipelinePredictEndpoints¶: Access prediction endpoints for link prediction models trained from this surface.

abstract property train: LinkPredictionPipelineTrainEndpoints¶: Access training endpoints for link prediction pipelines.

pydantic model graphdatascience.procedure_surface.api.pipeline.LinkPredictionPipelineInfoResult¶

field auto_tuning_config: dict[str, Any]¶

field feature_steps: list[Any]¶

field name: str¶

field node_property_steps: list[Any]¶

field parameter_space: dict[str, Any]¶

field split_config: dict[str, Any]¶

class graphdatascience.procedure_surface.api.pipeline.LinkPredictionPipelinePredictEndpoints¶

pydantic model graphdatascience.procedure_surface.api.pipeline.LinkPredictionPipelinePredictMutateResult¶

field compute_millis: int | None¶

field configuration: dict[str, Any] | None¶

field mutate_millis: int | None¶

field post_processing_millis: int | None¶

field pre_processing_millis: int | None¶

field probability_distribution: dict[str, Any] | None¶

field relationships_written: int | None¶

field sampling_stats: dict[str, Any] | None¶

pydantic model graphdatascience.procedure_surface.api.pipeline.LinkPredictionPipelineTrainResult¶

field configuration: dict[str, Any]¶

field model_info: LinkPredictionModelInfoResult¶

field model_selection_stats: dict[str, Any]¶

field train_millis: int¶

class graphdatascience.procedure_surface.api.pipeline.NodeClassificationPipeline¶

Represents a node classification training pipeline.

Construct this using gds.v2.pipeline.node_classification.create().

add_logistic_regression(*, batch_size: int | tuple[int, int] = 100, class_weights: list[float] | None = None, focus_weight: float | tuple[float, float] = 0.0, learning_rate: float | tuple[float, float] = 0.001, max_epochs: int | tuple[int, int] = 100, min_epochs: int | tuple[int, int] = 1, patience: int | tuple[int, int] = 1, penalty: float | tuple[float, float] = 0.0, tolerance: float | tuple[float, float] = 0.001) → NodeClassificationPipelineInfoResult¶

Add a logistic regression model candidate to the pipeline.

Parameters:

batch_size (int | tuple[int, int]) – Batch size to use during training. Pass a two-value tuple to define a parameter range.
class_weights (list[float] | None) – Optional class weights to use during training.
focus_weight (float | tuple[float, float]) – Focus weight for optimization. Pass a two-value tuple to define a parameter range.
learning_rate (float | tuple[float, float]) – Learning rate for optimization. Pass a two-value tuple to define a parameter range.
max_epochs (int | tuple[int, int]) – Maximum number of training epochs. Pass a two-value tuple to define a parameter range.
min_epochs (int | tuple[int, int]) – Minimum number of training epochs. Pass a two-value tuple to define a parameter range.
patience (int | tuple[int, int]) – Early stopping patience. Pass a two-value tuple to define a parameter range.
penalty (float | tuple[float, float]) – Penalty term to use during training. Pass a two-value tuple to define a parameter range.
tolerance (float | tuple[float, float]) – Convergence tolerance. Pass a two-value tuple to define a parameter range.

Returns:

The updated pipeline state.

Return type:

add_mlp(*, batch_size: int | tuple[int, int] = 100, class_weights: list[float] | None = None, focus_weight: float | tuple[float, float] = 0.0, hidden_layer_sizes: list[int] = [100], learning_rate: float | tuple[float, float] = 0.001, max_epochs: int | tuple[int, int] = 100, min_epochs: int | tuple[int, int] = 1, patience: int | tuple[int, int] = 1, penalty: float | tuple[float, float] = 0.0, tolerance: float | tuple[float, float] = 0.001) → NodeClassificationPipelineInfoResult¶

Add a multi-layer perceptron model candidate to the pipeline.

Parameters:

batch_size (int | tuple[int, int]) – Batch size to use during training. Pass a two-value tuple to define a parameter range.
class_weights (list[float] | None) – Optional class weights to use during training.
focus_weight (float | tuple[float, float]) – Focus weight for optimization. Pass a two-value tuple to define a parameter range.
hidden_layer_sizes (list[int]) – Sizes of the hidden layers in the neural network.
learning_rate (float | tuple[float, float]) – Learning rate for optimization. Pass a two-value tuple to define a parameter range.
max_epochs (int | tuple[int, int]) – Maximum number of training epochs. Pass a two-value tuple to define a parameter range.
min_epochs (int | tuple[int, int]) – Minimum number of training epochs. Pass a two-value tuple to define a parameter range.
patience (int | tuple[int, int]) – Early stopping patience. Pass a two-value tuple to define a parameter range.
penalty (float | tuple[float, float]) – Penalty term to use during training. Pass a two-value tuple to define a parameter range.
tolerance (float | tuple[float, float]) – Convergence tolerance. Pass a two-value tuple to define a parameter range.

Returns:

The updated pipeline state.

Return type:

add_node_property(task_name: str, **config: Any) → NodeClassificationPipelineInfoResult¶

Add a node property step to the pipeline.

Parameters:

task_name (str) – The task name of the node property step to add.
config (Any)

Returns:

The updated pipeline state.

Return type:

add_random_forest(*, criterion: str | None = 'GINI', max_depth: int | tuple[int, int] = 2147483647, max_features_ratio: float | tuple[float, float] | None = None, min_leaf_size: int | tuple[int, int] = 1, min_split_size: int | tuple[int, int] = 2, number_of_decision_trees: int | tuple[int, int] = 100, number_of_samples_ratio: float | tuple[float, float] = 1.0) → NodeClassificationPipelineInfoResult¶

Add a random forest model candidate to the pipeline.

Parameters:

criterion (str | None) – Split criterion to optimize.
max_depth (int | tuple[int, int]) – Maximum tree depth. Pass a two-value tuple to define a parameter range.
max_features_ratio (float | tuple[float, float] | None) – Fraction of features sampled per split. Pass a two-value tuple to define a parameter range.
min_leaf_size (int | tuple[int, int]) – Minimum number of samples in a leaf. Pass a two-value tuple to define a parameter range.
min_split_size (int | tuple[int, int]) – Minimum number of samples required to split a node. Pass a two-value tuple to define a parameter range.
number_of_decision_trees (int | tuple[int, int]) – Number of trees to train. Pass a two-value tuple to define a parameter range.
number_of_samples_ratio (float | tuple[float, float]) – Fraction of samples used per tree. Pass a two-value tuple to define a parameter range.

Returns:

The updated pipeline state.

Return type:

configure_auto_tuning(*, max_trials: int = 10) → NodeClassificationPipelineInfoResult¶

Configure auto-tuning for the pipeline.

Parameters:: max_trials (int) – Maximum number of trials to run during auto-tuning.
Returns:: The updated pipeline state.
Return type:: NodeClassificationPipelineInfoResult

configure_split(*, test_fraction: float = 0.3, validation_folds: int = 3) → NodeClassificationPipelineInfoResult¶

Configure the train-test split used by the pipeline.

Parameters:

test_fraction (float) – Fraction of nodes reserved for testing.
validation_folds (int) – Number of validation folds to use.

Returns:

The updated pipeline state.

Return type:

drop(fail_if_missing: bool = False) → PipelineCatalogEntryProtocol | None¶

Drop the pipeline and return its catalog entry when available.

Parameters:: fail_if_missing (bool)
Return type:: PipelineCatalogEntryProtocol | None

exists() → bool¶

Return whether the pipeline exists.

Return type:: bool

name() → str¶

Return the pipeline name.

Return type:: str

select_features(node_properties: str | list[str]) → NodeClassificationPipelineInfoResult¶

Select the node properties used as input features.

Parameters:: node_properties (str | list[str]) – One or more node properties to use as features.
Returns:: The updated pipeline state.
Return type:: NodeClassificationPipelineInfoResult

train(G: GraphV2, *, metrics: list[str], model_name: str, target_property: str, relationship_types: list[str] = ['*'], target_node_labels: list[str] = ['*'], store_model_to_disk: bool = False, random_seed: Any | None = None, username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None) → tuple[NodeClassificationModelV2, NodeClassificationPipelineTrainResult]¶

Train a node classification model from this pipeline.

Parameters:

G (GraphV2) – Graph object to use
metrics (list[str]) – Metrics to optimize for.
model_name (str) – Name of the trained model.
target_property (str) – The target node property to predict.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
target_node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
store_model_to_disk (bool) – Whether to persist the trained model to disk.
random_seed (Any | None) – Seed for random number generation to ensure reproducible results.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
log_progress (bool) – Display progress logging.
sudo (bool) – Disable the memory guard.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.

Returns:

The trained model and the corresponding training result.

Return type:

tuple[NodeClassificationModelV2, NodeClassificationPipelineTrainResult]

train_estimate(G: GraphV2, *, metrics: list[str], model_name: str, target_property: str, relationship_types: list[str] = ['*'], target_node_labels: list[str] = ['*'], store_model_to_disk: bool = False, random_seed: Any | None = None, username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None) → EstimationResult¶

Estimate the memory required to train a node classification model from this pipeline.

Parameters:

G (GraphV2) – Graph object to use
metrics (list[str]) – Metrics to optimize for.
model_name (str) – Name of the trained model.
target_property (str) – The target node property to predict.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
target_node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
store_model_to_disk (bool) – Whether to persist the trained model to disk.
random_seed (Any | None) – Seed for random number generation to ensure reproducible results.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
log_progress (bool) – Display progress logging.
sudo (bool) – Disable the memory guard.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.

Returns:

The estimated memory footprint for training.

Return type:

EstimationResult

class graphdatascience.procedure_surface.api.pipeline.NodeClassificationPipelineEndpoints¶

abstract add_logistic_regression(pipeline_name: str, *, batch_size: int | tuple[int, int] = 100, class_weights: list[float] | None = None, focus_weight: float | tuple[float, float] = 0.0, learning_rate: float | tuple[float, float] = 0.001, max_epochs: int | tuple[int, int] = 100, min_epochs: int | tuple[int, int] = 1, patience: int | tuple[int, int] = 1, penalty: float | tuple[float, float] = 0.0, tolerance: float | tuple[float, float] = 0.001) → NodeClassificationPipelineInfoResult¶

Add a logistic regression model candidate to the pipeline.

Parameters:

pipeline_name (str) – Name of the pipeline.
batch_size (int | tuple[int, int]) – Batch size to use during training. Pass a two-value tuple to define a parameter range.
class_weights (list[float] | None) – Optional class weights to use during training.
focus_weight (float | tuple[float, float]) – Focus weight for optimization. Pass a two-value tuple to define a parameter range.
learning_rate (float | tuple[float, float]) – Learning rate for optimization. Pass a two-value tuple to define a parameter range.
max_epochs (int | tuple[int, int]) – Maximum number of training epochs. Pass a two-value tuple to define a parameter range.
min_epochs (int | tuple[int, int]) – Minimum number of training epochs. Pass a two-value tuple to define a parameter range.
patience (int | tuple[int, int]) – Early stopping patience. Pass a two-value tuple to define a parameter range.
penalty (float | tuple[float, float]) – Penalty term to use during training. Pass a two-value tuple to define a parameter range.
tolerance (float | tuple[float, float]) – Convergence tolerance. Pass a two-value tuple to define a parameter range.

Returns:

The updated pipeline state.

Return type:

abstract add_mlp(pipeline_name: str, *, batch_size: int | tuple[int, int] = 100, class_weights: list[float] | None = None, focus_weight: float | tuple[float, float] = 0.0, hidden_layer_sizes: list[int] = [100], learning_rate: float | tuple[float, float] = 0.001, max_epochs: int | tuple[int, int] = 100, min_epochs: int | tuple[int, int] = 1, patience: int | tuple[int, int] = 1, penalty: float | tuple[float, float] = 0.0, tolerance: float | tuple[float, float] = 0.001) → NodeClassificationPipelineInfoResult¶

Add a multi-layer perceptron model candidate to the pipeline.

Parameters:

pipeline_name (str) – Name of the pipeline.
batch_size (int | tuple[int, int]) – Batch size to use during training. Pass a two-value tuple to define a parameter range.
class_weights (list[float] | None) – Optional class weights to use during training.
focus_weight (float | tuple[float, float]) – Focus weight for optimization. Pass a two-value tuple to define a parameter range.
hidden_layer_sizes (list[int]) – Sizes of the hidden layers in the neural network.
learning_rate (float | tuple[float, float]) – Learning rate for optimization. Pass a two-value tuple to define a parameter range.
max_epochs (int | tuple[int, int]) – Maximum number of training epochs. Pass a two-value tuple to define a parameter range.
min_epochs (int | tuple[int, int]) – Minimum number of training epochs. Pass a two-value tuple to define a parameter range.
patience (int | tuple[int, int]) – Early stopping patience. Pass a two-value tuple to define a parameter range.
penalty (float | tuple[float, float]) – Penalty term to use during training. Pass a two-value tuple to define a parameter range.
tolerance (float | tuple[float, float]) – Convergence tolerance. Pass a two-value tuple to define a parameter range.

Returns:

The updated pipeline state.

Return type:

abstract add_node_property(pipeline_name: str, task_name: str, **config: Any) → NodeClassificationPipelineInfoResult¶

Add a node property step to the pipeline.

Parameters:

pipeline_name (str) – Name of the pipeline.
task_name (str) – The task name of the node property step to add.
config (Any) – Additional configuration for the node property step.

Returns:

The updated pipeline state.

Return type:

abstract add_random_forest(pipeline_name: str, *, criterion: str | None = 'GINI', max_depth: int | tuple[int, int] = 2147483647, max_features_ratio: float | tuple[float, float] | None = None, min_leaf_size: int | tuple[int, int] = 1, min_split_size: int | tuple[int, int] = 2, number_of_decision_trees: int | tuple[int, int] = 100, number_of_samples_ratio: float | tuple[float, float] = 1.0) → NodeClassificationPipelineInfoResult¶

Add a random forest model candidate to the pipeline.

Parameters:

pipeline_name (str) – Name of the pipeline.
criterion (str | None) – Split criterion to optimize.
max_depth (int | tuple[int, int]) – Maximum tree depth. Pass a two-value tuple to define a parameter range.
max_features_ratio (float | tuple[float, float] | None) – Fraction of features sampled per split. Pass a two-value tuple to define a parameter range.
min_leaf_size (int | tuple[int, int]) – Minimum number of samples in a leaf. Pass a two-value tuple to define a parameter range.
min_split_size (int | tuple[int, int]) – Minimum number of samples required to split a node. Pass a two-value tuple to define a parameter range.
number_of_decision_trees (int | tuple[int, int]) – Number of trees to train. Pass a two-value tuple to define a parameter range.
number_of_samples_ratio (float | tuple[float, float]) – Fraction of samples used per tree. Pass a two-value tuple to define a parameter range.

Returns:

The updated pipeline state.

Return type:

abstract configure_auto_tuning(pipeline_name: str, *, max_trials: int = 10) → NodeClassificationPipelineInfoResult¶

Configure auto-tuning for the pipeline.

Parameters:

pipeline_name (str) – Name of the pipeline.
max_trials (int) – Maximum number of trials to run during auto-tuning.

Returns:

The updated pipeline state.

Return type:

abstract configure_split(pipeline_name: str, *, test_fraction: float = 0.3, validation_folds: int = 3) → NodeClassificationPipelineInfoResult¶

Configure the train-test split used by the pipeline.

Parameters:

pipeline_name (str) – Name of the pipeline.
test_fraction (float) – Fraction of nodes reserved for testing.
validation_folds (int) – Number of validation folds to use.

Returns:

The updated pipeline state.

Return type:

abstract create(pipeline_name: str) → tuple[NodeClassificationPipeline, NodeClassificationPipelineInfoResult]¶

Create a new node classification pipeline.

Parameters:: pipeline_name (str) – Name of the pipeline.
Returns:: The created pipeline and the corresponding result payload.
Return type:: tuple[NodeClassificationPipeline, NodeClassificationPipelineInfoResult]

abstract get(pipeline_name: str) → NodeClassificationPipeline¶

Retrieve an existing node classification pipeline by name.

Parameters:: pipeline_name (str) – Name of the pipeline.
Returns:: The reconstructed pipeline object.
Return type:: NodeClassificationPipeline

abstract property predict: NodeClassificationPipelinePredictEndpoints¶: Access prediction endpoints for node classification models trained from this surface.

abstract select_features(pipeline_name: str, node_properties: str | list[str]) → NodeClassificationPipelineInfoResult¶

Select the node properties used as input features.

Parameters:

pipeline_name (str) – Name of the pipeline.
node_properties (str | list[str]) – One or more node properties to use as features.

Returns:

The updated pipeline state.

Return type:

NodeClassificationPipelinePredictMutateResult

abstract property train: NodeClassificationPipelineTrainEndpoints¶: Access training endpoints for node classification pipelines.

pydantic model graphdatascience.procedure_surface.api.pipeline.NodeClassificationPipelineInfoResult¶

field auto_tuning_config: dict[str, Any]¶

field feature_properties: list[Any]¶

field name: str¶

field node_property_steps: list[Any]¶

field parameter_space: dict[str, Any]¶

field split_config: dict[str, Any]¶

class graphdatascience.procedure_surface.api.pipeline.NodeClassificationPipelinePredictEndpoints¶

abstract estimate(G: GraphV2, model_name: str, *, relationship_types: list[str] | None = None, target_node_labels: list[str] | None = None, username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None) → EstimationResult¶

Estimate the memory required to run node classification prediction.

Parameters:

G (GraphV2) – Graph object to use
model_name (str) – Name of the model.
relationship_types (list[str] | None) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
target_node_labels (list[str] | None) – Optional node label filter.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
log_progress (bool) – Display progress logging.
sudo (bool) – Disable the memory guard.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.

Returns:

The estimated memory footprint for prediction.

Return type:

EstimationResult

abstract mutate(G: GraphV2, model_name: str, mutate_property: str, *, relationship_types: list[str] | None = None, target_node_labels: list[str] | None = None, predicted_probability_property: str | None = None, username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None) → NodeClassificationPipelinePredictMutateResult¶

Run node classification prediction in mutate mode.

Parameters:

G (GraphV2) – Graph object to use
model_name (str) – Name of the model.
mutate_property (str) – Name of the node property to store the results in.
relationship_types (list[str] | None) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
target_node_labels (list[str] | None) – Optional node label filter.
predicted_probability_property (str | None) – Optional node property to store the predicted probability distribution in.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
log_progress (bool) – Display progress logging.
sudo (bool) – Disable the memory guard.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.

Returns:

The mutate result summary.

Return type:

abstract stream(G: GraphV2, model_name: str, *, relationship_types: list[str] | None = None, target_node_labels: list[str] | None = None, include_predicted_probabilities: bool = False, username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None) → DataFrame¶

Run node classification prediction in stream mode.

Parameters:

G (GraphV2) – Graph object to use
model_name (str) – Name of the model.
relationship_types (list[str] | None) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
target_node_labels (list[str] | None) – Optional node label filter.
include_predicted_probabilities (bool) – Whether to include the predicted probability distribution in the streamed results.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
log_progress (bool) – Display progress logging.
sudo (bool) – Disable the memory guard.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.

Returns:

The prediction results as a DataFrame.

Return type:

DataFrame

abstract write(G: GraphV2, model_name: str, write_property: str, *, relationship_types: list[str] | None = None, target_node_labels: list[str] | None = None, predicted_probability_property: str | None = None, username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, write_concurrency: int | None = None, job_id: str | None = None) → NodeClassificationPipelinePredictWriteResult¶

Run node classification prediction in write mode.

Parameters:

G (GraphV2) – Graph object to use
model_name (str) – Name of the model.
write_property (str) – Name of the node property to store the results in.
relationship_types (list[str] | None) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
target_node_labels (list[str] | None) – Optional node label filter.
predicted_probability_property (str | None) – Optional node property to store the predicted probability distribution in.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
log_progress (bool) – Display progress logging.
sudo (bool) – Disable the memory guard.
concurrency (int | None) – Number of concurrent threads to use.
write_concurrency (int | None) – Number of concurrent threads to use for writing.
job_id (str | None) – Identifier for the computation.

Returns:

The write result summary.

Return type:

NodeClassificationPipelinePredictWriteResult

pydantic model graphdatascience.procedure_surface.api.pipeline.NodeClassificationPipelinePredictMutateResult¶

field compute_millis: int | None¶

field configuration: dict[str, Any] | None¶

field mutate_millis: int | None¶

field node_properties_written: int | None¶

field post_processing_millis: int | None¶

field pre_processing_millis: int | None¶

pydantic model graphdatascience.procedure_surface.api.pipeline.NodeClassificationPipelinePredictWriteResult¶

field compute_millis: int | None¶

field configuration: dict[str, Any] | None¶

field node_properties_written: int | None¶

field post_processing_millis: int | None¶

field pre_processing_millis: int | None¶

field write_millis: int | None¶

pydantic model graphdatascience.procedure_surface.api.pipeline.NodeClassificationPipelineTrainResult¶

field configuration: dict[str, Any]¶

field model_info: NodeClassificationModelInfoResult¶

field model_selection_stats: dict[str, Any]¶

field train_millis: int¶

enum graphdatascience.procedure_surface.api.pipeline.NodeRegressionMetric(value)¶

Member Type:: str

Valid values are as follows:

MEAN_SQUARED_ERROR = <NodeRegressionMetric.MEAN_SQUARED_ERROR: 'MEAN_SQUARED_ERROR'>¶

ROOT_MEAN_SQUARED_ERROR = <NodeRegressionMetric.ROOT_MEAN_SQUARED_ERROR: 'ROOT_MEAN_SQUARED_ERROR'>¶

MEAN_ABSOLUTE_ERROR = <NodeRegressionMetric.MEAN_ABSOLUTE_ERROR: 'MEAN_ABSOLUTE_ERROR'>¶

The Enum and its members also have the following methods:

__new__(value)¶

class graphdatascience.procedure_surface.api.pipeline.NodeRegressionPipeline¶

Represents a node regression training pipeline.

Construct this using: func:gds.v2.pipeline.node_regression.create().

add_linear_regression(*, batch_size: int | tuple[int, int] = 100, learning_rate: float | tuple[float, float] = 0.001, max_epochs: int | tuple[int, int] = 100, min_epochs: int | tuple[int, int] = 1, patience: int | tuple[int, int] = 1, penalty: float | tuple[float, float] = 0.0, tolerance: float | tuple[float, float] = 0.001) → NodeRegressionPipelineInfoResult¶

Add a linear regression model candidate to the pipeline.

Parameters:

batch_size (int | tuple[int, int]) – Batch size to use during training. Pass a two-value tuple to define a parameter range.
learning_rate (float | tuple[float, float]) – Learning rate for optimization. Pass a two-value tuple to define a parameter range.
max_epochs (int | tuple[int, int]) – Maximum number of training epochs. Pass a two-value tuple to define a parameter range.
min_epochs (int | tuple[int, int]) – Minimum number of training epochs. Pass a two-value tuple to define a parameter range.
patience (int | tuple[int, int]) – Early stopping patience. Pass a two-value tuple to define a parameter range.
penalty (float | tuple[float, float]) – Penalty term to use during training. Pass a two-value tuple to define a parameter range.
tolerance (float | tuple[float, float]) – Convergence tolerance. Pass a two-value tuple to define a parameter range.

Returns:

The updated pipeline state.

Return type:

add_node_property(task_name: str, **config: Any) → NodeRegressionPipelineInfoResult¶

Add a node property step to the pipeline.

Parameters:

task_name (str) – The name of the node property step to add.
config (Any)

Returns:

The updated pipeline state.

Return type:

add_random_forest(*, max_depth: int | tuple[int, int] = 2147483647, max_features_ratio: float | tuple[float, float] | None = None, min_leaf_size: int | tuple[int, int] = 1, min_split_size: int | tuple[int, int] = 2, number_of_decision_trees: int | tuple[int, int] = 100, number_of_samples_ratio: float | tuple[float, float] = 1.0) → NodeRegressionPipelineInfoResult¶

Add a random forest model candidate to the pipeline.

Parameters:

max_depth (int | tuple[int, int]) – Maximum tree depth. Pass a two-value tuple to define a parameter range.
max_features_ratio (float | tuple[float, float] | None) – Fraction of features sampled per split. Pass a two-value tuple to define a parameter range.
min_leaf_size (int | tuple[int, int]) – Minimum number of samples in a leaf. Pass a two-value tuple to define a parameter range.
min_split_size (int | tuple[int, int]) – Minimum number of samples required to split a node. Pass a two-value tuple to define a parameter range.
number_of_decision_trees (int | tuple[int, int]) – Number of trees to train. Pass a two-value tuple to define a parameter range.
number_of_samples_ratio (float | tuple[float, float]) – Fraction of samples used per tree. Pass a two-value tuple to define a parameter range.

Returns:

The updated pipeline state.

Return type:

configure_auto_tuning(*, max_trials: int = 10) → NodeRegressionPipelineInfoResult¶

Configure auto-tuning for the pipeline.

Parameters:: max_trials (int) – Maximum number of trials to run during auto-tuning.
Returns:: The updated pipeline state.
Return type:: NodeRegressionPipelineInfoResult

configure_split(*, test_fraction: float = 0.3, validation_folds: int = 3) → NodeRegressionPipelineInfoResult¶

Configure the train-test split used by the pipeline.

Parameters:

test_fraction (float) – Fraction of nodes reserved for testing.
validation_folds (int) – Number of validation folds to use.

Returns:

The updated pipeline state.

Return type:

drop(fail_if_missing: bool = False) → PipelineCatalogEntryProtocol | None¶

Drop the pipeline and return its catalog entry when available.

Parameters:: fail_if_missing (bool)
Return type:: PipelineCatalogEntryProtocol | None

exists() → bool¶

Return whether the pipeline exists.

Return type:: bool

name() → str¶

Return the pipeline name.

Return type:: str

select_features(feature_properties: str | list[str]) → NodeRegressionPipelineInfoResult¶

Select the node properties used as input features.

Parameters:: feature_properties (str | list[str]) – One or more node properties to use as features.
Returns:: The updated pipeline state.
Return type:: NodeRegressionPipelineInfoResult

train(G: GraphV2, *, metrics: list[str | NodeRegressionMetric], model_name: str, target_property: str, relationship_types: list[str] = ['*'], target_node_labels: list[str] = ['*'], store_model_to_disk: bool = False, random_seed: Any | None = None, username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None) → tuple[NodeRegressionModelV2, NodeRegressionPipelineTrainResult]¶

Train a node regression model from this pipeline.

Parameters:

G (GraphV2) – Graph object to use
metrics (list[str | NodeRegressionMetric]) – Metrics to optimize for. Plain strings and NodeRegressionMetric values are both accepted.
model_name (str) – Name of the trained model.
target_property (str) – The target node property to predict.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
target_node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
store_model_to_disk (bool) – Whether to persist the trained model to disk.
random_seed (Any | None) – Seed for random number generation to ensure reproducible results.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
log_progress (bool) – Display progress logging.
sudo (bool) – Disable the memory guard.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.

Returns:

The trained model and the corresponding training result.

Return type:

tuple[NodeRegressionModelV2, NodeRegressionPipelineTrainResult]

class graphdatascience.procedure_surface.api.pipeline.NodeRegressionPipelineEndpoints¶

abstract add_linear_regression(pipeline_name: str, *, batch_size: int | tuple[int, int] = 100, learning_rate: float | tuple[float, float] = 0.001, max_epochs: int | tuple[int, int] = 100, min_epochs: int | tuple[int, int] = 1, patience: int | tuple[int, int] = 1, penalty: float | tuple[float, float] = 0.0, tolerance: float | tuple[float, float] = 0.001) → NodeRegressionPipelineInfoResult¶

Add a linear regression model candidate to the pipeline.

Parameters:

pipeline_name (str) – Name of the pipeline.
batch_size (int | tuple[int, int]) – Batch size to use during training. Pass a two-value tuple to define a parameter range.
learning_rate (float | tuple[float, float]) – Learning rate for optimization. Pass a two-value tuple to define a parameter range.
max_epochs (int | tuple[int, int]) – Maximum number of training epochs. Pass a two-value tuple to define a parameter range.
min_epochs (int | tuple[int, int]) – Minimum number of training epochs. Pass a two-value tuple to define a parameter range.
patience (int | tuple[int, int]) – Early stopping patience. Pass a two-value tuple to define a parameter range.
penalty (float | tuple[float, float]) – Penalty term to use during training. Pass a two-value tuple to define a parameter range.
tolerance (float | tuple[float, float]) – Convergence tolerance. Pass a two-value tuple to define a parameter range.

Returns:

The updated pipeline state.

Return type:

abstract add_node_property(pipeline_name: str, task_name: str, **config: Any) → NodeRegressionPipelineInfoResult¶

Add a node property step to the pipeline.

Parameters:

pipeline_name (str) – Name of the pipeline.
task_name (str) – The name of the node property step to add.
config (Any) – Additional configuration for the node property step.

Returns:

The updated pipeline state.

Return type:

abstract add_random_forest(pipeline_name: str, *, max_depth: int | tuple[int, int] = 2147483647, max_features_ratio: float | tuple[float, float] | None = None, min_leaf_size: int | tuple[int, int] = 1, min_split_size: int | tuple[int, int] = 2, number_of_decision_trees: int | tuple[int, int] = 100, number_of_samples_ratio: float | tuple[float, float] = 1.0) → NodeRegressionPipelineInfoResult¶

Add a random forest model candidate to the pipeline.

Parameters:

pipeline_name (str) – Name of the pipeline.
max_depth (int | tuple[int, int]) – Maximum tree depth. Pass a two-value tuple to define a parameter range.
max_features_ratio (float | tuple[float, float] | None) – Fraction of features sampled per split. Pass a two-value tuple to define a parameter range.
min_leaf_size (int | tuple[int, int]) – Minimum number of samples in a leaf. Pass a two-value tuple to define a parameter range.
min_split_size (int | tuple[int, int]) – Minimum number of samples required to split a node. Pass a two-value tuple to define a parameter range.
number_of_decision_trees (int | tuple[int, int]) – Number of trees to train. Pass a two-value tuple to define a parameter range.
number_of_samples_ratio (float | tuple[float, float]) – Fraction of samples used per tree. Pass a two-value tuple to define a parameter range.

Returns:

The updated pipeline state.

Return type:

abstract configure_auto_tuning(pipeline_name: str, *, max_trials: int = 10) → NodeRegressionPipelineInfoResult¶

Configure auto-tuning for the pipeline.

Parameters:

pipeline_name (str) – Name of the pipeline.
max_trials (int) – Maximum number of trials to run during auto-tuning.

Returns:

The updated pipeline state.

Return type:

abstract configure_split(pipeline_name: str, *, test_fraction: float = 0.3, validation_folds: int = 3) → NodeRegressionPipelineInfoResult¶

Configure the train-test split used by the pipeline.

Parameters:

pipeline_name (str) – Name of the pipeline.
test_fraction (float) – Fraction of nodes reserved for testing.
validation_folds (int) – Number of validation folds to use.

Returns:

The updated pipeline state.

Return type:

abstract create(pipeline_name: str) → tuple[NodeRegressionPipeline, NodeRegressionPipelineInfoResult]¶

Create a new node regression pipeline.

Parameters:: pipeline_name (str) – Name of the pipeline.
Returns:: The created pipeline and the corresponding result payload.
Return type:: tuple[NodeRegressionPipeline, NodeRegressionPipelineInfoResult]

abstract get(pipeline_name: str) → NodeRegressionPipeline¶

Retrieve an existing node regression pipeline by name.

Parameters:: pipeline_name (str) – Name of the pipeline.
Returns:: The reconstructed pipeline object.
Return type:: NodeRegressionPipeline

abstract property predict: NodeRegressionPipelinePredictEndpoints¶: Access prediction endpoints for node regression models trained from this surface.

abstract select_features(pipeline_name: str, node_properties: str | list[str]) → NodeRegressionPipelineInfoResult¶

Select the node properties used as input features.

Parameters:

pipeline_name (str) – Name of the pipeline.
node_properties (str | list[str]) – One or more node properties to use as features.

Returns:

The updated pipeline state.

Return type: