Pathfinding Algorithms¶
- class graphdatascience.procedure_surface.api.pipeline.LinkPredictionModelV2¶
Represents a link prediction model in the model catalog.
Construct this using: func:gds.v2.pipeline.link_prediction.train().
- class graphdatascience.procedure_surface.api.pipeline.LinkPredictionPipeline¶
Represents a link prediction training pipeline.
Construct this using
gds.v2.pipeline.link_prediction.create().
- class graphdatascience.procedure_surface.api.pipeline.LinkPredictionPipelineEndpoints¶
- abstract add_feature(pipeline_name: str, feature_type: str, *, node_properties: list[str]) LinkPredictionPipelineInfoResult¶
Add an relationship feature step to the pipeline.
- Parameters:
- Returns:
The updated pipeline state.
- Return type:
- abstract add_logistic_regression(pipeline_name: str, *, batch_size: int | tuple[int, int] = 100, class_weights: list[float] | None = None, focus_weight: float | tuple[float, float] = 0.0, learning_rate: float | tuple[float, float] = 0.001, max_epochs: int | tuple[int, int] = 100, min_epochs: int | tuple[int, int] = 1, patience: int | tuple[int, int] = 1, penalty: float | tuple[float, float] = 0.0, tolerance: float | tuple[float, float] = 0.001) LinkPredictionPipelineInfoResult¶
Add a logistic regression model candidate to the pipeline.
- Parameters:
pipeline_name (str) – Name of the pipeline.
batch_size (int | tuple[int, int]) – Batch size to use during training. Pass a two-value tuple to define a parameter range.
class_weights (list[float] | None) – Optional class weights to use during training.
focus_weight (float | tuple[float, float]) – Focus weight for optimization. Pass a two-value tuple to define a parameter range.
learning_rate (float | tuple[float, float]) – Learning rate for optimization. Pass a two-value tuple to define a parameter range.
max_epochs (int | tuple[int, int]) – Maximum number of training epochs. Pass a two-value tuple to define a parameter range.
min_epochs (int | tuple[int, int]) – Minimum number of training epochs. Pass a two-value tuple to define a parameter range.
patience (int | tuple[int, int]) – Early stopping patience. Pass a two-value tuple to define a parameter range.
penalty (float | tuple[float, float]) – Penalty term to use during training. Pass a two-value tuple to define a parameter range.
tolerance (float | tuple[float, float]) – Convergence tolerance. Pass a two-value tuple to define a parameter range.
- Returns:
The updated pipeline state.
- Return type:
- abstract add_mlp(pipeline_name: str, *, batch_size: int | tuple[int, int] = 100, class_weights: list[float] | None = None, focus_weight: float | tuple[float, float] = 0.0, hidden_layer_sizes: list[int] = [100], learning_rate: float | tuple[float, float] = 0.001, max_epochs: int | tuple[int, int] = 100, min_epochs: int | tuple[int, int] = 1, patience: int | tuple[int, int] = 1, penalty: float | tuple[float, float] = 0.0, tolerance: float | tuple[float, float] = 0.001) LinkPredictionPipelineInfoResult¶
Add a multi-layer perceptron model candidate to the pipeline.
- Parameters:
pipeline_name (str) – Name of the pipeline.
batch_size (int | tuple[int, int]) – Batch size to use during training. Pass a two-value tuple to define a parameter range.
class_weights (list[float] | None) – Optional class weights to use during training.
focus_weight (float | tuple[float, float]) – Focus weight for optimization. Pass a two-value tuple to define a parameter range.
hidden_layer_sizes (list[int]) – Sizes of the hidden layers in the neural network.
learning_rate (float | tuple[float, float]) – Learning rate for optimization. Pass a two-value tuple to define a parameter range.
max_epochs (int | tuple[int, int]) – Maximum number of training epochs. Pass a two-value tuple to define a parameter range.
min_epochs (int | tuple[int, int]) – Minimum number of training epochs. Pass a two-value tuple to define a parameter range.
patience (int | tuple[int, int]) – Early stopping patience. Pass a two-value tuple to define a parameter range.
penalty (float | tuple[float, float]) – Penalty term to use during training. Pass a two-value tuple to define a parameter range.
tolerance (float | tuple[float, float]) – Convergence tolerance. Pass a two-value tuple to define a parameter range.
- Returns:
The updated pipeline state.
- Return type:
- abstract add_node_property(pipeline_name: str, task_name: str, **config: Any) LinkPredictionPipelineInfoResult¶
Add a node property step to the pipeline.
- Parameters:
- Returns:
The updated pipeline state.
- Return type:
- abstract add_random_forest(pipeline_name: str, *, criterion: str | None = 'GINI', max_depth: int | tuple[int, int] = 2147483647, max_features_ratio: float | tuple[float, float] | None = None, min_leaf_size: int | tuple[int, int] = 1, min_split_size: int | tuple[int, int] = 2, number_of_decision_trees: int | tuple[int, int] = 100, number_of_samples_ratio: float | tuple[float, float] = 1.0) LinkPredictionPipelineInfoResult¶
Add a random forest model candidate to the pipeline.
- Parameters:
pipeline_name (str) – Name of the pipeline.
criterion (str | None) – Split criterion to optimize.
max_depth (int | tuple[int, int]) – Maximum tree depth. Pass a two-value tuple to define a parameter range.
max_features_ratio (float | tuple[float, float] | None) – Fraction of features sampled per split. Pass a two-value tuple to define a parameter range.
min_leaf_size (int | tuple[int, int]) – Minimum number of samples in a leaf. Pass a two-value tuple to define a parameter range.
min_split_size (int | tuple[int, int]) – Minimum number of samples required to split a node. Pass a two-value tuple to define a parameter range.
number_of_decision_trees (int | tuple[int, int]) – Number of trees to train. Pass a two-value tuple to define a parameter range.
number_of_samples_ratio (float | tuple[float, float]) – Fraction of samples used per tree. Pass a two-value tuple to define a parameter range.
- Returns:
The updated pipeline state.
- Return type:
- abstract configure_auto_tuning(pipeline_name: str, *, max_trials: int = 10) LinkPredictionPipelineInfoResult¶
Configure auto-tuning for the pipeline.
- Parameters:
- Returns:
The updated pipeline state.
- Return type:
- abstract configure_split(pipeline_name: str, *, negative_relationship_type: str | None = None, negative_sampling_ratio: float = 1.0, test_fraction: float = 0.1, train_fraction: float = 0.1, validation_folds: int = 3) LinkPredictionPipelineInfoResult¶
Configure the train-test split used by the pipeline.
- Parameters:
pipeline_name (str) – Name of the pipeline.
negative_relationship_type (str | None) – Relationship type to use for the negative samples.
negative_sampling_ratio (float) – Ratio of sampled negative relationships.
test_fraction (float) – Fraction of relationships reserved for testing.
train_fraction (float) – Fraction of relationships reserved for training.
validation_folds (int) – Number of validation folds to use.
- Returns:
The updated pipeline state.
- Return type:
- abstract create(pipeline_name: str) tuple[LinkPredictionPipeline, LinkPredictionPipelineInfoResult]¶
Create a new link prediction pipeline.
- Parameters:
pipeline_name (str) – Name of the pipeline.
- Returns:
The created pipeline and the corresponding result payload.
- Return type:
tuple[LinkPredictionPipeline, LinkPredictionPipelineInfoResult]
- abstract get(pipeline_name: str) LinkPredictionPipeline¶
Retrieve an existing link prediction pipeline by name.
- Parameters:
pipeline_name (str) – Name of the pipeline.
- Returns:
The reconstructed pipeline object.
- Return type:
- abstract property predict: LinkPredictionPipelinePredictEndpoints¶
Access prediction endpoints for link prediction models trained from this surface.
- abstract property train: LinkPredictionPipelineTrainEndpoints¶
Access training endpoints for link prediction pipelines.
- pydantic model graphdatascience.procedure_surface.api.pipeline.LinkPredictionPipelineInfoResult¶
- class graphdatascience.procedure_surface.api.pipeline.LinkPredictionPipelinePredictEndpoints¶
- pydantic model graphdatascience.procedure_surface.api.pipeline.LinkPredictionPipelinePredictMutateResult¶
- pydantic model graphdatascience.procedure_surface.api.pipeline.LinkPredictionPipelineTrainResult¶
-
- field model_info: LinkPredictionModelInfoResult¶
- class graphdatascience.procedure_surface.api.pipeline.NodeClassificationPipeline¶
Represents a node classification training pipeline.
Construct this using
gds.v2.pipeline.node_classification.create().- add_logistic_regression(*, batch_size: int | tuple[int, int] = 100, class_weights: list[float] | None = None, focus_weight: float | tuple[float, float] = 0.0, learning_rate: float | tuple[float, float] = 0.001, max_epochs: int | tuple[int, int] = 100, min_epochs: int | tuple[int, int] = 1, patience: int | tuple[int, int] = 1, penalty: float | tuple[float, float] = 0.0, tolerance: float | tuple[float, float] = 0.001) NodeClassificationPipelineInfoResult¶
Add a logistic regression model candidate to the pipeline.
- Parameters:
batch_size (int | tuple[int, int]) – Batch size to use during training. Pass a two-value tuple to define a parameter range.
class_weights (list[float] | None) – Optional class weights to use during training.
focus_weight (float | tuple[float, float]) – Focus weight for optimization. Pass a two-value tuple to define a parameter range.
learning_rate (float | tuple[float, float]) – Learning rate for optimization. Pass a two-value tuple to define a parameter range.
max_epochs (int | tuple[int, int]) – Maximum number of training epochs. Pass a two-value tuple to define a parameter range.
min_epochs (int | tuple[int, int]) – Minimum number of training epochs. Pass a two-value tuple to define a parameter range.
patience (int | tuple[int, int]) – Early stopping patience. Pass a two-value tuple to define a parameter range.
penalty (float | tuple[float, float]) – Penalty term to use during training. Pass a two-value tuple to define a parameter range.
tolerance (float | tuple[float, float]) – Convergence tolerance. Pass a two-value tuple to define a parameter range.
- Returns:
The updated pipeline state.
- Return type:
- add_mlp(*, batch_size: int | tuple[int, int] = 100, class_weights: list[float] | None = None, focus_weight: float | tuple[float, float] = 0.0, hidden_layer_sizes: list[int] = [100], learning_rate: float | tuple[float, float] = 0.001, max_epochs: int | tuple[int, int] = 100, min_epochs: int | tuple[int, int] = 1, patience: int | tuple[int, int] = 1, penalty: float | tuple[float, float] = 0.0, tolerance: float | tuple[float, float] = 0.001) NodeClassificationPipelineInfoResult¶
Add a multi-layer perceptron model candidate to the pipeline.
- Parameters:
batch_size (int | tuple[int, int]) – Batch size to use during training. Pass a two-value tuple to define a parameter range.
class_weights (list[float] | None) – Optional class weights to use during training.
focus_weight (float | tuple[float, float]) – Focus weight for optimization. Pass a two-value tuple to define a parameter range.
hidden_layer_sizes (list[int]) – Sizes of the hidden layers in the neural network.
learning_rate (float | tuple[float, float]) – Learning rate for optimization. Pass a two-value tuple to define a parameter range.
max_epochs (int | tuple[int, int]) – Maximum number of training epochs. Pass a two-value tuple to define a parameter range.
min_epochs (int | tuple[int, int]) – Minimum number of training epochs. Pass a two-value tuple to define a parameter range.
patience (int | tuple[int, int]) – Early stopping patience. Pass a two-value tuple to define a parameter range.
penalty (float | tuple[float, float]) – Penalty term to use during training. Pass a two-value tuple to define a parameter range.
tolerance (float | tuple[float, float]) – Convergence tolerance. Pass a two-value tuple to define a parameter range.
- Returns:
The updated pipeline state.
- Return type:
- add_node_property(task_name: str, **config: Any) NodeClassificationPipelineInfoResult¶
Add a node property step to the pipeline.
- Parameters:
- Returns:
The updated pipeline state.
- Return type:
- add_random_forest(*, criterion: str | None = 'GINI', max_depth: int | tuple[int, int] = 2147483647, max_features_ratio: float | tuple[float, float] | None = None, min_leaf_size: int | tuple[int, int] = 1, min_split_size: int | tuple[int, int] = 2, number_of_decision_trees: int | tuple[int, int] = 100, number_of_samples_ratio: float | tuple[float, float] = 1.0) NodeClassificationPipelineInfoResult¶
Add a random forest model candidate to the pipeline.
- Parameters:
criterion (str | None) – Split criterion to optimize.
max_depth (int | tuple[int, int]) – Maximum tree depth. Pass a two-value tuple to define a parameter range.
max_features_ratio (float | tuple[float, float] | None) – Fraction of features sampled per split. Pass a two-value tuple to define a parameter range.
min_leaf_size (int | tuple[int, int]) – Minimum number of samples in a leaf. Pass a two-value tuple to define a parameter range.
min_split_size (int | tuple[int, int]) – Minimum number of samples required to split a node. Pass a two-value tuple to define a parameter range.
number_of_decision_trees (int | tuple[int, int]) – Number of trees to train. Pass a two-value tuple to define a parameter range.
number_of_samples_ratio (float | tuple[float, float]) – Fraction of samples used per tree. Pass a two-value tuple to define a parameter range.
- Returns:
The updated pipeline state.
- Return type:
- configure_auto_tuning(*, max_trials: int = 10) NodeClassificationPipelineInfoResult¶
Configure auto-tuning for the pipeline.
- Parameters:
max_trials (int) – Maximum number of trials to run during auto-tuning.
- Returns:
The updated pipeline state.
- Return type:
- configure_split(*, test_fraction: float = 0.3, validation_folds: int = 3) NodeClassificationPipelineInfoResult¶
Configure the train-test split used by the pipeline.
- Parameters:
- Returns:
The updated pipeline state.
- Return type:
- drop(fail_if_missing: bool = False) PipelineCatalogEntryProtocol | None¶
Drop the pipeline and return its catalog entry when available.
- Parameters:
fail_if_missing (bool)
- Return type:
PipelineCatalogEntryProtocol | None
- select_features(node_properties: str | list[str]) NodeClassificationPipelineInfoResult¶
Select the node properties used as input features.
- train(G: GraphV2, *, metrics: list[str], model_name: str, target_property: str, relationship_types: list[str] = ['*'], target_node_labels: list[str] = ['*'], store_model_to_disk: bool = False, random_seed: Any | None = None, username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None) tuple[NodeClassificationModelV2, NodeClassificationPipelineTrainResult]¶
Train a node classification model from this pipeline.
- Parameters:
G (GraphV2) – Graph object to use
model_name (str) – Name of the trained model.
target_property (str) – The target node property to predict.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
target_node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
store_model_to_disk (bool) – Whether to persist the trained model to disk.
random_seed (Any | None) – Seed for random number generation to ensure reproducible results.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
log_progress (bool) – Display progress logging.
sudo (bool) – Disable the memory guard.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.
- Returns:
The trained model and the corresponding training result.
- Return type:
tuple[NodeClassificationModelV2, NodeClassificationPipelineTrainResult]
- train_estimate(G: GraphV2, *, metrics: list[str], model_name: str, target_property: str, relationship_types: list[str] = ['*'], target_node_labels: list[str] = ['*'], store_model_to_disk: bool = False, random_seed: Any | None = None, username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None) EstimationResult¶
Estimate the memory required to train a node classification model from this pipeline.
- Parameters:
G (GraphV2) – Graph object to use
model_name (str) – Name of the trained model.
target_property (str) – The target node property to predict.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
target_node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
store_model_to_disk (bool) – Whether to persist the trained model to disk.
random_seed (Any | None) – Seed for random number generation to ensure reproducible results.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
log_progress (bool) – Display progress logging.
sudo (bool) – Disable the memory guard.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.
- Returns:
The estimated memory footprint for training.
- Return type:
- class graphdatascience.procedure_surface.api.pipeline.NodeClassificationPipelineEndpoints¶
- abstract add_logistic_regression(pipeline_name: str, *, batch_size: int | tuple[int, int] = 100, class_weights: list[float] | None = None, focus_weight: float | tuple[float, float] = 0.0, learning_rate: float | tuple[float, float] = 0.001, max_epochs: int | tuple[int, int] = 100, min_epochs: int | tuple[int, int] = 1, patience: int | tuple[int, int] = 1, penalty: float | tuple[float, float] = 0.0, tolerance: float | tuple[float, float] = 0.001) NodeClassificationPipelineInfoResult¶
Add a logistic regression model candidate to the pipeline.
- Parameters:
pipeline_name (str) – Name of the pipeline.
batch_size (int | tuple[int, int]) – Batch size to use during training. Pass a two-value tuple to define a parameter range.
class_weights (list[float] | None) – Optional class weights to use during training.
focus_weight (float | tuple[float, float]) – Focus weight for optimization. Pass a two-value tuple to define a parameter range.
learning_rate (float | tuple[float, float]) – Learning rate for optimization. Pass a two-value tuple to define a parameter range.
max_epochs (int | tuple[int, int]) – Maximum number of training epochs. Pass a two-value tuple to define a parameter range.
min_epochs (int | tuple[int, int]) – Minimum number of training epochs. Pass a two-value tuple to define a parameter range.
patience (int | tuple[int, int]) – Early stopping patience. Pass a two-value tuple to define a parameter range.
penalty (float | tuple[float, float]) – Penalty term to use during training. Pass a two-value tuple to define a parameter range.
tolerance (float | tuple[float, float]) – Convergence tolerance. Pass a two-value tuple to define a parameter range.
- Returns:
The updated pipeline state.
- Return type:
- abstract add_mlp(pipeline_name: str, *, batch_size: int | tuple[int, int] = 100, class_weights: list[float] | None = None, focus_weight: float | tuple[float, float] = 0.0, hidden_layer_sizes: list[int] = [100], learning_rate: float | tuple[float, float] = 0.001, max_epochs: int | tuple[int, int] = 100, min_epochs: int | tuple[int, int] = 1, patience: int | tuple[int, int] = 1, penalty: float | tuple[float, float] = 0.0, tolerance: float | tuple[float, float] = 0.001) NodeClassificationPipelineInfoResult¶
Add a multi-layer perceptron model candidate to the pipeline.
- Parameters:
pipeline_name (str) – Name of the pipeline.
batch_size (int | tuple[int, int]) – Batch size to use during training. Pass a two-value tuple to define a parameter range.
class_weights (list[float] | None) – Optional class weights to use during training.
focus_weight (float | tuple[float, float]) – Focus weight for optimization. Pass a two-value tuple to define a parameter range.
hidden_layer_sizes (list[int]) – Sizes of the hidden layers in the neural network.
learning_rate (float | tuple[float, float]) – Learning rate for optimization. Pass a two-value tuple to define a parameter range.
max_epochs (int | tuple[int, int]) – Maximum number of training epochs. Pass a two-value tuple to define a parameter range.
min_epochs (int | tuple[int, int]) – Minimum number of training epochs. Pass a two-value tuple to define a parameter range.
patience (int | tuple[int, int]) – Early stopping patience. Pass a two-value tuple to define a parameter range.
penalty (float | tuple[float, float]) – Penalty term to use during training. Pass a two-value tuple to define a parameter range.
tolerance (float | tuple[float, float]) – Convergence tolerance. Pass a two-value tuple to define a parameter range.
- Returns:
The updated pipeline state.
- Return type:
- abstract add_node_property(pipeline_name: str, task_name: str, **config: Any) NodeClassificationPipelineInfoResult¶
Add a node property step to the pipeline.
- Parameters:
- Returns:
The updated pipeline state.
- Return type:
- abstract add_random_forest(pipeline_name: str, *, criterion: str | None = 'GINI', max_depth: int | tuple[int, int] = 2147483647, max_features_ratio: float | tuple[float, float] | None = None, min_leaf_size: int | tuple[int, int] = 1, min_split_size: int | tuple[int, int] = 2, number_of_decision_trees: int | tuple[int, int] = 100, number_of_samples_ratio: float | tuple[float, float] = 1.0) NodeClassificationPipelineInfoResult¶
Add a random forest model candidate to the pipeline.
- Parameters:
pipeline_name (str) – Name of the pipeline.
criterion (str | None) – Split criterion to optimize.
max_depth (int | tuple[int, int]) – Maximum tree depth. Pass a two-value tuple to define a parameter range.
max_features_ratio (float | tuple[float, float] | None) – Fraction of features sampled per split. Pass a two-value tuple to define a parameter range.
min_leaf_size (int | tuple[int, int]) – Minimum number of samples in a leaf. Pass a two-value tuple to define a parameter range.
min_split_size (int | tuple[int, int]) – Minimum number of samples required to split a node. Pass a two-value tuple to define a parameter range.
number_of_decision_trees (int | tuple[int, int]) – Number of trees to train. Pass a two-value tuple to define a parameter range.
number_of_samples_ratio (float | tuple[float, float]) – Fraction of samples used per tree. Pass a two-value tuple to define a parameter range.
- Returns:
The updated pipeline state.
- Return type:
- abstract configure_auto_tuning(pipeline_name: str, *, max_trials: int = 10) NodeClassificationPipelineInfoResult¶
Configure auto-tuning for the pipeline.
- Parameters:
- Returns:
The updated pipeline state.
- Return type:
- abstract configure_split(pipeline_name: str, *, test_fraction: float = 0.3, validation_folds: int = 3) NodeClassificationPipelineInfoResult¶
Configure the train-test split used by the pipeline.
- Parameters:
- Returns:
The updated pipeline state.
- Return type:
- abstract create(pipeline_name: str) tuple[NodeClassificationPipeline, NodeClassificationPipelineInfoResult]¶
Create a new node classification pipeline.
- Parameters:
pipeline_name (str) – Name of the pipeline.
- Returns:
The created pipeline and the corresponding result payload.
- Return type:
tuple[NodeClassificationPipeline, NodeClassificationPipelineInfoResult]
- abstract get(pipeline_name: str) NodeClassificationPipeline¶
Retrieve an existing node classification pipeline by name.
- Parameters:
pipeline_name (str) – Name of the pipeline.
- Returns:
The reconstructed pipeline object.
- Return type:
- abstract property predict: NodeClassificationPipelinePredictEndpoints¶
Access prediction endpoints for node classification models trained from this surface.
- abstract select_features(pipeline_name: str, node_properties: str | list[str]) NodeClassificationPipelineInfoResult¶
Select the node properties used as input features.
- Parameters:
- Returns:
The updated pipeline state.
- Return type:
- abstract property train: NodeClassificationPipelineTrainEndpoints¶
Access training endpoints for node classification pipelines.
- pydantic model graphdatascience.procedure_surface.api.pipeline.NodeClassificationPipelineInfoResult¶
- class graphdatascience.procedure_surface.api.pipeline.NodeClassificationPipelinePredictEndpoints¶
- abstract estimate(G: GraphV2, model_name: str, *, relationship_types: list[str] | None = None, target_node_labels: list[str] | None = None, username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None) EstimationResult¶
Estimate the memory required to run node classification prediction.
- Parameters:
G (GraphV2) – Graph object to use
model_name (str) – Name of the model.
relationship_types (list[str] | None) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
target_node_labels (list[str] | None) – Optional node label filter.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
log_progress (bool) – Display progress logging.
sudo (bool) – Disable the memory guard.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.
- Returns:
The estimated memory footprint for prediction.
- Return type:
- abstract mutate(G: GraphV2, model_name: str, mutate_property: str, *, relationship_types: list[str] | None = None, target_node_labels: list[str] | None = None, predicted_probability_property: str | None = None, username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None) NodeClassificationPipelinePredictMutateResult¶
Run node classification prediction in mutate mode.
- Parameters:
G (GraphV2) – Graph object to use
model_name (str) – Name of the model.
mutate_property (str) – Name of the node property to store the results in.
relationship_types (list[str] | None) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
target_node_labels (list[str] | None) – Optional node label filter.
predicted_probability_property (str | None) – Optional node property to store the predicted probability distribution in.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
log_progress (bool) – Display progress logging.
sudo (bool) – Disable the memory guard.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.
- Returns:
The mutate result summary.
- Return type:
- abstract stream(G: GraphV2, model_name: str, *, relationship_types: list[str] | None = None, target_node_labels: list[str] | None = None, include_predicted_probabilities: bool = False, username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None) DataFrame¶
Run node classification prediction in stream mode.
- Parameters:
G (GraphV2) – Graph object to use
model_name (str) – Name of the model.
relationship_types (list[str] | None) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
target_node_labels (list[str] | None) – Optional node label filter.
include_predicted_probabilities (bool) – Whether to include the predicted probability distribution in the streamed results.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
log_progress (bool) – Display progress logging.
sudo (bool) – Disable the memory guard.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.
- Returns:
The prediction results as a DataFrame.
- Return type:
DataFrame
- abstract write(G: GraphV2, model_name: str, write_property: str, *, relationship_types: list[str] | None = None, target_node_labels: list[str] | None = None, predicted_probability_property: str | None = None, username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, write_concurrency: int | None = None, job_id: str | None = None) NodeClassificationPipelinePredictWriteResult¶
Run node classification prediction in write mode.
- Parameters:
G (GraphV2) – Graph object to use
model_name (str) – Name of the model.
write_property (str) – Name of the node property to store the results in.
relationship_types (list[str] | None) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
target_node_labels (list[str] | None) – Optional node label filter.
predicted_probability_property (str | None) – Optional node property to store the predicted probability distribution in.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
log_progress (bool) – Display progress logging.
sudo (bool) – Disable the memory guard.
concurrency (int | None) – Number of concurrent threads to use.
write_concurrency (int | None) – Number of concurrent threads to use for writing.
job_id (str | None) – Identifier for the computation.
- Returns:
The write result summary.
- Return type:
- pydantic model graphdatascience.procedure_surface.api.pipeline.NodeClassificationPipelinePredictMutateResult¶
- pydantic model graphdatascience.procedure_surface.api.pipeline.NodeClassificationPipelinePredictWriteResult¶
- pydantic model graphdatascience.procedure_surface.api.pipeline.NodeClassificationPipelineTrainResult¶
-
- field model_info: NodeClassificationModelInfoResult¶
- enum graphdatascience.procedure_surface.api.pipeline.NodeRegressionMetric(value)¶
- Member Type:
Valid values are as follows:
- MEAN_SQUARED_ERROR = <NodeRegressionMetric.MEAN_SQUARED_ERROR: 'MEAN_SQUARED_ERROR'>¶
- ROOT_MEAN_SQUARED_ERROR = <NodeRegressionMetric.ROOT_MEAN_SQUARED_ERROR: 'ROOT_MEAN_SQUARED_ERROR'>¶
- MEAN_ABSOLUTE_ERROR = <NodeRegressionMetric.MEAN_ABSOLUTE_ERROR: 'MEAN_ABSOLUTE_ERROR'>¶
The
Enumand its members also have the following methods:- __new__(value)¶
- class graphdatascience.procedure_surface.api.pipeline.NodeRegressionPipeline¶
Represents a node regression training pipeline.
Construct this using: func:gds.v2.pipeline.node_regression.create().
- add_linear_regression(*, batch_size: int | tuple[int, int] = 100, learning_rate: float | tuple[float, float] = 0.001, max_epochs: int | tuple[int, int] = 100, min_epochs: int | tuple[int, int] = 1, patience: int | tuple[int, int] = 1, penalty: float | tuple[float, float] = 0.0, tolerance: float | tuple[float, float] = 0.001) NodeRegressionPipelineInfoResult¶
Add a linear regression model candidate to the pipeline.
- Parameters:
batch_size (int | tuple[int, int]) – Batch size to use during training. Pass a two-value tuple to define a parameter range.
learning_rate (float | tuple[float, float]) – Learning rate for optimization. Pass a two-value tuple to define a parameter range.
max_epochs (int | tuple[int, int]) – Maximum number of training epochs. Pass a two-value tuple to define a parameter range.
min_epochs (int | tuple[int, int]) – Minimum number of training epochs. Pass a two-value tuple to define a parameter range.
patience (int | tuple[int, int]) – Early stopping patience. Pass a two-value tuple to define a parameter range.
penalty (float | tuple[float, float]) – Penalty term to use during training. Pass a two-value tuple to define a parameter range.
tolerance (float | tuple[float, float]) – Convergence tolerance. Pass a two-value tuple to define a parameter range.
- Returns:
The updated pipeline state.
- Return type:
- add_node_property(task_name: str, **config: Any) NodeRegressionPipelineInfoResult¶
Add a node property step to the pipeline.
- Parameters:
- Returns:
The updated pipeline state.
- Return type:
- add_random_forest(*, max_depth: int | tuple[int, int] = 2147483647, max_features_ratio: float | tuple[float, float] | None = None, min_leaf_size: int | tuple[int, int] = 1, min_split_size: int | tuple[int, int] = 2, number_of_decision_trees: int | tuple[int, int] = 100, number_of_samples_ratio: float | tuple[float, float] = 1.0) NodeRegressionPipelineInfoResult¶
Add a random forest model candidate to the pipeline.
- Parameters:
max_depth (int | tuple[int, int]) – Maximum tree depth. Pass a two-value tuple to define a parameter range.
max_features_ratio (float | tuple[float, float] | None) – Fraction of features sampled per split. Pass a two-value tuple to define a parameter range.
min_leaf_size (int | tuple[int, int]) – Minimum number of samples in a leaf. Pass a two-value tuple to define a parameter range.
min_split_size (int | tuple[int, int]) – Minimum number of samples required to split a node. Pass a two-value tuple to define a parameter range.
number_of_decision_trees (int | tuple[int, int]) – Number of trees to train. Pass a two-value tuple to define a parameter range.
number_of_samples_ratio (float | tuple[float, float]) – Fraction of samples used per tree. Pass a two-value tuple to define a parameter range.
- Returns:
The updated pipeline state.
- Return type:
- configure_auto_tuning(*, max_trials: int = 10) NodeRegressionPipelineInfoResult¶
Configure auto-tuning for the pipeline.
- Parameters:
max_trials (int) – Maximum number of trials to run during auto-tuning.
- Returns:
The updated pipeline state.
- Return type:
- configure_split(*, test_fraction: float = 0.3, validation_folds: int = 3) NodeRegressionPipelineInfoResult¶
Configure the train-test split used by the pipeline.
- Parameters:
- Returns:
The updated pipeline state.
- Return type:
- drop(fail_if_missing: bool = False) PipelineCatalogEntryProtocol | None¶
Drop the pipeline and return its catalog entry when available.
- Parameters:
fail_if_missing (bool)
- Return type:
PipelineCatalogEntryProtocol | None
- select_features(feature_properties: str | list[str]) NodeRegressionPipelineInfoResult¶
Select the node properties used as input features.
- train(G: GraphV2, *, metrics: list[str | NodeRegressionMetric], model_name: str, target_property: str, relationship_types: list[str] = ['*'], target_node_labels: list[str] = ['*'], store_model_to_disk: bool = False, random_seed: Any | None = None, username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None) tuple[NodeRegressionModelV2, NodeRegressionPipelineTrainResult]¶
Train a node regression model from this pipeline.
- Parameters:
G (GraphV2) – Graph object to use
metrics (list[str | NodeRegressionMetric]) – Metrics to optimize for. Plain strings and
NodeRegressionMetricvalues are both accepted.model_name (str) – Name of the trained model.
target_property (str) – The target node property to predict.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
target_node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
store_model_to_disk (bool) – Whether to persist the trained model to disk.
random_seed (Any | None) – Seed for random number generation to ensure reproducible results.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
log_progress (bool) – Display progress logging.
sudo (bool) – Disable the memory guard.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.
- Returns:
The trained model and the corresponding training result.
- Return type:
tuple[NodeRegressionModelV2, NodeRegressionPipelineTrainResult]
- class graphdatascience.procedure_surface.api.pipeline.NodeRegressionPipelineEndpoints¶
- abstract add_linear_regression(pipeline_name: str, *, batch_size: int | tuple[int, int] = 100, learning_rate: float | tuple[float, float] = 0.001, max_epochs: int | tuple[int, int] = 100, min_epochs: int | tuple[int, int] = 1, patience: int | tuple[int, int] = 1, penalty: float | tuple[float, float] = 0.0, tolerance: float | tuple[float, float] = 0.001) NodeRegressionPipelineInfoResult¶
Add a linear regression model candidate to the pipeline.
- Parameters:
pipeline_name (str) – Name of the pipeline.
batch_size (int | tuple[int, int]) – Batch size to use during training. Pass a two-value tuple to define a parameter range.
learning_rate (float | tuple[float, float]) – Learning rate for optimization. Pass a two-value tuple to define a parameter range.
max_epochs (int | tuple[int, int]) – Maximum number of training epochs. Pass a two-value tuple to define a parameter range.
min_epochs (int | tuple[int, int]) – Minimum number of training epochs. Pass a two-value tuple to define a parameter range.
patience (int | tuple[int, int]) – Early stopping patience. Pass a two-value tuple to define a parameter range.
penalty (float | tuple[float, float]) – Penalty term to use during training. Pass a two-value tuple to define a parameter range.
tolerance (float | tuple[float, float]) – Convergence tolerance. Pass a two-value tuple to define a parameter range.
- Returns:
The updated pipeline state.
- Return type:
- abstract add_node_property(pipeline_name: str, task_name: str, **config: Any) NodeRegressionPipelineInfoResult¶
Add a node property step to the pipeline.
- Parameters:
- Returns:
The updated pipeline state.
- Return type:
- abstract add_random_forest(pipeline_name: str, *, max_depth: int | tuple[int, int] = 2147483647, max_features_ratio: float | tuple[float, float] | None = None, min_leaf_size: int | tuple[int, int] = 1, min_split_size: int | tuple[int, int] = 2, number_of_decision_trees: int | tuple[int, int] = 100, number_of_samples_ratio: float | tuple[float, float] = 1.0) NodeRegressionPipelineInfoResult¶
Add a random forest model candidate to the pipeline.
- Parameters:
pipeline_name (str) – Name of the pipeline.
max_depth (int | tuple[int, int]) – Maximum tree depth. Pass a two-value tuple to define a parameter range.
max_features_ratio (float | tuple[float, float] | None) – Fraction of features sampled per split. Pass a two-value tuple to define a parameter range.
min_leaf_size (int | tuple[int, int]) – Minimum number of samples in a leaf. Pass a two-value tuple to define a parameter range.
min_split_size (int | tuple[int, int]) – Minimum number of samples required to split a node. Pass a two-value tuple to define a parameter range.
number_of_decision_trees (int | tuple[int, int]) – Number of trees to train. Pass a two-value tuple to define a parameter range.
number_of_samples_ratio (float | tuple[float, float]) – Fraction of samples used per tree. Pass a two-value tuple to define a parameter range.
- Returns:
The updated pipeline state.
- Return type:
- abstract configure_auto_tuning(pipeline_name: str, *, max_trials: int = 10) NodeRegressionPipelineInfoResult¶
Configure auto-tuning for the pipeline.
- Parameters:
- Returns:
The updated pipeline state.
- Return type:
- abstract configure_split(pipeline_name: str, *, test_fraction: float = 0.3, validation_folds: int = 3) NodeRegressionPipelineInfoResult¶
Configure the train-test split used by the pipeline.
- Parameters:
- Returns:
The updated pipeline state.
- Return type:
- abstract create(pipeline_name: str) tuple[NodeRegressionPipeline, NodeRegressionPipelineInfoResult]¶
Create a new node regression pipeline.
- Parameters:
pipeline_name (str) – Name of the pipeline.
- Returns:
The created pipeline and the corresponding result payload.
- Return type:
tuple[NodeRegressionPipeline, NodeRegressionPipelineInfoResult]
- abstract get(pipeline_name: str) NodeRegressionPipeline¶
Retrieve an existing node regression pipeline by name.
- Parameters:
pipeline_name (str) – Name of the pipeline.
- Returns:
The reconstructed pipeline object.
- Return type:
- abstract property predict: NodeRegressionPipelinePredictEndpoints¶
Access prediction endpoints for node regression models trained from this surface.
- abstract select_features(pipeline_name: str, node_properties: str | list[str]) NodeRegressionPipelineInfoResult¶
Select the node properties used as input features.
- Parameters:
- Returns:
The updated pipeline state.
- Return type:
- abstract train(G: GraphV2, pipeline_name: str, *, metrics: list[str | NodeRegressionMetric], model_name: str, target_property: str, relationship_types: list[str] = ['*'], target_node_labels: list[str] = ['*'], store_model_to_disk: bool = False, random_seed: Any | None = None, username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None) tuple[NodeRegressionModelV2, NodeRegressionPipelineTrainResult]¶
Train a node regression model from the given pipeline.
- Parameters:
G (GraphV2) – Graph object to use
pipeline_name (str) – Name of the pipeline.
metrics (list[str | NodeRegressionMetric]) – Metrics to optimize for. Plain strings and
NodeRegressionMetricvalues are both accepted.model_name (str) – Name of the trained model.
target_property (str) – The target node property to predict.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
target_node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
store_model_to_disk (bool) – Whether to persist the trained model to disk.
random_seed (Any | None) – Seed for random number generation to ensure reproducible results.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
log_progress (bool) – Display progress logging.
sudo (bool) – Disable the memory guard.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.
- Returns:
The trained model and the corresponding training result.
- Return type:
tuple[NodeRegressionModelV2, NodeRegressionPipelineTrainResult]
- pydantic model graphdatascience.procedure_surface.api.pipeline.NodeRegressionPipelineInfoResult¶
- class graphdatascience.procedure_surface.api.pipeline.NodeRegressionPipelinePredictEndpoints¶
- abstract mutate(G: GraphV2, model_name: str, mutate_property: str, *, relationship_types: list[str] | None = None, target_node_labels: list[str] | None = None, username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None) NodeRegressionPipelinePredictMutateResult¶
Run node regression prediction in mutate mode.
- Parameters:
G (GraphV2) – Graph object to use
model_name (str) – Name of the model.
mutate_property (str) – Name of the node property to store the results in.
relationship_types (list[str] | None) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
target_node_labels (list[str] | None) – Optional node label filter.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
log_progress (bool) – Display progress logging.
sudo (bool) – Disable the memory guard.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.
- Returns:
The mutate result summary.
- Return type:
- abstract stream(G: GraphV2, model_name: str, *, relationship_types: list[str] | None = None, target_node_labels: list[str] | None = None, username: str | None = None, log_progress: bool = True, sudo: bool = False, concurrency: int | None = None, job_id: str | None = None) DataFrame¶
Run node regression prediction in stream mode.
- Parameters:
G (GraphV2) – Graph object to use
model_name (str) – Name of the model.
relationship_types (list[str] | None) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
target_node_labels (list[str] | None) – Optional node label filter.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
log_progress (bool) – Display progress logging.
sudo (bool) – Disable the memory guard.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.
- Returns:
The prediction results as a DataFrame.
- Return type:
DataFrame
- pydantic model graphdatascience.procedure_surface.api.pipeline.NodeRegressionPipelinePredictMutateResult¶
- pydantic model graphdatascience.procedure_surface.api.pipeline.NodeRegressionPipelineTrainResult¶
-
- field model_info: NodeRegressionModelInfoResult¶
- pydantic model graphdatascience.procedure_surface.api.pipeline.PipelineCatalogEntry¶
- class graphdatascience.procedure_surface.api.pipeline.PipelineEndpoints¶
- abstract drop(pipeline_name: str, *, fail_if_missing: bool = False) PipelineCatalogEntry | None¶
Drop a pipeline from the catalog, optionally failing when missing.
- Parameters:
- Return type:
PipelineCatalogEntry | None
- abstract exists(pipeline_name: str) PipelineExistsResult | None¶
Return pipeline existence details when present, otherwise None.
- Parameters:
pipeline_name (str)
- Return type:
PipelineExistsResult | None
- abstract property link_prediction: LinkPredictionPipelineEndpoints¶
Access link prediction pipeline procedures.
- abstract list(pipeline_name: str | None = None) list[PipelineCatalogEntry]¶
List pipeline catalog entries, optionally filtered by pipeline name.
- Parameters:
pipeline_name (str | None)
- Return type:
- abstract property node_classification: NodeClassificationPipelineEndpoints¶
Access node classification pipeline procedures.
- abstract property node_regression: NodeRegressionPipelineEndpoints¶
Access node regression pipeline procedures.