Similarity Algorithms¶

class graphdatascience.procedure_surface.api.similarity.KnnEndpoints¶

abstract estimate(G: GraphV2 | dict[str, Any], node_properties: str | list[str] | dict[str, str], top_k: int = 10, similarity_cutoff: float = 0.0, delta_threshold: float = 0.001, max_iterations: int = 100, sample_rate: float = 0.5, perturbation_rate: float = 0.0, random_joins: int = 10, random_seed: int | None = None, initial_sampler: str = 'UNIFORM', relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None) → EstimationResult¶

Estimates the memory requirements for running the K-Nearest Neighbors algorithm.

Parameters:

G (GraphV2 | dict[str, Any]) – Graph object to use or a dictionary representing the graph dimensions.
node_properties (str | list[str] | dict[str, str],) – Node properties to use for the similarity computation.
top_k (int) – Number of most similar nodes to return for each node.
similarity_cutoff (float, default=0.0) – The threshold for similarity scores.
delta_threshold (float) – Minimum change between iterations.
max_iterations (int) – Maximum number of iterations to run.
sample_rate (float, default=0.5) – The sampling rate for the algorithm.
perturbation_rate (float, default=0.0) – The rate at which to perturb the similarity graph.
random_joins (int, default=10) – The number of random joins to perform.
random_seed (int | None) – Seed for random number generation to ensure reproducible results.
initial_sampler (str) – Sampling strategy for the initial nearest neighbors.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
sudo (bool) – Disable the memory guard.
log_progress (bool) – Display progress logging.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
concurrency (int | None) – Number of concurrent threads to use.
job_id – Identifier for the computation.

Returns:

Object containing the estimated memory requirements.

Return type:

EstimationResult

abstract mutate(G: GraphV2, mutate_relationship_type: str, mutate_property: str, node_properties: str | list[str] | dict[str, str], top_k: int = 10, similarity_cutoff: float = 0.0, delta_threshold: float = 0.001, max_iterations: int = 100, sample_rate: float = 0.5, perturbation_rate: float = 0.0, random_joins: int = 10, random_seed: int | None = None, initial_sampler: str = 'UNIFORM', relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None) → KnnMutateResult¶

Runs the K-Nearest Neighbors algorithm and stores the results as new relationships in the graph catalog.

Parameters:

G (GraphV2) – Graph object to use
mutate_relationship_type (str) – Name of the relationship type to store the results in.
mutate_property (str) – Name of the node property to store the results in.
node_properties (str | list[str] | dict[str, str],) – Node properties to use for the similarity computation.
top_k (int) – Number of most similar nodes to return for each node.
similarity_cutoff (float, default=0.0) – The threshold for similarity scores.
delta_threshold (float) – Minimum change between iterations.
max_iterations (int) – Maximum number of iterations to run.
sample_rate (float, default=0.5) – The sampling rate for the algorithm.
perturbation_rate (float, default=0.0) – The rate at which to perturb the similarity graph.
random_joins (int, default=10) – The number of random joins to perform.
random_seed (int | None) – Seed for random number generation to ensure reproducible results.
initial_sampler (str) – Sampling strategy for the initial nearest neighbors.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
sudo (bool) – Disable the memory guard.
log_progress (bool) – Display progress logging.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.

Returns:

Object containing metadata from the execution.

Return type:

KnnMutateResult

abstract stats(G: GraphV2, node_properties: str | list[str] | dict[str, str], top_k: int = 10, similarity_cutoff: float = 0.0, delta_threshold: float = 0.001, max_iterations: int = 100, sample_rate: float = 0.5, perturbation_rate: float = 0.0, random_joins: int = 10, random_seed: int | None = None, initial_sampler: str = 'UNIFORM', relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None) → KnnStatsResult¶

Runs the K-Nearest Neighbors algorithm and returns execution statistics.

Parameters:

G (GraphV2) – Graph object to use
node_properties (str | list[str] | dict[str, str],) – Node properties to use for the similarity computation.
top_k (int) – Number of most similar nodes to return for each node.
similarity_cutoff (float, default=0.0) – The threshold for similarity scores.
delta_threshold (float) – Minimum change between iterations.
max_iterations (int) – Maximum number of iterations to run.
sample_rate (float, default=0.5) – The sampling rate for the algorithm.
perturbation_rate (float, default=0.0) – The rate at which to perturb the similarity graph.
random_joins (int, default=10) – The number of random joins to perform.
random_seed (int | None) – Seed for random number generation to ensure reproducible results.
initial_sampler (str) – Sampling strategy for the initial nearest neighbors.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
sudo (bool) – Disable the memory guard.
log_progress (bool) – Display progress logging.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.

Returns:

Object containing execution statistics and algorithm-specific results.

Return type:

KnnStatsResult

abstract stream(G: GraphV2, node_properties: str | list[str] | dict[str, str], top_k: int = 10, similarity_cutoff: float = 0.0, delta_threshold: float = 0.001, max_iterations: int = 100, sample_rate: float = 0.5, perturbation_rate: float = 0.0, random_joins: int = 10, random_seed: int | None = None, initial_sampler: str = 'UNIFORM', relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None) → DataFrame¶

Runs the K-Nearest Neighbors algorithm and returns the result as a DataFrame.

Parameters:

G (GraphV2) – Graph object to use
node_properties (str | list[str] | dict[str, str],) – Node properties to use for the similarity computation.
top_k (int) – Number of most similar nodes to return for each node.
similarity_cutoff (float, default=0.0) – The threshold for similarity scores.
delta_threshold (float) – Minimum change between iterations.
max_iterations (int) – Maximum number of iterations to run.
sample_rate (float, default=0.5) – The sampling rate for the algorithm.
perturbation_rate (float, default=0.0) – The rate at which to perturb the similarity graph.
random_joins (int, default=10) – The number of random joins to perform.
random_seed (int | None) – Seed for random number generation to ensure reproducible results.
initial_sampler (str) – Sampling strategy for the initial nearest neighbors.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
sudo (bool) – Disable the memory guard.
log_progress (bool) – Display progress logging.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.

Returns:

The similarity results as a DataFrame with columns ‘node1’, ‘node2’, and ‘similarity’.

Return type:

DataFrame

abstract write(G: GraphV2, write_relationship_type: str, write_property: str, node_properties: str | list[str] | dict[str, str], top_k: int = 10, similarity_cutoff: float = 0.0, delta_threshold: float = 0.001, max_iterations: int = 100, sample_rate: float = 0.5, perturbation_rate: float = 0.0, random_joins: int = 10, random_seed: int | None = None, initial_sampler: str = 'UNIFORM', relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None, write_concurrency: int | None = None) → KnnWriteResult¶

Runs the K-Nearest Neighbors algorithm and writes the results back to the database.

Parameters:

G (GraphV2) – Graph object to use
write_relationship_type (str) – Name of the relationship type to store the results in.
write_property (str) – Name of the node property to store the results in.
node_properties (str | list[str] | dict[str, str],) – Node properties to use for the similarity computation.
top_k (int) – Number of most similar nodes to return for each node.
similarity_cutoff (float, default=0.0) – The threshold for similarity scores.
delta_threshold (float) – Minimum change between iterations.
max_iterations (int) – Maximum number of iterations to run.
sample_rate (float, default=0.5) – The sampling rate for the algorithm.
perturbation_rate (float, default=0.0) – The rate at which to perturb the similarity graph.
random_joins (int, default=10) – The number of random joins to perform.
random_seed (int | None) – Seed for random number generation to ensure reproducible results.
initial_sampler (str) – Sampling strategy for the initial nearest neighbors.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
sudo (bool) – Disable the memory guard.
log_progress (bool) – Display progress logging.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.
write_concurrency (int | None) – Number of concurrent threads to use for writing.Returns
-------
KnnWriteResult – Object containing metadata from the execution.

Return type:

KnnWriteResult

class graphdatascience.procedure_surface.api.similarity.KnnFilteredEndpoints¶

abstract estimate(G: GraphV2 | dict[str, Any], node_properties: str | list[str] | dict[str, str], source_node_filter: str, target_node_filter: str, seed_target_nodes: bool = False, top_k: int = 10, similarity_cutoff: float = 0.0, delta_threshold: float = 0.001, max_iterations: int = 100, sample_rate: float = 0.5, perturbation_rate: float = 0.0, random_joins: int = 10, random_seed: int | None = None, initial_sampler: str = 'UNIFORM', relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, username: str | None = None, concurrency: int | None = None) → EstimationResult¶

Estimates the memory requirements for running the Filtered K-Nearest Neighbors algorithm.

The Filtered K-Nearest Neighbors algorithm computes a distance value for node pairs in the graph with customizable source and target node filters, creating new relationships between each node and its k nearest neighbors within the filtered subset.

Parameters:

G (GraphV2 | dict[str, Any]) – Graph object to use or a dictionary representing the graph dimensions.
node_properties (str | list[str] | dict[str, str]) – Node properties to use for the similarity computation.
source_node_filter (str) – A Cypher expression to filter which nodes can be sources in the similarity computation.
target_node_filter (str) – A Cypher expression to filter which nodes can be targets in the similarity computation.
seed_target_nodes (bool | None, default=None) – Whether to use a seeded approach for target node selection.
top_k (int) – Number of most similar nodes to return for each node.
similarity_cutoff (float, default=0.0) – The threshold for similarity scores.
delta_threshold (float) – Minimum change between iterations.
max_iterations (int) – Maximum number of iterations to run.
sample_rate (float, default=0.5) – The sampling rate for the algorithm.
perturbation_rate (float, default=0.0) – The rate at which to perturb the similarity graph.
random_joins (int, default=10) – The number of random joins to perform.
random_seed (int | None) – Seed for random number generation to ensure reproducible results.
initial_sampler (str) – Sampling strategy for the initial nearest neighbors.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
sudo (bool) – Disable the memory guard.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
concurrency (int | None) – Number of concurrent threads to use.

Returns:

Object containing the estimated memory requirements.

Return type:

EstimationResult

abstract mutate(G: GraphV2, mutate_relationship_type: str, mutate_property: str, node_properties: str | list[str] | dict[str, str], source_node_filter: str, target_node_filter: str, seed_target_nodes: bool = False, top_k: int = 10, similarity_cutoff: float = 0.0, delta_threshold: float = 0.001, max_iterations: int = 100, sample_rate: float = 0.5, perturbation_rate: float = 0.0, random_joins: int = 10, random_seed: int | None = None, initial_sampler: str = 'UNIFORM', relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None) → KnnMutateResult¶

Runs the Filtered K-Nearest Neighbors algorithm and stores the results as new relationships in the graph catalog.

The Filtered K-Nearest Neighbors algorithm computes a distance value for node pairs in the graph with customizable source and target node filters, creating new relationships between each node and its k nearest neighbors within the filtered subset.

Parameters:

G (GraphV2) – Graph object to use
mutate_relationship_type (str) – Name of the relationship type to store the results in.
mutate_property (str) – Name of the node property to store the results in.
node_properties (str | list[str] | dict[str, str]) – Node properties to use for the similarity computation.
source_node_filter (str) – A Cypher expression to filter which nodes can be sources in the similarity computation.
target_node_filter (str) – A Cypher expression to filter which nodes can be targets in the similarity computation.
seed_target_nodes (bool | None, default=None) – Whether to use a seeded approach for target node selection.
top_k (int) – Number of most similar nodes to return for each node.
similarity_cutoff (float, default=0.0) – The threshold for similarity scores.
delta_threshold (float) – Minimum change between iterations.
max_iterations (int) – Maximum number of iterations to run.
sample_rate (float, default=0.5) – The sampling rate for the algorithm.
perturbation_rate (float, default=0.0) – The rate at which to perturb the similarity graph.
random_joins (int, default=10) – The number of random joins to perform.
random_seed (int | None) – Seed for random number generation to ensure reproducible results.
initial_sampler (str) – Sampling strategy for the initial nearest neighbors.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
sudo (bool) – Disable the memory guard.
log_progress (bool) – Display progress logging.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.

Returns:

Object containing metadata from the execution.

Return type:

KnnMutateResult

abstract stats(G: GraphV2, node_properties: str | list[str] | dict[str, str], source_node_filter: str, target_node_filter: str, seed_target_nodes: bool = False, top_k: int = 10, similarity_cutoff: float = 0.0, delta_threshold: float = 0.001, max_iterations: int = 100, sample_rate: float = 0.5, perturbation_rate: float = 0.0, random_joins: int = 10, random_seed: int | None = None, initial_sampler: str = 'UNIFORM', relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None) → KnnStatsResult¶

Runs the Filtered K-Nearest Neighbors algorithm and returns execution statistics.

The Filtered K-Nearest Neighbors algorithm computes a distance value for node pairs in the graph with customizable source and target node filters, creating new relationships between each node and its k nearest neighbors within the filtered subset.

Parameters:

G (GraphV2) – Graph object to use
node_properties (str | list[str] | dict[str, str]) – Node properties to use for the similarity computation.
source_node_filter (str) – A Cypher expression to filter which nodes can be sources in the similarity computation.
target_node_filter (str) – A Cypher expression to filter which nodes can be targets in the similarity computation.
seed_target_nodes (bool | None, default=None) – Whether to use a seeded approach for target node selection.
top_k (int) – Number of most similar nodes to return for each node.
similarity_cutoff (float, default=0.0) – The threshold for similarity scores.
delta_threshold (float) – Minimum change between iterations.
max_iterations (int) – Maximum number of iterations to run.
sample_rate (float, default=0.5) – The sampling rate for the algorithm.
perturbation_rate (float, default=0.0) – The rate at which to perturb the similarity graph.
random_joins (int, default=10) – The number of random joins to perform.
random_seed (int | None) – Seed for random number generation to ensure reproducible results.
initial_sampler (str) – Sampling strategy for the initial nearest neighbors.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
sudo (bool) – Disable the memory guard.
log_progress (bool) – Display progress logging.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.

Returns:

Object containing execution statistics and algorithm-specific results.

Return type:

KnnStatsResult

abstract stream(G: GraphV2, node_properties: str | list[str] | dict[str, str], source_node_filter: str, target_node_filter: str, seed_target_nodes: bool = False, top_k: int = 10, similarity_cutoff: float = 0.0, delta_threshold: float = 0.001, max_iterations: int = 100, sample_rate: float = 0.5, perturbation_rate: float = 0.0, random_joins: int = 10, random_seed: int | None = None, initial_sampler: str = 'UNIFORM', relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None) → DataFrame¶

Runs the Filtered K-Nearest Neighbors algorithm and returns the result as a DataFrame.

The Filtered K-Nearest Neighbors algorithm computes a distance value for node pairs in the graph with customizable source and target node filters, creating new relationships between each node and its k nearest neighbors within the filtered subset.

Parameters:

G (GraphV2) – Graph object to use
node_properties (str | list[str] | dict[str, str]) – Node properties to use for the similarity computation.
source_node_filter (str) – A Cypher expression to filter which nodes can be sources in the similarity computation.
target_node_filter (str) – A Cypher expression to filter which nodes can be targets in the similarity computation.
seed_target_nodes (bool | None, default=None) – Whether to use a seeded approach for target node selection.
top_k (int) – Number of most similar nodes to return for each node.
similarity_cutoff (float, default=0.0) – The threshold for similarity scores.
delta_threshold (float) – Minimum change between iterations.
max_iterations (int) – Maximum number of iterations to run.
sample_rate (float, default=0.5) – The sampling rate for the algorithm.
perturbation_rate (float, default=0.0) – The rate at which to perturb the similarity graph.
random_joins (int, default=10) – The number of random joins to perform.
random_seed (int | None) – Seed for random number generation to ensure reproducible results.
initial_sampler (str) – Sampling strategy for the initial nearest neighbors.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
sudo (bool) – Disable the memory guard.
log_progress (bool) – Display progress logging.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.

Returns:

The similarity results as a DataFrame with columns ‘node1’, ‘node2’, and ‘similarity’.

Return type:

DataFrame

abstract write(G: GraphV2, write_relationship_type: str, write_property: str, node_properties: str | list[str] | dict[str, str], source_node_filter: str, target_node_filter: str, seed_target_nodes: bool = False, top_k: int = 10, similarity_cutoff: float = 0.0, delta_threshold: float = 0.001, max_iterations: int = 100, sample_rate: float = 0.5, perturbation_rate: float = 0.0, random_joins: int = 10, random_seed: int | None = None, initial_sampler: str = 'UNIFORM', relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], write_concurrency: int | None = None, sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None) → KnnWriteResult¶

Runs the Filtered K-Nearest Neighbors algorithm and writes the results back to the database.

The Filtered K-Nearest Neighbors algorithm computes a distance value for node pairs in the graph with customizable source and target node filters, creating new relationships between each node and its k nearest neighbors within the filtered subset.

Parameters:

G (GraphV2) – Graph object to use
write_relationship_type (str) – Name of the relationship type to store the results in.
write_property (str) – Name of the node property to store the results in.
node_properties (str | list[str] | dict[str, str]) – Node properties to use for the similarity computation.
source_node_filter (str) – A Cypher expression to filter which nodes can be sources in the similarity computation.
target_node_filter (str) – A Cypher expression to filter which nodes can be targets in the similarity computation.
seed_target_nodes (bool | None, default=None) – Whether to use a seeded approach for target node selection.
top_k (int) – Number of most similar nodes to return for each node.
similarity_cutoff (float, default=0.0) – The threshold for similarity scores.
delta_threshold (float) – Minimum change between iterations.
max_iterations (int) – Maximum number of iterations to run.
sample_rate (float, default=0.5) – The sampling rate for the algorithm.
perturbation_rate (float, default=0.0) – The rate at which to perturb the similarity graph.
random_joins (int, default=10) – The number of random joins to perform.
random_seed (int | None) – Seed for random number generation to ensure reproducible results.
initial_sampler (str) – Sampling strategy for the initial nearest neighbors.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
write_concurrency (int | None) – Number of concurrent threads to use for writing.
sudo (bool) – Disable the memory guard.
log_progress (bool) – Display progress logging.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.

Returns:

Object containing metadata from the execution.

Return type:

KnnWriteResult

pydantic model graphdatascience.procedure_surface.api.similarity.KnnMutateResult¶

field compute_millis: int¶

field configuration: dict[str, Any]¶

field did_converge: bool¶

field mutate_millis: int¶

field node_pairs_considered: int¶

field nodes_compared: int¶

field post_processing_millis: int¶

field pre_processing_millis: int¶

field ran_iterations: int¶

field relationships_written: int¶

field similarity_distribution: dict[str, int | float]¶

pydantic model graphdatascience.procedure_surface.api.similarity.KnnStatsResult¶

field compute_millis: int¶

field configuration: dict[str, Any]¶

field did_converge: bool¶

field node_pairs_considered: int¶

field nodes_compared: int¶

field post_processing_millis: int¶

field pre_processing_millis: int¶

field ran_iterations: int¶

field similarity_distribution: dict[str, int | float]¶

field similarity_pairs: int¶

pydantic model graphdatascience.procedure_surface.api.similarity.KnnWriteResult¶

field compute_millis: int¶

field configuration: dict[str, Any]¶

field did_converge: bool¶

field node_pairs_considered: int¶

field nodes_compared: int¶

field post_processing_millis: int¶

field pre_processing_millis: int¶

field ran_iterations: int¶

field relationships_written: int¶

field similarity_distribution: dict[str, int | float]¶

field write_millis: int¶

class graphdatascience.procedure_surface.api.similarity.NodeSimilarityEndpoints¶

abstract estimate(G: GraphV2, top_k: int = 10, bottom_k: int = 10, top_n: int = 0, bottom_n: int = 0, similarity_cutoff: float = 1e-42, degree_cutoff: int = 1, upper_degree_cutoff: int = 2147483647, similarity_metric: str = 'JACCARD', use_components: bool | str = False, relationship_weight_property: str | None = None, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, username: str | None = None, concurrency: int | None = None) → EstimationResult¶

Estimates the memory requirements for running the Node Similarity algorithm.

Parameters:

G (GraphV2) – Graph object to use
top_k (int) – Number of most similar nodes to return for each node.
bottom_k (int, default=10) – The maximum number of neighbors with the lowest similarity scores to compute per node.
top_n (int, default=0) – The maximum number of neighbors to select globally based on similarity scores.
bottom_n (int, default=0) – The maximum number of neighbors to select globally based on lowest similarity scores.
similarity_cutoff (float) – The threshold for similarity scores.
degree_cutoff (int, default=1) – The minimum degree a node must have to be considered.
upper_degree_cutoff (int, default=2147483647) – The maximum degree a node can have to be considered.
similarity_metric (str, default="JACCARD") – The similarity metric to use for computation.
use_components (bool | str, default=False) – Whether to compute similarity within connected components. Given a string uses the node property stored in the graph
relationship_weight_property (str | None) – Name of the property to be used as weights.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
sudo (bool) – Disable the memory guard.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
concurrency (int | None) – Number of concurrent threads to use.

Returns:

Object containing the estimated memory requirements.

Return type:

EstimationResult

abstract mutate(G: GraphV2, mutate_relationship_type: str, mutate_property: str, top_k: int = 10, bottom_k: int = 10, top_n: int = 0, bottom_n: int = 0, similarity_cutoff: float = 1e-42, degree_cutoff: int = 1, upper_degree_cutoff: int = 2147483647, similarity_metric: str = 'JACCARD', use_components: bool | str = False, relationship_weight_property: str | None = None, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None) → NodeSimilarityMutateResult¶

Runs the Node Similarity algorithm and stores the results as new relationships in the graph catalog.

Parameters:

G (GraphV2) – Graph object to use
mutate_relationship_type (str) – Name of the relationship type to store the results in.
mutate_property (str) – Name of the node property to store the results in.
top_k (int) – Number of most similar nodes to return for each node.
bottom_k (int, default=10) – The maximum number of neighbors with the lowest similarity scores to compute per node.
top_n (int, default=0) – The maximum number of neighbors to select globally based on similarity scores.
bottom_n (int, default=0) – The maximum number of neighbors to select globally based on lowest similarity scores.
similarity_cutoff (float) – The threshold for similarity scores.
degree_cutoff (int, default=1) – The minimum degree a node must have to be considered.
upper_degree_cutoff (int, default=2147483647) – The maximum degree a node can have to be considered.
similarity_metric (str, default="JACCARD") – The similarity metric to use for computation.
use_components (bool | str, default=False) – Whether to compute similarity within connected components. Given a string uses the node property stored in the graph
relationship_weight_property (str | None) – Name of the property to be used as weights.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
sudo (bool) – Disable the memory guard.
log_progress (bool) – Display progress logging.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.

Returns:

Object containing metadata from the execution.

Return type:

NodeSimilarityMutateResult

abstract stats(G: GraphV2, top_k: int = 10, bottom_k: int = 10, top_n: int = 0, bottom_n: int = 0, similarity_cutoff: float = 1e-42, degree_cutoff: int = 1, upper_degree_cutoff: int = 2147483647, similarity_metric: str = 'JACCARD', use_components: bool | str = False, relationship_weight_property: str | None = None, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None) → NodeSimilarityStatsResult¶

Runs the Node Similarity algorithm and returns execution statistics.

Parameters:

G (GraphV2) – Graph object to use
top_k (int) – Number of most similar nodes to return for each node.
bottom_k (int, default=10) – The maximum number of neighbors with the lowest similarity scores to compute per node.
top_n (int, default=0) – The maximum number of neighbors to select globally based on similarity scores.
bottom_n (int, default=0) – The maximum number of neighbors to select globally based on lowest similarity scores.
similarity_cutoff (float) – The threshold for similarity scores.
degree_cutoff (int, default=1) – The minimum degree a node must have to be considered.
upper_degree_cutoff (int, default=2147483647) – The maximum degree a node can have to be considered.
similarity_metric (str, default="JACCARD") – The similarity metric to use for computation.
use_components (bool | str, default=False) – Whether to compute similarity within connected components. Given a string uses the node property stored in the graph
relationship_weight_property (str | None) – Name of the property to be used as weights.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
sudo (bool) – Disable the memory guard.
log_progress (bool) – Display progress logging.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.

Returns:

Object containing execution statistics and algorithm-specific results.

Return type:

NodeSimilarityStatsResult

abstract stream(G: GraphV2, top_k: int = 10, bottom_k: int = 10, top_n: int = 0, bottom_n: int = 0, similarity_cutoff: float = 1e-42, degree_cutoff: int = 1, upper_degree_cutoff: int = 2147483647, similarity_metric: str = 'JACCARD', use_components: bool | str = False, relationship_weight_property: str | None = None, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None) → DataFrame¶

Runs the Node Similarity algorithm and returns the result as a DataFrame.

Parameters:

G (GraphV2) – Graph object to use
top_k (int) – Number of most similar nodes to return for each node.
bottom_k (int, default=10) – The maximum number of neighbors with the lowest similarity scores to compute per node.
top_n (int, default=0) – The maximum number of neighbors to select globally based on similarity scores.
bottom_n (int, default=0) – The maximum number of neighbors to select globally based on lowest similarity scores.
similarity_cutoff (float) – The threshold for similarity scores.
degree_cutoff (int, default=1) – The minimum degree a node must have to be considered.
upper_degree_cutoff (int, default=2147483647) – The maximum degree a node can have to be considered.
similarity_metric (str, default="JACCARD") – The similarity metric to use for computation.
use_components (bool | str, default=False) – Whether to compute similarity within connected components. Given a string uses the node property stored in the graph
relationship_weight_property (str | None) – Name of the property to be used as weights.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
sudo (bool) – Disable the memory guard.
log_progress (bool) – Display progress logging.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.

Returns:

The similarity results as a DataFrame with columns ‘node1’, ‘node2’, and ‘similarity’.

Return type:

DataFrame

abstract write(G: GraphV2, write_relationship_type: str, write_property: str, top_k: int = 10, bottom_k: int = 10, top_n: int = 0, bottom_n: int = 0, similarity_cutoff: float = 1e-42, degree_cutoff: int = 1, upper_degree_cutoff: int = 2147483647, similarity_metric: str = 'JACCARD', use_components: bool | str = False, relationship_weight_property: str | None = None, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None, write_concurrency: int | None = None) → NodeSimilarityWriteResult¶

Runs the Node Similarity algorithm and writes the results back to the database.

Parameters:

G (GraphV2) – Graph object to use
write_relationship_type (str) – Name of the relationship type to store the results in.
write_property (str) – Name of the node property to store the results in.
top_k (int) – Number of most similar nodes to return for each node.
bottom_k (int, default=10) – The maximum number of neighbors with the lowest similarity scores to compute per node.
top_n (int, default=0) – The maximum number of neighbors to select globally based on similarity scores.
bottom_n (int, default=0) – The maximum number of neighbors to select globally based on lowest similarity scores.
similarity_cutoff (float) – The threshold for similarity scores.
degree_cutoff (int, default=1) – The minimum degree a node must have to be considered.
upper_degree_cutoff (int, default=2147483647) – The maximum degree a node can have to be considered.
similarity_metric (str, default="JACCARD") – The similarity metric to use for computation.
use_components (bool | str, default=False) – Whether to compute similarity within connected components. Given a string uses the node property stored in the graph
relationship_weight_property (str | None) – Name of the property to be used as weights.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
sudo (bool) – Disable the memory guard.
log_progress (bool) – Display progress logging.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.
write_concurrency (int | None) – Number of concurrent threads to use for writing.Returns
-------
NodeSimilarityWriteResult – Object containing metadata from the execution.

Return type:

NodeSimilarityWriteResult

class graphdatascience.procedure_surface.api.similarity.NodeSimilarityFilteredEndpoints¶

abstract estimate(G: GraphV2 | dict[str, Any], source_node_filter: str | list[int], target_node_filter: str | list[int], top_k: int = 10, bottom_k: int = 10, top_n: int = 0, bottom_n: int = 0, similarity_cutoff: float = 1e-42, degree_cutoff: int = 1, upper_degree_cutoff: int = 2147483647, similarity_metric: str = 'JACCARD', use_components: bool | str = False, relationship_weight_property: str | None = None, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, username: str | None = None, concurrency: int | None = None) → EstimationResult¶

Estimates the memory requirements for running the Filtered Node Similarity algorithm.

Parameters:

G (GraphV2 | dict[str, Any]) – Graph object to use or a dictionary representing the graph dimensions.
source_node_filter (str | list[int]) – A Cypher expression or list of node IDs to filter which nodes can be sources.
target_node_filter (str | list[int]) – A Cypher expression or list of node IDs to filter which nodes can be targets.
top_k (int) – Number of most similar nodes to return for each node.
bottom_k (int, default=10) – The maximum number of neighbors with the lowest similarity scores to compute per node.
top_n (int, default=0) – The maximum number of neighbors to select globally based on similarity scores.
bottom_n (int, default=0) – The maximum number of neighbors to select globally based on lowest similarity scores.
similarity_cutoff (float) – The threshold for similarity scores.
degree_cutoff (int, default=1) – The minimum degree a node must have to be considered.
upper_degree_cutoff (int, default=2147483647) – The maximum degree a node can have to be considered.
similarity_metric (str, default="JACCARD") – The similarity metric to use for computation. JACCARD, OVERLAP or COSINE.
use_components (bool | str, default=False) – Whether to compute similarity within connected components. Given a string uses the node property stored in the graph
relationship_weight_property (str | None) – Name of the property to be used as weights.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
sudo (bool) – Disable the memory guard.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
concurrency (int | None) – Number of concurrent threads to use.

Returns:

Object containing the estimated memory requirements.

Return type:

EstimationResult

abstract mutate(G: GraphV2, mutate_relationship_type: str, mutate_property: str, source_node_filter: str | list[int], target_node_filter: str | list[int], top_k: int = 10, bottom_k: int = 10, top_n: int = 0, bottom_n: int = 0, similarity_cutoff: float = 1e-42, degree_cutoff: int = 1, upper_degree_cutoff: int = 2147483647, similarity_metric: str = 'JACCARD', use_components: bool | str = False, relationship_weight_property: str | None = None, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None) → NodeSimilarityMutateResult¶

Runs the Filtered Node Similarity algorithm and stores the results as new relationships in the graph catalog.

Parameters:

G (GraphV2) – Graph object to use
mutate_relationship_type (str) – Name of the relationship type to store the results in.
mutate_property (str) – Name of the node property to store the results in.
source_node_filter (str | list[int]) – A Cypher expression or list of node IDs to filter which nodes can be sources.
target_node_filter (str | list[int]) – A Cypher expression or list of node IDs to filter which nodes can be targets.
top_k (int) – Number of most similar nodes to return for each node.
bottom_k (int, default=10) – The maximum number of neighbors with the lowest similarity scores to compute per node.
top_n (int, default=0) – The maximum number of neighbors to select globally based on similarity scores.
bottom_n (int, default=0) – The maximum number of neighbors to select globally based on lowest similarity scores.
similarity_cutoff (float) – The threshold for similarity scores.
degree_cutoff (int, default=1) – The minimum degree a node must have to be considered.
upper_degree_cutoff (int, default=2147483647) – The maximum degree a node can have to be considered.
similarity_metric (str, default="JACCARD") – The similarity metric to use for computation. JACCARD, OVERLAP or COSINE.
use_components (bool | str, default=False) – Whether to compute similarity within connected components. Given a string uses the node property stored in the graph
relationship_weight_property (str | None) – Name of the property to be used as weights.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
sudo (bool) – Disable the memory guard.
log_progress (bool) – Display progress logging.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.

Returns:

Object containing metadata from the execution.

Return type:

NodeSimilarityMutateResult

abstract stats(G: GraphV2, source_node_filter: str | list[int], target_node_filter: str | list[int], top_k: int = 10, bottom_k: int = 10, top_n: int = 0, bottom_n: int = 0, similarity_cutoff: float = 1e-42, degree_cutoff: int = 1, upper_degree_cutoff: int = 2147483647, similarity_metric: str = 'JACCARD', use_components: bool | str = False, relationship_weight_property: str | None = None, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None) → NodeSimilarityStatsResult¶

Runs the Filtered Node Similarity algorithm and returns execution statistics.

Parameters:

G (GraphV2) – Graph object to use
source_node_filter (str | list[int]) – A Cypher expression or list of node IDs to filter which nodes can be sources.
target_node_filter (str | list[int]) – A Cypher expression or list of node IDs to filter which nodes can be targets.
top_k (int) – Number of most similar nodes to return for each node.
bottom_k (int, default=10) – The maximum number of neighbors with the lowest similarity scores to compute per node.
top_n (int, default=0) – The maximum number of neighbors to select globally based on similarity scores.
bottom_n (int, default=0) – The maximum number of neighbors to select globally based on lowest similarity scores.
similarity_cutoff (float) – The threshold for similarity scores.
degree_cutoff (int, default=1) – The minimum degree a node must have to be considered.
upper_degree_cutoff (int, default=2147483647) – The maximum degree a node can have to be considered.
similarity_metric (str, default="JACCARD") – The similarity metric to use for computation. JACCARD, OVERLAP or COSINE.
use_components (bool | str, default=False) – Whether to compute similarity within connected components. Given a string uses the node property stored in the graph
relationship_weight_property (str | None) – Name of the property to be used as weights.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
sudo (bool) – Disable the memory guard.
log_progress (bool) – Display progress logging.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.

Returns:

Object containing execution statistics and algorithm-specific results.

Return type:

NodeSimilarityStatsResult

abstract stream(G: GraphV2, source_node_filter: str | list[int], target_node_filter: str | list[int], top_k: int = 10, bottom_k: int = 10, top_n: int = 0, bottom_n: int = 0, similarity_cutoff: float = 1e-42, degree_cutoff: int = 1, upper_degree_cutoff: int = 2147483647, similarity_metric: str = 'JACCARD', use_components: bool | str = False, relationship_weight_property: str | None = None, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None) → DataFrame¶

Runs the Filtered Node Similarity algorithm and returns the result as a DataFrame.

Parameters:

G (GraphV2) – Graph object to use
source_node_filter (str | list[int]) – A Cypher expression or list of node IDs to filter which nodes can be sources.
target_node_filter (str | list[int]) – A Cypher expression or list of node IDs to filter which nodes can be targets.
top_k (int) – Number of most similar nodes to return for each node.
bottom_k (int, default=10) – The maximum number of neighbors with the lowest similarity scores to compute per node.
top_n (int, default=0) – The maximum number of neighbors to select globally based on similarity scores.
bottom_n (int, default=0) – The maximum number of neighbors to select globally based on lowest similarity scores.
similarity_cutoff (float) – The threshold for similarity scores.
degree_cutoff (int, default=1) – The minimum degree a node must have to be considered.
upper_degree_cutoff (int, default=2147483647) – The maximum degree a node can have to be considered.
similarity_metric (str, default="JACCARD") – The similarity metric to use for computation. JACCARD, OVERLAP or COSINE.
use_components (bool | str, default=False) – Whether to compute similarity within connected components. Given a string uses the node property stored in the graph
relationship_weight_property (str | None) – Name of the property to be used as weights.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
sudo (bool) – Disable the memory guard.
log_progress (bool) – Display progress logging.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.

Returns:

The similarity results as a DataFrame with columns ‘node1’, ‘node2’, and ‘similarity’.

Return type:

DataFrame

abstract write(G: GraphV2, write_relationship_type: str, write_property: str, source_node_filter: str | list[int], target_node_filter: str | list[int], top_k: int = 10, bottom_k: int = 10, top_n: int = 0, bottom_n: int = 0, similarity_cutoff: float = 1e-42, degree_cutoff: int = 1, upper_degree_cutoff: int = 2147483647, similarity_metric: str = 'JACCARD', use_components: bool | str = False, relationship_weight_property: str | None = None, relationship_types: list[str] = ['*'], node_labels: list[str] = ['*'], sudo: bool = False, log_progress: bool = True, username: str | None = None, concurrency: int | None = None, job_id: str | None = None, write_concurrency: int | None = None) → NodeSimilarityWriteResult¶

Runs the Filtered Node Similarity algorithm and writes the results back to the database.

Parameters:

G (GraphV2) – Graph object to use
write_relationship_type (str) – Name of the relationship type to store the results in.
write_property (str) – Name of the node property to store the results in.
source_node_filter (str | list[int]) – A Cypher expression or list of node IDs to filter which nodes can be sources.
target_node_filter (str | list[int]) – A Cypher expression or list of node IDs to filter which nodes can be targets.
top_k (int) – Number of most similar nodes to return for each node.
bottom_k (int, default=10) – The maximum number of neighbors with the lowest similarity scores to compute per node.
top_n (int, default=0) – The maximum number of neighbors to select globally based on similarity scores.
bottom_n (int, default=0) – The maximum number of neighbors to select globally based on lowest similarity scores.
similarity_cutoff (float) – The threshold for similarity scores.
degree_cutoff (int, default=1) – The minimum degree a node must have to be considered.
upper_degree_cutoff (int, default=2147483647) – The maximum degree a node can have to be considered.
similarity_metric (str, default="JACCARD") – The similarity metric to use for computation. JACCARD, OVERLAP or COSINE.
use_components (bool | str, default=False) – Whether to compute similarity within connected components. Given a string uses the node property stored in the graph
relationship_weight_property (str | None) – Name of the property to be used as weights.
relationship_types (list[str]) – Filter the graph using the given relationship types. Relationships with any of the given types will be included.
node_labels (list[str]) – Filter the graph using the given node labels. Nodes with any of the given labels will be included.
sudo (bool) – Disable the memory guard.
log_progress (bool) – Display progress logging.
username (str | None) – As an administrator, impersonate a different user for accessing their graphs.
concurrency (int | None) – Number of concurrent threads to use.
job_id (str | None) – Identifier for the computation.
write_concurrency (int | None) – Number of concurrent threads to use for writing.Returns
-------
NodeSimilarityWriteResult – Object containing metadata from the execution.

Return type:

NodeSimilarityWriteResult

pydantic model graphdatascience.procedure_surface.api.similarity.NodeSimilarityMutateResult¶

field compute_millis: int¶

field configuration: dict[str, Any]¶

field mutate_millis: int¶

field nodes_compared: int¶

field post_processing_millis: int¶

field pre_processing_millis: int¶

field relationships_written: int¶

field similarity_distribution: dict[str, Any]¶

pydantic model graphdatascience.procedure_surface.api.similarity.NodeSimilarityStatsResult¶

field compute_millis: int¶

field configuration: dict[str, Any]¶

field nodes_compared: int¶

field post_processing_millis: int¶

field pre_processing_millis: int¶

field similarity_distribution: dict[str, Any]¶

field similarity_pairs: int¶

pydantic model graphdatascience.procedure_surface.api.similarity.NodeSimilarityWriteResult¶

field compute_millis: int¶

field configuration: dict[str, Any]¶

field nodes_compared: int¶

field post_processing_millis: int¶

field pre_processing_millis: int¶

field relationships_written: int¶

field similarity_distribution: dict[str, Any]¶

field write_millis: int¶