Catalog Arrow Endpoints

class graphdatascience.procedure_surface.arrow.catalog.catalog_arrow_endpoints.CatalogArrowEndpoints
__new__(**kwargs)
construct(graph_name: str, nodes: DataFrame | list[DataFrame], relationships: DataFrame | list[DataFrame] | None = None, concurrency: int | None = None, undirected_relationship_types: list[str] | None = None) GraphV2

Construct a graph from a list of node and relationship dataframes.

Parameters:
  • graph_name (str) – Name of the graph to construct

  • nodes (DataFrame | list[DataFrame]) –

    Node dataframes. A dataframe should follow the schema:

    • nodeId to identify uniquely the node overall dataframes

    • labels to specify the labels of the node as a list of strings (optional)

    • other columns are treated as node properties

  • relationships (DataFrame | list[DataFrame] | None) –

    Relationship dataframes. A dataframe should follow the schema:

    • sourceNodeId to identify the start node of the relationship

    • targetNodeId to identify the end node of the relationship

    • relationshipType to specify the type of the relationship (optional)

    • other columns are treated as relationship properties

  • concurrency (int | None) – Number of concurrent threads to use.

  • undirected_relationship_types (list[str] | None) – List of relationship types to treat as undirected.

Returns:

Constructed graph object.

Return type:

GraphV2

property datasets: DatasetEndpoints

Endpoints for loading predefined datasets into the graph catalog.

drop(G: GraphV2 | str, fail_if_missing: bool = True) GraphInfo | None

Drop a graph from the graph catalog.

Parameters:
  • G (GraphV2 | str) – Graph to drop by name of object.

  • fail_if_missing (bool) – Whether to fail if the graph is missing

Returns:

GraphV2 metadata object containing information like node count.

Return type:

GraphListResult

filter(G: GraphV2, graph_name: str, node_filter: str, relationship_filter: str, concurrency: int | None = None, job_id: str | None = None) GraphWithFilterResult

Create a subgraph of a graph based on a filter expression.

Parameters:
  • G (GraphV2) – Graph object to use

  • (str) (relationship_filter) – Name of subgraph to create

  • (str) – Filter expression for nodes

  • (str) – Filter expression for relationships

  • concurrency (int | None) – Number of concurrent threads to use.

  • job_id (str | None) – Identifier for the computation.

  • graph_name (str)

  • node_filter (str)

  • relationship_filter (str)

Returns:

tuple of the filtered graph object and the information like graph name, node count, relationship count, etc.

Return type:

GraphWithFilterResult

generate(graph_name: str, node_count: int, average_degree: float, *, relationship_distribution: str | None = None, relationship_seed: int | None = None, relationship_property: RelationshipPropertySpec | None = None, orientation: str | None = None, allow_self_loops: bool | None = None, read_concurrency: int | None = None, job_id: str | None = None, sudo: bool = False, log_progress: bool = True, username: str | None = None) GraphWithGenerationStats

Generates a random graph and store it in the graph catalog.

Parameters:
  • graph_name (str) – Name of the generated graph.

  • node_count (int) – The number of nodes in the generated graph

  • average_degree (float) – The average out-degree of the generated nodes

  • relationship_distribution (str | None, default=None) – Determines the relationship distribution strategy.

  • relationship_seed (int | None, default=None) – Seed value for generating deterministic relationships.

  • relationship_property (RelationshipPropertySpec | None, default=None) – Configure generated relationship properties.

  • orientation (str | None, default=None) – Specifies the orientation of the generated relationships.

  • allow_self_loops (bool | None, default=None) – Whether nodes in the graph can have relationships where start and end nodes are the same.

  • read_concurrency (int | None, default=None) – Number of concurrent threads/processes to use during graph generation.

  • job_id (str | None) – Identifier for the computation.

  • sudo (bool) – Disable the memory guard.

  • log_progress (bool) – Display progress logging.

  • username (str | None) – As an administrator, impersonate a different user for accessing their graphs.

Returns:

tuple of the generated graph object and the result object containing stats about the generation.

Return type:

GraphGenerationStats

get(graph_name: str) GraphV2

Retrieve a handle to a graph from the graph catalog.

Parameters:

graph_name (str) – The name of the graph.

Returns:

A handle to the graph.

Return type:

GraphV2

list(G: GraphV2 | str | None = None) list[GraphInfoWithDegrees]

List graphs in the graph catalog.

Parameters:

G (GraphV2 | str | None) – GraphV2 object or name to filter results. If None, list all graphs.

Returns:

List of graph metadata objects containing information like node count.

Return type:

list[GraphInfoWithDegrees]

property node_labels: NodeLabelEndpoints

Endpoints for node label operations.

property node_properties: NodePropertiesEndpoints

Endpoints for node label operations.

project(graph_name: str, query: str, *, job_id: str | None = None, concurrency: int | None = None, undirected_relationship_types: list[str] | None = None, inverse_indexed_relationship_types: list[str] | None = None, batch_size: int | None = None, logging: bool = True) GraphWithProjectResult

Projects a graph from the Neo4j database into the GDS graph catalog.

Parameters:
  • graph_name (str) – Name of the graph to be created in the catalog.

  • query (str) – Cypher query to select nodes and relationships for the graph projection. Must contain gds.graph.project.remote. Example: MATCH (n)–>(m) RETURN gds.graph.project.remote(n, m)

  • job_id (str | None) – Identifier for the computation.

  • concurrency (int | None) – Number of concurrent threads to use.

  • undirected_relationship_types (list[str]) – List of relationship types to treat as undirected.

  • inverse_indexed_relationship_types (list[str]) – List of relationship types to index in both directions.

  • batch_size (int | None, default=None) – Number of rows to process in each batch when projecting the graph.

  • logging (bool, default=True) – Whether to log progress during graph projection.

Returns:

A result object containing information about the projected graph.

Return type:

ProjectionResult

property relationships: RelationshipsEndpoints

Endpoints for relationship operations.

property sample: GraphSamplingEndpoints

Endpoints for graph sampling.

class graphdatascience.procedure_surface.arrow.catalog.CatalogArrowEndpoints
construct(graph_name: str, nodes: DataFrame | list[DataFrame], relationships: DataFrame | list[DataFrame] | None = None, concurrency: int | None = None, undirected_relationship_types: list[str] | None = None) GraphV2

Construct a graph from a list of node and relationship dataframes.

Parameters:
  • graph_name (str) – Name of the graph to construct

  • nodes (DataFrame | list[DataFrame]) –

    Node dataframes. A dataframe should follow the schema:

    • nodeId to identify uniquely the node overall dataframes

    • labels to specify the labels of the node as a list of strings (optional)

    • other columns are treated as node properties

  • relationships (DataFrame | list[DataFrame] | None) –

    Relationship dataframes. A dataframe should follow the schema:

    • sourceNodeId to identify the start node of the relationship

    • targetNodeId to identify the end node of the relationship

    • relationshipType to specify the type of the relationship (optional)

    • other columns are treated as relationship properties

  • concurrency (int | None) – Number of concurrent threads to use.

  • undirected_relationship_types (list[str] | None) – List of relationship types to treat as undirected.

Returns:

Constructed graph object.

Return type:

GraphV2

drop(G: GraphV2 | str, fail_if_missing: bool = True) GraphInfo | None

Drop a graph from the graph catalog.

Parameters:
  • G (GraphV2 | str) – Graph to drop by name of object.

  • fail_if_missing (bool) – Whether to fail if the graph is missing

Returns:

GraphV2 metadata object containing information like node count.

Return type:

GraphListResult

filter(G: GraphV2, graph_name: str, node_filter: str, relationship_filter: str, concurrency: int | None = None, job_id: str | None = None) GraphWithFilterResult

Create a subgraph of a graph based on a filter expression.

Parameters:
  • G (GraphV2) – Graph object to use

  • (str) (relationship_filter) – Name of subgraph to create

  • (str) – Filter expression for nodes

  • (str) – Filter expression for relationships

  • concurrency (int | None) – Number of concurrent threads to use.

  • job_id (str | None) – Identifier for the computation.

  • graph_name (str)

  • node_filter (str)

  • relationship_filter (str)

Returns:

tuple of the filtered graph object and the information like graph name, node count, relationship count, etc.

Return type:

GraphWithFilterResult

generate(graph_name: str, node_count: int, average_degree: float, *, relationship_distribution: str | None = None, relationship_seed: int | None = None, relationship_property: RelationshipPropertySpec | None = None, orientation: str | None = None, allow_self_loops: bool | None = None, read_concurrency: int | None = None, job_id: str | None = None, sudo: bool = False, log_progress: bool = True, username: str | None = None) GraphWithGenerationStats

Generates a random graph and store it in the graph catalog.

Parameters:
  • graph_name (str) – Name of the generated graph.

  • node_count (int) – The number of nodes in the generated graph

  • average_degree (float) – The average out-degree of the generated nodes

  • relationship_distribution (str | None, default=None) – Determines the relationship distribution strategy.

  • relationship_seed (int | None, default=None) – Seed value for generating deterministic relationships.

  • relationship_property (RelationshipPropertySpec | None, default=None) – Configure generated relationship properties.

  • orientation (str | None, default=None) – Specifies the orientation of the generated relationships.

  • allow_self_loops (bool | None, default=None) – Whether nodes in the graph can have relationships where start and end nodes are the same.

  • read_concurrency (int | None, default=None) – Number of concurrent threads/processes to use during graph generation.

  • job_id (str | None) – Identifier for the computation.

  • sudo (bool) – Disable the memory guard.

  • log_progress (bool) – Display progress logging.

  • username (str | None) – As an administrator, impersonate a different user for accessing their graphs.

Returns:

tuple of the generated graph object and the result object containing stats about the generation.

Return type:

GraphGenerationStats

get(graph_name: str) GraphV2

Retrieve a handle to a graph from the graph catalog.

Parameters:

graph_name (str) – The name of the graph.

Returns:

A handle to the graph.

Return type:

GraphV2

list(G: GraphV2 | str | None = None) list[GraphInfoWithDegrees]

List graphs in the graph catalog.

Parameters:

G (GraphV2 | str | None) – GraphV2 object or name to filter results. If None, list all graphs.

Returns:

List of graph metadata objects containing information like node count.

Return type:

list[GraphInfoWithDegrees]

property node_labels: NodeLabelEndpoints

Endpoints for node label operations.

property node_properties: NodePropertiesEndpoints

Endpoints for node label operations.

project(graph_name: str, query: str, *, job_id: str | None = None, concurrency: int | None = None, undirected_relationship_types: list[str] | None = None, inverse_indexed_relationship_types: list[str] | None = None, batch_size: int | None = None, logging: bool = True) GraphWithProjectResult

Projects a graph from the Neo4j database into the GDS graph catalog.

Parameters:
  • graph_name (str) – Name of the graph to be created in the catalog.

  • query (str) – Cypher query to select nodes and relationships for the graph projection. Must contain gds.graph.project.remote. Example: MATCH (n)–>(m) RETURN gds.graph.project.remote(n, m)

  • job_id (str | None) – Identifier for the computation.

  • concurrency (int | None) – Number of concurrent threads to use.

  • undirected_relationship_types (list[str]) – List of relationship types to treat as undirected.

  • inverse_indexed_relationship_types (list[str]) – List of relationship types to index in both directions.

  • batch_size (int | None, default=None) – Number of rows to process in each batch when projecting the graph.

  • logging (bool, default=True) – Whether to log progress during graph projection.

Returns:

A result object containing information about the projected graph.

Return type:

ProjectionResult

property relationships: RelationshipsEndpoints

Endpoints for relationship operations.

property sample: GraphSamplingEndpoints

Endpoints for graph sampling.

class graphdatascience.procedure_surface.arrow.catalog.GraphWithProjectResult

Result object for graph projection jobs, containing the projected graph and the projection result. Can be used as a context manager to ensure the projected graph is dropped after use.

static __new__(_cls, graph: GraphV2, result: ProjectionResult)

Create new instance of GraphWithProjectResult(graph, result)

Parameters:
graph: GraphV2

Alias for field number 0

result: ProjectionResult

Alias for field number 1

pydantic model graphdatascience.procedure_surface.arrow.catalog.ProjectionResult

Result object for graph projection jobs.

field configuration: dict[str, Any]
field graph_name: str
field node_count: int
field project_millis: int
field query: str
field relationship_count: int