Common datasets
For convenience, the library is shipped with a few common datasets. These are easily imported to GDS to get a graph object representing the dataset.
The common datasets comes with a loader method that takes two optional parameters:
graph_name
which assigns a graph name,
undirected
which takes a boolean and will load the graph as undirected if set to true.
If a graph is loaded as undirected = True
, then it will have twice the number of relationships compared to its directed version.
The default value for undirected
varies for each dataset.
For example:
G = gds.graph.load_cora()
assert G.node_count() == 2708
assert G.node_labels() == ["Paper"]
1. Datasets
1.1. Cora
A well known citation network introduced by Automating the Construction of Internet Portals with Machine Learning and used in many node classification or link prediction publications.
The default is to load Cora as undirected = False
.
Name | Value |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1.2. Karate club
A well known social network introduced by Zachary.
The default is to load Karate club as undirected = False
.
Name | Value |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1.3. IMDB
A heterogeneous graph that is used to benchmark node classification or link prediction models such as Heterogeneous Graph Attention Network, MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding and Graph Transformer Networks. The graph contains Actors, Directors, Movies (and UnclassifiedMovies) as nodes, and relationships between actors and movies that they acted in, and between directors and movies which they directed for.
The default is to load IMDB dataset as undirected = True
. If loaded as directed, it will have half the relationships.
Name | Value |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Was this page helpful?