GraphGists

Semantically Linking Cancer Models

This graphgist provides a set of Cypher queries to use the property graph data model as a way of linking together cancer model descriptions with metadata.

By a cancer 'model', we mean a 'mathematical' or 'computational' model that forms the basis for a 'simulation'. In this case, we refer to modeling cancer biology and therefore simulations of cancer function and progression (i.e. predicted processes of cancer). A 'model description' is an abstraction of a model that includes metadata as well as a reference to a model implementation (perhaps code or an executable). Models typically have some initial state, where a simulation produces some change or progressive changes in that state. In the case of cancer models, we might have some starting parameters and a 3D model of a tumor that when a simulation is run, the parameters are updated and the 3D model changes to reflect a prediction over time of the tumor’s lifecycle. For an introduction to cancer biology, check out a free eBook here https://bookboon.com/en/introduction-to-cancer-biology-ebook

We believe that representing a collection of cancer models in a property graph moves us towards creating queryable systems by which we can explore combinations of cancer model via semantic links with domain data. The idea behind using a property graph data model is to be able to link together different cancer models via domain-specific metadata, such as units, computational types, and biological terms and taxanommoic categorizations.

With this database, we can ask questions such as:

  • What cancer models fall into the same categories?

  • What cancer models have common input and output parameters?

  • What cancer models are directly or indirectly linked together?

Data model

Entities

  • MODEL - Represents an abstract model description

  • PARAMETER - Represents a model interface parameter

  • CATEGORY - Metadata representing a categorization of a cancer model

  • TERM - Metadata representing a controlled vocabulary term

  • UNIT - Metadata representing a unit of measurement

  • PERSON - Represents a person

  • ORGANISATION - Represents an organisation

Relationships

  • HAS_INPUT - Connects a model with its input parameters

  • HAS_OUTPUT - Connects a model with its output parameters

  • HAS_METADATA - Connects models and parameters to metadata

  • HAS_CATEGORY - Connects categories with other categories (i.e. subcategories)

  • CONTAINS - Connects a model with other sub-models (to show model composition)

  • SYNONYM_OF_TERM - Connects terms with synonymous terms

  • CREATED_BY - Connects a model with a creator (or author) of the model

  • CONTRIBUTED_BY - Connects a model with a publisher of the model description (i.e. the database record)

Query console

Populate the graph Top

Visualize the graph Top

Below is a visualization of the whole property graph using a test dataset of cancer model descriptions. For more about the cancer models used, check out our paper 'Semantically Linking In Silico Cancer Models' https://www.la-press.com/semantically-linking-in-silico-cancer-models-article-a4552

The full graph is not particularly useful on its own. It does however provide the basis for more interesting queries such as those below.


Cancer model descriptions

This is how a model is visualized as a property graph. In this case this is an EGFR-ERK molecular pathway model by Z. Wang et al in 'Simulating non-small cell lung cancer with a multiscale agent-based model, Theor Biol Med Model, 4(1):50+, Dec. 2007'. https://dx.doi.org/10.1186/1742-4682-4-50

EGFR-ERK Pathway as property graph

A model has metadata such as a creator, contributor, publisher, model categorizations (e.g. Lung cancer, Homogenous tumor mass, Subcellular scale), and input and output parameters (e.g. EGF concentration, cell cycle time). Parameters can be linked to other domain metadata represented in domain graphs.

Domain metadata

Categories

  • Taxanomic categorizations of cancer models.

MATCH (n:Category)
RETURN collect(n.name) AS Categories

Units

  • Numerical units that include SI units and some derived.

MATCH (n:Unit)
RETURN collect(n.name) AS Units

CLI datatypes

  • Command Line Interface datatypes, such as Strings, Filenames, and differently formatted numbers.

MATCH (n:Type)
RETURN collect(n.name) AS CLIdatatypes

People and organisations

  • Outside entities that have some ownership or contributorship to the model development or description.

MATCH (n:Person)-[:MEMBER_OF]-(m:Organisation)
RETURN n.name AS Name, m.name AS Organisation

What cancer models are modeled using continuous mathematics? Top

MATCH (n:Model)
WHERE (n)-[:HAS_METADATA]-({ name:'Continuous'})
RETURN n.title AS Model

What cancer models have common and compatible input and output parameters? Top

MATCH (n:Model)-[:HAS_INPUT]-(p:Parameter)-[:HAS_METADATA]-(meta:Term)-[:HAS_METADATA]-(q:Parameter)-[:HAS_OUTPUT]-(m:Model)
WHERE n<>m
RETURN DISTINCT m.title AS ModelA, n.title AS ModelB, q.name AS OutputA, p.name AS InputB, meta.term

In the multiscale Alarcon 2003 model, what are the component models and corresponding scales? Top

MATCH (n:Model {URN: 'urn:miriam:tumor:000013'})-[:HAS_METADATA]-(:Category {name: 'Multiscale'}), (m:Model)
MATCH (n)-[:CONTAINS]-(m)
MATCH (m)-[:HAS_METADATA]-(scale:Category)-[:HAS_CATEGORY]-(:Category {name: 'Single-scale'})
RETURN m.title AS ComponentModel, scale.name As Scale