Session Track: Data Intelligence
Session Time:
Session description
Understanding gene-disease associations is essential for uncovering the molecular mechanisms underlying disease and identifying new therapeutic targets. Traditional methods, such as genome-wide association studies (GWAS), have provided valuable insights but are limited to single-modality genomic data, often overlooking complementary information from other biological and clinical domains. In this study, we use a multimodal knowledge graph (KG) to predict gene-disease associations for cardiovascular diseases and assess the impact of integrating cardiac magnetic resonance imaging (CMR) traits. We constructed a KG using UK Biobank data and 18 external biomedical databases. The CMR traits were incorporated in the KG as additional nodes. Embeddings were generated using a directed variational graph auto-encoder (DVGAE), and gene-disease associations were predicted using three machine learning models: support vector machine (SVM), random forest (RF), and artificial neural network (ANN). The top predicted genes were validated using pathway enrichment analysis. PageRank was applied to assess the importance level of the entities, especially the imaging data. This work shows how to integrate imaging data in KG and how to assess their significance, Furthermore, it demonstrates how to integrate the features (inside the node and on the relationships) in the embedding algorithm. The SVM model showed the best results, with AUCs of 0.80 for HF, 0.78 for AF, and 0.83 for MI. Notably, CMR nodes received higher PageRank scores (average 51.1), indicating their strong influence in the graph. Furthermore, including CMR data improved both predictive performance and the number of enriched pathways.
Postdoctoral Researcher, MRC Laboratory of Medical Sciences, Imperial College London
Khaled Rjoob is a Postdoctoral Researcher in Data Science at MRC Laboratory of Medical Sciences, Imperial College London, Hammersmith Hospital Campus, London, UK. He has more than six years of experience applying machine learning and statistical methods in healthcare. His current work focuses on using AI and knowledge graphs on UK Biobank data to predict novel gene–disease associations and support drug repurposing. Previously, at UCL's Institute of Cardiovascular Science, Khaled collaborated on long-COVID research using data from the ZOE app and NSHD metabolomics. He has published peer-reviewed articles, presented at international conferences, and enjoys interdisciplinary collaboration and teaching data science.