Towards understanding the factors tuning specific biomolecular interactionsInformation Collapse
Towards understanding the factors tuning specific biomolecular interactions
The interplay between specificity and affinity is a significant drive in biomolecular recognition. For many systems, the experimental determination of these factors has been challenging, which has aroused interest in computational prediction of binding affinity/specificity. Numerous algorithms are poised to calculate the binding affinity of biomolecular complexes. On the other hand, there is only a handful of approaches tuned to predict the specificity of binding. To that end, we have developed novel and simple specificity metrics upon working on two model systems. Our metrics are based on the statistical analysis of dynamic interaction profiles of intermolecular surfaces. In our first model system, we uncover that a specific salt-bridge network directs the ligand selection of the transmembrane receptor tyrosine kinase Axl. In our second system, we reveal that distinct hydrogen bonding profiles explain the sequence specificity of de novo DNA methyltransferases. We are currently exploring the generic applicability of our metrics to epigenetics-related systems.
Ezgi Karaca currently works as an Assistant professor at Izmir Biomedicine and Genome Center (Turkey) where she is the Principal Investigator of Karaca Lab. Her research in computational structural biology focuses on integrative modeling, protein dynamics, physical principles of biomolecular interactions by determining (and/or dissecting) the structures of biomolecular complexes as well as developing various computational tools, such as docking, homology modeling, and molecular dynamics. She was a Postdoctoral Fellow (2013-2016) in EMBL (European Molecular Biology Laboratory) and selected for the Alexander von Humboldt Research Fellowship in 2014, for her research on Molecular Basis of Ribosomal RNA Methylation. Dr. Ezgi Karaca awarded with Young Scientist Award (BAGEP) by Science Academy (Turkey) and Installation Grant by EMBO (European Molecular Biology Organization) in 2020. She became a member of the Turkish Medical Informatics Association (TURKMIA) in 2018. Dr. Ezgi Karaca received her BS and MS in chemical engineering from Bogazici University (Turkey), and her PhD in chemistry from Utrecht University (Netherlands).
Presenting author: Rıza ÖzçelikAbstract Collapse
ChemBoost: A chemical language based approach for protein - ligand binding affinity prediction
Rıza Özçelik, Hakime Öztürk, Arzucan Özgür and Elif Ozkirimli
Identification of high affinity drug-target interactions is a major research question in drug discovery. Proteins are generally represented by their structures or sequences. However, structures are available only for a small subset of biomolecules and sequence similarity is not always correlated with functional similarity. We propose ChemBoost, a chemical language based approach for affinity prediction using SMILES syntax. We hypothesize that SMILES is a codified language and ligands are documents composed of chemical words. These documents can be used to learn chemical word vectors that represent words in similar contexts with similar vectors. In ChemBoost, the ligands are represented via chemical word embeddings, while the proteins are represented through sequence-based features and/or chemical words of their ligands. Our aim is to process the patterns in SMILES as a language to predict protein-ligand affinity, even when we cannot infer the function from the sequence. We used eXtreme Gradient Boosting to predict protein-ligand affinities in KIBA and BindingDB data sets. ChemBoost was able to predict drug-target binding affinity as well as or better than state-of-the-art machine learning systems. When powered with ligand-centric representations, ChemBoost was more robust to the changes in protein sequence similarity and successfully captured the interactions between a protein and a ligand, even if the protein has low sequence similarity to the known targets of the ligand.
Presenting author: Altuğ KamacıoğluAbstract Collapse
Systematic analysis of phosphorylation structure
Altuğ Kamacıoğlu, Nurhan Ozlu and Nurcan Tuncbag
Phosphorylation is an essential post-translational modification for the regulation of almost all cellular processes. Several phosphorylation-sites for diverse cellular mechanisms and their corresponding kinases and quantitative change in phosphorylation is revealed with widespread quantitative phosphoproteomics analyses and even though the structure of a single protein and its phosphorylation-sites are studied, no systematic analysis concerning the structure of whole phosphoproteomics has been performed. In this study, we focused on the structural mechanism of phosphorylation to detect the respective location of phospho-sites through relative solvent accessibility of the phospho-sites and their characteristic features based on their location. We build on the data from all phosphorylation regions in current databases and a selected paper which filter false positive phosphorylation via quality-control. We find that a certain part of phosphorylation-sites locates in core part of protein with extremely low solvent accessibility and we observed that core phosphorylation-sites are highly found in false-positive phosphorylation-sites in databases. Core phosphorylation-sites are significantly less functional and more rigid than other type of phosphorylation. We found out that some of core phosphorylation-sites are very dynamic and highly functional. Lastly, we performed same analysis in Karayel et al. paper which include phosphorylation regulation in cell division, and almost all core phosphorylation regulated throughout cell division are detected as dynamic.
Presenting author: Tandac GucluAbstract Collapse
Ligand switching mutations in PDZ domain explained by centrality of amino acids
Tandac Guclu, Canan Atilgan and Ali Rana Atilgan
Mutations occasionally affect protein structure and/or function, and these changes are important alterations in ligand specificity that may have significant consequences, such as emergence of antibiotic resistance or disruption in cell signaling. Here we study PDZ3 domain which has an important role in mammal neural cell signaling. PDZ domains construct the PSD-95 complex by binding CRIPT (ligand I) and T-2F (ligand II) ligands. Previously, its specific mutations have been demonstrated to display preferred ligand specificity: Wild-type(WT) protein has higher binding affinity to ligand I and G330T mutation binds to both ligands I/II while the H372A mutation and the G330T-H372A double-mutation(DM) tend to bind only to ligand II. To scrutinize the emergent structural features due to the mutations, we conducted network analyses on the snapshots from the 400-ns long molecular dynamics simulations. Then, we utilized betweenness centrality (BC) to find the nodes which act as hubs for information communication in biological function. ΔBC results show that the N-terminus has an impact on the formation of H372AL2. Furthermore, we employed Girvan-Newman algorithm to investigate the modularity of PDZ3 protein. The results indicate that N and C termini of the structure are in the same community, while N-terminus and the ligand tend to be located in the same community only in favorable WT and the single mutation cases. We explain how the changes of the residue centralities by perturbations introduced in the form of mutations lead to the ligand switching behavior in the PDZ domain, and discuss why this behavior is governed by N-terminus region.
Presenting author: Serhan YılmazAbstract Collapse
Robust inference of kinase activity using functional networks
Serhan Yılmaz, Marzieh Ayati, Daniela Schlatzer, A. Ercument Cicek, Mark Chance and Mehmet Koyuturk
Mass spectrometry enables high-throughput screening of phospho-proteins across a broad range of biological contexts. When complemented by computational algorithms, phospho-proteomic data allows the inference of kinase activity, facilitating the identification of dysregulated kinases in various diseases including cancer, Alzheimer’s disease and Parkinson’s disease. To enhance the reliability of kinase activity inference, we present a network-based framework, RoKAI, that integrates various sources of functional information to capture coordinated changes in signaling. Through computational experiments, we show that phosphorylation of sites in the functional neighborhood of a kinase are significantly predictive of its activity. The incorporation of this knowledge in RoKAI consistently enhances the accuracy of kinase activity inference methods while making them more robust to missing annotations and quantifications. This enables the identification of understudied kinases and will likely lead to the development of novel kinase inhibitors for targeted therapy of many diseases. RoKAI is available as web-based tool at http://rokai.io.
Computational Enzymology: The structure, function and evolution of enzymesInformation Collapse
Computational Enzymology: The structure, function and evolution of enzymes
We seek to understand how enzymes work and how they evolve to perform new enzyme functions using computational biology approaches. Almost all domains that perform catalysis have evolved to work on numerous substrates. We have generated family trees, showing the evolution of these new specificities, and gaining a broad overview of enzyme evolution as we know it today. But we find that each family is different and careful analysis is required to understand individual mechanisms. In this talk I will focus on three practical topics relevant to understanding enzyme catalysis:- (1) Analysis of basic catalytic machinery in proteins (2) A new way to estimate substrate transformations and find the most appropriate enzymes or pathways to transform a given substrate into a given product. (3) Towards predicting enzyme mechanisms These approaches will allow us to analyse complex enzyme families, their mechanisms and their evolution and maybe ultimately help in the design of new enzymes.
Professor Dame Janet Thornton is Director Emeritus of the European Bioinformatics Institute and a senior scientist at the European Bioinformatics Institute (EBI). Her research focuses on understanding protein structure and function, and their effects on disease and ageing using highly interdisciplinary approaches. Thornton's work contributed significantly to understanding protein three-dimensional structure. She is the author of popular tools and databases that are used in academia as well as pharmaceutical companies. Professor Thornton is a Fellow of the Royal Society, a Fellow of the Academy of Medical Sciences, a member of EMBO and a foreign associate of the US National Academy of Sciences.
Error-free and error-prone DNA repair shape mutation landscapes in human tumorsInformation Collapse
Error-free and error-prone DNA repair shape mutation landscapes in human tumor
Mutation rates in human somatic cells are highly heterogeneous, and this heterogeneity is variable across different resolutions, from the single-nucleotide scale to the megabase-sized chromosomal domain scale. We suggest that variable local activity of DNA repair pathways is a common factor underpinning mutation rate diversity across the human genome. A common principle is that gene-rich, active chromatin regions tend to be protected from mutagenesis. We also highlight examples where DNA repair can be co-opted by mutagens, resulting in mutational processes that are unusually impactful because they are directed towards active chromatin.
Fran Supek is an ICREA Research Professor at the Institute for Research in Biomedicine (IRB Barcelona, Spain), where he leads the Genome Data Science laboratory. His research focuses on large-scale statistical analyses of genomic, transcriptomic, and epigenomic data. Dr. Supek received his BS and MS (integrated) in biology from the University of Zagreb (Croatia), and his PhD in molecular biology from the University of Zagreb (Croatia). Then, he became a postdoctoral researcher at the Centre for Genomic Regulation (CRG Barcelona, Spain) as a Marie Curie fellow. Dr. Supek is an EMBO Young Investigator and the PI of the ERC Starting Grant in 2017.
Lineage tracing of human embryonic development and foetal haematopoiesis through somatic mutationsInformation Collapse
Lineage tracing of human embryonic development and foetal haematopoiesis through somatic mutations
To date, ontogeny of the human haematopoietic system during foetal development has been characterized mainly through careful microscopic observations. Here we used whole-genome sequencing (WGS) of 511 single-cell derived haematopoietic colonies from healthy human foetuses of 8 and 18 post-conception weeks (pcw) coupled with deep targeted sequencing of tissues of known embryonic origin to reconstruct a phylogenetic tree of blood development. We found that in healthy foetuses, individual haematopoietic progenitors acquire tens of somatic mutations by 18 pcw. Using these mutations as barcodes, we timed the divergence of embryonic and extra-embryonic tissues during development and estimated the number of blood antecedents at different stages of embryonic development. Our analysis has shown that ectoderm originates from a smaller set of blood antecedents compared to endoderm and mesoderm. Finally, our data support a hypoblast origin of the extra-embryonic mesoderm and primitive blood in humans.
Dr. Ana Cvejic is a Principle Investigator at the Department of Haematology, University of Cambridge and an Honorary Faculty member at the Sanger Institute. Her research focuses on the genes and mechanisms behind haematopoietic stem cell (HSC) differentiation. Dr. Cvejic received her MS in Molecular Biology and Physiology at University of Belgrade and her PhD in Biochemistry at University of Bristol. She then joined University of Cambridge/Wellcome Trust Sanger Institute for a postdoctoral fellowship. In 2012, Dr. Cvejic was awarded with a Cancer Research UK Career Development Fellowship. She was awarded with ERC Starting Grant in 2016 and EMBO Young Investigator Award in 2017.
Presenting author: Dilek KoptekinAbstract Collapse
New solutions to old problems: Mitigating data loss and bias in ancient genome data processing
Dilek Koptekin, Etka Yapar, Ekin Sağlıcan, Can Alkan and Mehmet Somel
DNA in ancient samples is highly fragmented due to decay after death, has exogenous contamination and contains a low amount of endogenous DNA. Consequently, ancient DNA processing usually involves studying genome data with <1x coverage, composed of short reads with frequent C-to-T transitions at their ends. These create two types of challenges. One is the inability to call full diploid genotypes. Solutions include pseudo-haploidization, and genotype likelihood methods. However, it has been observed that such ancient genome data is “reference biased”, i.e. contain more reference alleles than alternatives at heterozygous positions. This appears to be caused by loss of alternative allele-bearing reads due to their slightly lower mapping quality. Second challenge is to avoid confusing postmortem C-to-T transitions with authentic variation. The solution is to use variants identified in worldwide populations instead of de novo calls. Further, one may use only transversions, or both transversions and transitions but after trimming 2-10 nucleotides of read ends where postmortem damage accumulates. Unfortunately, the former approach means not using c.67% of SNP data, while the latter means losing up to 30% of data due to short read lengths. Here we propose solutions to mitigate these effects in ancient genome data preprocessing. The first addresses reference bias. We show that aligning read data to a graph genome, or aligning to a linear reference genome but after masking common polymorphic sites in the reference, effectively removes reference bias in ancient genotype data. The second involves avoiding postmortem damage effects and minimizing data loss. Here, instead of trimming read ends, we mask potential sites where the read’s genotype can be affected by postmortem cytosine deamination. Our primary analysis increases genotyping by 15% especially in the lowest coverage samples without compromising accuracy, thereby significantly boosting statistical power in downstream population genetics analyses.
Presenting author: Marzieh Eslami RasekhAbstract Collapse
Genotyping macro-satellites in the human population
Marzieh Eslami Rasekh and Gary Benson
Macrosatellite repeats (MSRs) are DNA patterns of 100 bp or longer that repeat tandemly throughout the genome. MSRs that change copy number are called variable number tandem repeats (VNTRs), which have been predicted to have biological effects and have been linked to diseases. However, MSRs have not been studied in a high-throughput fashion. Therefore, we have developed a computational tool named Macro-Satellites Using Depth (MaSUD) to genotype MSR loci in the human genome. To predict copy number changes, MaSUD compares the number of reads mapping inside each MSR locus to a background distribution of similarly simulated reads of the reference allele. The performance of MaSUD was demonstrated on simulated datasets (precision>90% and recall>50%) and validated using long PacBio reads (linear regression p-value<2e-16 and r2=0.55, correlation=74.76%).We ran MaSUD on 2,504 genomes from five super-populations of the 1000 Genomes Project using 3,875 reference MSR loci. MaSUD predicted that >95% of these MSRs have a copy number variant in at least one individual and that, on average, a locus was variant in 1,457 individuals. A total of 2,512 VNTRs overlapped with 1,190 genes that were enriched in pathways related to cancer, diabetes, neuron differentiation, and neurogenesis.To identify VNTRs affecting gene expression, we compared the mean B-cell mRNA expression levels from 448 individuals using probes overlapping VNTRs (t-test, FDR<5%). Expression of 84 genes was significantly correlated with the corresponding VNTR allele. Top genes correlated with VNTRs include FANCA, AMFR, SPG7, INPP5E, DPYSL4, GPR35, PIGN, PEX5, PRPF6, EXOC2, MXRA7, and LRCH3. Alternative Splicing was among the UniProt keywords enriched for these genes (FDR=5e-3). In addition, unsupervised clustering shows that VNTRs separate human super-populations, and using a Random Forest model we could predict ancestry with 78% accuracy. This represents the first high-throughput analysis of macrosatellites in humans.
How novel sensors arise in bacteriaInformation Collapse
How novel sensors arise in bacteria
Bacteria possess various receptors that monitor changes in the environment and help adjusting cellular functions accordingly. The number of receptors in bacterial genomes varies significantly and different bacterial species seem to evolve unique receptors. Where do these novel receptors come from? Here, I will consider several basic mechanisms for their birth, using bacterial chemoreceptors as a model. In the first example, I will show how gene duplication serves as a foundation for evolving novel ligand specificities, while preserving conserved signaling determinants. In the second example, I will show how two proteins come together to form a novel receptor in a plug-and-play fashion. In the third example, I will demonstrate how a single alpha-helix insertion transformed a classical ligand-binding domain into a redox sensor. Finally, these three examples will be placed into a wider concept of molecular evolution of sensing and signaling in bacteria.
Igor B. Jouline is the Rod Sharp Endowed Professor in the Department of Microbiology and a Co-Director of Computational Health and Life Sciences at The Ohio State University Translational Data Analytics Institute. His research focuses on the evolution of genes, including those implicated in various diseases. Dr. Jouline was a Wellcome Trust Fellow at the University of Oxford (United Kingdom) and a Distinguished Staff Member at the US Department of Energy Oak Ridge National Laboratory. He served as a Chair of the International Odysseus Jury (Belgian Research Foundation) and several review panels at the National Institutes of Health. He was elected to the American Academy of Microbiology in 2017 and became a Fellow of the American Association for the Advancement of Science in 2019. Dr. Jouline received his BS in biology and MS in biophysics from Saratov State University (Russia), and his PhD in microbiology from St. Petersburg State University (Russia).
Proximity-based proteomics uncovers new mechanisms underlying rare developmental disordersInformation Collapse
Proximity-based proteomics uncovers new mechanisms underlying rare developmental disorders
The centrosome/cilium complex is required for key functions ranging from cell division to cellular signaling and its deregulation causes various human diseases including cancer and ciliopathies. To elucidate disease mechanisms, it is essential to determine how centrosomes and cilia assemble and function in time and in space. To this end, we used innovated proximity-based proximity-based labeling approaches and generated spatial and temporal interaction maps for functional modules mutated in various ciliopathies including retinal degeneration and Joubert syndrome. Functional and molecular characterization of these functional modules identified CCDC66 as a new component of the centrosome/cilium complex that regulates cilium biogenesis, ciliary transport and ciliary signaling. Using live imaging and FRAP experiments, we identified centriolar satellites as regulators of centrosomal and ciliary targeting of CCDC66. Strikingly, CCDC66-positive satellites underwent frequent splitting and fusion events, analogous to the dynamic behavior of liquid-like membrane-less compartments. Finally, we showed that CCDC66 directly binds to microtubules and localizes to the microtubule-based structures of the cilium, suggesting a link between them and ciliopathy disease mechanisms. Taken together, our results uncovered new mechanisms that underlie ciliopathies and showed the power of proximity proteomics in discovery-based research.
Elif Nur Fırat Karalar currently works as an Assistant Professor in the Department of Molecular Biology and Genetics at Koç University (Istanbul, Turkey). Her studies mainly focus on cell biology, cancer research, developmental biology including biology of centrosomes, cilia and microtubules. Previously, she worked as a post-doctoral scholar in Department of Biology in Stanford University (United States) and as a graduate student in Department of Molecular and Cell Biology in University of California (United States). Dr. Fırat-Karalar was awarded with Women In Science Award by LOREAL UNESCO in 2015 and was selected as one of the 27 EMBO Young Investigators in 2019, where she became the first scientist to be selected to the EMBO YIP program from Turkey. She is also a member of The American Society for Cell Biology (2008-present) and Molecular Biology Association of Turkey (2014-present). Dr. Fırat-Karalar received her BS in Molecular Biology and Genetics from Bilkent University (Turkey), and her PhD in Molecular and Cell Biology from University of California (United States).
Presenting author: Tülay KarakulakAbstract Collapse
Pathogenic impact of transcript isoform switching in 1209 cancer samples covering 27 cancer types using an isoform-specific interaction network
Tülay Karakulak, Abdullah Kahraman, Damian Szklarczyk and Christian von Mering
Alternative splicing regulation is often disturbed in various cancers leading to cancer-specific switches in the Most Dominant Transcripts (cMDT). To understand how these switches drive oncogenesis, we have analyzed isoform-specific protein interaction disruptions in the Pan-Cancer Analysis of Whole Genomes (PCAWG) project. Our study identified large variations in the number of cMDT with the highest frequency in cancers of female reproductive organs. Surprisingly, in contrast to the mutational load, cancers arising from the same primary tissue showed similar numbers of cMDT. Some cMDT were found in almost all samples of a cancer type rendering them as ideal diagnostic biomarkers. Other cMDT tended to be located at densely populated protein network regions disrupting interactions next to pathogenic cancer gene products in enzyme signalling, protein translation, and RNA splicing pathways. The highlighted common and distinct patterns of alternative splicing deregulations constitute new avenues for novel therapeutic targets in the fight against cancer.
Presenting author: Gökçe SengerAbstract Collapse
Integrated analysis of transcriptomic and proteomic data to understand the effect of aneuploidy on cancer genomes
Gökçe Senger and Martin Schaefer
Aneuploidy, whole chromosomal or chromosome arm level changes, is a hallmark of human cancer cells, but its role in cancer still remains to be fully elucidated. In this work, we focus on developing an understanding of how cancer cells deal with the excess amount of expression at both transcriptome and proteome level induced by chromosome gains, and how the excess expression affects protein complex stoichiometry. For 298 tumor samples, for which we have aneuploidy, transcriptomic and proteomic data made available by TCGA and CPTAC consortia, we first identified cancer-type specific chromosomes that are altered at higher frequencies than would be expected by chance. Then we profiled transcriptomic changes in response to chromosome number changes. To our surprise, we found that a relatively small number of genes on the aneuploid chromosomes changed expression while many expression changes happened on other chromosomes. Those differentially expressed genes on other chromosomes often form complexes and, even more, are often in the same complexes as differentially expressed genes on aneuploid chromosomes. These observations are even more pronounced on proteome level. To further investigate the differential co-regulation between co-complex members, we calculated protein level correlations between proteins of aneuploid chromosomes and their partner proteins of other chromosomes. We found that proteins involved in a smaller number of complexes have stronger correlations with their partners, highlighting the importance of compensation for stoichiometric imbalance in protein complexes. Aggregation-prone complex members also show stronger expression correlations suggesting that proteotoxicity of unpaired complex members make this compensation necessary. Our ongoing efforts focus on deciphering the regulatory control of gene expression of complex members (both on transcriptome and proteome level) to understand the molecular mechanisms of cancer cell adaptation to aneuploidy.
Translational BioinformaticsInformation Collapse
Advancements in omics technologies facilitated studying of complex diseases to reveal individual mechanisms of disease development and identification of personalized therapy targets. Bioinformatics methods developed tend to maximize model performance metrics and most frequently result in developing models that do not make sense clinically that's why they are not used in daily practice in medicine though they are useful to understand mechanisms at play in disease etiology. There is an imminent demand by physicians to translate these AI models into daily use that's why bioinformaticians need to develop models that make sense to clinicians so that they can be easily adopted in their daily practice. In this presentation, I will briefly talk about Bioinformatics methods that are used to develop models in studying complex diseases and give examples on how to modify them to develop clinically actionable models.
Uğur Sezerman graduated from Bogazici University and he holds a Ph.D. from Boston University, Biomedical Engineering. He is specialized in structural bioinformatics, protein engineering, computational genome analysis, personalized medicine and systems biology. He gave lectures and continued research at Boston University and Bogazici University. He has worked at Sabanci University where he has established the Computational Biology Laboratory and Protein Engineering Laboratory from September 1999 till March 2015. Currently, he is working as a researcher and an instructor at Acibadem University.
Presenting author: Tunca DoganAbstract Collapse
Heterogeneous COVID-19 knowledge graphs in comprehensive resource of biomedical relations (CROssBAR) system
Tunca Dogan, Heval Ataş, Vishal Joshi, Ahmet Atakan, Ahmet Süreyya Rifaioğlu, Esra Nalbat, Andrew Nightingale, Rabie Saidi, Vladimir Volynkin, Hermann Zellner, Rengul Atalay, Maria Martin and Volkan Atalay
Systemic analysis of available biological/biomedical data is critical for developing novel and effective treatment approaches against both complex diseases and rapidly emerging outbreaks (e.g., COVID-19). Owing to the fact that different sections of the biomedical data are produced by different organizations/institutions using various technologies, the data is scattered across individual resources without any explicit relations/connections, hindering comprehensive multi-omics-based analysis. We aimed to address this issue by constructing a comprehensive biological/biomedical resource, CROssBAR, with large-scale data integration from various data sources, enriching this data with deep learning-based prediction of relations, and its presentation via cutting-edge knowledge graph (KG) representations in our open-access web-service at https://crossbar.kansil.org. Starting from late 2019, the new coronavirus pandemic has wreaked havoc and brought along nearly 850K deaths. Systemic evaluation of the current biomedical knowledge about SARS-CoV-2 infection is expected aid researchers in developing effective drugs and vaccines. With the aim of contributing to this endeavor, we have constructed two COVID-19 KGs (https://crossbar.kansil.org/covid_main.php) using the CROssBAR system; (i) large-scale version including the entirety of related information on various CROssBAR-integrated resources, and (ii) simplified version distilled to include only the most relevant terms, ideal for fast interpretation. CROssBAR COVID-19 KGs incorporate relevant virus and host genes/proteins, interactions, pathways, phenotypes and other diseases, as well as drugs/compounds, some of which are new. These new drugs have been incorporated to the KGs either due to our network analysis-based pipeline or predicted by our deep-learning-based tools. We conducted a literature-based validation study and found that many of these drugs are now being experimented at preclinical/clinical stages against COVID-19. It is interesting to observe direct/indirect relations between the phenotypes/diseases in the KGs and COVID-19 over the incorporated host genes/proteins and enriched pathways, and between COVID-19 and our computationally predicted drugs/compounds, as they may reveal further evidence to be utilized against this disease.
Presenting author: Ilyes BaaliAbstract Collapse
DriveWays: A method for identifying possibly overlapping driver pathways in cancer
Ilyes Baali, Cesim Erten and Hilal Kazan
The majority of the previous methods for identifying cancer driver modules output non-overlapping modules. This assumption is biologically inaccurate as genes can participate in multiple molecular pathways. This is particularly true for cancer associated genes as many of them are network hubs connecting functionally distinct set of genes. It is important to provide combinatorial optimization problem definitions modeling this biological phenomenon and to suggest efficient algorithms for its solution. We provide a formal definition of the Overlapping Driver Module Identification in Cancer (ODMIC) problem. We show that the problem is NP-hard. We propose a seed-and-extend based heuristic named DriveWays that identifies overlapping cancer driver modules from the graph built from the IntAct PPI network. DriveWays incorporates mutual exclusivity, coverage, and the network connectivity information of the genes. We show that DriveWays outperforms the state-of-the-art methods in recovering well-known cancer driver genes performed on TCGA pan-cancer data. Additionally, DriveWay’s output modules show a stronger enrichment for the reference pathways in almost all cases. Overall, we show that enabling modules to overlap improves the recovery of functional pathways filtered with known cancer drivers, which essentially constitute the reference set of cancer-related pathways. The data, the source code, and useful scripts are available at: https://github.com/abucompbio/DriveWays.
Presenting author: Yeşim Aydın SonAbstract Collapse
Validation of LOAD-RF-RF selected risk SNVs for the early and differential diagnosis of Alzheimer’s disease
Sevda Rafatov, Hüseyin Cahit Burduroğu, Yavuzhan Çakır, Onur Erdoğan, Cem İyigün and Yeşim Aydın Son
Late-Onset Alzheimer’s Disease (LOAD) is the most common type of dementia in the aging populations, characterized by deterioration of memory and other cognitive domains. The complex genetic etiology of the LOAD is still unclear, which restrains the early and differential diagnosis of LOAD. Genome-Wide Association Studies (GWAS) allows exploration of the statistical interactions of individuals variants, but the univariate analysis oversees interactions between variants. The machine learning algorithms can capture hidden, novel, and significant patterns considering nonlinear interactions between variants for the understanding of the genetic predisposition for the complex genetic disorders, where multiple variants determine the risk. We developed in-silico LOAD models based on genotyping data from three different datasets from ADNI and dbGAP initiatives, through controlled access. GWAS datasets provided by ADNI (210 controls and 344 cases), and GenADA (777 controls and 798 cases), and NCRAD by dbGaP (1310 controls and 1289 cases) are analyzed. In the first step, GenADA, NCRAD, and ADNI datasets analyzed independently, and after preprocessing, PLINK is used for GWAS and followed by p-value filtering for the initial dimension reduction. For each dataset, two-step Random Forest (RF) is implemented with 5-fold cross-validation (CV) using the RANGER R package after GWAS with PLINK. Test performances of LOAD-RF models of ADNI, NCRAD, and GenADA datasets were 72,9%, 68,8%, and 92,4%, respectively. 390 SNVs from ADNI, 1740 from NCRAD, and 434 from GenADA datasets selected by the individual LOAD-RF models considering permutation importance of variants at 95% confidence. There were no consensus variants, but 62 genes common in at least two datasets are identified. Additionally, six genes were common in all 3 LOAD-RF models is identified. The test performances of LOAD-RF-RF models of ADNI, NCRAD and GenADA datasets were 74,0%, 72,1%, and 85,1% respectively. 32 SNVs from ADNI, 581 from NCRAD, and 107 from GenADA datasets selected by the individual LOAD-RF-RF models considering permutation importance of variants at 95% confidence. The LOAD-RF-RF analysis identified the SNVs that are highly significant and six SNVs are selected for experimental validation with pyrosequencing. Initially, we have genotyped 41 LOAD patients for the SPOCK1 variant and observed the minor allele frequency as 0.317 , which is significantly higher than the expected global minor allele frequency of 0.154. The experimental validation of the rest of the LOAD-RF-RF selected risk variants is still ongoing. SNVs identified and validated in this study will be utilized for the development of a genotyping kit for the early and differential diagnosis of LOAD. The kit will support the clinician’s decision in the early and differential diagnosis of LOAD and benefit the patients and their families for the planning of the treatment and support strategies.
Presenting author: Handan Melike DonertasAbstract Collapse
Age-related diseases share common genetic associations
Handan Melike Donertas, Daniel K Fabian, Matias Fuentealba Valenzuela, Linda Partridge and Janet M. Thornton
Ageing is the major risk factor for many diseases. With the rise in life expectancy, the overall burden of ageing-related diseases increases. The molecular link between ageing and age-related diseases, however, remains elusive. In this study, we test whether diseases with similar age-of-onset share a genetic component that is also implicated in ageing. We perform GWAS on UK Biobank data, which includes genomic, medical and lifestyle measures for almost half a million participants. Our analysis comparing 116 diseases suggested four disease clusters defined by their age-of-onset. We found that diseases with the same onset profile are genetically more similar, suggesting a common aetiology. Moreover, this similarity cannot be explained by disease categories (e.g. cardiovascular, endocrine), co-occurrences, or disease cause-effect relationships. Two of the clusters showed an age-dependent profile, starting to increase in prevalence after the age of 20 and 40 years. These clusters had genetic risk factors associated with senescence regulators and targets of the pro-longevity drugs. However, they had distinct functional enrichment and risk allele frequency distributions. We also tested predictions of mutation accumulation and antagonistic pleiotropy theories of ageing and found support for both. We are now working on a drug repurposing approach to find drugs targeting the common genetics between age-related diseases. This approach has the potential to identify drugs targeting multiple diseases simultaneously and alleviate the effects of multimorbidity and polypharmacy in late ages.
Presenting author: Nour AlserrAbstract Collapse
Inference Attacks Against Differentially-Private Query Results from Genomic Datasets Including Dependent Tuples
Nour Alserr, Erman Ayday and Ozgur Ulusoy
Thanks to the fast-paced throughput sequencing technologies which result in a large-scale datasets and biobanks. The number of sequenced human genomes has been increasing at an exponential rate, and now we are at about 2.5 million sequenced genomes around the world. This is projected to reach 105 million and this number can reach a lot more in 2025, especially after the COVID 19 pandemic, where many countries decide to study the genomic data in a population scale. These rich troves of data can empower the scientific advances. However, according to the sensitive nature of the genetic information, sharing the genomic datasets which include sensitive genetic or medical information for individuals can be misused if it lands in the wrong hands. Hence, for the hope of sharing the genomic dataset to gain better understanding of the human genetics, differential privacy (DP) is one of the privacy concepts proposed for sharing the summary statistics of genomic datasets in a private manner. DP mechanism provides a rigorous mathematical foundation for preserving privacy, but it does not consider the dependency of the data tuples in the dataset, which is a common situation for genomic datasets due to the inherent correlations between genomes of family members. We show how kin relationships between individuals in a genomic dataset cause a significant reduction in the privacy guarantees of traditional DP-based mechanisms. We formulate this as an attribute inference attack and show the privacy loss using differentially-private results of minor allele frequency (MAF) and chi-square queries over two real-life genomic datasets. Our results show that using the results of differentially-private MAF queries and exploiting the dependency between tuples, an adversary can reveal up to 50% more sensitive information about the genome of a target (compared to original privacy guarantees of standard DP-based mechanisms), while differentially-privacy chi-square queries can reveal up to 40% more sensitive information. Furthermore, we show that these inferred genomic records (as a result of the attribute inference attack) can be utilized to perform successful membership inference attacks to other statistical genomic datasets (e.g., associated with a sensitive trait). Using a log-likelihood-ratio (LLR) test, our results also show that the inference power of the adversary can be significantly high in such an attack even by using inferred (and hence partially incorrect) genomes. This work presented at the 28th conference of Intelligent Systems for Molecular Biology (ISMB2020). The full paper is available at: https://doi.org/10.1093/bioinformatics/btaa475
Presenting author: Arda EskinAbstract Collapse
Cardiac atrial transcriptomic landscaping reveals defects in various pathways in patients with ischemic heart disease or heart failure
Arda Eskin, Severi Mulari, Nurcan Tunçbağ and Esko Kankuri
Ischemic heart disease (IHD), causing high morbidity and mortality, continues to be the leading cause of death worldwide. In this study, samples of right atrial appendage were collected for transcriptomic profiling from 40 patients with IHD undergoing elective coronary artery bypass grafting (CABG) surgery. Additionally, 8 samples from patients with solitary valvular disease undergoing corrective valvular surgery were harvested to serve as controls. Clinical and follow-up data including medication, laboratory measurements are also collected for each patient. We obtained the transcriptomic data of healthy right atrial appendage tissue from GTEx (n = 429). Our aim in this study is to find novel associations and genes related to IHD and to predict the risk of having IHD by integrating transcriptomic, clinical and interactome data. We found 357 upregulated and 310 downregulated DEGs in IHD samples compared to healthy tissues (FDR < 0.05 and |logFC|>2). Among these, genes from protocadherin gamma subfamily were found to be significantly different between patient group who has an ejection fraction lower than 55% which represents the percentage of blood leaving the heart each time it contracts. (p value < 0.05). We inferred the most critical pathways from the list of DEGs and found that agrin interactions at neuromuscular junction, epithelial adherens junction signaling, sirtuin signaling and oxidative phosphorylation are significantly enriched. DEGs associated with oxidative phosphorylation are downregulated. Additionally, functional analysis of miRNAs and their targets that have significantly different expression values between patient groups, resulted with the enrichment of lipid metabolism. Overall, our results provide a transcriptome level understanding into processes reactive to IHD and the association of gene level data to phenotypic information.
Identifying effector genes of human GWAS variants by INFIMAInformation Collapse
Identifying effector genes of human GWAS variants by INFIMA
Genome-wide association studies (GWAS) have revealed many non-coding single nucleotide variants that are statistically associated with complex traits and diseases. However, the effector genes through which these disease risk variants mediate their effects are largely unknown. Recently, transcriptome-wide association studies that leverage reference transcriptomes have emerged as a powerful tool for revealing candidate disease-related genes when disease-relevant reference transcriptomes exist. However, they lack the ability to link risk variants to their effector genes. Surprisingly, model organism studies have also largely remained as an untapped resource for unveiling such effector genes. A recent well-powered expression quantitative locus (eQTL) study in islets from Diversity Outbred (DO) mice identified thousands of eQTLs; however, it lacked the resolution to pinpoint causal single nucleotide variants and the regulatory mechanisms responsible for the wide range in susceptibility to diabetes due to high linkage disequilibrium. To address this bottleneck and leverage eQTLs derived from DO mice for elucidating effector genes of human GWAS variants, we developed a statistical data integration model, INFIMA for Integrative Fine-Mapping with Model Organism Multi-Omics Data. INFIMA capitalizes on multi-omics data modalities such as chromatin accessibility and transcriptome from the eight DO mice founder strains to fine-map DO islet eQTLs. In addition, INFIMA employs footprinting and in silico mutation analysis to reveal regulatory genetic variants that mediate strain-specific expression differences. We applied INFIMA to identify novel effector genes in pancreatic islets for human GWAS variants associated with diabetes. We computationally validated INFIMA predictions with high-resolution chromatin capture data sets from mouse and human islets. Our results demonstrate that INFIMA is a powerful method for leveraging model organism multi-omics data to identify candidate effector genes of non-coding human GWAS variants and performs better than baseline alternatives.
Sunduz Keles is Professor of Statistics and of Biostatistics and Medical Informatics at the University of Wisconsin, Madison and her team are also affiliated with the Computation and Informatics in Biology and Medicine (CIBM) Program and Genome Center of Wisconsin. Professor Keles's research interests broadly include statistical genomics with specific focus on high dimensional genomic, high throughput sequencing experiments and biomedical data. Their reseach concerns developing and applying statistical methods for problems and the fundamentals of gene regulation in development and disease that arise in genome biology. She received her Ph.D. in Biostatistics of University of California-Berkeley in 2003. In 2007, Professor Keles was awarded with PhRMA Foundation Award in Informatics and H.I. Romnes Faculty Fellowship in 2012.