Prof. Dr. Ralf Thiele
Refine
Departments, institutes and facilities
Document Type
- Article (26)
- Conference Object (6)
Year of publication
Keywords
- 16S rRNA gene sequencing (1)
- Acute lymphoblastic leukemia (1)
- Algorithms (1)
- Amino Acid Sequence (1)
- B-cell lymphoma (1)
- Bacteria, Anaerobic (1)
- CIBERSORT (1)
- CREBBP (1)
- Cervical cancer screening (1)
- Cervicovaginal microbiome (1)
We propose a new alignment procedure that is capable of aligning protein sequences and structures in a unified manner. Recursive dynamic programming (RDP) is a hierarchical method which, on each level of the hierarchy, identifies locally optimal solutions and assembles them into partial alignments of sequences and/or structures. In contrast to classical dynamic programming, RDP can also handle alignment problems that use objective functions not obeying the principle of prefix optimality, e.g.\ scoring schemes derived from energy potentials of mean force. For such alignment problems, RDP aims at computing solutions that are near-optimal with respect to the involved cost function and biologically meaningful at the same time. Towards this goal, RDP maintains a dynamic balance between different factors governing alignment fitness such as evolutionary relationships and structural preferences. As in the RDP method gaps are not scored explicitly, the problematic assignment of gap cost parameters is circumvented. In order to evaluate the RDP approach we analyse whether known and accepted multiple alignments based on structural information can be reproduced with the RDP method. For this purpose, we consider the family of ferredoxins as our prime example. Our experiments show that, if properly tuned, the RDP method can outperform methods based on classical sequence alignment algorithms as well as methods that take purely structural information into account.
MOTIVATION: The genome projects produce a wealth of protein sequences. Theoretical methods to predict possible structures and functions are needed for screening purposes, large-scale comparisons and in-depth analysis to identify worthwhile targets for further experimental research. Sequence-structure alignment is a basic tool for the identification of model folds for protein sequences and the construction of crude structural models. Empirical contact potentials (potentials of mean force) are used to optimize and evaluate such alignments. RESULTS: We propose new scoring schemes based on a contact definition derived from Voronoi decompositions of the three-dimensional coordinates of protein structures. We demonstrate that Voronoi potentials are superior to pure distance-based contact potentials with respect to recognition rate and significance for native folds. Moreover, the scoring scheme has the potential to provide a reasonable balance of detail and ion such that it is also useful for the recognition of distantly related (both homologous and non-homologous) proteins. This is demonstrated here on a set of structural alignments showing much better correspondence of native and model scores for the Voronoi potentials as compared to conventional distance-based potentials.
The automated annotation of data from high throughput sequencing and genomics experiments is a significant challenge for bioinformatics. Most current approaches rely on sequential pipelines of gene finding and gene function prediction methods that annotate a gene with information from different reference data sources. Each function prediction method contributes evidence supporting a functional assignment. Such approaches generally ignore the links between the information in the reference datasets. These links, however, are valuable for assessing the plausibility of a function assignment and can be used to evaluate the confidence in a prediction. We are working towards a novel annotation system that uses the network of information supporting the function assignment to enrich the annotation process for use by expert curators and predicting the function of previously unannotated genes. In this paper we describe our success in the first stages of this development. We present the data integration steps that are needed to create the core database of integrated reference databases (UniProt, PFAM, PDB, GO and the pathway database Ara-Cyc) which has been established in the ONDEX data integration system. We also present a comparison between different methods for integration of GO terms as part of the function assignment pipeline and discuss the consequences of this analysis for improving the accuracy of gene function annotation. The methods and algorithms presented in this publication are an integral part of the ONDEX system which is freely available from http://ondex.sf.net/.
The reciprocal translocation t(12;21)(p13;q22), the most common structural genomic alteration in B-cell precursor acute lymphoblastic leukaemia in children, results in a chimeric transcription factor TEL-AML1 (ETV6-RUNX1). We identified directly and indirectly regulated target genes utilizing an inducible TEL-AML1 system derived from the murine pro B-cell line BA/F3 and a monoclonal antibody directed against TEL-AML1. By integration of promoter binding identified with chromatin immunoprecipitation (ChIP)-on-chip, gene expression and protein output through microarray technology and stable labelling of amino acids in cell culture, we identified 217 directly and 118 indirectly regulated targets of the TEL-AML1 fusion protein. Directly, but not indirectly, regulated promoters were enriched in AML1-binding sites. The majority of promoter regions were specific for the fusion protein and not bound by native AML1 or TEL. Comparison with gene expression profiles from TEL-AML1-positive patients identified 56 concordantly misregulated genes with negative effects on proliferation and cellular transport mechanisms and positive effects on cellular migration, and stress responses including immunological responses. In summary, this work for the first time gives a comprehensive insight into how TEL-AML1 expression may directly and indirectly contribute to alter cells to become prone for leukemic transformation.