Refine
Departments, institutes and facilities
Document Type
- Bachelor Thesis (2)
- Master's Thesis (2)
- Doctoral Thesis (1)
Has Fulltext
- no (5)
Keywords
- chemoCR (2)
- Datenbank (1)
- Hibernate (1)
- Markush (1)
- context free grammar (1)
- extSMILES (1)
- patent search (1)
- structure reconstruction (1)
RNA is one of the most important molecules in living organisms. One of its main functions is to regulate gene expression. This involves binding to and forming a joint structure with a messenger RNA. An RNAs functions is determined by its sequence and the structure it folds into. Accordingly, the prediction of individual as well as joint structures is an important area of research. In this thesis a method for the prediction of RNA-RNA joint structure using their minimum free energy (mfe) structures was developed. It is able to extensively explore the joint structural landscape of two interacting RNAs by taking advantage of the locality of changes in the RNAs structures as well as natural and energetic constraints. The method predicts the mfe joint structure as well as alternative stable joint structures while also computing non-optimal folding pathways from the unbound individual mfe structures to the predicted joint structures. It is shown how an enumeration approach is used which is able to deal with the enormous search space as well as to avoid any cyclic behaviour. The method is evaluated using two standard datasets of known interacting RNAs and shows good results.
Today publications are digitally available which enables researchers to search the text and often also the content of tables. On the contrary, images cannot be searched which is not a problem for most fields, but in chemistry most of the information are contained in images, especially structure diagrams. Next to the "normal" chemical structures, which represent exactly one molecule, there also exist generic structures, so called Markush structures. These contain variable parts and additional textual information which enable them to represent several molecules at once. This can vary between just a few and up to thousands or even millions. This ability lead to a spread of Markush structures in patents, because it enables patents to protect entire families of molecules at once. Next to the prevention of an enumeration of all structures it also has the advantage that, if a Markush structure is used in a patent, it is much harder to determine whether a specific structure is protected by it or not. To solve the question about the protection of a structure, it is necessary to search the patents. Appropriate databases for this task already do exist, but are filled manually. An automatic processing does not yet exist. In this project a Markush structure reconstruction prototype is developed which is able to reconstruct bitmaps including Markush structures (meaning a depiction of the structure and a text part describing the generic parts) into a digital format and save them in the newly developed context-free grammar based file format extSMILES. This format is searchable due to its context-free grammar based design. To be able to develop a Markush structure reconstruction prototype, an in depth analysis of the concept of Markush structures and their requirements for a reconstruction process was performed. Thereby it is stated, that the common connection table concept of the existing file formats is not able to store Markush structures. Especially challenging are conditions for most of the formats. Thus, a context-free grammar based file format is developed, which extends the SMILES format. This extSMILES called format assures the searchability of the results by its context-free grammar based concept, and is able to store all information contained in Markush structures. In addition it is generic, extendable and easily understandable. The developed prototype for the Markush structure reconstruction uses extSMILES as output format and is based on the chemical structure recognition tool chemoCR and the Unstructured Information Management Architecture UIMA. For chemoCR modules are developed which enable it to recognize and assemble Markush structures as well as to return the reconstruction result in extSMILES. For UIMA on the other hand, a pipeline is developed, which is able to analyse and translate the input text files to extSMILES. The results of both tools then are combined and presented in chemoCR. An evaluation of the prototype is performed on a representative set of twelve structures of interest and low image quality which contain all typical Markush elements. Trivial structures containing only one R-group are not evaluated. Due to the challenging nature of the images, no Markush structure could be correctly reconstructed. But by regarding the assumption, that R-group definitions which are described by natural language are excluded from the task, and under the condition that the core structure reconstruction is improved, the rate of success can be increased to 58.4%.
Grid services will form the base for future computational Grids. Web Services, have been extended to build Grid services. Grid Services are dened in the Open Grid Service Architecture (OGSA). The Globus Alliance has released a Web Service Resource Framework, which is still under development and which is still missing vital parts. One of them is a Concept that allows Grid-Service Requests to securely traverse Firewalls, and its realization. This Thesis aims at the development and realization of a detailed Concept for an Application Level Gateway for Grid services, based on an existing rough concept. This approach should enable a strict division between a local network and the Internet. The internet is considered as a untrusted site and the local network is considered as a trusted site. Grid resources are placed in the internet as well as in the local network. This means that the possibility to communicate through a Firewall is essential. Some further protocols like Grid Resource Allocation and Management (GRAM) and the Grid File Transfer Protocol (GridFTP) must be able to traverse the network borders securely as well, while no further actions must be taken from the user side. The German Federal Oce for Information Security (BSI) proposes a Firewall - Application Level Gateway (ALG) - Firewall solution to the German Aerospace Center (DLR) where this Thesis is written, as a principle approach. In this approach, the local network is divided from the Internet with two rewalls. Between those rewalls is a demilitarized zone (DMZ), where computers may be placed, which can be accessed from the Internet and from the local network. An ALG which is placed in this DMZ should represent the local Grid nodes to the Internet and it should act as a client to the local nodes. All Grid service requests must be directed to the ALG instead of the protected Grid nodes. The ALG then checks and validates the requests on the application level (OSI layer 7). Requests that pose no security threat and fulll certain criteria will then be forwarded to the local Grid nodes. The responses from the local Grid nodes are checked and validated by the ALG as well.
The initially large number of variants is reduced by applying custom variant annotation and filtering procedures. This requires complex software toolchains to be set up and data sources to be integrated. Furthermore, increasing study sizes subsequently require higher efforts to manage datasets in a multi-user and multi-institution environment. It is common practice to expect numerous iterations of continuative respecification and refinement of filter strategies, when the cause for a disease or phenotype is unknown. Data analysis support during this phase is fundamental, because handling the large volume of data is not possible or inadequate for users with limited computer literacy. Constant feedback and communication is necessary when filter parameters are adjusted or the study grows with additional samples. Consequently, variant filtering and interpretation becomes time-consuming and hinders a dynamic and explorative data analysis by experts.