Universidade de Lisboa Repositório da Universidade de Lisboa

Repositório da Universidade de Lisboa >
Faculdade de Ciências (FC) >
FC - Teses de Doutoramento >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10451/7159

Título: Automated knowledge extraction from protein sequence
Autor: Faria, Daniel Pedro de Jesus,1981-
Orientador: Falcão,André Osório e Cruz de Azerêdo,1969-
Ferreira,António Eduardo do Nascimento,1964-
Palavras-chave: Bioinformática
Teses de doutoramento - 2012
Issue Date: 2012
Resumo: Efficient and reliable prediction of protein functions based on their sequences is one of the standing problems in genetics and bioinformatics, as experimental methods to determine protein function are unable to keep up with the rate at which new sequences are published. The function of a protein is conditioned by its three-dimensional structure, which is deeply tied to the sequence, but we cannot yet model this information with sufficient reliability to make de novo protein function predictions. Thus, protein function predictions are necessarily comparative. The most common approaches to protein function prediction rely on sequence alignments and on the assumption that proteins of similar sequence have evolved from a common ancestor and thus should perform similar functions. However, cases of divergent evolution are relatively common, and can lead to prediction errors from these approaches. Machine learning approaches not involving sequence alignments methods have also been applied to protein function prediction. However, their application has been mostly restricted to predicting generic functional aspects of proteins. My thesis is that it is possible to extract suficient information from protein sequences to make reliable detailed function predictions without the use of sequence alignments, and therefore develop machine learning approaches that can compete in general with alignment-based approaches. To prove this thesis, I developed and evaluated multiple machine learning approaches in the context of detailed function prediction. Several of these approaches were able to compete with alignmentbased classiffiers in precision, and two outperformed them notably in small classiffication problems. The main contribution of my work was the discovery of the informativeness of tripeptide subsequences. The tripeptide composition of protein sequences not only led to the most precise classification of all approaches tested, but also was suficiently informative to measure similarity between proteins directly, and compete with sequence alignments.
URI: http://hdl.handle.net/10451/7159
Appears in Collections:FC - Teses de Doutoramento

Files in This Item:

File Description SizeFormat
ulsd_RE1200_td.pdf806,8 kBAdobe PDFView/Open

Please give feedback about this item
FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpaceOrkut
Formato BibTex mendeley Endnote Logotipo do DeGóis 

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.


  © Universidade de Lisboa / SIBUL
Alameda da Universidade | Cidade Universitária | 1649-004 Lisboa | Portugal
Tel. +351 217967624 | Fax +351 217933624 | repositorio@reitoria.ul.pt - Feedback - Statistics
Promotores do RCAAP   Financiadores do RCAAP

Fundação para a Ciência e a Tecnologia Universidade do Minho   Governo Português Ministério da Educação e Ciência PO Sociedade do Conhecimento (POSC) Portal oficial da União Europeia