Sequence gazing

Sequence gazing is looking at a protein sequence, or an alignment of sequences, to find features that may be helpful in the interpretation of function, even in the absence of detectable sequence homology. Interesting features include signal peptides and transmembrane domains, ligand-binding motifs, invariant residues, low-complexity regions, and cleavage and modification sites. A combination of telltale features, together with contextual and comparative genomics information, may conform to a known bioinformatics grammar and therefore may provide clues to function.

The term sequence gazing is slightly pejorative, as it describes an implicitly (but not necessarily) manual annotation process, when it is clear that annotation processes must become automated. An obvious danger, besides the time cost, is over-interpretation or misinterpretation of small samples of sequence. Sequence gazing does occasionally lead to impressive new insights, as in discovery of the leucine zipper based on a periodicity that implicated alpha-helical structures in transcription factor dimerization. Once new types of feature have been discovered and interpreted, responsibility for finding those featuers moves to motif libraries such as ProSite, protein families libraries such as Pfam, or dedicated search tools such as PSORT. This progress leaves fewer proteins seemingly featureless and ripe for sequence gazing.

The type of data mining termed sequence gazing increasingly can be done automatically, and should promote automated reasoning for hypothesizing some protein functions even in the absence of homology (see PMID:19396742).