Orphan marker

To understand a species from its genome sequence, it is important to transform the view from a bag of genes to a systems level understanding, as in the Genome Properties and SEED Subsystems approaches. A gene encodes a protein, the protein performs a role in some system, so the gene is a molecular marker for its system. However, a protein with one specific function sometimes may serve any of several different roles. When the genome sequence shows that a gene does not have its expected context, that gene is an orphan marker.

SelD, known originally as a marker for selenocysteine biosynthesis (with the selA, selB, and selC genes), turned out to be an orphan marker in a large number of species that lack selenocysteine but produce a selenouridine-modified tRNA. Further work showed species (Haloarcula marismortui, Enterococcus faecalis, Clostridium phytofermentans) with SelD orphaned again, unaccompanied by any other markers from the selenocysteine or selenouridine systems. In this case, the orphan marker revealed the third major selenium incorporation system, featuring a maturase that adds an essential labile selenium post-translationally to selenium-dependent molybdenum hydroxylases (SDMH); study of the maturase then showed SDMH to be quite widespread.

When a previously well-understood marker appears as an orphan marker in a sufficient number of complete genomes, it's a thing, a notable observation that invites a new bioinformatics journey.