It's a thing

It's a thing is a catch phrase for an interesting, unexplained characteristic or arrangement found in some bioinformatics analysis. The phrase makes the claim that a suspiciously interesting feature found in one genome matches recognizably similar features in enough other genomes to be meaningful. It suggests an opportunity to learn more through a bioinformatics journey, building on preliminary work when it is not yet clear - preliminary to what?

The SelD protein activates selenium for use in biological systems that turn it into selenocysteine or selenouridine. Finding a selD gene as an orphan marker in the complete genome of Enterococcus faecalis, with no other markers of the selenocysteine or selenouridine systems, is intriguing. A single orphaned marker, however, might point merely to recent gene loss and a system in decay. But when selD occurs as an orphan marker again in Haloarcula marismortui, and again in Clostridium phytofermentans, it's a thing. Orphan SelD turns out to be the marker that led to discovery of the maturation system for selenium-dependent molybdenum hydroxylases (see PMID: 18289380).

Every gene, including every gene for a transcription factor, must have neighboring genes. But if the genes for some transcription factor family and some other protein family regularly occur as adjacent genes pointing away from each other, that's a divergon, and it's a thing. There is a good chance the transcription factor binds the DNA between the two diverging operons and regulates them both. The actual transcription factor binding site becomes a candidate for discovery through its phylogenetic footprint.

If some predicted small gene translates into a short polypeptide, it may or may not be real. Even if similar translations can be predicted in other genomes, those too may be fortuitous translations of a nucleic acid sequence feature that is not really protein-coding. But if members of one family of short hypothetical polypeptides always occur paired with enzymes that cause modifications to short peptides, it's a thing. It may be a bacteriocin production system, as with subtilosin A. It may turn out to follow the bioinformatics grammar of a cofactor production system, as with PQQ or mycofactocin. Or it may be a pheromone production system, or something else. But if the pairing of the short polypeptide gene with the maturase enzyme is reliably conserved, it's a piece of a larger bioinformatics story, and it's a thing.