Edge piece for system reconstruction

A Genome Property, or a SEED subsystem, is a rule for using a set of protein family definitions to find a set of proteins that work together, such as the enzymes of one pathway or the subunits of one transporter. An edge piece is a protein family whose boundaries are so clear that identifying its members guides the building of other protein families from the same subsystem. The term "edge piece" should evoke analogies between solving jigsaw puzzles and reasoning via comparative genomics to find cohorts of proteins that work cooperatively.

When the Genome Property for the SCIFF system was built, the SCIFF (Six Cysteines in Forty-Five) precursor peptide itself was the edge piece. TIGRFAMs protein family TIGR03973 describes this protein unambiguously. Every discernible homologue appears to be a true member of the family. This clarity made it easy to build a protein family definition, TIGR03974, from a branch of the radical SAM enzyme superfamily. Members always occurs where SCIFF occurs, never occur anywhere else, and appear to be SCIFF maturase, an enzyme for post-translational modification to the SCIFF precursor.

Postulating that a protein family represents a true edge piece for a puzzle in subsystems reconstruction leads to several specific hypotheses. If the SCIFF precursor family is truly an edge piece for peptide modification system, then apparent SCIFF maturases without nearby precursor peptides should be explainable by finding missed gene model predictions, SCIFF maturases should be monophyletic within the radical SAM protein family, and several additional families of radical SAM enzymes not to different from the SCIFF maturase should show similar pairings with different families of precursor peptides, such as mycofactocin. All these predictions were borne out by comparative genomics across nearly 2000 genomes, completing a bioinformatics journey for discovery of a new peptide modification system.