CoMetGeNe logo
Trail grouping

Once CoMetGeNe trails are identified for several species, the conservation of metabolic and genomic organizational motifs can be investigated at an interspecific level. For simplicity, trails of metabolic reactions catalyzed by products of neighboring genes for a given species are called metabolic and genomic patterns. The role of trail grouping is to identify conserved such patterns for several species.

Given a reference species S among the ones for which trail grouping has been performed (using CoMetGeNe.py or CoMetGeNe_launcher.py), the script grouping.py can exploit trails of the reference species in either of the following two ways:

  • CoMetGeNe trails of S are analyzed in terms of genomic conservation across the other species in the data set, referred to as grouping trails by genes. This consists in determining whether the genes of S involved in these trails have neighboring homologues in other species. See Genomic conservation patterns (grouping by genes) below.
  • CoMetGeNe trails of S are analyzed in terms of metabolic conservation across the other species in the data set, referred to as grouping trails by reactions. This consists in determining whether reactions in the CoMetGeNe trails of S are also performed by products of neighboring geens in other species. See Metabolic conserveration patterns (grouping by reactions) below.

Trail grouping is provided with a user manual. You can also check out a few examples.

Note regarding directory structure

In order to perform trail grouping, the metabolic pathway maps of all species in the data set need to be stored in KGML format in a single directory with subdirectories for every species. The subdirectory names need to be the three- or four-letter KEGG codes for the species in question. For example, a correct directory structure can look like this:
data/bsu/path_aae00010.kgml, path_aae00020.kgml, ... data/pae/path_bbn00010.kgml, path_bbn00020.kgml, ... data/eco/path_eco00010.kgml, path_eco00020.kgml, ... data/ype/path_mpn00010.kgml, path_mpn00020.kgml, ...

It is important to preserve this type of directory structure if CoMetGeNe.py is launched directly; in case CoMetGeNe_launcher.py was used, this particular directory structure is ensured.

Genomic conservation patterns (grouping by genes)

Trail grouping by genes identifies conservation patterns between a reference species and the other species in the data set in terms of genomic organization.

Syntax

For example, suppose grouping.py is executed as follows:
python2 grouping.py genes results/ data/ eco -o tsg_eco.csv

This results in detecting genomic conservation patterns (trail grouping by genes) for species eco, using CoMetGeNe results stored in results/ and metabolic pathway maps stored in KGML format in data/ (see the Note regarding directory structure above). The output of trail grouping by genes is stored in CSV format in the file tsg_eco.csv.

Output format

The CSV file contains a line for every gene of the reference species S involved in CoMetGeNe trails of S that are common to S and at least one other species from the data set. Groups of neighboring genes in S involved in CoMetGeNe trails of S are separated by the line ***. In this CSV file, the line for a gene g of S contains:

  • The name of gene g.
  • The name of the chromosome on which g is located.
  • The strand on the chromosome on which g is located (+ for the positive strand, - for the negative strand).
  • A column for every other species in the data set that can take either of the following values:
    • A cross (x) if g has an homologue in the other species that is a neighbor of at least one other gene involved in the trail;
    • A dot (.) if g has no such homologue.

Metabolic conserveration patterns (grouping by reactions)

Trail grouping by reactions identifies conservation patterns between a reference species and the other species in the data set in terms of metabolic organization.

Syntax

For example, suppose grouping.py is executed as follows:
python2 grouping.py reactions results/ data/ eco -o tsr_eco.csv

This results in detecting metabolic conservation patterns (trail grouping by reactions) for species eco, using CoMetGeNe results stored in results/ and metabolic pathway maps stored in KGML format in data/ (see the Note regarding directory structure above). The output of trail grouping by reactions is stored in CSV format in the file tsr_eco.csv.

Output format

The CSV file contains a line for every reaction of the reference species S involved in CoMetGeNe trails of S. Groups of reactions involved in CoMetGeNe trails of S are separated by the line ***. Note that a given reaction may appear several times in the CSV file, if it occurs in several CoMetGeNe trails of S. In this CSV file, the line for a reaction r in a CoMetGeNe trail of S contains:

  • The KEGG R number for reaction r.
  • The gene name(s) of the gene(s) of S involved in reaction r.
  • The KEGG pathway map ID(s) for the pathway(s) in which the R number associated to r occurs.
  • A column for every other species S' in the data set that can take one of the three following values:
    • A cross (x) if r is performed in species S' by the product of at least one gene neighboring at least one other gene involved in the CoMetGeNe trail to which reaction r belongs;
    • A dot (.) if r is performed in species S' by the product of a gene that is not a neighbor of at least one other gene involved in the CoMetGeNe trail to which reaction r belongs.
    • A circle (o) if r is absent from species S'.

Created by Alexandra Zaharia. Maintained by Alain Denise.
Site style derived from the GreenWorld template at Blue Website Templates.