CoMetGeNe logo
User manual

Both CoMetGeNe.py (trail finding) and grouping.py (trail grouping) provide detailed usage explanations.

CoMetGeNe.py usage

Run CoMetGeNe.py -h to obtain the user manual:
usage: CoMetGeNe.py [-h] [--delta_G NUMBER] [--delta_D NUMBER] [--timeout SECONDS] [--output OUTPUT] [--skip-import] ORG DIR Determines maximum trails of reactions for the specified organisms such that the genes encoding the enzymes involved in the trails are neighbors. A trail of reactions is a sequence of reactions that can repeat reactions (vertices), but not arcs between reactions. Metabolic pathways and genomic information are automatically retrieved from the KEGG knowledge base. Required arguments: ORG query organism (three- or four-letter KEGG code, e.g. 'eco' for Escherichia coli K-12 MG1655). See full list of KEGG organism codes at http://rest.kegg.jp/list/genome DIR directory storing metabolic pathways for the query organism ORG or where metabolic pathways for ORG will be downloaded Optional arguments: -h, --help show this help message and exit --delta_G NUMBER, -dG NUMBER the NUMBER of genes that can be skipped (default: 0) --delta_D NUMBER, -dD NUMBER the NUMBER of reactions that can be skipped (default: 0) --timeout SECONDS, -t SECONDS timeout in SECONDS (default: 300) --output OUTPUT, -o OUTPUT output file --skip-import, -s skips importing metabolic pathways from KEGG, attempting to use locally stored KGML files if they are present under the specified directory (DIR) --both-strands, -b considers neighboring genes on both strands of a given chromosome (by default, only genes located on a single strand are considered neighbors) Example: running python2 CoMetGeNe.py eco data/ -dG 2 -o eco.out downloads metabolic pathways for species 'eco' to directory 'data/'. Trail finding is performed, allowing two genes to be skipped at most (-dG 2). Reactions cannot be skipped (-dD is 0 by default). Maximum trails of reactions such that the reactions are catalyzed by products of neighboring genes are saved in the output file 'eco.out'.

grouping.py usage

Run grouping.py -h to obtain the user manual:
usage: grouping.py [-h] [--output OUTPUT] {genes,reactions} RESULTS KGML ORG Groups CoMetGeNe trails by either genes or reactions, optionally producing a CSV file. Required arguments: {genes,reactions} type of trail grouping to perform (possible values: 'genes' or 'reactions') RESULTS directory storing CoMetGeNe results KGML directory containing input KGML files ORG reference species (KEGG organism code) Optional arguments: -h, --help show this help message and exit --output OUTPUT, -o OUTPUT output file (CSV) KGML needs to contain a subdirectory for every species for which a result file is present in RESULTS. The subdirectory names need to be the three- or four- letter KEGG codes for the species in question (e.g., 'bsu', 'eco', 'pae', etc.). Each species subdirectory is expected to contain metabolic pathways in KGML format. Example: running python2 grouping.py genes results/ data/ eco -o grouping_gene_eco.csv will perform trail grouping by genes for the reference species 'eco'. The CoMetGeNe results are stored in 'results/', and the KGML files are available in 'data/'. A CSV file is produced ('grouping_gene_eco.csv').

CoMetGeNe_launcher.py

Although CoMetGeNe_launcher.py does not accept command-line arguments, it can easily be configured to perform trail finding in parallel by altering a few variables. Examples:

  • If you wish to have CoMetGeNe consider only genes on the same strand of a chromosome as neighbors, then leave the following variable unchanged:
    both_strands = False
    If, however, you wish to take into account genes on both strands when defining gene neighborhoods, then you would need to set both_strands to True.
  • If you wish to run trail finding for species aae, bbn, eco, and mpn instead of the default 50 bacterial species, then simply modify the variable org_code as follows:
    org_codes = ['aae', 'bbn', 'eco', 'mpn']
  • If you wish to save metabolic pathways in a directory KGML/ rather than the default data/, then simply modify the variable kgml_dir as follows:
    kgml_dir = 'KGML'
  • If you wish to save CoMetGeNe/ results in a directory tf_20180701/ rather than the default results/, then simply modify the variable results_dir as follows:
    results_dir = 'tf_20180701'
  • If you wish to allow CoMetGeNe to skip at most 4 genes instead of the default 3, and at most 2 reactions instead of the default 3, then simply modify the variables delta_genes_max and delta_reactions_max as follows:
    delta_genes_max = 4 delta_reactions_max = 2

Blacklisted pathways

The underlying problem formulation for trail finding is NP-hard (see our accompanying publication for more details). This is why CoMetGeNe.py sets a timeout (of 5 minutes, by default) for analyzing a given metabolic pathway. If this timeout is reached without finishing the analysis, then the pathway in question is blacklisted, i.e. it is added to a list of exclusions for the species and combination of gap parameters for which the analysis could not be finished. The blacklist is a text file called excluded_pathways.txt, placed in the CoMetGeNe/ project root.

If CoMetGeNe.py is re-ran, it will no longer attempt to analyze the blacklisted pathway. If greater values of the gap parameters with respect to a blacklisted pathway are given to CoMetGeNe.py for a novel execution, then the pathway in question will not be analyzed. For example, if the blacklist contains the entry

cpe 2 2 00500
then the pathway 00500 will not be analyzed for species cpe when CoMetGeNe.py is ran with gap parameters (δD, δG) set to (2, 2), (2, 3), (3, 2), (3, 3), and so on.

Created by Alexandra Zaharia. Maintained by Alain Denise.
Site style derived from the GreenWorld template at Blue Website Templates.