ReconManager 
===============

.. automodule:: ReconManager 
    :members: 
    :undoc-members: 
    :show-inheritance: 
    
    **Recon-manager procedure**
    
    This project is to develop a framework that systematically predicts Gene-Transcript-Protein-Reaction Associations (GeTPRA) in human metabolims and updates a human genome-scale metabolic model (GEM) accordingly. **Recon manager** is a part of the project, which is a collection of scripts to generate Recon 2M.1 and simulate Recon models.
    
    ------
    

    **Features**
    **Recon manager** contains scripts that implement following tasks independently:

    - Convert GPR to TPR associations
    - Update metabolite information
    - Calculate model statistics
    - Evaluate functionality of metabolic model
    - Reconstruct personal GEMs using tINIT

    **Installation**
    *Major dependencies*
    - [gurobipy](http://www.gurobi.com/)

    *Procedure*
    **Note**: This source code was developed in Linux, and has been tested in Ubuntu 14.04.5 LTS (i7-4770 CPU @ 3.40GHz)

    1. Clone the repository

    2. Create and activate virtual environment
    
    .. code-block::
    
            $ virtualenv venv
            $ source venv/bin/activate

    3. Install packages at the root of the repository

    .. code-block::
    
            $ pip install pip --upgrade
            $ pip install -r requirements.txt

    4. Install [gurobipy](http://www.gurobi.com/)

        In our case, we installed gurobipy in the root of a server, and created its symbolic link in `venv`:

        .. code-block::
        
            $ ln -s /usr/local/lib/python2.7/dist-packages/gurobipy/ $HOME/recon-manager/venv/lib/python2.7/site-packages/

    *Feature: Convert GPR to TPR associations*
    Input arguments and corresponding files
    Following working input files can be found in: `./input_data/GPR_to_TPR_inputs`. 
    These files were used for the data presented in the manuscript.

    - `-o` : Output directory

    - `-model` : COBRA-compliant SBML file (generic human GEM)
        - File name in the source: `Recon2M.1_Entrez_Gene.xml`

    - `-gene_transcript_information` : A list of gene IDs and their matching transcript IDs
        - File name in the source: `Ensembl88_GRCh38_all_transcript_information.txt`

        **File format**

            NCBI gene ID    Gene stable ID  Transcript stable ID    RefSeq mRNA ID  UCSC Stable ID
            2733    ENSG00000119392 ENST00000309971 NM_001003722    uc004bvj.4
            2733    ENSG00000119392 ENST00000372770 NM_001499       uc004bvi.4
            5690    ENSG00000126067 ENST00000373237 NM_002794       uc001bzf.4
            5690    ENSG00000126067 ENST00000373237 NM_001199779    uc001bzf.4
            5690    ENSG00000126067 ENST00000621781 NM_001199780    uc021olh.3

        **Download procedure**

        1. Go to [Ensembl BioMarts](http://www.ensembl.org/biomart/martview)
        2. Click *Dataset* on the left menu

            Select *Ensembl Genes 89* in the drop-down menu *CHOOSE DATABASE*

            Select *Human genes (GRCh38.p10)* in the drop-down menu *CHOOSE DATASET*

        3. Click *Filters* on the left menu

            Click *GENE:* in the main menu (center)

            Check *Gene type* and select *protein_coding*

        4. Click *Attributes* on the left menu

            Check *Features* in the center

        5. Click both *GENE:* and *EXTERNAL:* in the main menu (center)

            *GENE:* -> *Ensembl* -> Uncheck *Gene stable ID* and *Transcript stable ID*

            Check following items in order:

            - *EXTERNAL:* -> *External References (max 3)* -> *NCBI gene ID*
            - *GENE:* -> *Ensembl* -> *Gene stable ID*
            - *GENE:* -> *Ensembl* -> *Transcript stable ID*
            - *EXTERNAL:* -> *External References (max 3)* -> *RefSeq mRNA ID*
            - *EXTERNAL:* -> *External References (max 3)* -> *UCSC Stable ID*

        6. Click the button *Results* on the top left
        7. Click the button *Go* in the top center

    *Implementation*
    **Note**: Running this script takes ~ 4 m

    .. code-block::
    
      $ python model_GPR_to_TPR_converter.py \
      -o ./results/GPR_to_TPR_results/ \
      -gene_transcript_information ./input_data/GPR_to_TPR_inputs/Ensembl88_GRCh38_all_transcript_information.txt \
      -model ./input_data/GPR_to_TPR_inputs/Recon2M.1_Entrez_Gene.xml


    *Feature: Update metabolite information*
    Input arguments and corresponding files
    Following working input files can be found in: `./input_data/metabolite_information_update_inputs`. 
    These files were used for the data presented in the manuscript.

    - `-o` : Output directory

    - `-model` : COBRA-compliant SBML file (generic human GEM)
        - File name in the source: `Recon2M.1_Entrez_Gene.xml`

    - `-mnx_xref` : Info on chemical identifiers from [MetaNetX](http://www.metanetx.org/)
        - File name in the source: `chem_xref.tsv`
        - Click [chem_xref.tsv](http://www.metanetx.org/cgi-bin/mnxget/mnxref/chem_xref.tsv) for downloading

    - `-mnx_prop` : Info on chemical structures from [MetaNetX](http://www.metanetx.org/)
        - File name in the source: `chem_prop.tsv`
        - Click [chem_prop.tsv](http://www.metanetx.org/cgi-bin/mnxget/mnxref/chem_prop.tsv) for downloading

    - `-bigg` : Info on BiGG metabolites from [BiGG Models](http://bigg.ucsd.edu/)
        - File name in the source: `bigg_models_metabolites.txt`
        - Click [bigg_models_metabolites.txt](http://bigg.ucsd.edu/static/namespace/bigg_models_metabolites.txt) for downloading

    - `-chebi` : Info on ChEBI and InChI from [ChEBI](https://www.ebi.ac.uk/chebi/init.do)
        - File name in the source: `chebiId_inchi.tsv`
        - Click [chebiId_inchi.tsv](ftp://ftp.ebi.ac.uk/pub/databases/chebi/Flat_file_tab_delimited/chebiId_inchi.tsv) for downloading

    *Implementation*
    **Note**: Running this script takes ~ 5 s

    .. code-block::
    
      $ python model_update_metabolite_information.py \
      -o ./results/metabolite_information_update_results/ \
      -model ./input_data/metabolite_information_update_inputs/Recon2M.1_Entrez_Gene.xml \
      -mnx_xref ./input_data/metabolite_information_update_inputs/chem_xref.tsv \
      -mnx_prop ./input_data/metabolite_information_update_inputs/chem_prop.tsv \
      -bigg ./input_data/metabolite_information_update_inputs/bigg_models_metabolites.txt \
      -chebi ./input_data/metabolite_information_update_inputs/chebiId_inchi.tsv


    *Feature: Calculate model statistics*
    Input arguments and corresponding files
    Following working input files can be found in: `./input_data/model_function_inputs`. 
    These files were used for the data presented in the manuscript.

    - `-o` : Output directory

    - `-model` : COBRA-compliant SBML file (generic human GEM)
        - File name in the source: `Recon2M.1_Entrez_Gene.xml`

    - `-medium` : A representative medium (RPMI-1640 medium)
        - File name in the source: `RPMI1640_medium.txt`

        **File format**
        
        .. code-block::

            EX_gly_LPAREN_e_RPAREN_	-0.05	1000
            EX_arg_L_LPAREN_e_RPAREN_	-0.05	1000
            EX_asn_L_LPAREN_e_RPAREN_	-0.05	1000
            EX_asp_L_LPAREN_e_RPAREN_	-0.05	1000
            EX_cys_L_LPAREN_e_RPAREN_	-0.05	1000
            EX_glu_L_LPAREN_e_RPAREN_	-0.05	1000
            EX_his_L_LPAREN_e_RPAREN_	-0.05	1000

    *Implementation*
    **Note**: Running this script takes ~ 47 m

    .. code-block::

      $ python model_metabolic_model_statistics.py \
      -o ./results/model_statistics_results/ \
      -medium ./input_data/model_function_inputs/RPMI1640_medium.txt \
      -model ./input_data/model_function_inputs/Recon2M.1_Entrez_Gene.xml


    *Feature: Evaluate functionality of metabolic model*
    Input arguments and corresponding files
    Following working input files can be found in: `./input_data/model_function_inputs`. 
    These files were used for the data presented in the manuscript.

    - `-o` : Output directory

    - `-model` : COBRA-compliant SBML file (generic human GEM)
        - File name in the source: `Recon2M.1_Entrez_Gene.xml`

    - `-medium` : A representative medium (RPMI-1640 medium)
        - File name in the source: `RPMI1640_medium.txt`

    - `-defined_medium` : A defined minimal medium
        - File name in the source: `Defined_medium.txt`

    - `-es_genes` : A list of essential genes
        - File name in the source: `Essential_genes_from_wang_et_al.txt`

    - `-ne_genes` : A list of non-essential genes
        - File name in the source: `Non_essential_genes_from_wang_et_al.txt`

    - `-c_source` : A list of carbon sources
        - File name in the source: `atp_carbon_source.txt`

    - `-biomass` : Reaction ID for biomass generation equation

    - `-oxygen` : Reaction ID for oxygen uptake

    - `-atp` : Reaction ID for ATP production

    *Implementation*
    **Note**: Directly insert reaction IDs in terminal for `-biomass`, `-oxygen` and `-atp`

    **Note**: Running this script takes ~ 7 m

    .. code-block::
    
      $ python model_metabolic_function.py \
      -o ./results/model_function_results/ \
      -model ./input_data/model_function_inputs/Recon2M.1_Entrez_Gene.xml \
      -medium ./input_data/model_function_inputs/RPMI1640_medium.txt \
      -defined_medium ./input_data/model_function_inputs/Defined_medium.txt \
      -es_genes ./input_data/model_function_inputs/Essential_genes_from_wang_et_al.txt \
      -ne_genes ./input_data/model_function_inputs/Non_essential_genes_from_wang_et_al.txt \
      -c_source ./input_data/model_function_inputs/atp_carbon_source.txt \
      -biomass biomass_reaction \
      -oxygen EX_o2_LPAREN_eRPAREN \
      -atp DM_atp_c_
    

    *Feature: Reconstruct personal GEMs using [tINIT](http://msb.embopress.org/content/10/3/721.long)*
    Input arguments and corresponding files
    Following working input files can be found in: `./input_data/tINIT_inputs`. 
    These files were used for the data presented in the manuscript.

    - `-o` : Output directory

    - `-model` : COBRA-compliant SBML file (generic human GEM)
        - File name in the source: `Recon2M.1_Entrez_Gene.xml`

    - `-medium` : A representative medium (RPMI-1640 medium)
        - File name in the source: `RPMI1640_medium.txt`

    - `-task` : A list of metabolic tasks
        - File name in the source: `MetabolicTasks.csv`

    - `-present_reaction` : A list of reactions that should be present in model
        - File name in the source: `essential_reactions.txt`

    - `-present_metabolite` : A list of metabolites that should be present in model
        - File name in the source: `essential_metabolites.txt`

    - `-i` : Omics data
        - File name in the source: `BLCA_T_TTL.csv`

    - `-biomass` : Reaction ID for biomass generation equation

    *Implementation*
    **Note**: Directly insert a reaction ID in terminal for `-biomass`

    **Note**: Running this script for a Recon model takes ~ 8 m

    .. code-block::
    
      $ python personal_GEM_tINIT.py \
      -o ./results/tINIT_results/ \
      -medium ./input_data/tINIT_inputs/RPMI1640_medium.txt \
      -model ./input_data/tINIT_inputs/Recon2M.1_Entrez_Gene.xml \
      -biomass biomass_reaction \
      -task ./input_data/tINIT_inputs/MetabolicTasks.csv \
      -present_reaction ./input_data/tINIT_inputs/essential_reactions.txt \
      -present_metabolite ./input_data/tINIT_inputs/essential_metabolites.txt \
      -i ./input_data/tINIT_inputs/BLCA_N_TTL.csv
    

    *Feature: Predict flux using transcript-level RNA-Seq data and GeTPRA*
    Input arguments and corresponding files
    Following working input files can be found in: `./input_data/Flux_prediction`. 
    These files were used for the data presented in the manuscript.

    - `-o` : Output directory

    - `-g_model` : COBRA-compliant SBML file (generic human GEM)
        - File name in the source: `Recon2M.2_BiGG_UCSC_Transcript.xml`

    - `-c_model` : COBRA-compliant SBML file (context-specific human GEM)
        - File name in the source: `LIHC_TCGA-BC-A10Q.xml`

    - `-getpra` : GeTPRA file 
        - File name in the source: `GeTPRA.txt`

    - `-use_getpra` : Option for flux prediction using transcript-level data
        - Insert`yes` for flux prediction using transcript-level data
        - Insert`no` for flux prediction using gene-level data

    *Implementation*
    **Note**: Running this script takes ~ 7 m

    .. code-block::
    
      $ python flux_prediction.py \
      -o ./results/Flux_prediction/ \
      -i ./input_data/Flux_prediction/LIHC_TCGA-BC-A10Q.csv \
      -getpra ./input_data/Flux_prediction/GeTPRA.txt \
      -g_model ./input_data/Flux_prediction/Recon2M.2_BiGG_UCSC_Transcript.xml \
      -c_model ./input_data/Flux_prediction/LIHC_TCGA-BC-A10Q.xml \
      -use_getpra yes
    

    **Publication**
    Jae Yong Ryu 1, Hyun Uk Kim 1 & Sang Yup Lee. Framework and resource for more than 11,000 gene-transcript-protein-reaction associations in human metabolism., Proc. Natl. Acad. Sci. U.S.A., 2017, http://www.pnas.org/content/early/2017/10/23/1713050114

    
    -------