# MO4

MO4: A Many-objective Evolutionary Algorithm for Protein Structure Prediction

The method is an ab initio modeling for PSP. It uses four objectives (i.e., Bond, Non-bond, SASA and RWplus) to measure the characteristic of protein conformations.  

The Bond and Non-bond are proposed by CHARMME. Executable files are provided by Tinker. https://dasher.wustl.edu/tinker/

The SASA calculates the surface area of atoms in solvent. It has been used to improve the accuracy of protein structure prediction and restrict the surface of proteins in MaOPSP. PyRosetta support the calculation of SASA. http://www.pyrosetta.org/

The RWplus is a knowledge-based function proposed by Y. Zhang. https://zhanglab.ccmb.med.umich.edu/RW/ 

## Requirements

* Python 3.6 or higher is required.
* Ubuntu 16.04 or higher is required.
* The details of python package is listed in [requirements.txt](./requirements.txt).

## Project Structure

```
ProteinPredict_MaOSearch  
    │── bin\                                   // External binary files folder
    │    │──params\                            // Directory contains parameter files used by binary files
    │    │    └──charmm27.prm   
    │    │──analyze  			// Executable binary program to calculate the bond and non-bond energy values.
    │    │──calRWplus  			// Executable binary program to calculate the RWplues energy value.
    │    │──protein  			// Executable binary program to generate xyz coordination of proteins by torsion angles
    │    └──xyzpdb  			// Executable binary program to generate pdb files
    │── energy\                                // Energy function folder
    │    └──energy.py                          // The integration of energy functions' usages 
    │── search\                                // The MaOEA directory
    │    ├──DDFC.py                            // Modified many-objective evolutionary algorithms.
    │    ├──MOEA.py                           // Including the mutation and crossover operators, the calculation of crowding distance
    │    └──SelectMethod.py                // 
    │── utils\                                 // Utils folder
    │    ├──LogUtils.py                        // Log generation
    │    └──ProteinUtils.py                    // Protein and residue structure
    │── logs\                                  // Project log folder
    │── config.json                            // Project config file
    │── energy_config.json                     // Energy function config file
    │── protein_config.json                    // Protein function config file
    │── main.py                                // The entry point of all project
    └── requirements.txt                       // The list of required python packages
```

## Config Sturcture
The all config files written in JSON are used set parameters of protein structure prediction.  And each experiment will save all config files to the corresponding logs folder.

### config.json
In config.json, some global parameters are set such as program path, energy parameters, MOEA parameters and predicted protein parameters. \
The following is an example of config.json and the meanings of parameters in config.json.
```json
{
    "paths": {
        "energy_path": "",
        "algo_path": "",
        "test_path": "",
        "logs_root": "logs",
        "root": ""
    },
    "energy_params": {
        "number_objective": 4,
        "max_thread": 32
    },
    "algo_params": {
        "name": "urs",
        "select_method": "urs",
        "pop_size": 50,
        "max_gen": 3000,
        "save_all": false,
        "k": 5,
        "l": 3,
        "pro_c": 1,
        "dis_c": 20,
        "dis_m": 20,
        "mesh_div": 20,
        "archive_thresh": 3000
    },
    "protein_params": {
        "name": "1ab1",
        "second_struct_file": "protein_structure/1ab1.seq",
        "status": "native"
    }
}
```

### energy_config.json
In energy_config.json, the all parameters are used to initialize class Energy. These parameters are mainly the configuration of binary files. Each objective in energy_config.json is a different energy function. And serial of each objective must be continuous, the start serial must be 0. \
The following is an example of energy_config.json and the meanings of parameters in energy_config.json. 
```json
{
    "general_params": {
        "prefix": "",
        "save_all": false,
        "generate_xyz_bin_path": "bin/protein",
        "generate_pdb_from_xyz_bin_path": "bin/xyzpdb",
        "prm_path": "bin/params/charmm27.prm"
    },
    "objective0": {
        "name": "analyze",
        "bin_path": "bin/analyze",
        "param_path": "bin/params/charmm27.prm",
        "objective_index": [0,1]
    },
    "objective1": {
        "name": "sas",
        "bin_path": "",
        "param_path": "bin/params/charmm27.prm", 
        "objective_index": 2
    },
    "objective2": {
        "name": "RWplus",
        "bin_path": "bin/calRWplus",
        "param_path": "bin/params/charmm27.prm",
        "objective_index": 3
    }
}
```

### protein_config.json
The secondary structure constraint and side-chain constraint of proteins are recorded in the protein_config.json.


## How to run？
1. Configuring the environment.
2. Adding external binary executable program.\
   Bond and Non-bond: https://dasher.wustl.edu/tinker/ \
   SASA: http://www.pyrosetta.org/\
   RWplus: https://zhanglab.ccmb.med.umich.edu/RW/\
   
python main.py --config config.json --energy_config energy_config.json

**If the protein files do not generate, please re-download these program, or executing the command "sudo chmod 777 * " in bin folder.**


