Index of /ftp/pub/bmrb/timedomain/bmr6682/analysis/9valid

All required steps and software documentation for doing a refinement in
water on an already good structure. Make sure the settings at the bottom
of this document are defined.

*** START
*** Got new topology and parameter files from HIC-Up database in Sweden
        Contains definitions for Fe2S2 cluster.
        Called them fes_xplor_par.txt and fes_xplor_top.txt
        Adjusted many angles to 180 and 0 from measured values in order to
        impose perfect symmetry. For most parameters the averaged values
        observed in Milo's calculation were imposed.
*** Using protocol from Aart Nederveen to generate an xplor psf/mtf file
	This protocol actually uses Chris Spronk/Jens Linge's code
*** Split the pdb file into multiple files using my code:
        splitpdb_new $x.pdb
        This assumes nawk is present or linked to gawk (as on cesg-master)
        and x is a variable set to 1xxx which will be the project name.
        The input PDB file at this point is molmol's output without pseudo atoms.
        Doesn't need to have protons because they will be generated.
*** Renames a bunch of atom names to xplor convention.
        $SCRIPTS/pdbcns/convert.py -convertpdb $x"_1".pdb PDB out.pdb XPLOR        
*** Rename the atom names in the FES cluster:
        sed -e 's/FE  HIS    67/FE1 FES   113/' out.pdb \
            -e 's/FE  CYS    45/FE2 FES   113/'            \
            -e 's/SE  CYS    45/S1  FES   113/'            \
            -e 's/SE  HIS    67/S2  FES   113/'       > $x"_template".pdb
*** Added topology and parameters to the same file (topallhdg5.3.pro). 
        Don't want to bother with specifying multiple files for them.        
*** Using xplor-nih to create psf topology file using generate_tmoc.inp
        script but needs parameter and topology files specific for fes cluster.
        Creating one takes a lot of time. Getting the patches in order
        takes even more time. Don't forget the cis-peptide planes patch
        if needed in your protein.
        xplor < generate_tmoc.inp > generate_tmoc.log
        This creates the psf file. It's a good idea to check the 
        energy so the parameters get checked too.
        Result: 1xxx.psf protein structure file (no coordinates)
*** Strip the protons and generate them anew, which is apparently easier
        than renaming them. The way to do this is by reading the psf file 1xxx.psf
        just created. Reading all atoms that already coincide which are all but the
        cluster atoms and many protons. Keep the coordinates of the known
        atoms just read and generate coordinates for all unknown atoms. This way
        the coordinate file doesn't need to be corrected too much.
        generateProtons.csh  can be run on cluster like:
        $QSUB $CONV_DIR/generateProtons.csh 
        It takes about 30 seconds per model so 10 min for 20 models.
        This could of course be done in parallel.
        Output can be found in directory: /u02/jurgen/job_io (define your own)
        in a file named after the job id and output stream type error ER or
        output OU. The output is very async-ed so wait for it.
*** Rename the files from 1xxx_cns_99.pdb to 1xxx_99.pdb etc.
        A simple script for doing this is called renameFile.csh
        Note that this will overwrite the files generate before by 
        splitpdb_new.        
*** Copy the converted files to the water refinement input dir:
        \cp *_*.pdb $COORINDIR
        \cp *.psf $COORINDIR/../psf
*** Convert the experimental data like upper/lower bounds and dihedrals.
        (aqua is installed and compiled on cesg-master)
        type: 'aqua' alias (see settings)
        type: 'qproject tmoc'
        type: 'qconvert -t 4 tmoc.upl'
        this specifies the input format is dyana (format 4)
        it will create an aqua output file tmoc.noe among other files.
        replace all + and - signs with a space in a text editor or do
        something like (so negative angles don't get inverted when 
        you use this for angles): 
                sed 's/[+-] /  /g' tmoc.noe > t
                mv t tmoc.noe
        type: cp ../conversion/1xxx_1.pdb .
        To convert the aqua data to xplor data
        type: aq2x xplor tmoc.noe tmoc.tbl 1xxx_1.pdb
        Move and rename to the input dir for structure calculation:
        cp tmoc.tbl ../RESAMPLE/tables/1xxx_noe.tbl
        And for dihedrals:
        convXPLORtorsrestr.awk tmoc.aco > tmoc_dihedral.tbl
        (this 31 line script resides in my bin dir)
        Edit manually for deviating format and move to tables dir
        like the NOEs.
*** To check the data coordinates versus constraints type
        Use the complete pdb file with all models by first creating it:
        joinpdb -o 1xxx_xplor.pdb 1xxx_[0-9]*.pdb 
        and then
        aqpc -r tmoc -xplor -r6sum 1 1xxx_xplor.pdb
        I found in the summary file: 1xxx_xplor.nsm
        Largest UPPER BOUND violations:
        _   THR   70 HG1    _   ASN   80 HB1   0.56 (model 10)
        _   CYS   84 HA     _   CYS   85 HN    0.44 (model 10)
        _   LYS  103 HA     _   GLY  104 HN    0.42 (model 12)
        Note that the order of the models in the ensemble PDB
        file is alphabetical on the number and not sequential as
        expected (I would have done this differently if I developed it).        
*** Run the prelim procheck/whatif analyses
        In file refine.pars set to 1 structure to test.
        NINPUTSTRUCTURES  = 1
        Set code in nmr_waterrefine.py to exit after these checks.
        python2 nmr_waterrefine.py
*** When all is successfull rerun it with the full number of structures
        (20 in tmoc case).
*** Run prelim refinement on 2 structures or so with small number of
        steps. I found this time that the structures explode during the phase
        where the temp is high and the dihe terms are scaled up by factor of 3.
        This phase is called hot. We'll it is.
        Trying to find out what is causing the problem. So far tried:
        - lowering the max temp to 300
        - downsizing step from 4 to 1 fs
        - now trying structures without fes group and with normal protonation on them.
                that works without explosion twice already.
        Ok, turns out I need to use the IMPRoper forces in stead of DIHEdral.
        The calculation was instable with dihedral terms of 750 times the factor 
        of 3. The impropers aren't scaled with this factor 3 in the protocol and
        is stable. Set everything back to original protocol settings.
*** Adjusting the number of structures back to the full number (20) and reran the
        refinement with the normal number of steps.
*** Get statistics from the summary files: analyzed_input/summary_analyzed.txt
*** Get statistics from PDB files by typing:
        grep violations refined_input/refined_*.pdb | gawk '{if ($6>0) print}'
        This time the only violations were small dihedral angle constraint violations
        over 5 (but below 10).
*** Get statistics from PDB files by typing:
        grep violations refined_input/refined_*.pdb | gawk '{if ($6>0) print}'
        This time the only violations were small dihedral angle constraint violations
        over 5 (but below 10).
*** DONE



--- Settings:
setenv x=1xxx
setenv CNS_BIN  /s/src/cns_solve_1.1/intel-i686-linux/bin
setenv TOPPAR   /u02/jurgen/CESG/Brian/tmoc/RESAMPLE/toppar
setenv CONV_DIR /u02/jurgen/CESG/Brian/tmoc/conversion
setenv SCRIPTS  /u/jurgen/CloneWars/Aart/water_bmrb

## You should change the job_io dirs for sure.
setenv QSUB    '/usr/bin/qsub -q seriallong -o /u02/jurgen/job_io -e /u02/jurgen/job_io -l nodes=1:ppn=1 -S /bin/tcsh -V' 
setenv COORINDIR /u02/jurgen/CESG/Brian/tmoc/RESAMPLE/input
alias aqua 'setenv aquaroot ~jurgen/src/aqua3.2;      source $aquaroot/aqsetup'

--- Nice software on cesg-master some of it in:
/u02/jurgen/bin

jedit 		text editor works over x (in java so slow but doable)
aqua            first type: aqua which should be an alias defined in .cshrc see settings.

Name	Last modified	Size

Parent Directory		-
PAR_ZINC.PRO	2021-02-13 18:54	2.8K
TOP_ZINC.PRO	2021-02-13 18:55	3.9K
c70a/	2021-02-13 18:55	-
c74c/	2021-02-13 18:54	-
c108a/	2021-02-13 18:56	-
final/	2021-02-13 18:56	-
generate_easy.inp	2021-02-13 18:54	19K
generate_inem.inp	2021-02-13 18:55	1.3K
generate_tmoc.inp	2021-02-13 18:53	1.7K
renumber/	2021-02-13 18:56	-

All required steps and software documentation for doing a refinement in
water on an already good structure. Make sure the settings at the bottom
of this document are defined.

*** START
*** Got new topology and parameter files from HIC-Up database in Sweden
        Contains definitions for Fe2S2 cluster.
        Called them fes_xplor_par.txt and fes_xplor_top.txt
        Adjusted many angles to 180 and 0 from measured values in order to
        impose perfect symmetry. For most parameters the averaged values
        observed in Milo's calculation were imposed.
*** Using protocol from Aart Nederveen to generate an xplor psf/mtf file
	This protocol actually uses Chris Spronk/Jens Linge's code
*** Split the pdb file into multiple files using my code:
        splitpdb_new $x.pdb
        This assumes nawk is present or linked to gawk (as on cesg-master)
        and x is a variable set to 1xxx which will be the project name.
        The input PDB file at this point is molmol's output without pseudo atoms.
        Doesn't need to have protons because they will be generated.
*** Renames a bunch of atom names to xplor convention.
        $SCRIPTS/pdbcns/convert.py -convertpdb $x"_1".pdb PDB out.pdb XPLOR        
*** Rename the atom names in the FES cluster:
        sed -e 's/FE  HIS    67/FE1 FES   113/' out.pdb \
            -e 's/FE  CYS    45/FE2 FES   113/'            \
            -e 's/SE  CYS    45/S1  FES   113/'            \
            -e 's/SE  HIS    67/S2  FES   113/'       > $x"_template".pdb
*** Added topology and parameters to the same file (topallhdg5.3.pro). 
        Don't want to bother with specifying multiple files for them.        
*** Using xplor-nih to create psf topology file using generate_tmoc.inp
        script but needs parameter and topology files specific for fes cluster.
        Creating one takes a lot of time. Getting the patches in order
        takes even more time. Don't forget the cis-peptide planes patch
        if needed in your protein.
        xplor < generate_tmoc.inp > generate_tmoc.log
        This creates the psf file. It's a good idea to check the 
        energy so the parameters get checked too.
        Result: 1xxx.psf protein structure file (no coordinates)
*** Strip the protons and generate them anew, which is apparently easier
        than renaming them. The way to do this is by reading the psf file 1xxx.psf
        just created. Reading all atoms that already coincide which are all but the
        cluster atoms and many protons. Keep the coordinates of the known
        atoms just read and generate coordinates for all unknown atoms. This way
        the coordinate file doesn't need to be corrected too much.
        generateProtons.csh  can be run on cluster like:
        $QSUB $CONV_DIR/generateProtons.csh 
        It takes about 30 seconds per model so 10 min for 20 models.
        This could of course be done in parallel.
        Output can be found in directory: /u02/jurgen/job_io (define your own)
        in a file named after the job id and output stream type error ER or
        output OU. The output is very async-ed so wait for it.
*** Rename the files from 1xxx_cns_99.pdb to 1xxx_99.pdb etc.
        A simple script for doing this is called renameFile.csh
        Note that this will overwrite the files generate before by 
        splitpdb_new.        
*** Copy the converted files to the water refinement input dir:
        \cp *_*.pdb $COORINDIR
        \cp *.psf $COORINDIR/../psf
*** Convert the experimental data like upper/lower bounds and dihedrals.
        (aqua is installed and compiled on cesg-master)
        type: 'aqua' alias (see settings)
        type: 'qproject tmoc'
        type: 'qconvert -t 4 tmoc.upl'
        this specifies the input format is dyana (format 4)
        it will create an aqua output file tmoc.noe among other files.
        replace all + and - signs with a space in a text editor or do
        something like (so negative angles don't get inverted when 
        you use this for angles): 
                sed 's/[+-] /  /g' tmoc.noe > t
                mv t tmoc.noe
        type: cp ../conversion/1xxx_1.pdb .
        To convert the aqua data to xplor data
        type: aq2x xplor tmoc.noe tmoc.tbl 1xxx_1.pdb
        Move and rename to the input dir for structure calculation:
        cp tmoc.tbl ../RESAMPLE/tables/1xxx_noe.tbl
        And for dihedrals:
        convXPLORtorsrestr.awk tmoc.aco > tmoc_dihedral.tbl
        (this 31 line script resides in my bin dir)
        Edit manually for deviating format and move to tables dir
        like the NOEs.
*** To check the data coordinates versus constraints type
        Use the complete pdb file with all models by first creating it:
        joinpdb -o 1xxx_xplor.pdb 1xxx_[0-9]*.pdb 
        and then
        aqpc -r tmoc -xplor -r6sum 1 1xxx_xplor.pdb
        I found in the summary file: 1xxx_xplor.nsm
        Largest UPPER BOUND violations:
        _   THR   70 HG1    _   ASN   80 HB1   0.56 (model 10)
        _   CYS   84 HA     _   CYS   85 HN    0.44 (model 10)
        _   LYS  103 HA     _   GLY  104 HN    0.42 (model 12)
        Note that the order of the models in the ensemble PDB
        file is alphabetical on the number and not sequential as
        expected (I would have done this differently if I developed it).        
*** Run the prelim procheck/whatif analyses
        In file refine.pars set to 1 structure to test.
        NINPUTSTRUCTURES  = 1
        Set code in nmr_waterrefine.py to exit after these checks.
        python2 nmr_waterrefine.py
*** When all is successfull rerun it with the full number of structures
        (20 in tmoc case).
*** Run prelim refinement on 2 structures or so with small number of
        steps. I found this time that the structures explode during the phase
        where the temp is high and the dihe terms are scaled up by factor of 3.
        This phase is called hot. We'll it is.
        Trying to find out what is causing the problem. So far tried:
        - lowering the max temp to 300
        - downsizing step from 4 to 1 fs
        - now trying structures without fes group and with normal protonation on them.
                that works without explosion twice already.
        Ok, turns out I need to use the IMPRoper forces in stead of DIHEdral.
        The calculation was instable with dihedral terms of 750 times the factor 
        of 3. The impropers aren't scaled with this factor 3 in the protocol and
        is stable. Set everything back to original protocol settings.
*** Adjusting the number of structures back to the full number (20) and reran the
        refinement with the normal number of steps.
*** Get statistics from the summary files: analyzed_input/summary_analyzed.txt
*** Get statistics from PDB files by typing:
        grep violations refined_input/refined_*.pdb | gawk '{if ($6>0) print}'
        This time the only violations were small dihedral angle constraint violations
        over 5 (but below 10).
*** Get statistics from PDB files by typing:
        grep violations refined_input/refined_*.pdb | gawk '{if ($6>0) print}'
        This time the only violations were small dihedral angle constraint violations
        over 5 (but below 10).
*** DONE



--- Settings:
setenv x=1xxx
setenv CNS_BIN  /s/src/cns_solve_1.1/intel-i686-linux/bin
setenv TOPPAR   /u02/jurgen/CESG/Brian/tmoc/RESAMPLE/toppar
setenv CONV_DIR /u02/jurgen/CESG/Brian/tmoc/conversion
setenv SCRIPTS  /u/jurgen/CloneWars/Aart/water_bmrb

## You should change the job_io dirs for sure.
setenv QSUB    '/usr/bin/qsub -q seriallong -o /u02/jurgen/job_io -e /u02/jurgen/job_io -l nodes=1:ppn=1 -S /bin/tcsh -V' 
setenv COORINDIR /u02/jurgen/CESG/Brian/tmoc/RESAMPLE/input
alias aqua 'setenv aquaroot ~jurgen/src/aqua3.2;      source $aquaroot/aqsetup'

--- Nice software on cesg-master some of it in:
/u02/jurgen/bin

jedit 		text editor works over x (in java so slow but doable)
aqua            first type: aqua which should be an alias defined in .cshrc see settings.