All required steps and software documentation for doing a refinement in water on an already good structure. Make sure the settings at the bottom of this document are defined. *** START *** Got new topology and parameter files from HIC-Up database in Sweden Contains definitions for Fe2S2 cluster. Called them fes_xplor_par.txt and fes_xplor_top.txt Adjusted many angles to 180 and 0 from measured values in order to impose perfect symmetry. For most parameters the averaged values observed in Milo's calculation were imposed. *** Using protocol from Aart Nederveen to generate an xplor psf/mtf file This protocol actually uses Chris Spronk/Jens Linge's code *** Split the pdb file into multiple files using my code: splitpdb_new $x.pdb This assumes nawk is present or linked to gawk (as on cesg-master) and x is a variable set to 1xxx which will be the project name. The input PDB file at this point is molmol's output without pseudo atoms. Doesn't need to have protons because they will be generated. *** Renames a bunch of atom names to xplor convention. $SCRIPTS/pdbcns/convert.py -convertpdb $x"_1".pdb PDB out.pdb XPLOR *** Rename the atom names in the FES cluster: sed -e 's/FE HIS 67/FE1 FES 113/' out.pdb \ -e 's/FE CYS 45/FE2 FES 113/' \ -e 's/SE CYS 45/S1 FES 113/' \ -e 's/SE HIS 67/S2 FES 113/' > $x"_template".pdb *** Added topology and parameters to the same file (topallhdg5.3.pro). Don't want to bother with specifying multiple files for them. *** Using xplor-nih to create psf topology file using generate_tmoc.inp script but needs parameter and topology files specific for fes cluster. Creating one takes a lot of time. Getting the patches in order takes even more time. Don't forget the cis-peptide planes patch if needed in your protein. xplor < generate_tmoc.inp > generate_tmoc.log This creates the psf file. It's a good idea to check the energy so the parameters get checked too. Result: 1xxx.psf protein structure file (no coordinates) *** Strip the protons and generate them anew, which is apparently easier than renaming them. The way to do this is by reading the psf file 1xxx.psf just created. Reading all atoms that already coincide which are all but the cluster atoms and many protons. Keep the coordinates of the known atoms just read and generate coordinates for all unknown atoms. This way the coordinate file doesn't need to be corrected too much. generateProtons.csh can be run on cluster like: $QSUB $CONV_DIR/generateProtons.csh It takes about 30 seconds per model so 10 min for 20 models. This could of course be done in parallel. Output can be found in directory: /u02/jurgen/job_io (define your own) in a file named after the job id and output stream type error ER or output OU. The output is very async-ed so wait for it. *** Rename the files from 1xxx_cns_99.pdb to 1xxx_99.pdb etc. A simple script for doing this is called renameFile.csh Note that this will overwrite the files generate before by splitpdb_new. *** Copy the converted files to the water refinement input dir: \cp *_*.pdb $COORINDIR \cp *.psf $COORINDIR/../psf *** Convert the experimental data like upper/lower bounds and dihedrals. (aqua is installed and compiled on cesg-master) type: 'aqua' alias (see settings) type: 'qproject tmoc' type: 'qconvert -t 4 tmoc.upl' this specifies the input format is dyana (format 4) it will create an aqua output file tmoc.noe among other files. replace all + and - signs with a space in a text editor or do something like (so negative angles don't get inverted when you use this for angles): sed 's/[+-] / /g' tmoc.noe > t mv t tmoc.noe type: cp ../conversion/1xxx_1.pdb . To convert the aqua data to xplor data type: aq2x xplor tmoc.noe tmoc.tbl 1xxx_1.pdb Move and rename to the input dir for structure calculation: cp tmoc.tbl ../RESAMPLE/tables/1xxx_noe.tbl And for dihedrals: convXPLORtorsrestr.awk tmoc.aco > tmoc_dihedral.tbl (this 31 line script resides in my bin dir) Edit manually for deviating format and move to tables dir like the NOEs. *** To check the data coordinates versus constraints type Use the complete pdb file with all models by first creating it: joinpdb -o 1xxx_xplor.pdb 1xxx_[0-9]*.pdb and then aqpc -r tmoc -xplor -r6sum 1 1xxx_xplor.pdb I found in the summary file: 1xxx_xplor.nsm Largest UPPER BOUND violations: _ THR 70 HG1 _ ASN 80 HB1 0.56 (model 10) _ CYS 84 HA _ CYS 85 HN 0.44 (model 10) _ LYS 103 HA _ GLY 104 HN 0.42 (model 12) Note that the order of the models in the ensemble PDB file is alphabetical on the number and not sequential as expected (I would have done this differently if I developed it). *** Run the prelim procheck/whatif analyses In file refine.pars set to 1 structure to test. NINPUTSTRUCTURES = 1 Set code in nmr_waterrefine.py to exit after these checks. python2 nmr_waterrefine.py *** When all is successfull rerun it with the full number of structures (20 in tmoc case). *** Run prelim refinement on 2 structures or so with small number of steps. I found this time that the structures explode during the phase where the temp is high and the dihe terms are scaled up by factor of 3. This phase is called hot. We'll it is. Trying to find out what is causing the problem. So far tried: - lowering the max temp to 300 - downsizing step from 4 to 1 fs - now trying structures without fes group and with normal protonation on them. that works without explosion twice already. Ok, turns out I need to use the IMPRoper forces in stead of DIHEdral. The calculation was instable with dihedral terms of 750 times the factor of 3. The impropers aren't scaled with this factor 3 in the protocol and is stable. Set everything back to original protocol settings. *** Adjusting the number of structures back to the full number (20) and reran the refinement with the normal number of steps. *** Get statistics from the summary files: analyzed_input/summary_analyzed.txt *** Get statistics from PDB files by typing: grep violations refined_input/refined_*.pdb | gawk '{if ($6>0) print}' This time the only violations were small dihedral angle constraint violations over 5 (but below 10). *** Get statistics from PDB files by typing: grep violations refined_input/refined_*.pdb | gawk '{if ($6>0) print}' This time the only violations were small dihedral angle constraint violations over 5 (but below 10). *** DONE --- Settings: setenv x=1xxx setenv CNS_BIN /s/src/cns_solve_1.1/intel-i686-linux/bin setenv TOPPAR /u02/jurgen/CESG/Brian/tmoc/RESAMPLE/toppar setenv CONV_DIR /u02/jurgen/CESG/Brian/tmoc/conversion setenv SCRIPTS /u/jurgen/CloneWars/Aart/water_bmrb ## You should change the job_io dirs for sure. setenv QSUB '/usr/bin/qsub -q seriallong -o /u02/jurgen/job_io -e /u02/jurgen/job_io -l nodes=1:ppn=1 -S /bin/tcsh -V' setenv COORINDIR /u02/jurgen/CESG/Brian/tmoc/RESAMPLE/input alias aqua 'setenv aquaroot ~jurgen/src/aqua3.2; source $aquaroot/aqsetup' --- Nice software on cesg-master some of it in: /u02/jurgen/bin jedit text editor works over x (in java so slow but doable) aqua first type: aqua which should be an alias defined in .cshrc see settings.