------------------------------------------------------------------------
			   PROTEIN DATA BANK
		          Quarterly Newsletter
		      Release #74  - October 1995
------------------------------------------------------------------------
INTERNET SITES

	WWW.....................http://www.pdb.bnl.gov
	FTP (anonymous).........ftp.pdb.bnl.gov
	Gopher..................gopher.pdb.bnl.gov

------------------------------------------------------------------------

OCTOBER 1995 PDB RELEASE

	3821 full-release atomic coordinate entries

	Molecule Type
	-------------
	3469	proteins, peptides, and viruses
	 80	protein/nucleic acid complexes
	260	nucleic acids
	 12	carbohydrates

	Experimental Technique
	----------------------
	 120	theoretical modeling
	 490	NMR
	3211	diffraction and other

Total size of atomic coordinate entry database is 1350 Mbytes uncompressed.

------------------------------------------------------------------------

TABLE OF CONTENTS
		
	What's New at the PDB
	Managing the Archives
		 Proposed Changes
		 HET Groups
		 CIF from PDB Entries
	PDB Format Change Policy
	To be Ala or Not to be Ala? That is the Question
	PDB Structure Factor Files in CIF - A Proposal
		 Examples of SF Files in CIF
	DALI - Server for Protein Structure Database Searches in 3D
	WHATCHECK - Tool for Verification of Protein Structures
	Biotech Protein Structure Validation Server Now On-Line
	Notes of a Protein Crystallographer
		 Professor M. G. Replacement's 65th Birthday
	CIF Workshop at 1995 Annual ACA Meeting
	Order Form
	Affiliated Centers

------------------------------------------------------------------------


WHAT'S NEW AT THE PDB

In order to make it easier to exchange the structure factors, i.e., 
the observed experimental data from X-ray crystallographic experiments, 
the PDB, in close collaboration with a number of macromolecular 
crystallographers, has developed a proposed standard interchange 
format for structure factors. This standard is in CIF, which stands 
for the IUCr-developed `Crystallographic Information File'. It was 
chosen both for simplicity of design and for being clearly 
self-defining, i.e., that the top of the file contains sufficient 
information for the rest of the file to be read and understood by 
either a program or a person. It thus allows for an archival record 
of the data to be stored for future use. This format will be easy 
to extend, just by adding additional tokens to the top of the file, 
as new crystallographic experimental methods are developed.

This project has been lead by Dr. Vivian Stojanoff of the BNL Biology 
Department, who has worked closely with the Chairperson of IUCr Working 
Group on Macromolecular CIF, Dr. Paula Fitzgerald, and other members 
of this committee. Vivian recently presented this standard at the CIF 
Workshop held during the Annual ACA Meeting in Montreal, Canada (see 
relevant articles in this Newsletter - one summarizing the Workshop and 
another presenting details of the format). We encourage all depositors 
of X-ray structures to submit their structure factors and to do so in 
this format. In fact, over the years, the PDB has observed that one of 
the most useful reasons for storing structure factors is for the 
crystallographer who did the experiment to be able to retrieve his/her 
own data which may be misplaced in their laboratory. In parallel, it 
will likely foster a great deal of research into methods of structure 
determination and validation techniques based on comparison of the 
final models versus the experimental data. All of the presently 
deposited structure factors have been converted to this format and 
are available via WWW. In addition, the PDB is providing simple filters 
which convert between this sfCIF format and other prominent formats, 
such as those in PROLSQ, TNT, X-PLOR and the CCP4 package.

						 Joel L. Sussman

------------------------------------------------------------------------

MANAGING THE ARCHIVES

This article inaugurates a regular column that will discuss the 
contents and format of PDB coordinate entries. We plan to use this 
column to alert users of possible changes in our data representation 
procedures. We will summarize implementation schedules, major policy 
decisions, and related items. For example, we regularly receive 
comments and suggestions for improving both the stored data and 
access to it. Some of these suggestions have been incorporated into 
PDB entries, while others are still being evaluated. In the future PDB 
will also provide data in CIF format using the mmCIF data dictionary. 
These and other changes in PDB operations require that we keep our 
users abreast of the issues - this column is intended to do just that.

The PDB is pleased to announce the appointment of Nancy Oeder Manning 
as Outreach Program Coordinator. Included in her responsibilities is 
the task of making sure that questions regarding entry content and 
format are resolved. If you would like to discuss the issues, please 
send e-mail to Nancy at oeder@bnl.gov and/or Enrique Abola at 
abola1@bnl.gov. You may also send a message to a larger audience via 
our listserver (pdb-l@pdb.pdb.bnl.gov).


We suggest keeping a close watch on the PDB Format Description 
document available through our Internet sites (WWW, FTP, and Gopher). 
This document will reflect changes in PDB Format and will serve as 
the reference for all questions regarding the content and format of 
PDB entries.

A common concern is the time frame for incorporating announced changes 
into existing entries. There are several reasons for the delay in 
implementation of changes. The first of these was the need to formulate 
a clear policy on how changes to the entry content and data format are 
to be introduced and implemented. We have developed the protocol 
detailed in the following article. The policy will become effective 
after a short period of public discussion.

 Proposed Changes

Professor George Sheldrick, University of Göttingen, Germany, suggested 
that we use B-equivalent in ATOM/HETATM records instead of U-equivalent 
when anisotropic temperature factors are included in an entry. This 
will obviate the need to check if ANISOU records are present before 
interpreting the contents of the B-value field. A number of entries 
recently deposited have anisotropic thermal parameters. Some of these 
resulted from studies by Keith Wilson and collaborators on very high 
resolution structures. George Sheldrick also deposited a structure, 
refined by SHELX, that includes anisotropic thermal parameters and he 
requested that the B-value field contain B-equivalent instead of 
U-equivalent.

We acknowledge the merit in this suggestion and do not foresee any 
major impediments to its adoption. Since there are only a few entries 
containing ANISOU values that will be affected by this change, at the 
request of Professor Sheldrick, his entry will be released with the 
B-equivalent values.

 HET Groups

The number of heterogen groups stored in the PDB is rapidly growing. 
Users have been requesting improvements in representation and handling 
of these groups. There are a number of initiatives underway in response 
to these requests and suggestions. Most important is the collaboration 
recently initiated with the Cambridge Crystallographic Data Centre 
(CCDC). As part of this work CCDC has provided the PDB with PreQuest, 
the processing program they use in preparing new entries for inclusion 
into their database. This program will help PDB in representing and 
managing this growing data set.

This software allows us to build 2-D diagrams of the HET groups from 
the 3-D coordinates. It automatically generates an atom numbering 
scheme using a standard algorithm. Matching of the 2-D atom names 
to those used in the crystal study is also provided. It then becomes 
less important to follow conventions such as IUPAC when naming atoms. 
IUPAC conventions for atom nomenclature are difficult to use in 
computer-based applications. Attempts have been made to follow 
them over the years; however, as users are aware, violations have 
occurred because of the difficulties involved in representing 
heterogens.

The PDB will keep users abreast of major developments in this area. 
Our WWW server (http://www.pdb.bnl.gov) contains a link to our 
heterogen collection and also to other databases that provide 
additional information about these groups.

 CIF from PDB Entries

The latest macromolecular CIF (mmCIF) dictionary containing definitions 
for items normally found in PDB entries has recently been released by 
the working group commissioned by the IUCr. This dictionary uses the 
CIF standard and is available through PDB's Internet sites (WWW, FTP, 
and Gopher) for study and comment. PDB is mounting a major effort to 
read and write CIF coordinate entry files. Frances Bernstein is 
currently building a table to relate PDB fields to CIF tokens. Once 
this table is ready it will be shared with the community along with 
tools that have been developed for handling PDB entries in CIF.

Comments on this column should be sent to Enrique Abola 
(abola1@bnl.gov) or Nancy Oeder Manning (oeder@bnl.gov).

------------------------------------------------------------------------

PDB FORMAT CHANGE POLICY

The PDB will use the following protocol in making changes to the way 
PDB coordinate entries are represented and archived. The purpose of 
the new policy is to allow ample time for everyone to understand these 
changes and to assess their impact on existing programs. These 
modifications are necessary to address the changing needs of our 
users as well as the changing nature of the data that is archived.

   1. Comments and suggestions will be solicited from the community 
      on specific problems and data representation issues as they 
      arise.

   2. Proposed format changes will be disseminated through the PDB 
      Listserver (pdb-l@pdb.pdb.bnl.gov) and PDB's Internet sites 
      (WWW, FTP, and Gopher). They will also be summarized in the 
      PDB Quarterly Newsletter.

   3. A sixty-day discussion period will follow the announcement of 
      proposed changes. Comments and suggestions must be received 
      within this time period. Major changes which are not upwardly 
      compatible will be allotted up to twice the standard amount of 
      discussion time.

   4. This sixty-day discussion period will be followed by a 
      thirty-day period in which the PDB staff, the PDB Advisory 
      Board, and the User Group Chair will evaluate and reconcile all 
      suggestions. The final decision pertaining to the format change, 
      which lies with the Advisory Board Chair, will then be officially 
      announced via the PDB Listserver and PDB's Internet sites (WWW, 
      FTP, and Gopher).

   5. Implementation will follow official announcement of the format 
      change. Major changes will not appear in PDB files earlier than 
      sixty days after the announcement, allowing sufficient time to 
      modify files and programs.

   6.  Changes will be released no more than twice a year, unless 
       extraordinary circumstances require action. This will be done 
       only in consultation with the Advisory Board and following the
       usual ninety-day discussion and evaluation period.

The PDB format has been in use since the late 1970's. A number of 
groups including the mmCIF Committee have been looking at ways to 
upgrade both the file content and the interchange format used by PDB. 
This is clearly needed due to changes in the data that PDB archives, 
the size of the database itself, and finally, to allow us to use more 
up-to-date methods for representing and storing biological data.

The PDB plans to be prudent and deliberate in making changes to the
current PDB files in order to minimize the need to change existing 
programs. In particular, we will explore ways and means of ensuring 
that programs which read the current ATOM/HETATM records can continue 
to do so in the foreseeable future.

Finally, we wish to acknowledge Dr. Gerald Selzer of the National 
Science Foundation who urged us to formulate this policy.

Any questions, comments, or suggestions should be sent to 
Joel Sussman (jls@bnl.gov), Enrique Abola (abola1@bnl.gov), or 
Dave Stampf (drs@bnl.gov).

------------------------------------------------------------------------

TO BE ALA OR NOT TO BE ALA? THAT IS THE QUESTION.

The SEQRES records in each PDB entry represent the complete sequence 
of the molecule studied. The sequence that appears on ATOM records 
must match the SEQRES records. It is essential that each depositor 
answer the sequence-related questions in the PDB Electronic Deposition 
Form.

One issue that continues to cause problems is addressed in the 
Deposition Form but often not handled correctly in the coordinate 
files that are actually sent to the PDB. Following is text taken from 
the Deposition Form:

	Some procedures force users to identify residues as 
	ALA when side-chain atoms are missing from the density. 
	However, the correct residue name must be used in your 
	deposition, both in the coordinate and SEQRES records.

It is important to recognize the difference between a residue that 
is identified as ALA because it actually is alanine and a residue for 
which the side chain atoms could not be located in the electron 
density maps. In the latter case the residue must be given the 
correct residue name. When these instructions are not followed, we 
have a great deal of difficulty in determining whether a residue that 
does not match the sequence database is a mutation that we were not 
told about, if it is a misnamed ALA (for example), or if there is 
some other reason for the mismatch.

In a similar vein, there is a distinction that must be made between 
the following possible situations:

   1. The atoms of residues 45 - 50 could not be located in the 
      experiment. In this case residues 45 - 50 will be included 
      in the SEQRES records although no coordinates are present 
      in the PDB entry.

   2. A recombinant experiment was done so residues 45 - 50 are not 
      present in the protein and residue 51 forms a normal peptide 
      bond with residue 44. In this case the SEQRES records will not
      include residues 45 - 50 because they were not present.

   3. Residues 45 - 50 are excised from the protein and there are 
      now two chains instead of one. In this case the protein will 
      be presented as two separate chains and residues 45 - 50 will 
      not appear in the PDB entry.

Providing the necessary information on the Deposition Form will allow 
us to prepare entries for your structure more quickly and, most 
importantly, correctly.

------------------------------------------------------------------------

PDB STRUCTURE FACTOR FILES IN CIF  A PROPOSAL

A standard ASCII interchange format for structure factor files based 
on the CIF format has been developed by the PDB in collaboration with 
several macromolecular crystallographers. The work was presented by 
its primary developer, Dr. Vivian Stojanoff of the BNL Biology 
Department at the CIF Workshop held during the ACA Annual Meeting in 
Montreal, Canada. The importance of a standard for the submission and 
retrieval of structure factor data as well as its role in structure 
validation procedures was discussed.

Structure factor files are received by the PDB for archiving and 
redistribution in a variety of formats. These files have to be 
converted by users to the different data input formats required by 
analysis programs. The purpose of the new standard is to provide a 
unique submission and retrieval format for structure factors from the 
PDB. In addition, it will facilitate the interchange of data between 
laboratories. The proposed standard is in CIF and was prepared in 
consultation with Dr. Paula Fitzgerald, Chairperson of the IUCr 
Working Group on Macromolecular CIF. A number of the data names 
presented should be considered provisional, as they have not yet 
been formally approved by the IUCr. However, no substantial changes 
are anticipated in this part of the mmCIF dictionary.

The proposed standard is compatible with most crystallographic 
packages, such as CCP4, PROLSQ, TNT, X-PLOR, SHELXL93, SFTools, 
etc. A parser (filter) which converts this format into other 
standard formats will be distributed by the PDB. The examples 
presented below illustrate the proposed format. Included in the 
standard are provisions to accommodate a variety of data items 
from experiments such as multiple wavelength anomalous scattering. 
A full description and other examples may be accessed through PDB's 
Internet sites (WWW, FTP, and Gopher).

Comments and suggestions are welcome; please send them to Enrique Abola 
(abola1@bnl.gov).

 Examples of SF Files in CIF

  - First Example

This example is one of the simpler possible cases, in which the 
depositor has provided measured and calculated structure factors 
and calculated phases.

Those reflections that have _refln.status set to 'o' were considered 
observed and were used in refinement. The reflection with _refln.status 
set to 'f' indicates that the reflection was considered observed but 
has been excluded from refinement for the purposes of calculating an 
R-free value.

data_r1aaksf

loop_
_database_2.database_code
_database_2.database_id
pdb_sf 'R1AAKSF'
pdb_coords '1AAK'

_audit.creation_date	'1992-04-08'
loop_
_audit_author.name
'COOK,W.J.'
'JEFFREY,L.C.'
'SULLIVAN,M.L.'
'VIERSTRA,R.D.'

#COMPND	 UBIQUITIN CONJUGATING ENZYME (E.C.6.3.2.19)
#SOURCE	 (ARABIDOPSIS THALIANA)
#AUTHOR	 W.J.COOK,L.C.JEFFREY,M.L.SULLIVAN,R.D.VIERSTRA
#REMARK  2
#REMARK	 2   RESOLUTION. 2.4 ANGSTROMS.
#REMARK	 3
#REMARK	 3   REFINEMENT.
#REMARK	 3   PROGRAM 1             X-PLOR
#REMARK	 3   AUTHORS 1	           BRUNGER
#REMARK	 3   PROGRAM 2	           PROLSQ
#REMARK	 3   AUTHORS 2	           KONNERT,HENDRICKSON
#REMARK	 3   R VALUE	           0.22
#REMARK	 3   RMSD BOND DISTANCES   0.025 ANGSTROMS
#REMARK	 3   RMSD BOND ANGLES	   2.8 DEGREES
#CRYST1	 41.800	44.900  83.200  90.00	90.00  90.00  P  21  21   21  4

loop_
_refln.index_h
_refln.index_k
_refln.index_l
_refln.F_meas_au
_refln.F_calc_au
_refln.phase_calc
_refln.status
 0	0	18	553.600	  538.455   180.00	o
 0	0	20	783.700	  781.204    80.00	o
.
.
.
17	3	 4	 21.100	  145.713     23.03	o
17	3	 	 63.800	   81.747    291.83	f
17	4	 0	 52.000	   40.654    270.02	o

  - Second Example

In this example, the depositor has provided measured and calculated 
structure factors, calculated phases, measured sigmas, and three 
sets of heavy-atom derivative data.

data_r1bsrsf

loop_
_database_2.database_code
_database_2.database_id
pdb_sf 'R1BSRSF'
pdb_coords '1BSR'

_audit.creation_date	'1993-04-28'
loop_
_audit_author.name
'MAZZARELLA,L.'

#COMPND	RIBONUCLEASE (BOVINE, SEMINAL) (BS-RNASE)
#SOURCE	BOVINE (BOS TAURUS) SEMINAL FLUID
#AUTHOR	L.MAZZARELLA
#REMARK  2
#REMARK	 2   RESOLUTION. 1.9 ANGSTROMS.
#REMARK	 3
#REMARK	 3   REFINEMENT.
#REMARK	 3   PROGRAM               X-PLOR
#REMARK	 3   AUTHORS               BRUNGER
#REMARK  3   R VALUE	           0.177
#REMARK	 3   RMSD BOND DISTANCES   0.020 ANGSTROMS
#REMARK	 3   RMSD BOND ANGLES	   3.70 DEGREES
#CRYST1	 36.500	 66.700	 107.500  90.00	 90.00  90.00  P  2  21  21  4

loop_
_refln.index_h
_refln.index_k
_refln.index_l
_refln.F_meas_au
_refln.F_calc_au
_refln.phase_calc
_refln.F_meas_sigma_au
_refln.status
 0	0	4	197.8	1351.0	-180.0	 1.9	o
 0	0	6	733.2	 807.0	   0.0	 7.0	o
.
.
.
19	3	1	157.0	 131.2	-145.0	24.2	o
19	3	2	207.6	 199.7	 -11.8	23.5	o
19	3	4	273.8	 374.2	  64.1	18.5	o
loop_
_phasing_mir.der_id
_phasing_mir.der_details
HgCl4 'best derivative'
AgNO3_1 'silver nitrate at low concentration - two sites'
AgNO3_2 'silver nitrate at high concentration - six sites'

loop_
_phasing_mir_refln.index_h
_phasing_mir_refln.index_k
_phasing_mir_refln.index_l
_phasing_mir_refln.der_id
_phasing_mir_refln.F_meas_au
_phasing_mir_refln.F_calc_au
_phasing_mir_refln.phase_calc
_phasing_mir_refln.F_meas_sigma_au
0	0	4	HgCl4	 197.8	 1351.0	  -180.0     1.9
0	0	4	AgNO3_1	 206.7	 1462.0	  -180.0    12.3
0	0	4	AgNO3_2	 367.3	 1551.0	  -180.0    36.7
0	0	6	HgCl4	 733.2	  807.0	     0.0     7.0
0	0	6	AgNO3_2	 856.6	  912.0	     0.0    15.4
	.
	.
	.
19	3	1	HgCl4	 157.0	 131.2	  -145.0    24.2
19	3	1	AgNO3_1	 142.2	 156.3	  -112.0    12.7
19	3	1	AgNO3_2	 168.3	 142.3	  -162.0    29.6
19	3	2	AgNO3_1	 207.6	 199.7	   -11.8    23.5
19	3	2	AgNO3_2	 183.4	 201.9	    15.0    25.2
------------------------------------------------------------------------

DALI  SERVER FOR PROTEIN STRUCTURE DATABASE SEARCHES IN 3D

	This article was written by Chris Sander, European 
	Molecular Biology Laboratory  European Bioinformatics 
	Institute, Heidelberg, Germany. It describes a service
	which may be useful to PDB users.

A database search service allowing exploration of the PDB for structural 
similarities is now available on WWW. Developed by Liisa Holm and 
Chris Sander of the European Molecular Biology Laboratory  European 
Bioinformatics Institute (EMBL-EBI) in Heidelberg, with help and input 
from Antoine Daruvar and Reinhard Schneider, the server takes protein 
structure coordinates and returns a list of all protein structures 
judged as similar by the Dali method. Typically, users may want to:

   - establish whether a newly determined structure has a unique fold

   - find structural homologues not detectable at the sequence level

   - explore the spectrum of currently known protein folds

   - extract remotely homologous structures for testing of threading 
     methods

To request a search, go to the PDB WWW site or go directly to 
http://www.embl-heidelberg.de/dali/dali.html and follow instructions. 
The currently preferred mechanism for submission of search coordinates 
is via e-mail to dali@embl-heidelberg.de.

If a user is interested in knowing the structural neighbors of a PDB 
protein complete with alignment information or would like to see the 
tree organization of all known structures, the Dali search server, 
which is linked to the FSSP database [L. Holm and C. Sander, Nucleic 
Acids Res. 22, 3600-9 (1994); L. Holm and C. Sander, J. Mol. Biol. 
233, 123-38 (1993); L. Holm and C. Sander, TIBS in press (Sept. 1995)], 
holds the results of a continuously updated all-against-all 
comparison of known protein structures in the PDB. For each 
protein in the PDB, its structural neighbors can be retrieved, 
complete with alignment information, and, if needed, protein 
coordinates optimally superimposed in a common frame of reference.

If a user would like to inspect sequence families grouped around each 
protein structure, the Dali server makes use of the continuously 
updated HSSP database of PDB/Swissprot alignments [C. Sander and 
R. Schneider, Nucleic Acids Res. 22, 3597-9,1994)]. For example, 
the Swissprot sequence families (aligned using sequence alignment) 
of two remote homologues (aligned in 3D) can be retrieved phased 
on the 3D alignment, e.g., the sequence families of actin and hsp70.

For further information, contact Liisa Holm at the EMBL-EBI 
(holm@embl-ebi.ac.uk).

------------------------------------------------------------------------

WHATCHECK  TOOL FOR VERIFICATION OF PROTEIN STRUCTURES

	This article was written by Rob Hooft, European Molecular 
	Biology Laboratory, Heidelberg, Germany. It describes a
	tool which may be useful to PDB users.

A new stand-alone version of the protein structure checking tools 
developed as part of the WHAT IF modeling system is now available 
for downloading. The stand-alone version, called WHATCHECK, was 
developed by Rob Hooft and Gerrit Vriend of the European Molecular 
Biology Laboratory (EMBL) in Heidelberg and is being made freely 
available to academic and commercial users alike. Crystallographers, 
NMR spectroscopists, and modelers interested in using the software 
in their own lab should free 80 MBytes of disk space on a Silicon 
Graphics machine and download executable and source code via 
anonymous FTP from Heidelberg (swift.embl-heidelberg.de/whatcheck), 
from the PDB via anonymous FTP (ftp.pdb.bnl.gov/pub/whatcheck), or 
from the PDB WWW site (http://www.pdb.bnl.gov). Versions for other 
systems may be made available if there is sufficient demand.

The methods in WHATCHECK are based on several years of experience 
analyzing protein models and include the following:

   - analysis of packing quality

   - analysis of backbone conformation and rotamers using a 
     position-specific residue database

   - analysis of the hydrogen bond network, resulting in corrections 
     to HIS/GLN/ASN side-chain orientations and HISD/HISE/HISH 
     assignments

   - temperature factor analysis

   - water molecule position checks

   - IUPAC and IUCr convention checks 

   - a number of geometric checks

WHATCHECK functionality is also part of the Biotech Validation server 
available from EBI, PDB, and EMBL.

Questions or suggestions regarding WHATCHECK should be directed via 
e-mail to Rob.Hooft@EMBL-Heidelberg.DE.

------------------------------------------------------------------------

BIOTECH PROTEIN STRUCTURE VALIDATION SERVER NOW ON-LINE

	This article was written by Chris Sander, European 
	Molecular Biology Laboratory  European Bioinformatics 
	Institute, Heidelberg, Germany. It describes a service 
	which may be useful to PDB users.

The Commission of the European Union-funded Biotech partners are: 

	Shoshana Wodak, Joan Pontius, Alexei Vagin, Jean Richelle
		Universitè Libre, Bruxelles, BE

	Keith Wilson, Victor Lamzin
		EMBL-HH, Hamburg, DE

	Chris Sander, Rob Hooft, Gerrit Vriend, Micheal Scharf
		EMBL-HD, Heidelberg, DE

	Janet Thornton, Roman Laskowski, Malcolm MacArthur
		University College London, UK

	Rob Kaptein, Ton Rullmann, Jurgen Doreleijers
		Bijvoet Center, Universiteit Utrecht, NL

	Eleanor Dodson, Gideon Davies, Jan Zelinka, Garib Murshudov
		University of York, UK

These partners, in collaboration with the EBI Outstation of the 
European Molecular Biology Laboratory (EMBL-EBI) and the Protein 
Data Bank, Brookhaven National Laboratory (PDB, BNL), announce a 
new WWW service for validating protein structures. The server takes 
a set of protein coordinates in PDB format and returns a set of 
check reports that evaluate various stereochemical, geometric, 
and physical properties. The evaluation is based on careful and 
systematic comparison with values derived from a database of 
well-determined structures. These checks may be particularly useful 
to crystallographers and NMR spectroscopists in the final stages of 
structure determination prior to submission of model coordinates to 
the public databases. They may also be used to check protein models 
built by theoretical methods, such as homology modeling. 

The service is now available from three sites:

	European Bioinformatics Institute
	EMBL-EBI, Hinxton Hall, Cambridge, UK
		http://saturn.embl-ebi.ac.uk:8400/
		or http://www.embl-ebi.ac.uk/

	Protein Data Bank
	Brookhaven National Laboratory
	Upton, New York, USA
		http://biotech.pdb.bnl.gov:8400/
		or http://www.pdb.bnl.gov/

	European Molecular Biology Laboratory
	EMBL-HD, Heidelberg, DE
		http://www.embl-heidelberg.de:8400/
		or http://www.sander.embl-heidelberg.de/

For further information, contact Chris Sander at the EMBL-EBI 
(Chris.Sander@embl-heidelberg.de).

------------------------------------------------------------------------

NOTES OF A PROTEIN CRYSTALLOGRAPHER

 Professor M. G. Replacement's 65th Birthday

	This article was written by Cele Abad-Zapatero, 
	Abbott Laboratories, Abbott Park, IL, USA. He 
	intends to contribute regularly under this heading. 
	If you have comments or suggestions please contact 
	him at abad@abbott.com.

Nowadays, it is possible that many older practitioners in the field 
of macromolecular crystallography, or even some novices, experience 
a sense of deja vu when reading the methodology section of many 
crystallographic papers. They have encountered many times sentences 
like: `the structure was solved by the method of Molecular Replacement 
as implemented in the program suite [...]....'. Or, in a different 
context: `the initial phases were improved by non-crystallographic 
electron density averaging between the N copies in the asymmetric unit 
[...]'. Yet the ideas, concepts, and even the terminology were terra 
incognita only thirty years ago. I do not intend to explain the method 
here, nor do I intend to statistically show how many macromolecular 
structures have been solved by the explicit or implicit use of those 
ideas. I just want to pay homage to the person who, approximately 
thirty years ago, had the intellectual vision of using the 
conservation of protein folds and the `redundancy' of information 
in the asymmetric unit to aid in the structure solution of 
macromolecular structures.

Professor Molecular G. Replacement was born in Frankfurt, Germany on 
July 30, 1930 and emigrated to England with a member of his family 
when he was nine years old. He obtained his B.Sc. from the University 
of London in 1950, M.Sc. in 1953 and later a Ph.D. in Chemistry from 
the University of Glasgow. He did his postdoctoral research from 1956 
through 1958 with Professor W. Libscomb in Minneapolis. Inspired by a 
lecture by Dorothy Hodgkin to work on the determination of the crystal 
structure of biological macromolecules, he returned to England to work
with Max Perutz on the structure of haemoglobin at the MRC during the 
exciting years spanning from 1958 to 1964. From then on, his ideas, 
programs, structures, and papers have had a tremendous influence in 
the field of macromolecular crystallography.

He married a remarkable woman, Audrey Pearson, in 1954 and they 
have three children (Alice, Martin, and Heather). Many old-timer 
crystallographers will remember the very first `cartoon' sketches 
of lactate dehydrogenase (LDH). Following Anders Liljas' suggestion, 
they were drawn by Audrey to simplify the wanderings of the 
polypeptide chain in space. Those drawings that today are so 
commonplace in all kinds of colors, shades and hues, have their 
origin in those crude hand-drawn diagrams. Incidently, she is 
also a superb potter.

Some may wonder, why this sudden interest in recognizing 
Professor M. G. Replacement. I must confess that his 65th birthday is 
only partly the reason. The idea originated when I recently saw a paper 
in one of the leading journals of our field where a structure had been 
solved by phase refinement using electron density averaging and there 
was no reference to Professor M. G. Replacement. At first, thinking 
it was pretty sad, I decided to write this note. But then, on second 
thought, I realized that in the end it may be of great honor to be 
passed on into oblivion. Do we quote Sir Isaac Newton every time 
we use the principle of inertia? Do we refer to the author of the 
Principia every time we use any of his equations? Certainly not. His 
work is all ingrained within the fabric of our science and our culture. 
Similarly, the work of Professor M. G. Replacement is now, and for 
evermore, part of the framework of macromolecular crystallography.

Taking some literary liberties, I would like to finish this brief 
homage with an adaptation of a well known American folk song. I leave 
it to the reader to find out the first and last names of 
Professor M. G. Replacement, his favorite hobby (I already disclosed 
the name of his wife and partner), and the name of the river that 
crosses the city in Indiana were both have lived since 1964.

	M... sail the boat ashore, Hallelujah
	M... sail the boat ashore, Hallelujah

	Audrey help to trim the sails, Hallelujah
	Audrey help to trim the sails, Hallelujah

	Wabash river is chilly and cold, Hallelujah
	Freezes the body but not the soul, Hallelujah

	Wabash river is deep and wide, Hallelujah
	Milk and honey on the other side, Hallelujah

------------------------------------------------------------------------

CIF WORKSHOP AT 1995 ANNUAL ACA MEETING

	This article was written by Philip Bourne, San Diego 
	Supercomputer Center (SDSC), San Diego, CA, USA.

The Crystallographic Information File (CIF) is a data representation 
and exchange format commonly used in small molecule crystallography. 
Most of the small molecule solution packages can produce CIF files, 
a common form of submission for papers to Acta C. 

With the availability of a macromolecular CIF (mmCIF) dictionary, a 
powder diffraction dictionary, and other dictionaries currently in 
progress (modulated structures, meta graphics, amino acid and 
nucleotide properties, etc.), the need for a Workshop describing 
these dictionaries and new software which uses them was considered 
timely. The Workshop was sponsored by COMCIFs, the IUCr-appointed 
committee which oversees the CIF standard, organized by Phil Bourne 
from SDSC, and held during the 1995 Annual ACA Meeting held in 
Montreal, Canada. Approximately forty people were in attendance.

Bourne opened the Workshop with a brief discussion on the history of 
CIF and described how similar efforts are only now starting to emerge 
in other disciplines. He emphasized the need for the crystallographic 
community to recognize that the high level of formalized data 
description achieved is presently unique among scientific disciplines. 
This offers new opportunities for software development as well as 
better access to a fast growing body of data through the use of 
comprehensive databases.

Syd Hall (University of Western Australia), the founder of CIF, 
elaborated on the history of the Self-defining Text Archival and 
Retrieval (STAR) format from which CIF is derived as well as CIF 
itself. STAR is a simple set of rules from which a Dictionary 
Definition Language (DDL) can be derived. The DDL defines the form 
of the various dictionaries and, as Hall pointed out, is critical 
to the development of good software. He also described CIFtbx (CIF 
Toolbox), a set of Fortran routines for the basic reading and writing 
of CIF files (ftp site 130.95.232.12). He emphasized the need for 
further software development particularly in the reading and browsing 
of CIF files.

Brian McMahon (IUCr) provided insight into CIF processing at the 
IUCr offices in Chester, and the software developed to process 
CIF-based submissions to Acta C. Of the 582 submissions to Acta C 
this year, 76 percent were CIF's that included the text of the paper. 
Any submissions in hardcopy form are first turned into a CIF and all 
submissions undergo data checking and subsequent format conversion 
into an easily-read form. Once accepted, a paper may automatically 
be converted into a final typeset version in the style of Acta C. 
McMahon also reported on three recent developments of great benefit 
to authors. First, is the availability of a booklet entitled `A Guide 
to CIF for Authors.' Second, is the ability to submit a CIF file via 
e-mail to checkcif@iucr.ac.uk. In return, the submitter will receive 
a detailed report of errors and potential errors found. This includes 
syntax errors, data items not conforming to the official dictionaries 
(a potential error), missing data items, data items with unusual 
values, derived values, and an indication that higher symmetry than 
reported may exist. Once the entry has passed checkcif, the CIF can 
be sent to printcif@iucr.ac.uk, the third recent development. Printcif 
produces a PostScript-formatted version of the file with any problematic 
values highlighted. This formatted version is not used by the journal, 
but is easy to read and correct.

Paula Fitzgerald (Merck Sharp and Dohme Research), Chairperson of 
the mmCIF Working Group, presented an overview of the mmCIF 
dictionary and an update of recent progress. The current mmCIF 
dictionary is composed of over 20,000 lines and contains a 
description for several thousand data items organized into over 
300 categories. The categories are further subdivided into groups 
to provide a hierarchical representation which traces the progress 
of the crystallographic experiment and the subsequent structure. 
While the development of the dictionary took five years of volunteer 
effort, it should be acknowledged that the dictionary is comprehensive 
and provides very useful reference work. The draft mmCIF dictionary 
is finished except for the incorporation of changes recognized by 
the community and final editorial checking by COMCIFs. The current 
draft dictionary can be found as an ASCII file at the WWW site 
http://ndbserver.rutgers.edu/mmCIF or on PDB's WWW home page. Also 
present are limited examples and introductory material. The contents of 
the site will be expanded in the near future. A listserver has also 
been established. To subscribe, send an e-mail message to 
mmciflist@ndbserver.rutgers.edu containing the words: subscribe 
mmciflist your-name. Version 1.0 of the dictionary will be available 
within three to six months.

Eldon Ulrich (University of Wisconsin) described progress with the 
NMR dictionary (NMRif). This project began at the April 10, 1994 
Workshop, Biological Macromolecular NMR Data Exchange and Archiving, 
organized by members of BioMagResBank and the mmCIF committee. The 
CIF and mmCIF concept and its development was described to scientists 
from the NMR community and proposed as a format for macromolecular NMR 
data exchange. BioMagResBank, under the direction of Eldon Ulrich, 
agreed to undertake the task of developing the dictionary with the 
help and advice of volunteers from the NMR community. The dictionary 
is represented as a relational database using a schema design tool 
called Opossum. Development of the relational schema has been carried 
out in collaboration with Miron Livny and Yannis Ioannidis, computer 
scientists at the University of Wisconsin. The current dictionary 
contains over 260 tables or categories comprised of more than 750 
unique data names. While no estimate was given for the dictionary 
completion date, it is expected to grow to double or triple its current 
size. It was suggested that Opossum be used as a graphical tool for 
representing and browsing the formidable mmCIF dictionary.

Gotzon Madariaga (University del Pais Vasco, Bilbao, Spain) presented 
the main features of the modulated structures dictionary he is 
developing, which is a superset of all the data items defined by 
the Commission on Aperiodic Crystals. A draft of this dictionary has 
been submitted to the IUCr. Interestingly, he raised several issues 
relating to problems with CIF which had also been noted by the mmCIF 
developers. Notable was the need to provide a reference between related 
blocks of data possibly contained in different files. The lack of 
nested loops in CIF (which are available in STAR) was also raised 
because the lack of these loops makes it more cumbersome to represent 
the data.

The final morning session was given by Bourne and concerned two 
new dictionaries based on CIF. The first was a dictionary to 
process abstracts for the ACA Meeting. Each of seventy CIF abstracts 
was submitted either using a WWW form or by completing a template 
form. The WWW form processing was automatic whereas many of the 
templates had to be hand-edited. Bourne concluded that since it 
was not necessary to collect statistics on the CIF submissions, 
processing them was more effort than was warranted. The second 
dictionary is a meta graphics (mg) dictionary to facilitate data 
exchange between graphics programs. The mg dictionary, while 
preliminary, is designed to capture all the salient features seen 
in the display of a macromolecule such that the biological 
structure/function visualized in the graphics representation are 
preserved and can be loaded into a variety of graphics programs. A 
long-term advantage of this approach is that graphics images are 
easily stored in a database and can be extracted as needed.

Vivian Stojanoff (Brookhaven National Laboratory) began the afternoon 
session with a discussion of an extension to the mmCIF dictionary for 
representing structure factors. While the current structure dictionary 
is not considered complete, it does describe more than the basic 
h,k,l,F and sigmaF found in many existing PDB submissions. Currently 
there is support for data from multiple derivatives and wavelengths.

John Westbrook (Rutgers University) described the Dictionary 
Definition Language (DDL) he developed in response to the needs of 
the macromolecular dictionary. The current small molecule core 
dictionary uses DDL version 1.4. Westbrook developed version 2.1 
which is upwardly compatible with version 1.4 but is much more 
rigorous. While this has little significance to the crystallographer, 
it is important to the software developer who needs a consistent way 
of representing both dictionaries and data files so that one piece of 
software can be used on both. Version 2.1 of the DDL offers that 
opportunity.

Syd Hall returned to describe the program Xtal_GX which provides a 
general approach to processing CIFs and uses the CIFtbx he described 
earlier in the day. It also has a graphics feature for display 
purposes.

The final formal presentation was given by Weider Chang (Columbia 
University) who described the next-generation, object-oriented tools 
he is developing with Bourne for basic CIF manipulation. Chang 
presented the basic layout of the library which is written in 
Objective C and from which browsers, a CIF2HTML convertor, and 
several dictionary checking tools have been developed. The CIF2HTML 
convertor was used in the ACA abstract procedure.

A lively discussion followed the formal presentations pertaining to 
the need of recognizing which subset of mmCIF data items will 
constitute a formal PDB entry. Attendees seemed well satisfied 
with the workshop - perhaps it will lead to development of other 
dictionaries as well as new, useful software.

			BROOKHAVEN ORDER FORM
Name of User	__________________________________	Date	_________
Organization	__________________________________	Phone	_________
Address	        __________________________________	Fax	_________
	        __________________________________	E-mail	_________
	        __________________________________
 
--------------------------------------------------------------------------
- Price is valid through September 30, 1996
- Price is per CD-ROM set released - releases occur four times per year
- Facsimile and phone orders are not acceptable
--------------------------------------------------------------------------

The Protein Data Bank MUST receive all three of the following items 
before shipment can be completed (please send all required items 
together via postal mail - facsimile and phone orders are NOT acceptable):

1. Completed order form;
2. Mailing label indicating exact shipping address; 
3. Payment (using one of the two options below):

 - Check payable to Brookhaven National Laboratory in U.S. dollars and drawn 
   on a U.S. bank. Foreign checks cannot be accepted and will be returned.

 - Original purchase order payable to Brookhaven National Laboratory. After 
   your order is processed, you will be invoiced by Brookhaven National 
   Laboratory. Please indicate exact address invoice should be sent to:
		________________________________________
		________________________________________
		________________________________________
		
A wire transfer is acceptable only AFTER we have received an original purchase
order from your organization and you have been invoiced by Brookhaven. After 
receiving Brookhaven's invoice, your bank may send a wire transfer to:

	   Bank name	   :  Morgan Guaranty Trust Co. of New York
	   Account name	   :  Brookhaven National Laboratory
	   Account number  :  076-51-912

Please send all three required items together via postal mail to:

			Protein Data Bank Orders
			Chemistry Department, Building 555
			Brookhaven National Laboratory
			P.O. Box 5000
			Upton, NY 11973-5000 USA

------------------------------------------------------------------------

	1 Protein Data Bank CD-ROM Set - ISO 9660 Format	$332.26
	    (tax and shipping charges not applicable)	        

------------------------------------------------------------------------

AFFILIATED CENTERS

Twenty-two affiliated centers offer DATAPRTP information for distribution. 
These centers are members of the Protein Data Bank Service Association 
(PDBSA). Centers designated with an asterisk(*) may distribute DATAPRTP 
information both on-line and on magnetic or optical media; those without 
an asterisk are on-line distributors only.

BMERC
BioMolecular Engineering Research Center
College of Engineering, Boston University
Boston, Massachusetts
Nancy Sands (617-353-7123)
sands@darwin.bu.edu
http://bmerc-www.bu.edu/

*BIOSYM
BIOSYM Technologies, Inc.
San Diego, California
Rick Lee (619-546-5536)
rickl@biosym.com
http://www.biosym.com/

BIRKBECK
Crystallography Department
Birkbeck College, University of London
London, United Kingdom
Alan Mills (44-171-6316810)
a.mills@cryst.bbk.ac.uk
http://www.cryst.bbk.ac.uk/PDB/pdb.html/

CAN/SND
Canadian Scientific Numeric Data Base Service
Ottawa, Ontario, Canada
Roger Gough (613-993-3294)
cansnd@vm.nrc.ca

CAOS/CAMM
Dutch National Facility for Computer Assisted Chemistry
Nijmegen, The Netherlands
Jan Noordik (31-80-653386)
noordik@caos.caos.kun.nl
http://www.caos.kun.nl/

*CCDC
Cambridge Crystallographic Data Centre
Cambridge, United Kingdom
David Watson (44-1223-336394)
watson@chemcrys.cam.ac.uk

CSC
CSC Scientific Computing Ltd.
Espoo, Finland
Heikki Lehvaslaiho (358-0-457-2076)
heikki.lehvaslaiho@csc.fi
http://www.csc.fi/

CINECA
NE Italy Interuniversity Computing Center
Casalecchio di Reno (BO), Italy
Laura Setti (39-51-6599478)
asltc0@icineca.cineca.it

ICGEB
International Centre for Genetic Engineering and Biotechnology
Trieste, Italy
Sandor Pongor (39-40-3757300)
pongor@icgeb.trieste.it

EMBL
European Molecular Biology Laboratory
Heidelberg, Germany
Hans Doebbeling (49-6221-387-247)
hans.doebbeling@embl-heidelberg.de
http://www.EMBL-Heidelberg.DE/

INN
Israeli National Node
Weizmann Institute of Science
Rehovot, Israel
Leon Esterman (972-8-343934)
lsestern@weizmann.weizmann.ac.il
*JAICI
Japan Association for International Chemical Information
Tokyo, Japan
Hideaki Chihara (81-3-5978-3608)

*MAG
Molecular Applications Group
Palo Alto, California
Hilary Jensen (415-473-3039)
hilary@suerte.mag.com
http://hyper.stanford.edu/~Mag/

*MSI
Molecular Simulations Inc.
Burlington, Massachusetts
Lance J. Ransom Wright (617-229-9800)
lance@msi.com
http://www.msi.com/

NCHC
National Center for High-Performance Computing
Hsinchu, Taiwan, ROC
Jyh-Shyong Ho (886-35-776085; ex: 342)
c00jsh00@nchc.gov.tw

NCSA
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign
Champaign, Illinois
Patricia Carlson (217-244-0768)
pcarlson@ncsa.uiuc.edu

NATIONAL CENTER FOR BIOTECHNOLOGY INFORMATION
National Library of Medicine
National Institutes of Health
Bethesda, Maryland
Stephen Bryant (301-496-2475)
bryant@ncbi.nlm.nih.gov
http://www.ncbi.nlm.nih.gov/

*OML
Oxford Molecular Ltd.
Oxford, United Kingdom
Steve Gardner (44-1865-784600)
sgardner@oxmol.co.uk
http://www.oxmol.co.uk/

*OSAKA UNIVERSITY
Institute for Protein Research
Osaka, Japan
Yoshiki Matsuura (81-6-879-8605)
matsuura@protein.osaka-u.ac.jp	

PITTSBURGH SUPERCOMPUTING CENTER 
Pittsburgh, Pennsylvania
Hugh Nicholas (412-268-4960)
nicholas@psc.edu
http://pscinfo.psc.edu/biomed/biomed.html/

SEQNET
Daresbury Laboratory
Warrington, United Kingdom 
User Interface Group (44-1925-603351)
uig@daresbury.ac.uk

*TRIPOS
Tripos, Inc.
St. Louis, Missouri
Akbar Nayeem (314-647-1099; ex: 3224)
akbar@tripos.com

------------------------------------------------------------------------

Protein Data Bank
Chemistry Department, Bldg. 555
Brookhaven National Laboratory
P.O. Box 5000
Upton, NY 11973-5000 USA

------------------------------------------------------------------------

TO CONTACT PDB

Telephone	516-282-3629
Facsimile	516-282-5751

	Internet:

	pdb@bnl.gov.....................general correspondence
	orders@pdb.pdb.bnl.gov..........order information
	sysadmin@pdb.pdb.bnl.gov........network services
	listserv@pdb.pdb.bnl.gov........Listserver subscriptions
	pdb-l@pdb.pdb.bnl.gov...........Listserver postings
	errata@pdb.pdb.bnl.gov..........entry error reporting

Please include your name, postal mailing address, e-mail address, 
facsimile number, and telephone number in all correspondence.

------------------------------------------------------------------------

INTERNET SITES

	WWW.....................http://www.pdb.bnl.gov
	FTP (anonymous).........ftp.pdb.bnl.gov
	Gopher..................gopher.pdb.bnl.gov

------------------------------------------------------------------------

STATEMENT OF SUPPORT

PDB is supported by a combination of Federal Government Agency 
funds (work supported by the U.S. National Science Foundation; 
the U.S. Public Health Service, National Institutes of Health, 
National Center for Research Resources, National Institute of 
General Medical Sciences, and National Library of Medicine; and 
the U.S. Department of Energy under contract DE-AC02-76CH00016) 
and user fees.

------------------------------------------------------------------------

PDB STAFF

Joel L. Sussman, Head
David R. Stampf, Sr. Project Mgr.
Enrique E. Abola, Science Coordinator
Jaime Prilusky, Interim Head Database Dev.

Frances C. Bernstein
Judith A. Callaway
Minette Cummings
Betty R. Deroski
Pamela A. Esposito
Arthur Forman
Patricia A. Langdon
Michael D. Libeson
Nancy O. Manning
John E. McCarthy
Regina K. Shea
Janet L. Sikora
Karen E. Smith
Dejun Xue

------------------------------------------------------------------------
------------------------------------------------------------------------