0.  Data Collection and Processing:  


Because the size of a raw data-set is rather large (on the order of a gigabyte even for a highly symmetric crystal), downloading and processing data is too painful to contemplate. It is also of limited pedagogical value unless you actually grow a crystal, mount it, and collect the data yourself.  Therefore we will start with the processed data.  If you would like to see what software is involved in this process, take a look at the following programs (from CCP4) and their documents:


mosflm
Used for determining the space group and cell parameters and for indexing the diffraction pattern, and integrating the intensities of each individual spot, and recording these for an entire data set in a single file.
scala
The data are recorded on between 10 or 20 to 100 or more separate imaging plates or CCD exposures. These must be put on a common scale, accounting for fluctuating beam intensities, etc.  Symmetry-related reflections are in the process compared and their differences are calculated for reflections recorded on the same exposure (Rsymm) and on different exposures (Rmerge; Rcryst).
truncate
The scaled intensity data are reduced to an asymmetric unit of reciprocal space, and then the square roots of the intensities are calculated, yielding reflection amplitudes  F(hkl) and their associated error, sigmaF(hkl), and corresponding anomalous differences between Bijoet pairs and their associated estimated error.

This is done for each data set.  As examples, you can download my hammerhead ribozyme native data set as well as two different Bromine derivative data sets, which for obscure reasons I called derivative 123 and derivative 124.  There are no significant X-ray absorbering atoms in the native data set, but Br absorbs X-rays quite significantly at the wavelength used to collect these data sets (0.88 A if memory serves). Therefore both derivatives contain useful anomalous differences as well.  Anomalous differences reported in the native data set are essentially noise.

The data you just downloaded is in the form of a tarball, i.e., you have to issue the commands:

% gunzip all_data.tar.gz ; tar xvf all_data.tar.gz

or on some systems, simply

% tar xvfz all_data.tar.gz  

and you will get three files, each having the suffix mtz.  I called the native data set
 nomnph5.mtz and the two derivatives     124_trunc_anom.mtz     and    R_123_trunc_anom.mtz


You can look at the contents of these binary files using the ccp4 utility mtzdmp, eg:

% mtzdmp nomnph5.mtz
Do this to familiarize yourself with how the data looks.

Next:  Scaling Native data to derivative data

Back to previous page



Click here for web site index