Step 1
By Vinzenz Unger and Anchi Cheng
Optical diffraction, image acquisition and preparation, determination of lattice parameters, calculation of unit cell dimensions and magnification, calculation of a filtered image
Optical diffraction
Objective: Evaluation of image quality, determination of tilt parameter (see step 5)
If illuminated by coherent light (e.g. laser beam) the images of crystals act as diffraction gratings. Sharp and bright spots that extend far and evenly in all directions are the hallmark of well-ordered crystals. In contrast, large and diffuse diffraction spots or a pattern that only shows a few spots in each direction are signs of disorder. Such crystals may not be suitable for further processing. To obtain reliable results and to avoid potential complications at later stages in the processing the following points are important:
• Analyze the micrograph with the emulsion facing towards the detector (e.g. in a vertical setup this corresponds to the emulsion side facing up).
• Move the micrograph around to identify the largest coherently diffracting area. Ideally, there would be only one coherent area. However, in praxis the following may happen: observation: spots “jump” while the micrograph is being moved --> more than one independent crystalline area is present observation: defined spots of multiple lattices are visible -->many small areas or stacked 2D- crystals remedy: decrease size of aperture at optical bench to figure if any one part of the image can be used
• Identify the best area by finding the part of the image for which number, sharpness and intensity of visible diffraction spots is maximal. Clearly mark centre of area by placing marks at the edge of the image. If an image contains two or more good areas that do not overlap, mark and treat them as independent “images”.
• Scanning the best area for further processing sometimes will require including unwanted parts of a neighboring crystalline patch. Keep the “contaminating” area to a minimum. This will be impossible if the two areas are stacked. If the layers are both good, estimate the number of spots that overlap. Do not proceed if this is a significant number because it will be impossible to combine the data.
• Keep a record of the approximate resolution limit observed by optical diffraction. Image processing should provide data to about twice this limit. If this is not the case some processing parameters may need adjustment.
Mainly upon analyzing images of tilted crystals one may encounter an uneven distribution of spots along a preferential axis. Such anisotropic diffraction can be caused by a lack of specimen flatness, specimen movement during the exposure or splitting of spots if the specimen tilt is high. While the first two problems cannot be overcome computationally the splitting of spots observed at high specimen tilts can and needs to be corrected for in the image processing procedure. Accordingly, most images showing anisotropic diffraction (especially at lower specimen tilts) are not suitable for further processing.
Image acquisition
• Calibrate the scanner / microdensitometer using a blank (unexposed) part of the micrograph if possible.
• Scan only the best parts of micrographs to keep files reasonably small. Image sizes up to about 6000x6000 pixel can be handled quite comfortably if availability of disk space is not a problem. MRC processing software requires square images as input. This makes arrays in increments of 1000 pixels a convenient choice.
• Scan micrographs with the emulsion side facing the detector (to stay consistent with orientation of film in microscope and optical diffraction). For instance, on a flatbed microdensitometer that has the light source underneath the scanning stage and the detector above the stage the emulsion side should face up.
• Sample densities at at least 1/2 to 1/3 of the frequency required to retain the desired resolution at a given magnification, i.e.
stepzise [µm] ≤ (magnification/ 10.000) * resolution [Å]/2 (1)
• Typical magnifications are at ~60.000x for images of a specimen that diffracts to high resolution. This requires scan step sizes of 5-7µm to retain all the information. In these cases the number of unit cells represented in the scanned part of the image quickly becomes too small for the amplitudes of the high-resolution terms to rise above the background noise of the image (i.e. no phase information can be obtained). This emphasizes why the best area of the micrograph should be chosen carefully.
• Keep a record identifying the corners for start and end of the scan at least initially to be able to identify unwanted changes in the orientation of the image during the acquisition process.
• Retrieve binary image file, convert to MRC-format and visually check for scan artifacts (e.g. purely white or noisy “ghost lines”). Displaying files with MRC-format can be done with the software package XIMDISP (available from the MRC upon request)
Depending on the type of scanner, different routines will be required for the conversion to MRC-format. In cases where the scanner output are transmission values a conversion to optical densities needs to be done before proceeding any further. Precisely what is required needs to be determined by the local user. Furthermore, since machine type and display software will differ between laboratories it cannot be emphasized enough that the apparent orientation of the displayed image should be the same as the orientation of the image on the scanner. For instance, assume that the micrograph is scanned along a normal x,y coordinate system (i.e. x going to the right and y going up). In this case the origin (0,0) of the displayed image should be in the bottom left corner and the last pixel scanned (x,y) should occupy the top right corner on the display. In other words, any two corners on the display must correspond to the same corners on the selected area. However, initially this may not be true in most cases. The MRC-package uses the convention that the image origin (0,0) is at the bottom left corner. Accordingly, the image may need to be rotated or flipped to meet the program requirements. Otherwise, some rules have to be established to determine the correct handedness later on in the processing. Scanning some easily identifiable patterns as test cases is helpful to assure the correct orientation and that the handedness of objects is not altered during the acquisition steps.
Image preparation and determination of lattice parameters
Before submitting the image to the main protocol some “cosmetic” operations should be performed to prepare the image.
• Create a smooth gradient of optical densities around the edges of the image (program: TAPEREDGE). This is necessary because in many cases the optical densities along opposite sides of the image are significantly different. In the calculated transform this leads to pronounced spikes, which may overlap, with some of the low-resolution terms. Any overlap will make it impossible to extract correct amplitude and phase values for these Fourier terms.
• Generate a histogram of the optical densities (program: HISTO) and eliminate extreme densities caused by dust particles, scratches in the negative or unexposed areas (program: LABEL). This step is optional
• Create a smaller copy of the original by pixel averaging if the image is 3000x3000 pixel or larger. This step does not influence the accuracy of the main processing routines but saves disk space and speeds up the steps that can be performed on a reduced copy of the image. In praxis 2x2 pixel averaging (program: LABEL) is advisable and sufficient.
• Calculate the Fourier transform of the tapered and reduced (or tapered original) image (program: FFTRANS) to provide the input for the calculation of a filtered image and to allow the determination of the lattice parameters.
• Display the transform and index several diffraction spots. Avoid reflections that are dispersed or do not fit the lattice well because including these may enforce a worse overall fit for the lattice parameter which does not serve the purpose of this step.
Generally, spots to as high a resolution as possible should be included in the indexing as long as they are unambiguously determined (see Fig.1). Indexing the Friedel mates of the chosen reflections helps to minimize the offset of the fitted lattice from the true transform origin.
• Determine the lattice parameters as precisely as possible. For instance, if ~20-40 spot coordinates are used to fit the lattice vectors (menu option in XIMDISP), the root mean square error for the fit should not exceed 0.5 pixel.
This accuracy is critical to be able to detect and retrieve the weaker high-resolution data. However, it is often impossible to achieve this accuracy for the transforms of uncorrected images especially if the specimen does not diffract to near atomic resolution. In this case a preliminary run through the processing routines under less stringent conditions is helpful to get more precise lattice parameters. Root mean square errors up to about 1- 1.5 pixels are acceptable for this preliminary run. The rough correction of the image usually will sharpen the high-resolution spots enough to allow them being included in the indexing. It is advisable to display the final lattice and check that most of the spots fall onto their predicted positions.
• Make a list of spots suitable for calculating a filtered image while inspecting the outcome of the final lattice fit. Use only those terms for which a “perfect spot” (like shown in Fig.1a) centered around its predicted position would include a significant part of the observed spot (see Fig. 2). This is important if the explicit protocol is used that is given later.
• For reasons outlined in the more detailed explanation for setting the program parameter in MASKTRANA it may not be advantageous to include spots beyond 15Å unless their intensities are high.
Calculation of the unit cell dimensions and magnification
Besides serving as input for subsequent programs the lattice vectors are also used to calculate the unit cell dimensions, the precise magnification of the image and where applicable the overall tilt geometry.
For a new specimen the first step is to determine the real space unit cell dimensions. The transform coordinates for the (1,0) and (0,1) reflections allow to calculate the length of the reciprocal space vectors (a* and b* respectively) which reflects the maximal number of unit cells along these two directions. In XIMDISP this number is given explicitly. If initially the nominal magnification at which the image was recorded is used as estimate then the unit cell size can be calculated as follows:

Similarly the length of the b-axis can be calculated by using the b* value instead of a*. The reciprocal space (γ*) angle needs to be accounted for if different from 90˚ because a* and b* are projections of the real space axes. If the two unit cell axes are not the same the smaller is usually identified with the a-axis (i.e. in reciprocal space this is the longer axis).
Once a reliable average has been determined from several images, above formula can be rearranged to adjust the magnification for all images. The precise magnification is required for a simulation of the so-called contrast transfer function that affects the phase data.
Two things need to be pointed out at this stage. First, the magnification of the microscope should be calibrated by taking some images of a test specimen with known unit cell dimensions (e.g. purple membrane or catalase) and second, equation (2) only holds for specimens with tilts below 30˚. For images of crystals at higher tilts a scale factor needs to be included to correct for the observed distortions of the lattice vectors. The required scalefactor is calculated by the program EMTILT, along with the tilt geometry. To calculate the magnification the following equation may be used:

Calculation of a filtered image
Objective: Identify best centre coordinates for future reference area
This step takes advantage of the fact that the information about the structure of the protein is contained in discrete diffraction maxima. Consequently, most of the noise can be removed by placing mask holes around the diffraction spots in the transform and zero all remaining values (program: MASKTRANA). The backtransform (program: FFTRANS) yields a filtered image that shows the object much more clearly (see Fig.3). For an ideal and infinite lattice all of the diffraction spots would be so called “delta functions”, i.e. peaks occupying a single pixel only. However, due to the finite size of the image and lattice disorder the real peaks are spread over several pixels. Accordingly, the outcome of the filtering step will largely depend on the size and type of the mask hole used. The smaller the maskhole radius the “nicer” the image will look since decreasing the radius excludes information about the lattice distortions (see Fig. 3). This immediately suggests that distortions can be identified by “comparing” (= cross-correlating) a tightly masked transform (= reference) with a loosely masked transform that still contains all the information about irregularities in the lattice. However, to get the best results one first needs to determine the xy-coordinates that will serve to centre the reference area. If the optical diffraction was done carefully the centre will be close to the actual centre of the digitized array. Nevertheless, checking a filtered image prior to executing the main script is often beneficial, especially for specimens that are not perfectly ordered.
• Tightly mask the transform to produce a mostly uniform image.
• Choose the reference area (1/10 - 1/20 of the size of the filtered image) and keep a record of centre-coordinate for later use.
Correction of lattice distortions, extraction of raw image data and correction of phases for the effect of the contrast transfer function (CTF)
The steps that are necessary to generate the basic image data are outlined in Fig.4. As mentioned in the previous paragraph lattice distortions can be identified by cross-correlation procedures between a reference area and the original image. Most conveniently this is done in reciprocal space (program: TWOFILE) because in this case it amounts to a multiplication of the loosely masked transform of the image with the complex conjugate of the transform of the reference area (see REF for more detailed explanation). Backtransformation of the cross-correlation map is the first step to translate its information into real space disorder parameters. Consecutively, all unit cells can be searched for the exact position (= offset) of the object and the degree of similarity with the reference area (program: QUADSERCHB). In practical terms this is achieved by first correlating the object (or part of it) with itself. This so called autocorrelation map is calculated from a small part of the reference area and its centre appears as a more or less symmetrical peak (see Fig.5) whose shape depends on the actual distribution of Fourier terms in the image (program: AUTOCORRL). Finding the best fit of this central peak and the cross-correlation peaks for the individual unit cells provides the length and orientation of the distortion vectors and goodness (=height) of correlation. It should be mentioned that QUADSERCHB only determines the translational offset for the cross-correlation peak but does not take into account any rotational disorder. Once the distortion vectors are known, the original image can be corrected by re-interpolating its optical densities to bring the unit cells onto a “perfect” lattice (program: CCUNBENDD).
From this outline it becomes clear that the quality of the reference area is the main factor to the success of the “unbending” procedure because any disorder present in the reference will remain uncorrected. Accordingly, improvements in the quality of the reference area by successive passes of processing make the determination of the lattice distortions and consequently, the correction of the image more accurate. In most cases the result will not get any better after 2-3 passes of image filtering and lattice unbending. However, a further improvement can be obtained once a 3D model is available, because in this case a “perfect” reference area can be created de novo by calculating a back projection of the model for the “precise” imaging conditions (program: MAKETRAN)
A second factor, which greatly influences the quality of the data, is the image area that is used to generate the final transform. As mentioned above, the cross-correlation provides a measure how well each individual unit cell corresponds to the motif defined by the reference area. Choosing an appropriate cross-correlation cutoff for boxing (program: BOXIMAGE) the best image area has a large impact on the data quality and resolution. In our experience boxing is particularly important if a significant number of unit cells shows cross-correlation levels of less than 50%. This is more likely to occur for specimens that are only ordered to intermediate resolutions (5-10Å) and for thick specimen (> 150Å). However, it should be mentioned that choosing too small an area will result in a number of random reflections at higher resolution that appear to have a significant signal-to-noise ratio. This effect can be minimized by insuring that the boxed area still contains several thousand copies of the molecule. In our experience coherent areas of ~4000-5000 molecules with cross-correlation values of ≥75% are sufficient to provide reliable data in the 5-10Å regime.
Once the best area was selected and transformed the program MMBOX is used to extract raw data. The output is a file that lists the amplitude, phase, quality, background and a dummy column used in the next step for each of the unique (h,k) reflections. The quality of each measurement is based on the signal-to-noise ratio and is expressed as “IQ” value. Measurements with a signal-to-noise ratio of at least 8 have an IQ=1 whereas an IQ=8 indicates that the measurement was above the background noise but only within the standard deviation of the pixel intensities around the predicted peak position.
The next step consists in an initial correction of the phase data for the effect of the contrast transfer function (CTF; program: CTFAPPLY). The CTF is a modulation of the scattered waves by the objective lens, which results in periodical contrast reversals across the image (see Fig.6). In reciprocal space this corresponds to a phase shift by 180˚ in certain parts of the transform. Because this modulation will be different for each image (due to their different amounts of underfocus) the phases must be corrected to allow the combination of data from several images. The visual manifestation of the CTF is a characteristic pattern of alternating light and dark bands (= Thon rings) in the diffuse diffraction patterns of amorphous materials. The dark areas correspond to frequencies that are not or only poorly transferred and hence do not contribute to the formation of the image. Vice versa, good transfer is achieved in the bright areas. This phenomenon can be used to determine the defocus, which then allows simulating the modulation and to correct for the CTF imposed contrast reversals.
The initial estimate of the amount of underfocus should be as accurate as possible to avoid potential problems later on. This is particularly true for thick specimens where the molecular transform is not as constrained. Errors in the initial underfocus estimate can make the buildup of a three-dimensional data set a lot more difficult in these cases. Using the nominal underfocus at which the image was recorded is not advisable because the actual underfocus will often be significantly different. A more accurate estimate can be obtained from the Thon ring pattern if this is visible in the calculated transform.
• Determine if the image shows astigmatism. In this case the concentric nodes in the Thon ring pattern appear ellipsoidal instead of round.
• Determine the transform coordinates of a point within the first zero of the pattern (see Fig.6). If the image is astigmatic determine a point on each of the principal axes of the ellipse.
• Calculate the length of the reciprocal space vector that connects the chosen point(s) with the transform origin: l = √x2+y2 ; where x and y are the coordinates read off the transform.
• Convert the length of the vector into reciprocal Å

• Calculate a set of reference curves (program: CTFCALC; see Fig.6, upper panels) and match against the value obtained from equation (4). The amount of underfocus that places the first zero closest to this value is the starting estimate.
• Correct the phase data by running CTFAPPLY and store both the raw and CTF-corrected data lists for future use.
It is important to keep the original list of raw data since refinement of the initial underfocus values is inevitable at a later stage of the data handling.
Multiple passes of image filtering and lattice straightening
There are two major options how to perform a second pass of filtering and unbending. In our hands the best results are achieved if the corrected and boxed image is used to generate the reference area while the masked transform of the simply unbend but unboxed image provides the counterpart for the cross-correlation. In this case the full sized, already corrected image will be unbend a second time and the final data are extracted from its transform after boxing off the best area. A protocol for this procedure can be obtained from out ftp-site (“JOBB.com”). However, generating the reference from the boxed and corrected image and using the uncorrected image for the cross-correlation is equally possible. Which option works best for a particular specimen will have to be determined empirically. Differences between these two approaches are likely to occur in cases where the specimen is ordered to intermediate resolutions only.