next up previous
Next: References Up: Technique Previous: Electrophoresis

Basecalling

There are quite a few steps which need to be completed between original DNA sequence and decoded sequence. We will briefly describe what is involved in each of them.The following is a schematic presentation of the main steps.

Model Basically, at the end of the day, we want to end up with easy to decode sequence. We would like the peaks to be evenly spaced, approximately the same height, and the same baseline. We will see that it is possible to get ``nice--looking'' peaks in the middle region (100bps--500bp) but decoding becomes increasingly more difficult in the later parts of the read, and in most cases it is hopeless after about the base 800. There are also may be some problems in the first 50 bps. The following is an example of what the decoded signal looks like. We can see that we can easily read off the peaks up to about scan 400, but after that it becomes less clear where the real signal is.

What confuses the signal?

  1. Fragment formation
    The primer extension reaction may not go smoothly, There may be false stops, i.e. at certain points, replication will just stop without the terminator being incorporated, or conversely, several terminators accumulate at the same position. The relative concentrations of ddNTPs and dNTPs may not have been right and we get too many short or long fragments. While DNA moves down the gel, secondary (e.g., hairpin) structures may form and change the mobility properties of the DNA fragments.
  2. Convolution
    The peaks for fragments of length, say 299 and 300 are well-separated, however the peaks for longer lengths may not be separated so well. The movement of DNA down the gel is a stochastic process, and the longer expected time it takes to get from one point to another, the more variable this time is. At some point along the sequence, the variance of the passage time becomes so large that the peaks completely merge.

  3. Cross Talk
    The fluorophores employed in the four-dye sequencing strategy have distinct emission spectra. These spectra exhibit significant overlap (see the next figure). Hence there is a need for a transformation to recover the relative concentrations of the four dyes from the fluorescence intensities measured at 4 different wavelengths.

    Cross-Talk

  4. Measurement error
    White noise can originate from several sources, including background noise, detector noise and other noise from the operating environment. Another type of noise encountered is low-frequency variation due to slow changes in the background light level during collection. Such variations may be caused by deformation of the gel due to heating, the formation of bubbles in the path of the laser, variations in laser output power and other systematic changes in the environment.

What's involved in basecalling? Decoding is the last step of it; btu, it has to be preceded by several data processing steps. These pre--processing steps vary from one basecalling algorithm to another, but the main features are the same, and they are necessary for successful basecalling.



next up previous
Next: References Up: Technique Previous: Electrophoresis



Simon Cawley
Wed Apr 22 15:50:11 PDT 1998