genome sequencing project
There have been many debates of how to gain the essential sequence information of large eukaryotic genomes such as those from mammalian and plant genomes. A relative straightforward approach is to isolate mRNA from different tissues at different ages and sequence single reads of cDNAs, also called Expressed-Sequence-Tags or ESTs. While this approach rapidly leads to the discovery of genes and when they are expressed at what levels, it lacks the genomic positional information, the gene-flanking regions containing regulatory sequences, and the actual gene number because of gene amplification and genes that are expressed in only a few cells and at low levels.
2) Whole-Genome Shotgun Sequencing
An alternative has been the whole-genome shotgun sequencing approach, which is not biased against rare versus abundant expression of genes. This approach has been attractive because the actual sequencing can be accomplished in a short period of time. It can easily be scaled because all clones can be sequenced in parallel. The major challenge, however, is the assembly of these sequences into contiguous sequence information and to anchor them to the genetic map. While this approach works well for genomes with a low content of repeat sequences, it is more challenging for genomes with a high content of repeat sequences. Key to these genomes has been the use of sequence mates with distances of 2-3 kb, 8-10 kb, and 150 kb, also serving as Sequence-Tagged-Connectors (STCs) in the latter case. In addition, the 150 kb clones (BACs) have to be anchored to the genetic map to provide a chromosomal framework of clones (see also next section). Still, STCs contain many repeat sequences of transposable elements and multigene families, generating many erroneous results unless other supporting data can be added.
3) BAC-by-BAC Sequencing
A third approach has been to sequence overlapping BAC clones, also referred to as a minimum tiling path (MTP). Similar, like the previous case, the BAC clones are anchored to the genetic map. A great aid to anchoring the clones is a process called DNA fingerprinting. If a BAC library contains genomic fragments that covers the genome several times over, then many BAC clones share restriction patterns in common. Based on common restriction patterns tiles of overlapping BAC clones can be assembled into fingerprinted contigs or FPCs. For instance, a 23x BAC library of the 400-Mb rice genome has been assembled in 438 FPC. However, the bigger mouse genome of 2.8 Gb was reduced to 296 FPCs with BAC coverage of 305,716 clones (ca. 16x). The relative fewer mouse FPCs was mainly due to the comparison of 453,962 STCs with the human genome sequence. About 11.3% of these STCs established collinearity between mouse and human, thereby facilitating the ordering of FPCs. The advantage of these long tiles of BAC clones is that it becomes easier to anchor them to the genetic map because they span a distance of many centiMorgans (cMs). Although these tiles are reasonably correct, their limited resolution makes them unsuitable for determining a MTP because they are based on restriction pattern with six-base-pair cutters. Therefore, it has been necessary to use STCs in combination with FPCs to identify the minimum overlap between two neighboring BAC clones. The disadvantage of this strategy is that sequencing a whole genome takes a lot of time because many nucleation points are needed and extensions of BAC clones by another round of BAC clones occur in many cycles.