| contents | |
| next | |
| back |
Model Organism Research
Most mapping and sequencing technologies were developed from studies of nonhuman genomes, notably those of the bacterium Escherichia coli, the yeast Saccharomyces cerevisiae, the fruit fly Drosophila melanogaster, the roundworm Caenorhabditis elegans, and the laboratory mouse Mus musculus. These simpler systems provide excellent models for developing and testing the procedures needed for studying the much more complex human genome.
A large amount of genetic information has already been derived from these organisms, providing valuable data for the analysis of normal gene regulation, genetic diseases, and evolutionary processes. Physical maps have been completed for E. coli, and extensive overlapping clone sets are available for S. cerevisiae and C. elegans. In addition, sequencing projects have been initiated by the NIH genome program for E. coli, S. cerevisiae, and C. elegans.
Mouse genome research will provide much significant comparative information because of the many biological and genetic similarities between mouse and man. Comparisons of human and mouse DNA sequences will reveal areas that have been conserved during evolution and are therefore important. An extensive database of mouse DNA sequences will allow counterparts of particular human genes to be identified in the mouse and exten-sively studied. Conversely, information on genes first found to be important in the mouse will lead to associated human studies. The mouse genetic map, based on morphological markers, has already led to many insights into human biology. Mouse models are being developed to explore the effects of mutations causing human diseases, including diabe-tes, muscular dystrophy, and several cancers. A genetic map based on DNA markers is presently being constructed, and a physical map is planned to allow direct comparison with the human physical map.
Informatics: Data Collection and Interpretation
Collecting and Storing Data
The reference map and sequence generated by genome research will be used as a primary information source for human biology and medicine far into the future. The vast amount of data produced will first need to be collected, stored, and distributed. If compiled in books, the data would fill an estimated 200 volumes the size of a Manhat-tan telephone book (at 1000 pages each), and reading it would require 26 years working around the clock (Fig.14).
| HUMAN GENETIC DIVERSITY: The Ultimate Human Genetic Database
|
Because handling this amount of data will require exten-sive use of computers, database development will be a major focus of the Human Genome Project. The present challenge is to improve database design, software for database access and manipulation, and data-entry procedures to compensate for the varied computer procedures and systems used in different laboratories. Databases need to be designed that will accurately represent map information (linkage, STSs, physical location, disease loci) and sequences (genomic, cDNAs, proteins) and link them to each other and to bibliographic text databases of the scientific and medical literature.
Interpreting Data
New tools will also be needed for analyzing the data from genome maps and sequences. Recognizing where genes begin and end and identifying their exons, introns, and regula-tory sequences may require extensive comparisons with sequences from related species such as the mouse to search for conserved similarities (homologies). Searching a data-base for a particular DNA sequence may uncover these homologous sequences in a known gene from a model organism, revealing insights into the function of the correspond-ing human gene.
Correlating sequence information with genetic linkage data and disease gene research will reveal the molecular basis for human variation. If a newly identified gene is found to code for a flawed protein, the altered protein must be compared with the normal version to identify the specific abnormality that causes disease. Once the error is pinpointed, researchers must try to determine how to correct it in the human body, a task that will require knowledge about how the protein functions and in which cells it is active.
| Fig. 14. Magnitude of Genome Data. If the DNA sequence of the human genome were compiled in books, the equivalent of 200 volumes the size of a Manhattan telephone book (at 1000 pages each) would be needed to hold it all. New data-analysis tools will be needed for understanding the information from genome maps and sequences. | ![]() |
![]() |
Fig. 15. Understanding Gene Function. Understanding how genes function will require analyses of the 3-D structures of the proteins for which the genes code. |
Correct protein function depends on the three-dimensional (3D), or folded, structure the proteins assume in biological environments; thus, understanding protein structure will be essential in determining gene function. DNA sequences will be translated into amino acid sequences, and re-searchers will try to make inferences about functions either by com-paring protein sequences with each other or by comparing their specific 3-D structures (Fig. 15).
Because the 3-D structure patterns (motifs) that protein molecules assume are much more evolutionarily con-served than amino acid sequences, this type of homology search could prove more fruitful. Particular motifs may serve similar functions in several different proteins, infor-mation that would be valuable in genome analyses. Currently, however, only a few protein motifs can be recognized at the sequence level. Continued development of analytic capabilities to facilitate grouping protein sequences into motif families will make homology searches more successful.
Mapping Databases
The Genome Data Base (GDB), located at Johns Hopkins University (Baltimore, Mary-land), provides location, ordering, and distance information for human genetic markers, probes, and contigs linked to known human genetic disease. GDB is presently working on incorporating physical mapping data. Also at Hopkins is the Online Mendelian Inheritance in Man database, a catalog of inherited human traits and diseases.
The Human and Mouse Probes and Libraries Database (located at the American Type Culture Collection in Rockville, Maryland) and the GBASE mouse database (located at Jackson Laboratory, Bar Harbor, Maine) include data on RFLPs, chromosomal assign-ments, and probes from the laboratory mouse.
Sequence Databases
Nucleic Acids (DNA and RNA)
Public databases containing the complete nucleotide sequence of the human genome and those of selected model organisms will be one of the most useful products of the Human Genome Project. Four major public databases now store nucleotide sequences: GenBank and the Genome Sequence DataBase (GSDB) in the United States, European Molecular Biology Laboratory (EMBL) Nucleotide Sequence Database in the United Kingdom, and the DNA Database of Japan (DDBJ). The databases collaborate to share sequences, which are compiled from direct author submissions and journal scans. The four databases now house a total of almost 200 Mb of sequence. Although human sequences predomi-nate, more than 8000 species are represented. [Paragraph updated July 1994]
Proteins
The major protein sequence databases are the Protein Identification Resource (National Biomedical Research Foundation), Swissprot, and GenPept (both distributed with GenBank). In addition to sequence information, they contain information on protein motifs and other features of protein structure.
Impact of the Human Genome Project
The atlas of the human genome will revolutionize medical practice and biological research into the 21st century and beyond. All human genes will eventually be found, and accurate diagnostics will be developed for most inherited diseases. In addition, animal models for human disease research will be more easily developed, facilitating the under-standing of gene function in health and disease.
Researchers have already identified single genes associated with a number of diseases, such as cystic fibrosis, Duchenne muscular dystrophy, myotonic dystrophy, neurofibroma-tosis, and retinoblastoma. As research progresses, investigators will also uncover the mechanisms for diseases caused by several genes or by a gene interacting with environ-mental factors. Genetic susceptibilities have been implicated in many major disabling and fatal diseases including heart disease, stroke, diabetes, and several kinds of cancer. The identification of these genes and their proteins will pave the way to more-effective therapies and preventive measures. Investigators determining the underlying biology of genome organization and gene regulation will also begin to understand how humans develop from single cells to adults, why this process sometimes goes awry, and what changes take place as people age.
New technologies developed for genome research will also find myriad applications in industry, as well as in projects to map (and ultimately improve) the genomes of economi-cally important farm animals and crops.
While human genome research itself does not pose any new ethical dilemmas, the use of data arising from these studies presents challenges that need to be addressed before the data accumulate significantly. To assist in policy development, the ethics component of the Human Genome Project is funding conferences and research projects to identify and consider relevant issues, as well as activities to promote public awareness of these topics.
| contents | |
| next | |
| back |
Page: 0015. Version: 0001. Produced by: GEENOR. No rights reserved.