February 12, 2001
Graduate student lands starring role in the Human Genome Project
By Tim Stephens
Kent is not a typical biologist, having spent 15 years writing computer animation software after earning his bachelor's and master's degrees in mathematics at UCSC. As part of his initial graduate research in biology (studying gene expression in the roundworm C. elegans), he wrote an impressive program that caught the eye of professor of computer science David Haussler. Haussler, a Howard Hughes Medical Institute investigator and a leader in the field of bioinformatics, soon began collaborating with Kent. Shortly thereafter, Eric Lander, director of the Genome Center at MIT's Whitehead Institute, asked Haussler to help analyze the human genome.
That project led to Kent's finest hour--so far. Working flat out for a solid month in the garage office behind his Santa Cruz bungalow, he wrote most of GigAssembler in a mad rush. When the 80-hour work weeks began to take a toll on his wrists, he used ice packs to control the pain. His efforts paid off.
Haussler led the team of UCSC researchers, including Kent, that used GigAssembler to analyze data from the genome consortium's sequencing laboratories and piece together a draft of the genome sequence. Running the program on a hastily built parallel-processing cluster of 100 computer workstations, they finished just in time for Collins to announce the completion of a working draft of the human genome at a White House press conference in June 2000.
"The assembly process is kind of like solving a giant jigsaw puzzle, but it is a much more complicated jigsaw puzzle than you ever imagined," Kent said.
When Kent decided to take on the task of assembling the genome, he faced a daunting challenge. The human genetic code is spelled out in roughly 3 billion DNA subunits arranged in specific sequences on the chromosomes. To determine those sequences, Genome Project scientists divided the chromosomal DNA into about 25,000 small overlapping regions for analysis by sequencing labs. The labs obtained sequences for many random fragments of DNA from each region, providing a total of about 400,000 sequenced fragments. These sequences then had to be assembled in the proper order and orientation to represent the sequences of each of the 23 human chromosomes as accurately as possible.
Kent turned his attention to the problem shortly after passing his Ph.D. qualifying exam in May 2000. "After my oral exams, I looked around and saw that the assembly wasn't very far along. I had an idea that I was pretty sure would work, so I pursued it," he said, then added, in a typical understatement, "It ended up being a bigger project than I had thought."
Kent, Haussler, and other members of the UCSC team are coauthors on three scientific papers on the human genome in the February 15 issue of Nature. They continue to work on the task of completing the genome sequence, identifying gaps where new sequencing data are needed, and updating the assembled sequence as new data become available. Haussler's group has also been involved in the analysis of the genome to predict locations of genes using a program called Genie, developed by his former student David Kulp, now vice president of bioinformatics at Affymetrix. In addition, one of Haussler's graduate students, Terrence Furey, has been working with a group that is identifying the locations in the sequence of the dark bands that are used as landmarks in cytogenetic studies of human chromosomes.
As an encore to GigAssembler, Kent created a web-based human genome browser that has already proved quite useful to biomedical researchers. The browser is publicly available at genome.ucsc.edu. The site gets an average of 20,000 "hits" per day, Haussler said.
Kent has a story he likes to tell about how he ended up switching from computer animation to biology. He had written graphics and animation programs for some of the first personal computers in the 1980s, and later developed products for Atari and Autodesk. But he finally got tired of keeping up with the constantly changing operating systems. Windows 95 was the last straw, he said.
"The platform for software developers came on 12 CD-ROMs, and I said to myself, heck, the whole human genome would fit on one CD-ROM, and it doesn't change every three months," he said.
Little did he know that he would actually be the one to put the human genome onto a CD-ROM disk. A human genome CD-ROM produced at UCSC was recently placed in the National Millennium Time Capsule, to be housed at the National Archives for the next 100 years.
"It is kind of odd to find myself in the middle of all this," Kent said. "But it's always thrilling when you get something complicated to work, and there is an awful lot of research that depends on having the genome sequence finished."
Of course, what Kent achieved with GigAssembler was entirely dependent on input from many other scientists involved in the sequencing and analysis of the human genome. In particular, a map of the genome developed by Robert Waterston, director of the Genome Sequencing Center at Washington University in St. Louis, served as an invaluable guide in putting the pieces together, Haussler said.
Haussler's team at UCSC was part of the genome analysis group led by Whitehead's Lander. The key members of this group include Collins, Lander, Waterston, Ewan Birney at the European Bioinformatics Institute in Cambridge, England, and Gregory Schuler at the National Center for Biotechnology Information. A complete list of the members of the genome analysis group, which includes scientists at more than a dozen institutions, can be found in the Nature paper on sequencing and analysis of the human genome.