International 'ENCODE' project reveals function of human genome; Massive catalog 'like Google Maps for the human genome,' includes work by WSU professor
Detroit - Leonard Lipovich's determination to prove genetic matter once deemed "junk" has a place in clinical medicine is bringing the Wayne State University School of Medicine to the forefront of a field occupying genome enthusiasts while also contributing to ENCODE, the global follow-up to The Human Genome Project.
Upon completion in 2003, the HGP produced an almost complete list of the 3 billion pairs of chemical letters in the DNA that embodies the human genetic code, but revealed nothing on how this blueprint works. Five years of concerted effort by more than 440 researchers in 32 labs around the world has changed that.
The group, working collaboratively in the Encyclopedia of DNA Elements (ENCODE) Project, has produced the first view of how the human genome actually does its job. Researchers across the United States, United Kingdom, Spain, Singapore and Japan performed more than 1,600 sets of experiments on 147 types of tissue with technologies standardized across the consortium.
Among them are WSU Assistant Professor of Molecular Medicine and Genetics and of Neurology Leonard Lipovich, Ph.D., and his team at WSU's Center for Molecular Medicine and Genetics.
Two groundbreaking articles highlighting results from his lab are included in a coordinated publication released Sept. 5. It includes one main integrative paper and five other papers in the journal Nature, 18 papers in Genome Research and six papers in Genome Biology.
The experiments relied on innovative uses of next-generation sequencing technologies, which had only become available around the start of the ENCODE production effort five years ago. ENCODE generated more than 15 trillion bytes of raw data and consumed the equivalent of more than 300 years of computer time to analyze.
Lipovich's contributions, on long non-coding ribonucleic acids, or lncRNAs, could lead to new therapeutics for cancer and other diseases.
"Long non-coding RNA genes comprise half of human genes. Most medical, therapeutic work so far has focused on normal, protein-coding genes. So, we - working as part of a multinational team of scientists - have just expanded, twofold, the set of genes that can be therapeutic targets," he said.
During the ENCODE study, researchers found that more than 80 percent of the human genome sequence is linked to biological function. They mapped more than 4 million regulatory regions where proteins specifically interact with DNA with exquisite specificity. These findings represent a significant advance in understanding the precise and complex controls over the expression of genetic information within a cell.
The findings bring into much sharper focus the continually active genome in which proteins routinely turn genes on and off using sites that are sometimes at great distances from the genes they regulate; where sites on a chromosome interact with each other, also sometimes at great distances; where chemical modifications of DNA influence gene expression; and where various functional forms of RNA, a form of nucleic acid related to DNA, help regulate the whole system.
"For many years people pooh-poohed our field, saying that our long non-coding RNAs are either junk or conventional protein-coding messenger RNAs that we failed to properly understand. We now demonstrate, using an experimental approach, that they are really non-protein-coding (never translated into protein) RNAs in human cells," Lipovich said.
Lipovich, together with his colleague James B. Brown at the University of California, Berkeley, is a joint last author on a paper on whole-genome translation testing of human lncRNAs included in the September 2012 issue of Genome Research, the genetics and genomics field's leading peer-reviewed journal. The paper places Wayne State on the international genomics radar, amidst an elite group of institutions - including the University of California, Berkeley and Stanford University - whose pioneering collaborative work is defining the current phase of ENCODE. The September issue is dedicated to the latest phase of results from the ENCODE consortium.
The research project is supported by the National Human Genome Research Institute. In 2003, ENCODE's goal of identifying all functional elements contained in the human genome seemed just as daunting as sequencing the human genome was in 1990. ENCODE was launched as a pilot project. By 2007, NHGRI concluded that the technology had sufficiently evolved for a full-scale project, in which the institute invested approximately $123 million during five years. In addition, NHGRI has devoted about $40 million to the ENCODE pilot project, plus approximately $125 million in ENCODE-related technology development and model organism research since 2003.
Lipovich's paper sets a new standard for how to integrate RNA data with protein data in a way that has never been done. "My lab, through its computational work here at Wayne, did a vital part of the integration, developing a method that can be used in any future studies that intersect protein and RNA data genome-wide," he said. "Unusual, rare lncRNA-encoded proteins, such as those we found, could be the results of incorrect lncRNA processing by cells in diseased tissues, and hence a huge resource of biomarkers for diagnostics."
Data can be accessed through the ENCODE project portal (www.encodeproject.org), the University of California at Santa Cruz genome browser (http://genome.ucsc.edu/ENCODE/), the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/geo/info/ENCODE.html/) and the European Bioinformatics Institute (http://useast.ensembl.org/Homo_sapiens/encode.html?redirect=mirror;source=www.ensembl.org).
"The ENCODE catalog is like Google Maps for the human genome," said Elise Feingold, Ph.D., an NHGRI program director who helped start the ENCODE Project. "Simply by selecting the magnification in Google Maps, you can see countries, states, cities, streets, even individual intersections, and by selecting different features, you can get directions, see street names and photos, and get information about traffic and even weather. The ENCODE maps allow researchers to inspect the chromosomes, genes, functional elements and individual nucleotides in the human genome in much the same way."
Since the same topics were addressed in different ways in different papers, the new website, http://www.nature.com/encode/, will allow anyone to follow a topic through all of the papers in the ENCODE publication set in which it appears by clicking on the relevant "thread" at the Nature ENCODE explorer page.
The ENCODE data are rapidly becoming a fundamental resource for researchers to help understand human biology and disease. More than 100 papers using ENCODE data have already been published by investigators who were not part of the ENCODE Project. For example, researchers studying the genetic basis of human diseases are using ENCODE to sort through the many disease-associated variants, or markers, that map not only to protein-coding regions of the genome, but also to the non-coding regions of the genome, the vast tracts of sequence between genes where ENCODE has identified many regulatory sites, in an effort to determine which specific variants contribute to disease.
"As much as nine out of 10 times, disease-linked genetic variants are not in protein-coding regions," said Mike Pazin, Ph.D., an ENCODE program director at NHGRI. "So what does that mean? The answer is going to turn out to be that some of the time, the genetic changes causing the disorder are in fact within regulatory regions, or switches, that affect how much protein is produced or when the protein is produced, rather than affecting the structure of the protein itself. The medical condition will occur because the gene is aberrantly turned on or turned off or abnormal amounts of the protein are made. Far from being 'junk' DNA, this regulatory DNA clearly makes important contributions to human disease."
An additional paper in September's Genome Research co-written by Lipovich, "The GENCODE catalogue of human long non-coding RNAs: Analysis of their gene structure, evolution and expression," presents the most authoritative reference catalog of noncoding-RNA genes ever constructed. Noncoding-RNA genes exemplify a huge category of sequence that was, until recently, thought of as 'junk' DNA.
"It will be used by the entire international ENCODE consortium as a foundation for functional studies linking this exciting new class of RNAs to human health and disease," said Dr. Lipovich.
Along with Dr. Brown at Berkeley, Lipovich is a middle author on this large, international effort from ENCODE's Analysis Working Group.
The ENCODE AWG is open to all academic, government and private sector scientists interested in participating in an open process to facilitate the comprehensive identification of the functional elements in the human genome sequence and who agree to a variety of criteria.
###
Wayne State University is one of the nation's pre-eminent public research universities in an urban setting. Through its multidisciplinary approach to research and education, and its ongoing collaboration with government, industry and other institutions, the university seeks to enhance economic growth and improve the quality of life in the city of Detroit, state of Michigan and throughout the world. For more information about research at Wayne State University, visit http://www.research.wayne.edu.
Founded in 1868, the Wayne State University School of Medicine is the largest single-campus medical school in the nation, with more than 1,000 medical students. In addition to undergraduate medical education, the school offers master's degree, Ph.D. and M.D.-Ph.D. programs in 14 areas of basic science to about 400 students annually.