Rat Genome Database
The Rat Genome Database (RGD) is a database of rat genomics, genetics, physiology and functional data, as well as data for comparative genomics between rat, human and mouse.[1] RGD is responsible for attaching biological information to the rat genome via structured vocabulary, or ontology, annotations assigned to genes and quantitative trait loci (QTL), and for consolidating rat strain data and making it available to the research community. RGD is working with groups such as the Programs for Genomic Applications[2] at MCW and the National BioResource Project for the Rat (NBPR-Rat) in Japan[3] to collect and make available comprehensive physiologic data for a variety of rat strains. They are also developing a suite of tools for mining and analyzing genomic, physiologic and functional data for the rat, and comparative data for rat, mouse and human.
Content | |
---|---|
Description | The Rat Genome Database |
Organisms | Rattus norvegicus (rat) |
Contact | |
Research center | Medical College of Wisconsin |
Laboratory | Biomedical Engineering |
Authors | Anne E. Kwitek, PhD |
Primary citation | PMID 25355511 |
Access | |
Website | rgd |
Download URL | RGD Data Release |
RGD began as a collaborative effort between research institutions involved in rat genetic and genomic research. Its goal, as stated in RFA: HL-99-013, was the establishment of a Rat Genome Database to collect, consolidate, and integrate data generated from ongoing rat genetic and genomic research efforts and make these data widely available to the scientific community. A secondary, but critical goal was to provide curation of mapped positions for quantitative trait loci, known mutations and other phenotypic data.
The rat continues to be extensively used by researchers as a model organism for investigating pharmacology, toxicology, general physiology and the biology and pathophysiology of disease.[4] In recent years, there has been a rapid increase in rat genetic and genomic data. In addition to this, the Rat Genome Database has become a central point for information on the rat for research and now features information on not just genetics and genomics, but physiology and molecular biology as well. There are tools and data pages available for all of these fields that are curated by RGD staff.[5]
Data
RGD's Data page[6] lists eight types of data stored in the database: Genes, QTLs, Markers, Maps, Strains, Ontologies, Sequences and References. Of these, six are actively used and regularly updated. The RGD "Maps" datatype refers to legacy genetic and radiation hybrid maps. This data has been largely supplanted by the rat whole genome sequence. The "Sequences" data type is not a full list of either genomic, transcript or protein sequences, but rather mostly contains PCR primer sequences which define simple sequence length polymorphism (SSLP) and expressed sequence tag (EST) Markers. Such sequences are useful primarily for researchers still using these markers for genotyping their animals and for distinguishing between markers of the same name. The six major data types in RGD are as follows:
- Genes: Initial gene records are imported/updated from the National Center for Biotechnology Information's (NCBI's) Gene database on a weekly basis. Data imported during this process includes the Gene ID, Genbank/RefSeq nucleotide and protein sequence identifiers, HomoloGene group IDs and Ensembl Gene, Transcript and Protein IDs. Additional protein-related data is imported from the UniProtKB database. RGD curators review the literature and manually curate GO, diseases, phenotypes and pathways for rat genes, diseases and pathways for mouse genes, and diseases, phenotypes and pathways for human genes.[7][8] In addition, the site imports GO annotations for mouse and human genes from the GO Consortium, rat electronic annotations from UniProt and mouse phenotype annotations from the Mouse Genome Database/Mouse Genome Informatics (MGD/MGI).
- QTLs: RGD's staff manually curates data for rat and human QTLs from the literature where such publications exist or from records directly submitted by researchers. Mouse QTL records, including Mammalian Phenotype (MP) ontology assignments, are imported directly from MGI. For rat and human QTLs, curation includes assigning MP and disease ontology annotations. QTL positions are automatically assigned based on the genomic positions of peak and/or flanking markers or single nucleotide polymorphisms (SNPs). QTL records link to information about related strains, candidate genes, associated markers and related QTLs.
- Strains: As for QTL records, RGD strain records are either manually curated from the literature or submitted by researchers. Strain records include information about the origin and availability of the strain, associated phenotypes, whether the strain is a model for a human disease, and any information that is available about breeding, behavior, husbandry, etc. Strain records link to information about related genes and QTLs, associated strains (e.g. parental strains or substrains) and, where available, strain-specific nucleotide variants. For congenic and mutant strains, genomic positions are assigned for the introgressed region (congenic strains) or the location of the mutated sequence (mutant strains). RGD does not import data for mouse strains.
- Markers: Because genetic markers such as SSLPs and ESTs have been, and continue to be, used for QTLs and strains, RGD stores marker data for rat, human and mouse. Marker data includes the sequences of the associated forward and reverse PCR primers, genomic positions and links to NCBI's Probe database. Marker records link to associated QTL, strain and gene records.
- Ontologies: In order to make RGD's data both human readable and available for computational analysis and retrieval, RGD relies on the use of multiple ontologies. As of July 2015, RGD used 16 different ontologies to express the various types of data applicable to RGD's diverse datatypes. Ontology annotations are assigned manually by curators[7][8] or are imported from external sources through the use of automated pipelines. Six of the ontologies in use at RGD were created or co-created at RGD and seven are under development by RGD staff members and/or collaborators, these being ontologies for Pathway (PW), Rat Strains (RS), Vertebrate Traits (VT), Disease (RDO), Clinical Measurements (CMO), Measurement Methods (MMO) and Experimental Conditions (XCO).[9] Ontologies which are imported from outside sources are updated weekly.
- References: RGD references are scientific publications that have been used for curation or are sources for data objects such as QTLs and strains. For references accessed via NCBI's PubMed, imported data includes the title, authors, citation and PubMed ID. In some cases, a reference is an internal record for processes such as automated pipelines or a personal communication, giving users of the database an indication of the source of a particular piece of data. PubMed records are not available for these. Each reference record links to all of the data curated from that article, including genes, QTLs, strains and ontology annotations.
Genome tools
RGD's Genome tools[10] include both software tools developed at RGD and tools from third party sources.
Genome tools developed at RGD
RGD develops web-based tools designed to use the data stored in the RGD database for analyses in rat and across species. These include:
- Gene Annotator: The Gene Annotator or GA tool takes as input a list of gene symbols, RGD IDs, GenBank accession numbers, Ensembl identifiers, and/or a chromosomal region and retrieves gene orthologs, external database identifiers and ontology annotations for the corresponding genes in RGD. The data can be downloaded into an Excel spreadsheet or analyzed in the tool. The "Annotation Distribution" function displays a list of terms in each of seven categories with the percentage of genes from the input list with annotations to each term. The "Comparison Heat Map" function allows comparisons of annotations for genes in the input list across two ontologies or across two branches of the same ontology.
- Variant Visualizer: Variant Visualizer (VV) is a viewing and analysis tool for rat strain-specific sequence polymorphisms. VV takes as input a list of gene symbols or a genomic region as defined by chromosome, start and stop positions or by two gene or markers symbols. The user must also select their strains of interest from a list of strains for which whole genome sequences exist and can set parameters for the variants in the result set. Output is a heatmap-type display of variants. Additional information for individual variants can be viewed in a "detail pane" display.
- OLGA - Object List Generator & Analyzer: OLGA is a search engine designed to allow users to run multiple queries, generate a list of objects from each query and flexibly combine the results. OLGA takes as input either a list of object symbols or search parameters based on ontology annotations or position. The final list of genes, QTLs or strains can be downloaded or submitted to the GA Tool, the Variant Visualizer or the Genome Viewer from within the tool.
- Genome Viewer: The Genome Viewer (GViewer) tool provides users with complete genome views of genes, QTLs and mapped strains annotated to a function, biological process, cellular component, phenotype, disease, pathway, or chemical interaction. GViewer allows Boolean searches across multiple ontologies. Output is displayed against a karyotype of the rat genome.
- Overgo Probe Designer: Overgo probes are pairs of partially overlapping 22mer oligonucleotides derived from repeat-masked genomic sequence and used as high specific activity probes for genome mapping. The Overgo Probe Designer tool takes as input a nucleotide sequence and outputs a list of optimized probe sequences containing the requisite 8 nucleotide overlap on their 3' ends.
- ACP Haplotyper: The ACP Haplotyper creates a "visual haplotype" that can be used to identify conserved and non-conserved chromosomal regions between any of the 48 rat strains characterized as part of the ACP project. For the selected chromosome and between the selected strains, the tool compares the allele size data for microsatellite markers on the selected genetic or RH map.[11]
- SNPlotyper: SNPlotyper is a visualization and analysis tool for Rat SNP data imported from dbSNP and Ensembl. It enables users to view haplotype blocks shared between strains and identify informative (polymorphic) markers between two or more strains. Data in SNPlotyper is legacy genotyping data and does not include the strain-specific variants derived from WGS of rat strains.
Third party genome tools adapted for use with RGD data
RGD offers several "third party" software tools that have been adapted for use on the website utilizing data stored in the RGD database. These include:
- Genome Browsers: As of July 2015, RGD supported two types of genome browsers for viewing data for rat, mouse and human. Both tools, GBrowse [12] and JBrowse[13] have been or are being developed by the Generic Model Organism Database. These tools allow the user to view the location of a genetic landmark (sequence, gene, locus, marker, and/or oligonucleotide) on the genome of the applicable species. They also allow comparisons between species via the use of "synteny tracks" and links between instances of the browsers for the different species.
- RatMine: RatMine is a rat-centric version of the InterMine[14] software. It enables users to mine and analyze rat data from diverse databases including RGD, NCBI, UniProtKB and Ensembl in a single location using a consistent format. The InterMine platform has been adapted for multiple species in other databases and is designed to be interoperable between instances so that users can query across species from the RatMine interface.[15]
- Virtual Comparative Map: The Virtual Comparative Map (VCMap) was originally developed to explore the syntenic relationships between rat, mouse and human genomes. A new version of VCMap is now available which also incorporates cow, pig and chicken. Users select a "primary" or "backbone" species, then can view the syntenic regions in one or more of the other species.
Additional data and tools
Phenotypes and Models portal
RGD's Phenotypes and Models portal[16] focuses on strains, phenotypes and the rat as a model organism for physiology and disease. The Phenotypes and Models portal has five sections: "Phenotypes", "Strains & Models", "Meet Joe Rat", "PhenoMiner" and "Strain Medical Records".
- Phenotypes: The Phenotypes section contains a large body of data from the PhysGen Program for Genomic Applications project, an NHLBI-funded project to "develop consomic and knockout rat strains, phenotypically characterize these strains, and provide these resources to the scientific community.".[17] Data categories include measurements of cardiovascular, renal and respiratory function, blood chemistry, body morphology and behavior. Links are also provided to protocols for phenotyping rats and to similar high-throughput phenotyping data at the National BioResource Project for the Rat in Japan (NBRP-Rat).
- Strains & Models: The Strains and Models section contains general information on rat strains, including information about strain availability and animal husbandry, and links to the RGD strain search and to review articles about rat strains. The section also includes a subsection about disease models that gives detailed information about which rat strains have been used as models for human cardiovascular disease, neurological disease, mammary cancer, diabetes, respiratory diseases, and immune and inflammatory diseases.
- Meet Joe Rat: "Meet Joe Rat" is designed as a general information resource for rat researchers. The Photos and Images pages link to images of PGA/PhysGen parental and consomic strains which in turn link to data for those strains. "Ratday" links to the yearly RGD rat calendar. "Community Submissions" gives information and forms for submitting photos, for registering strains and for submitting quantitative phenotype data for PhenoMiner. The final subsection contains information about strain availability.
- PhenoMiner: PhenoMiner[18] is a database and web application for finding and analyzing quantitative rat phenotype data. Data is annotated to ontologies for rat strain, clinical measurement, measurement method, and experimental condition. Experiments are categorized by the trait or disease assessed by the measurement. The use of standardized vocabularies and data formats allows comparison of values across experiments for the same measurement. The PhenoMiner results page includes a graph of the measurement values and a downloadable table of the values with their accompanying metadata. A link is provided to give users the opportunity to submit their own data to the database.
- Strain Medical Records: RGD's Strain Medical Records (SMR) are designed to consolidate what is known about a particular strain. Information such as coat coloring, average body weights at various time points for both male and female, and information about reproduction is presented. Average values for quantitative phenotype measurements such as blood pressure, heart rate and blood chemistry for rats of that strain under standard/control conditions are given along with the corresponding range of values for other commonly used strains. Each SMR links to source(s) where the strain can be obtained, to PhenoMiner for the quantitative phenotype data and to variant, QTL and microarray expression data.
Diseases
As of July 2015, RGD had nine disease portals:[19][20]
- Cancer
- Cardiovascular Disease
- Diabetes
- Immune and Inflammatory Disease
- Neurological Disease
- Obesity and Metabolic Syndrome
- Renal Disease
- Respiratory Disease
- Sensory Organ Disease
Disease portals consolidate the data in RGD for a specific disease category and present it in a single group of pages. Genes, QTLs and strains annotated to any disease in the category are listed, with genome-wide views of their locations in rat, human and mouse (see "Genome Viewer" in Genome tools developed at RGD). Additional sections of the portal display data for phenotypes, biological processes and pathways related to the disease category. Pages are also supplied to give users access to information about rat strains used as models for one or more diseases in the category, tools that could be used to analyze the data and additional resources related to the disease category.
Pathways
RGD's Pathway resources[21][22] include an ontology[23] of pathway terms (encompassing not only metabolic pathways but also disease, drug, regulatory and signaling pathways), as well as interactive diagrams of the components and interactions of selected pathways; "Pathway Suites and Suite Networks", i.e. groupings of related pathways which all contribute to a larger process such as glucose homeostasis or gene expression regulation; and Physiological Pathway diagrams which display networks of organs, tissues, cells and molecular pathways at the whole animal or systems level.
Knockouts
Until recently, direct, specific genomic manipulations in the rat were not possible. However, with the rise of technologies such as Zinc finger nuclease- and CRISPR -based mutagenesis techniques, that is no longer the case.[24] Groups producing rat gene knockouts and other types of genetically modified rats include the Human and Molecular Genetics Center at MCW. RGD links to information about the rat strains produced in these studies via pages about the PhysGen Knockout project[25] and the MCW Gene Editing Rat Resource Center (GERRC),[26] accessed from RGD page headers. Funding for both the PhysGenKO project and the GERRC came from the National Heart Lung and Blood Institute (NHLBI). The stated goal of both projects was to produce rats with alterations in one or more specific genes related to the mission of the NHLBI. Genes were nominated by rat researchers. Nominations were adjudicated by an External Advisory Board. In the case of the PhysGenKO project, many of the rats produced by the group were phenotyped using a standardized high-throughput phenotyping protocol and the data is available in RGD's PhenoMiner tool.
Community outreach and education
RGD reaches out to the rat research community in a variety of ways including an email forum, a news page, a Facebook page, and regular attendance and presentations at scientific meetings and conferences.[27] Additional educational activities include the production of tutorial videos, both outlining how to use RGD tools and data, and on more general topics such as biomedical ontologies and biological (i.e. gene, QTL and strain) nomenclature. These videos are hosted on a number of online video hosting sites including YouTube.
Funding
RGD is funded by grant HL64541 from the National Heart, Lung, and Blood Institute (NHLBI) on behalf of the NIH. The grant includes some additional funding from the National Human Genome Research Institute (NHGRI). As of July 2015, the principal investigator of the grant was Mary E. Shimoyama, PhD who took over this leadership position from Howard J. Jacob, PhD in early 2015.[28]
Accession numbers and genome assembly
As of July 2015, the most current genomic sequence for rat is available under accession numbers AABR07000001-AABR07073554 in the international sequence databases (GenBank, DDBJ and EMBL). The most current assembly is Rnor_6.0. The assembly level is "chromosome" and the genome representation is "full", including a sequence of the Y chromosome (missing from all previous assemblies).[29]
References
- Shimoyama M, De Pons J, Hayman GT, et al. (2015). "The Rat Genome Database 2015: genomic, phenotypic and environmental variations and disease". Nucleic Acids Res. 43 (Database issue): D743–50. doi:10.1093/nar/gku1026. PMC 4383884. PMID 25355511.
- PGA. "PhysGen Programs for Genomic Applications". Pga.mcw.edu. Retrieved 2015-07-15.
- NBRP-Rat. "The National BioResource Project for the Rat in Japan". anim.med.kyoto-u.ac.jp. Retrieved 2015-07-15.
- Aitman TJ, Critser JK, Cuppen E, et al. (2008). "Progress and prospects in rat genetics: a community view". Nat. Genet. 40 (5): 516–22. doi:10.1038/ng.147. PMID 18443588.
- RGD. "About RGD - Rat Genome Database". Rgd.mcw.edu. Retrieved 2013-02-17.
- "RGD Data - Rat Genome Database". Rgd.mcw.edu. Retrieved 2015-07-08.
- Laulederkind SJ, Shimoyama M, Hayman GT, et al. (2011). "The Rat Genome Database curation tool suite: a set of optimized software tools enabling efficient acquisition, organization, and presentation of biological data". Database (Oxford). 2011: bar002. doi:10.1093/database/bar002. PMC 3041158. PMID 21321022.
- Shimoyama M, Hayman GT, Laulederkind SJ, et al. (2009). "The rat genome database curators: who, what, where, why". PLoS Comput. Biol. 5 (11): e1000582. doi:10.1371/journal.pcbi.1000582. PMC 2775909. PMID 19956751.
- RGD. "About RGD Ontologies - Rat Genome Database". Rgd.mcw.edu. Retrieved 2015-07-08.
- RGD. "Genome Tools - Rat Genome Database". Rgd.mcw.edu. Retrieved 2015-07-08.
- de la Cruz N, Bromberg S, Pasko D, et al. (2005). "The Rat Genome Database (RGD): developments towards a phenome database". Nucleic Acids Res. 33 (Database issue): D485–91. doi:10.1093/nar/gki050. PMC 540004. PMID 15608243.
- Stein LD, Mungall C, Shu S, et al. (2002). "The generic genome browser: a building block for a model organism system database". Genome Res. 12 (10): 1599–610. doi:10.1101/gr.403602. PMC 187535. PMID 12368253.
- Skinner ME, Uzilov AV, Stein LD, Mungall CJ, Holmes IH (2009). "JBrowse: a next-generation genome browser". Genome Res. 19 (9): 1630–8. doi:10.1101/gr.094607.109. PMC 2752129. PMID 19570905.
- Smith RN, Aleksic J, Butano D, et al. (2012). "InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data". Bioinformatics. 28 (23): 3163–5. doi:10.1093/bioinformatics/bts577. PMC 3516146. PMID 23023984.
- Rachel L, Julie S, Daniela B, et al. (2015). "Cross-organism analysis using InterMine". Genesis. 53 (8): 547–60. doi:10.1002/dvg.22869. PMC 4545681. PMID 26097192.
- RGD. "Phenotypes & Models - Rat Genome Database". Rgd.mcw.edu. Retrieved 2015-07-15.
- RGD. "Phenotype Data - Rat Genome Database". Rgd.mcw.edu. Retrieved 2015-07-14.
- Laulederkind SJ, Liu W, Smith JR, et al. (2013). "PhenoMiner: quantitative phenotype curation at the rat genome database". Database (Oxford). 2013: bat015. doi:10.1093/database/bat015. PMC 3630803. PMID 23603846.
- RGD. "RGD Disease Portals - Rat Genome Database". Rgd.mcw.edu. Retrieved 2015-07-15.
- Twigger SN, Shimoyama M, Bromberg S, Kwitek AE, Jacob HJ (2007). "The Rat Genome Database, update 2007--easing the path from disease to data and back again". Nucleic Acids Res. 35 (Database issue): D658–62. doi:10.1093/nar/gkl988. PMC 1761441. PMID 17151068.
- Petri V, Shimoyama M, Hayman GT, et al. (2011). "The Rat Genome Database pathway portal". Database (Oxford). 2011: bar010. doi:10.1093/database/bar010. PMC 3072770. PMID 21478484.
- Hayman GT, Jayaraman P, Petri V, et al. (2013). "The updated RGD Pathway Portal utilizes increased curation efficiency and provides expanded pathway information". Hum. Genomics. 7: 4. doi:10.1186/1479-7364-7-4. PMC 3598722. PMID 23379628.
- Petri V, Jayaraman P, Tutaj M, et al. (2014). "The pathway ontology - updates and applications". J Biomed Semantics. 5 (1): 7. doi:10.1186/2041-1480-5-7. PMC 3922094. PMID 24499703.
- Flister MJ, Prokop JW, Lazar J, Shimoyama M, Dwinell M, Geurts A (2015). "2015 Guidelines for Establishing Genetically Modified Rat Models for Cardiovascular Research". J Cardiovasc Transl Res. 8 (4): 269–77. doi:10.1007/s12265-015-9626-4. PMC 4475456. PMID 25920443.
- RGD. "PhysGen Knockouts - Rat Genome Database". Rgd.mcw.edu. Retrieved 2015-07-15.
- RGD. "Gene Editing Rat Resource Center". Rgd.mcw.edu. Retrieved 2015-07-21.
- RGD. "Rat Community - Rat Genome Database". Rgd.mcw.edu. Retrieved 2015-07-15.
- NIH. "Project Information for grant HL64541: Rat Genome Database". nih.gov. Retrieved 2015-07-15.
- NIH. "Rnor_6.0 - Assembly - NCBI". nih.gov. Retrieved 2015-07-15.