PGSB Repeat Element Database (PGSB-REdat) and Catalog (PGSB-REcat)


Mobile elements, the major group of repetitive elements have first been discovered in maize in the 1950s by Barbara McClintock. Plant genomes are crowded by taxon specific mobile elements and their deteriorated remnants, with portions between 20 to over 90 percent of primarily LTR-retrotransposon insertions leading to complex and highly repetitive structures. Due to their harmful effects on genome integrity transposons are usually kept at bay within heterochromatic regions and transcriptionally silenced by epigentic processes. At the same time they can also play beneficial roles in in evolutionary processes by creating genetic diversity needed for selection. The interplay between proliferation and removal of transposable elements greatly influences genome size and chromosomal architecture. Their prominent differential accumulations, even within closely related species, pose intriguing questions about host control, transposon countermeasures and the factors, which disturb and restore the balance (references).

The umbrella term mobile element covers a large and heterogeneous group of genetic elements, which are often highly degenerated and inserted within each other leading to complex fragmented and nested structures. Their detection and exact annotation is especially for large plant genomes not trivial. Usually genomic sequences are repeat masked prior to gene detection, to minimize unwanted transposon related gene calls. Our aim goes beyond the sole masking and we try to provide cross-species-consistent annotations of transposons and other repetitive elements in plant genomes suited for evolutionary analyses and comparative genomics. For that purpose we have developed an in house pipeline, termed ANGELA (Automated Nested Genetic Element Annotation). Repeat identification, element defragmentation and data extraction are based on two main components: a continously updated database of repeat elements (PGSB-REdat) and a detailed repeat classification schema (PGSB-REcat).

The PGSB Repeat Element Database (PGSB-REdat)

PGSB-REdat was initially started as a compilation of publicly available plant repeat sequences from TREP, TIGR repeats, PlantSat and Genbank. It was rapidly expanded with a bulk of de novo detected LTR-retrotransposon sequences (~37.000 by LTR_STRUC) from the genomes presented in PGSB PlantsDB. PGSB-REdat provides not only sequences, but if available also additional information regarding source (institution, Genbank ID), literature (Pubmed ID), organism (NCBI tax ID), sequence completeness and most important classification keys linking the repeat sequences to PGSB-REcat, a generic and detailed ontology for repeat elements.

The current public version PGSB-REdat_v9.3p has a size of ~450 Mb and contains ~62.000 sequences. To reduce redundancies the sequences where clustered with >=95% identity over >=95% length coverage, taking the longest element as representative. The public release does not contain yet unpublished data or sequences from Repbase. The repeat database can be browsed on this website and customized subsets can be downloaded with user defined taxon and/or repeat type restrictions. A bulk download is available via FTP.

PGSB Repeat Element Catalog (PGSB-REcat)

PGSB-REcat integrates existing classifications for repetitive elements ( 3-letter code, Repbase, Repeatmasker, TIGR and tandem repeats VNTRs ) into a more detailed hierarchical tree structure. The machine readable key facilitates data extraction on annotated sequences at different levels of detail. The repetitive elements are divided into three main groups:

  • Simple Sequence Repeats (e.g. micro-, minisatellites and satellites)
  • Mobile Elements (Retrotroelements, DNA transposons and Helitrons )
  • High Copy Number Genes (e.g. RNA genes, histones)

A further 'Additional Attributes' category enables the assignment of general features, like replication type or chromosome location and the annotation of sequence attributes, like partial/complete sequence or nested level. A repeat sequence can be annotated with only one key from the three main groups, but possibly several from the 'Additional Attributes' category. The "universal" classification scheme for mobile elements remains an issue for controversial debates (references). PGSB-REcat should therefore be regarded as usable working draft, without claim to be exhaustive or correct in all details. It can still be subject to changes and improvements.
For questions or improvement suggestions please contact Heidrun Gundlach.


