Arabidopsis Thaliana Reference Transcript Dataset 2 (AtRTD2)

Background and purpose

Groups at the University of Dundee/James Hutton Institute, the Universities of Glasgow and Vienna and other colleagues have released AtRTD2, a new transcriptome for Arabidopsis (Zhang et al., 2016 - bioRxiv doi: http://dx.doi.org/10.1101/051938 May. 6, 2016). The objective of making the Reference Transcript Dataset (AtRTD2) was to exploit the accuracy of transcript quantification of programmes such as Salmon and Kallisto in analysing Arabidopsis RNA-seq data for gene expression and, in particular, alternative splicing (AS).

AtRTD2 and AtRTD2-QUASI transcriptomes

The Arabidopsis Thaliana Reference Transcript Dataset 2 (AtRTD2) is a comprehensive, non-redundant, high quality transcript reference dataset developed for RNA-seq analysis. It is generated by integration of transcript assemblies of ca. 8.5 billion pairs of reads from 285 RNA-seq data sets obtained from 129 RNA-seq libraries and merged along with the previous version, AtRTD (which covers TAIR10 transcripts and transcripts from Marquez et al., 2012), and Araport11 transcript assemblies. By comparison to experimental data from high resolution RT-PCR, we observed inaccurate quantification of transcript isoforms for genes with transcripts which have variation in the lengths of their UTRs. This variation is not effectively corrected in RNA-seq analysis programmes. To address this, we have carried out genome-wide modifications of AtRTD2 by padding genomic sequence from the ends of the shorter transcripts up to the co-ordinates of the end of the transcript(s) that covered the biggest region of the gene. This significantly improved transcript quantification and correlation of alternative splicing analysis with experimental data. As a result, we release AtRTD2-QUASI specifically for use in Quantification of Alternatively Spliced Isoforms with Salmon or Kallisto and demonstrate that it outperforms other available transcriptomes for RNA-seq analysis.

Available transcriptomes:

AtRTD2_19April2016.fa contains all the sequences for the actual transcripts and AtRTD2_19April2016.gtf is corresponding transcript information in gene annotation format. AtRTDv2_QUASI_19April2016.fa contains all the sequences for the padded transcripts and AtRTDv2_QUASI_19April2016.gtf is corresponding padded transcript information in gene annotation format.

WARNING! AtRTD2-QUASI will generate accurate quantification of AS isoforms for the majority of genes. AtRTD2-QUASI is ONLY intended for use in Quantification of Alternatively Spliced Isoforms. It should be used with caution to investigate alternative splice junctions in the padded regions in the 3' and 5' UTRs (e.g. if a shorter transcript terminates in an intron in the UTR) and it may not be appropriate for genes with bona fide alternative transcription start sites or major alternative poly A sites. In these cases, both AtRTD2 or AtRTD2-QUASI could be run and validated against experimental data.

Download AtRTD2_19April2016.fa
Download AtRTD2_19April2016.gtf
Download AtRTDv2_QUASI_19April2016.fa
Download AtRTDv2_QUASI_19April2016.gtf
Download readme.txt

Nomenclature in the GTF file

We have used different identifiers to show the source of specific transcripts in the transcriptomes.

Transcript sourceIdentifierExample
TAIR10 (AGI code).nAT5G37850.3
Marquez et al (2012)_IDnnAT5G17660_ID4
Dataset 1- Cufflinks_JCnAT3G48050_JC20
Dataset 1 - StringTie_JSnAT5G45190_JS1
Dataset 2 - Cufflinks_cnAt2G04430_c1
Dataset 2 - StringTie_snAT5G37850_s3
Araport11_PnAT5G14610_P6

Citation

If you use the AtRTD2 transcriptomic data or AtRTD2-QUASI dataset resources in your research and analyses, we would appreciate if you would include the following citation in your publication.

Zhang, R., Calixto, C.P.G., Marquez, Y., Venhuizen, P., Tzioutziou, N.A., Guo, W., Spensley, M., Frei dit Frey, N., Hirt, H., James, A.B., Nimmo, H.G., Barta, A., Kalyna, M., Brown, J.W.S (2016) AtRTD2: A Reference Transcript Dataset for accurate quantification of alternative splicing and expression changes in Arabidopsis thaliana RNA-seq data. bioRxiv doi: http://dx.doi.org/10.1101/051938

Previous AtRTD

We released the first AtRTD (now referred to as AtRTD1) in 2015. It is still available for download under the following citation:

Zhang, R., Calixto, C.P.G., Tzioutziou, N.A., James, A.B., Simpson, C.G., Gou, W., Marquez, Y., Kalyna, M., Patro, R., Eyras, E., Barta, A., Nimmo, H.G. and Brown, J.W.S. (2015) AtRTD - A comprehensive Reference Transcript Dataset resource for accurate quantification of transcript-specific expression in Arabidopsis thaliana. New Phytologist 208, 96-101.

Download atRTD.fasta
Download atRTD.gtf