Groups at the University of Dundee/James Hutton Institute, the Universities of Glasgow and Vienna and other colleagues have released AtRTD2, a new transcriptome for Arabidopsis (Zhang et al., 2016 - bioRxiv doi: http://dx.doi.org/10.1101/051938 May. 6, 2016). The objective of making the Reference Transcript Dataset (AtRTD2) was to exploit the accuracy of transcript quantification of programmes such as Salmon and Kallisto in analysing Arabidopsis RNA-seq data for gene expression and, in particular, alternative splicing (AS).
The Arabidopsis Thaliana Reference Transcript Dataset 2 (AtRTD2) is a comprehensive, non-redundant, high quality transcript reference dataset developed for RNA-seq analysis. It is generated by integration of transcript assemblies of ca. 8.5 billion pairs of reads from 285 RNA-seq data sets obtained from 129 RNA-seq libraries and merged along with the previous version, AtRTD (which covers TAIR10 transcripts and transcripts from Marquez et al., 2012), and Araport11 transcript assemblies. By comparison to experimental data from high resolution RT-PCR, we observed inaccurate quantification of transcript isoforms for genes with transcripts which have variation in the lengths of their UTRs. This variation is not effectively corrected in RNA-seq analysis programmes. To address this, we have carried out genome-wide modifications of AtRTD2 by padding genomic sequence from the ends of the shorter transcripts up to the co-ordinates of the end of the transcript(s) that covered the biggest region of the gene. This significantly improved transcript quantification and correlation of alternative splicing analysis with experimental data. As a result, we release AtRTD2-QUASI specifically for use in Quantification of Alternatively Spliced Isoforms with Salmon or Kallisto and demonstrate that it outperforms other available transcriptomes for RNA-seq analysis.
AtRTD2_19April2016.fa contains all the sequences for the actual transcripts and AtRTD2_19April2016.gtf is corresponding transcript information in gene annotation format. AtRTDv2_QUASI_19April2016.fa contains all the sequences for the padded transcripts and AtRTDv2_QUASI_19April2016.gtf is corresponding padded transcript information in gene annotation format.
WARNING! AtRTD2-QUASI will generate accurate quantification of AS isoforms for the majority of genes. AtRTD2-QUASI is ONLY intended for use in Quantification of Alternatively Spliced Isoforms. It should be used with caution to investigate alternative splice junctions in the padded regions in the 3' and 5' UTRs (e.g. if a shorter transcript terminates in an intron in the UTR) and it may not be appropriate for genes with bona fide alternative transcription start sites or major alternative poly A sites. In these cases, both AtRTD2 or AtRTD2-QUASI could be run and validated against experimental data.
Download AtRTD2_19April2016.faWe have used different identifiers to show the source of specific transcripts in the transcriptomes.
Transcript source | Identifier | Example |
---|---|---|
TAIR10 (AGI code) | .n | AT5G37850.3 |
Marquez et al (2012) | _IDnn | AT5G17660_ID4 |
Dataset 1- Cufflinks | _JCn | AT3G48050_JC20 |
Dataset 1 - StringTie | _JSn | AT5G45190_JS1 |
Dataset 2 - Cufflinks | _cn | At2G04430_c1 |
Dataset 2 - StringTie | _sn | AT5G37850_s3 |
Araport11 | _Pn | AT5G14610_P6 |
Zhang, R., Calixto, C.P.G., Marquez, Y., Venhuizen, P., Tzioutziou, N.A., Guo, W., Spensley, M., Frei dit Frey, N., Hirt, H., James, A.B., Nimmo, H.G., Barta, A., Kalyna, M., Brown, J.W.S (2016) AtRTD2: A Reference Transcript Dataset for accurate quantification of alternative splicing and expression changes in Arabidopsis thaliana RNA-seq data. bioRxiv doi: http://dx.doi.org/10.1101/051938
We released the first AtRTD (now referred to as AtRTD1) in 2015. It is still available for download under the following citation:
Zhang, R., Calixto, C.P.G., Tzioutziou, N.A., James, A.B., Simpson, C.G., Gou, W., Marquez, Y., Kalyna, M., Patro, R., Eyras, E., Barta, A., Nimmo, H.G. and Brown, J.W.S. (2015) AtRTD - A comprehensive Reference Transcript Dataset resource for accurate quantification of transcript-specific expression in Arabidopsis thaliana. New Phytologist 208, 96-101.
Download atRTD.fasta