Assembly Conversion
In addition to Tablet, several other assembly viewers have been created by various groups around the world, each tool with its own set of advantages and disadvantages. These applications also support a wide range of assembly formats from an even wider range of next generation sequence assemblers, and converting between formats – so that a data set can be inspected with more than one assembly viewer – can be something of a challenge.
The preferred file format for viewing assemblies or mappings in Tablet is SAM/BAM which has emerged as a de facto standard.
Summary of formats
- MAQ (binary)
MAQ assembly output consists of several binary files, with the two main outputs being the .map assembly file and the .cns consensus file. - MAQ (text) – An assembly in MAQ (text) format is stored in a tab-delimited text file, usually accompanied by reference data in .fasta or consensus data in .fastq. Supported by Tablet.
- ACE – An ACE file contains all its assembly information in a single text-based file: both the reads and the consensus/contig information. Several assembly tools can produce ACE files, including the Roche 454 “Newbler” gsAssembler, and MIRA. Supported by Tablet.
- AFG – Similar to ACE, AFG is a single text-based file assembly container that holds read and consensus information together. Supported by Tablet.
- BANK – An AMOS bank folder is a special directory of binary encoded files containing all information on an assembly.
- MAF (MIRA) – The MIRA assembly format (MAF) is similar to ACE but includes read quality scores, and explicit paired end information. This can be converted into SAM/BAM for use in Tablet.
- SAM – SAM aims to be a generic format for storing large nucleotide sequence alignments, which many assemblers are converging towards using. It is a tab-delimited text format, with optional header information. Supported by Tablet.
- BAM – A BAM file is a highly compressed, binary version of SAM. Supported by Tablet.
- SOAP – The SOAP format is a tab-delimited text assembly, usually accompanied by reference data stored in a .fasta file. Supported by Tablet.
MAQ (binary) to MAQ (text)
Conversion requires Maq.
Convert the .map to .txt (maq formatted):
maq mapview assembly.map > assembly.txt
To generate a consensus file (.fastq formatted) from the .cns file:
maq cns2fq assembly.cns > assembly.fastq
or to generate a reference file (.fasta formatted) from the .cns file:
maq cns2ref assembly.cns > assembly.fasta
MAQ (text) to ACE
Conversion requires Tablet.
The .txt Maq alignment and an accompanying .fastq consensus file can be converted to .ace using the command line maqtoace that we distribute with Tablet (located in the utils directory). We may design a GUI for this tool in the near future.
Convert the .txt file to .ace:
maqtoace -maqtxt=assembly.txt -fastq=consensus.fastq -dir=. -filename=assembly.ace
For OS X users, maqtoace must be run as follows using the terminal:
cd /Applications/Tablet.app/Contents/Resources/app/lib java -Xmx1024m -cp tablet.jar tablet.io.utils.MaqToAce <options as above>
ACE to AFG
Conversion requires AMOS.
Convert the .afg file to .ace:
toAmos -ace assembly.ace -o assembly.afg
AFG to BANK
Conversion requires AMOS.
Convert the .afg file to a bank folder:
bank-transact -m assembly.afg -b assembly.bnk -c
This will create a folder named assembly.bnk that will contain a collection of binary files.
MIRA to SAM
MIRA can produce a number of output formats including ACE (which Tablet supports), and its own MIRA Assembly Format MAF which includes paired end information explicitly. Converting MAF to SAM/BAM allows you to view paired end reads (and read group information like strains) in Tablet.
Conversion requires maf2sam.py (Cross platform; requires Python and Biopython.)
Convert the MAF file to (unsorted) SAM:
maf2sam.py EXAMPLE_out.unpadded.fasta EXAMPLE_out.maf > EXAMPLE_out.sam
Then follow the SAM to BAM instructions below, including sorting and indexing. Experienced Unix/Linux users may find it useful to pipe the maf2sam.py output into samtools view to go directly to (unsorted) BAM.
MAQ (binary) to SAM
Conversion requires SAMtools (Linux only; no conversion tools are provided with the Windows release).
Depending on the version of Maq used to assemble the file, you will need to use either maq2sam-long (.map files generated by maq-0.7.x) or maq2sam-short (for .map files generated by maq-0.6.x).
Convert the .map file to .sam:
maq2sam-long assembly.map > assembly.sam
maq2sam-short assembly.map > assembly.sam
SAM to BAM
Conversion requires SAMtools.
If reference data is to be included, it must first be indexed from an input .fasta file:
samtools faidx reference.fasta
This generates a BAM-compatible reference index (reference.fasta.fai).
Next, generate the actual .bam file (-t can be skipped if excluding reference data):
samtools view -b -S -t reference.fasta.fai -o assembly.bam assembly.sam
To work efficiently, the .bam file must also be sorted:
samtools sort assembly.bam assembly_sorted.bam
The final step is to index the .bam file:
samtools index assembly_sorted.bam
This generates an index named assembly_sorted.bam.bai.
The final collection of files should contain:
reference.fasta reference.fasta.fai assembly_sorted.bam assembly_sorted.bam.bai
Many of the programs and scripts listed here also have options and command-line flags beyond what we have covered. Consult the actual tools’ documentation for further details. If you know of additional programs or formats that should be included, then please let us know.