Information & Computational Sciences

Assembly Conversion

In addition to Tablet, several other assembly viewers have been created by various groups around the world, each tool with its own set of advantages and disadvantages. These applications also support a wide range of assembly formats from an even wider range of next generation sequence assemblers, and converting between formats – so that a data set can be inspected with more than one assembly viewer – can be something of a challenge.

The preferred file format for viewing assemblies or mappings in Tablet is SAM/BAM which has emerged as a de facto standard.

Summary of formats

  • MAQ (binary)
    MAQ assembly output consists of several binary files, with the two main outputs being the .map assembly file and the .cns consensus file.
  • MAQ (text) – An assembly in MAQ (text) format is stored in a tab-delimited text file, usually accompanied by reference data in .fasta or consensus data in .fastq. Supported by Tablet.
  • ACE – An ACE file contains all its assembly information in a single text-based file: both the reads and the consensus/contig information. Several assembly tools can produce ACE files, including the Roche 454 “Newbler” gsAssembler, and MIRA. Supported by Tablet.
  • AFG – Similar to ACE, AFG is a single text-based file assembly container that holds read and consensus information together. Supported by Tablet.
  • BANK – An AMOS bank folder is a special directory of binary encoded files containing all information on an assembly.
  • MAF (MIRA) – The MIRA assembly format (MAF) is similar to ACE but includes read quality scores, and explicit paired end information. This can be converted into SAM/BAM for use in Tablet.
  • SAM – SAM aims to be a generic format for storing large nucleotide sequence alignments, which many assemblers are converging towards using. It is a tab-delimited text format, with optional header information. Supported by Tablet.
  • BAM – A BAM file is a highly compressed, binary version of SAM. Supported by Tablet.
  • SOAP – The SOAP format is a tab-delimited text assembly, usually accompanied by reference data stored in a .fasta file. Supported by Tablet.

MAQ (binary) to MAQ (text)

Conversion requires Maq.

Convert the .map to .txt (maq formatted):

    maq mapview assembly.map > assembly.txt

To generate a consensus file (.fastq formatted) from the .cns file:

    maq cns2fq assembly.cns > assembly.fastq

or to generate a reference file (.fasta formatted) from the .cns file:

    maq cns2ref assembly.cns > assembly.fasta

MAQ (text) to ACE

Conversion requires Tablet.

The .txt Maq alignment and an accompanying .fastq consensus file can be converted to .ace using the command line maqtoace that we distribute with Tablet (located in the utils directory). We may design a GUI for this tool in the near future.

Convert the .txt file to .ace:

    maqtoace -maqtxt=assembly.txt -fastq=consensus.fastq -dir=. -filename=assembly.ace

For OS X users, maqtoace must be run as follows using the terminal:

    cd /Applications/Tablet.app/Contents/Resources/app/lib
    java -Xmx1024m -cp tablet.jar tablet.io.utils.MaqToAce <options as above>

ACE to AFG

Conversion requires AMOS.

Convert the .afg file to .ace:

    toAmos -ace assembly.ace -o assembly.afg

AFG to BANK

Conversion requires AMOS.

Convert the .afg file to a bank folder:

    bank-transact -m assembly.afg -b assembly.bnk -c

This will create a folder named assembly.bnk that will contain a collection of binary files.

MIRA to SAM

MIRA can produce a number of output formats including ACE (which Tablet supports), and its own MIRA Assembly Format MAF which includes paired end information explicitly. Converting MAF to SAM/BAM allows you to view paired end reads (and read group information like strains) in Tablet.

Conversion requires maf2sam.py (Cross platform; requires Python and Biopython.)

Convert the MAF file to (unsorted) SAM:

    maf2sam.py EXAMPLE_out.unpadded.fasta EXAMPLE_out.maf > EXAMPLE_out.sam

Then follow the SAM to BAM instructions below, including sorting and indexing. Experienced Unix/Linux users may find it useful to pipe the maf2sam.py output into samtools view to go directly to (unsorted) BAM.

MAQ (binary) to SAM

Conversion requires SAMtools (Linux only; no conversion tools are provided with the Windows release).

Depending on the version of Maq used to assemble the file, you will need to use either maq2sam-long (.map files generated by maq-0.7.x) or maq2sam-short (for .map files generated by maq-0.6.x).

Convert the .map file to .sam:

    maq2sam-long assembly.map > assembly.sam
    maq2sam-short assembly.map > assembly.sam

SAM to BAM

Conversion requires SAMtools.

If reference data is to be included, it must first be indexed from an input .fasta file:

    samtools faidx reference.fasta

This generates a BAM-compatible reference index (reference.fasta.fai).

Next, generate the actual .bam file (-t can be skipped if excluding reference data):

    samtools view -b -S -t reference.fasta.fai -o assembly.bam assembly.sam

To work efficiently, the .bam file must also be sorted:

    samtools sort assembly.bam assembly_sorted.bam

The final step is to index the .bam file:

    samtools index assembly_sorted.bam

This generates an index named assembly_sorted.bam.bai.

The final collection of files should contain:

    reference.fasta
    reference.fasta.fai
    assembly_sorted.bam
    assembly_sorted.bam.bai

 

Many of the programs and scripts listed here also have options and command-line flags beyond what we have covered. Consult the actual tools’ documentation for further details. If you know of additional programs or formats that should be included, then please let us know.

Top