Export

In order to export result of alignment or clones from binary file (.vdjca or .clns) to a human-readable text file one can use exportAlignments and exportClones commands respectively. The syntax for these commands is:

mixcr exportAlignments [options] alignments.vdjca alignments.txt
mixcr exportClones [options] clones.clns clones.txt

The resulting tab-delimited text file will contain columns with different types of information. If no options specified, the default set of columns, which is sufficient in most cases, will be exported. The possible columns are (see below for details): aligned sequences, qualities, all or just best hit for V, D, J and C genes, corresponding alignemtns, nucleotide and amino acid sequences of gene region present in sequence etc. In case of clones, the additional columns are: clone count, clone fraction etc.

One can customize the list of fields that will be exported by passing parameters to export commands. For example, in order to export just clone count, best hits for V and J genes with corresponding alignments and CDR3 amino acid sequence, one can do:

mixcr exportClones -count -vHit -jHit -vAlignment -jAlignment -aaFeature CDR3 clones.clns clones.txt

The columns in the resulting file will be exported in the exact same order as parameters in the command line. The list of available fields will be reviewed in the next subsections. For convenience, MiXCR provides two predefined sets of fields for exporting: min (will export minimal required information about clones or alignments) and full (used by default); one can use these sets by specifying --preset option:

mixcr exportClones --preset min clones.clns clones.txt

One can add additional columns to preset in the following way:

mixcr exportClones --preset min -qFeature CDR2 clones.clns clones.txt

One can also put all export fields in the file like:

-vHits
-dHits
-feature CDR3
...

and pass this file to export command:

mixcr exportClones --presetFile myFields.txt clones.clns clones.txt

Command line parameters

The list of command line parameters for both exportAlignments and exportClones is the following:

Option Description
-h, –help print help message
-f, –fields list available fields that can be exported
-p, –preset select predefined set of fields to export (full or min)
-pf, –presetFile load file with a list of fields to export
-s, –no-spaces output short versions of column headers which facilitates analysis with Pandas, R/DataFrames or other data tables processing library

Available fields

The following fields can be exported both for alignments and clones:

Field Description
-vHit Best V hit.
-dHit Best D hit.
-jHit Best J hit.
-cHit Best C hit.
-vHits All V hits.
-dHits All D hits.
-jHits All J hits.
-cHits All C hits.
--vHitsWithoutScore All V hits without scores.
--dHitsWithoutScore All D hits without scores.
--jHitsWithoutScore All J hits without scores.
--cHitsWithoutScore All C hits without scores.
-vAlignment Best V alignment.
-dAlignment Best D alignment.
-jAlignment Best J alignment.
-cAlignment Best C alignment.
-vAlignments All V alignments.
-dAlignments All D alignments.
-jAlignments All J alignments.
-cAlignments All C alignments.
-nFeature [feature] Nucleotide sequence of specified gene feature.
-qFeature [feature] Quality of sequences of specified gene feature.
-aaFeature [feature] Amino acid sequence of specified gene feature.
-avrgFeatureQuality [feature] Average quality of sequence of specified gene feature.
-minFeatureQuality [feature] Minimal quality of sequence of specified gene feature.
-defaultAnchorPoints Outputs a list of default anchor points (see table below for the list of anchor points and format).
-lengthOf [feature] Outputs length of specified gene feature.
-positionOf [anchorPoint] Outputs position of specified anchor point in the clonal sequence or aligned read.

The following fields are specific for alignments:

Field Description
-sequence Aligned sequence (initial read), or 2 sequences in case of paired-end reads.
-quality Initial read quality, or 2 qualities in case of paired-end reads.
-readId Index of source read (in e.g. .fastq file) for alignment.
-targets Number of targets, i.e. 1 in case of single reads and 2 in case of paired-end reads.
-descrR1 Description line from initial .fasta or .fastq file of the first read (only available if --save-description was used in align command).
-descrR2 Description line from initial .fastq file of the second read (only available if --save-description was used in align command).

The following fields are specific for clones:

Field Description
-count Clone count.
-fraction Clone fraction.
-sequence Clonal sequence (or several sequences in case of multi-featured assembling).
-quality Clonal sequence quality (or several qualities in case of multi-featured assembling).
-targets Number of targets, i.e. number of gene regions used to assemble clones.

Default anchor point positions

Positions of anchor poins produced by -defaultAnchorPoints option are outputted as a colon separated list. If anchor point is not covered by target sequence nothing is printed for it, but flanking colon symbols are preserved to maintain positions in array. See example:

:::::::::108:117:125:152:186:213:243:244:

If there are several target sequences (e.g. paired-end reads or multi-part clonal sequnce), the array is outputted for each target sequence. In this case arrays are sepparated by comma:

2:61:107:107:118:::::::::::::,:::::::::103:112:120:147:181:208:238:239:

Even if there are no anchor points in one of the parts:

:::::::::::::::::,:::::::::108:117:125:152:186:213:243:244:

The following table shows the correspondance between anchor point and positions in default anchor point array:

Anchors point Zero-based position One-based position
V5UTRBeginTrimmed 0 1
V5UTREnd / L1Begin 1 2
L1End / VIntronBegin 2 3
VIntronEnd / L2Begin 3 4
L2End / FR1Begin 4 5
FR1End / CDR1Begin 5 6
CDR1End / FR2Begin 6 7
FR2End / CDR2Begin 7 8
CDR2End / FR3Begin 8 9
FR3End / CDR3Begin 9 10
VEndTrimmed 10 11
DBeginTrimmed 11 12
DEndTrimmed 12 13
JBeginTrimmed 13 14
CDR3End / FR4Begin 14 15
FR4End 15 16
CBegin 16 17
CExon1End 17 18

Examples

Export only best V, D, J hits and best V hit alignment from .vdjca file:

mixcr exportAlignments -vHit -dHit -jHit -vAlignment input.vdjca test.txt
Best V hit Best D hit Best J hit Best V alignment
IGHV4-34*00   IGHJ4*00 |262|452|453|47|237|SC268GSC271ASC275G|956.1,58|303|450| 56|301|SG72TSA73CSG136TSA144CSA158CSG171T|331.0|
IGHV2-23*00 IGHD2*21 IGHJ6*00 |262|452|453|47|237|SC268GSC271ASC275G|956.1,58|303|450| 56|301|SG72TSA73CSG136TSA144CSA158CSG171T|331.0|

The syntax of alignment is described in appendix.

Exporting well formatted alignments for manual inspection

MiXCR allows to export resulting alignments after align step as a pretty formatted text for manual analysis of produced alignments and structure of library to facilitate optimization of analysis parameters and libraray preparation protocol. To export pretty formatted alignments use exportAlignmentsPretty command:

mixcr exportAlignmentsPretty --skip 1000 --limit 10 input.vdjca test.txt

this will export 10 results after skipping first 1000 records and place result into test.txt file. Skipping of first records is often useful because first sequences in fastq file may have lower quality then average reads, so first resulsts are not representative. It is possible to omit last paramenter with output file name to print result directly to standard output stream (to console), like this:

mixcr exportAlignmentsPretty --skip 1000 --limit 10 input.vdjca

Here is a summary of command line options:

Option Description
-h, --help print help message
-n, --limit limit number of alignments; no more than provided number of results will be outputted
-s, --skip number of results to skip
-t, --top output only top hits for V, D, J nad C genes
--cdr3-contains output only those alignemnts which CDR3 contains specified nucleotides (e.g. --cdr3-contains TTCAGAGGAGC)
--read-contains output only those alignemnts for which corresonding reads contain specified nucleotides e.g. --read-contains ATGCTTGCGCGCT)
--verbose use more verbose format for alignments (see below for example)

Results produced by this command has the following structure:


  >>> Read id: 1

                                                      5'UTR><L1
   Quality    88888888888888888888888887888888888888888888888888888888888888888888888887888878
   Target0  0 AAGGCCTTTCCACTTGGTGATCAGCACTGAGCACAGAGGACTCACCATGGAGTTGGGGCTGAGCTGGGTTTTCCTTGTTG 79
IGHV3-7*00 54 aaggcctttccacttggtgatcagcactgagcacagaggactcaccatggaAttggggctgagctgggttttccttgttg 133

                        L1><L2     L2><FR1
   Quality     88888888887888888888888888888889989989989889999997999999989999999999999999999899
   Target0  80 CTATTTTAGAAGGTGTCCAGTGTGAGGTGAAGTTGGTGGAGTCTGGGGGAGGCCTGGTCCAGCCTGGGGGGTCCCTGAGA 159
IGHV3-7*00 134 ctattttagaaggtgtccagtgtgaggtgCagCtggtggagtctgggggaggcTtggtccagcctggggggtccctgaga 213

                             FR1><CDR1              CDR1><FR2
   Quality     999999999999999999999999999999999999999999999 9999999999999999999999999999999999
   Target0 160 CTCTCCTGTGAAGCCTCCGGATTCACCTTTAGTAGTTATTGGATG-GCATGGGTCCGCCAGGGTCCAGGGCAGGGGCTGG 238
IGHV3-7*00 214 ctctcctgtgCagcctcTggattcacctttagtagCtattggatgAgc-tgggtccgccaggCtccagggAaggggctgg 292

                         FR2><CDR2              CDR2><FR3
   Quality     99999999999999999999999999999999999799999999999999999999999999998999899898999999
   Target0 239 AATGGGTGGGCAACATAAGGCCGGATGGAAGTGAGAGTTGGTACTTGGAGTCTGTGATGGGGCGATTCATGATATCTAGA 318
IGHV3-7*00 293 aGtgggtggCcaacataaAgcAAgatggaagtgagaAAtACtaTGtggaCtctgtgaAgggCcgattcaCCatCtcCaga 372

                                                                                 FR3><CDR3
    Quality     99899899999999988989999889979988888888878878788888888878888888778788888888878888
    Target0 319 GACAACGCCAAGAAGTCACTTTATCTGCAAATGGACAGCCTGAGAGTCGAGGACACGGCCGTCTATTATTGTGCGACTTC 398
 IGHV3-7*00 373 gacaacgccaagaaCtcactGtatctgcaaatgAacagcctgagagCcgaggacacggcTgtGtattaCtgtgcga     448
IGHD3-10*00  12                                                                              ttc 14

                                 CDR3><FR4
    Quality     88888788888888888888888787788777887787777877777877787787877878788788777767778788
    Target0 399 GGAGGAGCCGGAGGACTACTGGGGCCAGGGAGCCCTGGTCACCGTCTCCTCGGCTTCCACCAAGGGCCCATCGGTCTTCC 478
IGHD3-10*00  15 gg-ggag                                                                          20
   IGHJ4*00   8              gactactggggccagggaAccctggtcaccgtctcctc                              45
   IGHG4*00   0                                                      cttccaccaagggcccatcggtcttcc 26
   IGHG3*00   0                                                      cttccaccaagggcccatcggtcttcc 26
   IGHG2*00   0                                                      cCtccaccaagggcccatcggtcttcc 26
   IGHG1*00   0                                                      cCtccaccaagggcccatcggtcttcc 26
   IGHGP*00 194                                                    AgcCtccaccaagggcccatcggtcttcc 222


 Quality     87370
 Target0 479 CCTTG 483
IGHG4*00  27 ccCtg 31
IGHG3*00  27 ccCtg 31
IGHG2*00  27 ccCtg 31
IGHG1*00  27 ccCtg 31
IGHGP*00 223 ccCtg 227

Using of --verbose option will produce alignments in s slightly different format:

>>> Read id: 12343    <--- Index of analysed read in input file

>>> Target sequences (input sequences):

Sequence0:   <--- Read 1 from paired-end read
Contains features: CDR1, VRegionTrimmed, L2, L, Intron, VLIntronL, FR1, Exon1,              <--- Gene features
VExon2Trimmed                                                                                    found in read 1

     0 TCTTGGGGGATTCGGTGATCAGCACTGAACACAGAGGACTCACCATGGAGTTTGGGCTGAACTGGGTTTTCCTCGTTGCT 79  <--- Sequyence & quality 
       FGGEGGGGGDG8F78CFC6CEFF<,CFG9EED,6,CFCC<EEGFG,CE:CCAFFGGC87CEF?A?FBC@FGGFG>B,FC9          of read 1

    80 CTATTAAGAGGTGTCCAGTGTCAGGTGCAGCTGGTGGAGTCTGGGGGTGGCGTGTTCCAGCCTGGGGGGTCCGTGAGACT 159
       F9,A,95AFE,B?,E,C,9AC<FGA<EE5??,A,A<:=:E,=B8C7+++8,++@+,885=D7:@8E+:5*1**11**++<
   160 CTCCTGTGCAGCGTCGGGATGCACATCATGGAGCTATGGCCAGCCCTGGGTACGCCAGGCTACAGGCCACGGGCTGGAGG 239
       <++*++0++2A:ECE5EC5**2@C+:++++++22*2:+29+*2***25/79*0299))*/)*0*0*.75)7:)1)1/)))

   240 GGGTGCGTGGTAGATGGGAA 259
       )9:.)))*1)12***-/).)

Sequence1:   <--- Read 2 from paired-end read
Contains features: JCDR3Part, DCDR3Part, DJJunction, CDR2, JRegionTrimmed, CDR3, VDJunction,
VJJunction, VCDR3Part, ShortCDR3, FR4, FR3

     0 CGAGGCAAGAGGCTGGTGTGGGTGGCGGTTATATGGTATGGTGGAAGTAATAAACACTATGCAGACCCCGTGAAGGGCCG 79
       **0*0**)2**/**5D7<15*9<5:1+*0:GF:=C>6A52++*:2+++FF>>3<++++++302**:**/<+**;:/**2+

    80 ATTCACCATCGCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAAGAGCCTGAGAGCCGAGGACACGGCTTTGT 159
       +++<0***C:2+9GGFB?,5,4,+,2F<>FC=*,,C:>,=,@,,;3<@=,3,,<3,CF?=**<>@,?3,<<:3,CC,E,@

   160 ATTACTGTGCGAGAGGTCAACAGGGTGACTATGTCTACGGTAGGGACGTCGGGGGCCAAGGGACCACGGTCACCGTCTCC 239
       ,@;FCF@+F@FGGF9FD,F>>+B:=,,=><GFCGGCFEGFF?+=B+7EF>+FFA,8F<E:,5+GDFFE,@F?,,7GGDFE

   240 TCAGGGAGTGCATCCGCCCCAACCCTTTTCCCCCTCTCTGCGTTGATACCACTGGCAGCTC 300
       C,FGGGEFCCGEEGGCFCC:8FGEGGGE@DFB-GFGGGGF@GFGFE<,GFCCFCAGC@CCC

>>> Gene features that can be extracted from this (paired-)read:                         <--- For paired-end reads
JCDR3Part, CDR1, VRegionTrimmed, L2, DCDR3Part, VDJTranscriptWithout5UTR, Exon2, L,           some gene features
DJJunction, Intron, FR2, CDR2, VDJRegion, JRegionTrimmed, CDR3, VDJunction, VJJunction,       can be extracted by
VLIntronL, FR1, VCDR3Part, ShortCDR3, Exon1, FR4, VExon2Trimmed, FR3                          merging sequence
                                                                                              information

>>> Alignments with V gene:

IGHV3-33*00 (total score = 1638.0) <--- Alignment of both reads with IGHV3-33
Alignment of Sequence0 (score = 899.0):   <--- Alignment of IGHV3-33 with read 1 from paired-end read
     65 ATTCGGTGATCAGCACTGAACACAGAGGACTCACCATGGAGTTTGGGCTGAGCTGGGTTTTCCTCGTTGCTCTTTTAAGA 144 <--- Germline
        ||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||| ||||||
      9 ATTCGGTGATCAGCACTGAACACAGAGGACTCACCATGGAGTTTGGGCTGAACTGGGTTTTCCTCGTTGCTCTATTAAGA 88  <--- Read
        DG8F78CFC6CEFF<,CFG9EED,6,CFCC<EEGFG,CE:CCAFFGGC87CEF?A?FBC@FGGFG>B,FC9F9,A,95AF     <--- Quality score

    145 GGTGTCCAGTGTCAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGC 224
        |||||||||||||||||||||||||||||||||||||| |||||| ||||||||||| ||||| ||||||||||||||||
     89 GGTGTCCAGTGTCAGGTGCAGCTGGTGGAGTCTGGGGGTGGCGTGTTCCAGCCTGGGGGGTCCGTGAGACTCTCCTGTGC 168
        E,B?,E,C,9AC<FGA<EE5??,A,A<:=:E,=B8C7+++8,++@+,885=D7:@8E+:5*1**11**++<<++*++0++

    225 AGCGTCTGGATTCACCTTCA-GTAGCTATGGCATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTG 300
        |||||| |||| || | ||| | |||||||||  || |||||| ||||||||| ||||| | ||||||||| |||||
    169 AGCGTCGGGATGCA-CATCATGGAGCTATGGCCAGCCCTGGGTACGCCAGGCTACAGGCCACGGGCTGGAGGGGGTG 244
        2A:ECE5EC5**2@ C+:++++++22*2:+29+*2***25/79*0299))*/)*0*0*.75)7:)1)1/))))9:.)

Alignment of Sequence1 (score = 739.0):   <--- Alignment of IGHV3-33 with read 2 from paired-end read
    279 AGGCAAGGGGCTGGAGTGGGTGGCAGTTATATGGTATGATGGAAGTAATAAATACTATGCAGACTCCGTGAAGGGCCGAT 358
        ||||||| |||||| ||||||||| ||||||||||||| ||||||||||||| ||||||||||| |||||||||||||||
      2 AGGCAAGAGGCTGGTGTGGGTGGCGGTTATATGGTATGGTGGAAGTAATAAACACTATGCAGACCCCGTGAAGGGCCGAT 81
        0*0**)2**/**5D7<15*9<5:1+*0:GF:=C>6A52++*:2+++FF>>3<++++++302**:**/<+**;:/**2+++

    359 TCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTAT 438
        |||||||| |||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||| |||||
     82 TCACCATCGCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAAGAGCCTGAGAGCCGAGGACACGGCTTTGTAT 161
        +<0***C:2+9GGFB?,5,4,+,2F<>FC=*,,C:>,=,@,,;3<@=,3,,<3,CF?=**<>@,?3,<<:3,CC,E,@,@

    439 TACTGTGCGAGAG 451
        |||||||||||||
    162 TACTGTGCGAGAG 174
        ;FCF@+F@FGGF9

IGHV3-30*00 (total score = 1582.0)  <--- Alternative hit for V gene
Alignment of Sequence0 (score = 885.0):
     65 ATTCGGTGATCAGCACTGAACACAGAGGACTCACCATGGAGTTTGGGCTGAGCTGGGTTTTCCTCGTTGCTCTTTTAAGA 144
        ||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||| ||||||
      9 ATTCGGTGATCAGCACTGAACACAGAGGACTCACCATGGAGTTTGGGCTGAACTGGGTTTTCCTCGTTGCTCTATTAAGA 88
        DG8F78CFC6CEFF<,CFG9EED,6,CFCC<EEGFG,CE:CCAFFGGC87CEF?A?FBC@FGGFG>B,FC9F9,A,95AF

    145 GGTGTCCAGTGTCAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGC 224
        |||||||||||||||||||||||||||||||||||||| |||||| ||||||||||| ||||| ||||||||||||||||
     89 GGTGTCCAGTGTCAGGTGCAGCTGGTGGAGTCTGGGGGTGGCGTGTTCCAGCCTGGGGGGTCCGTGAGACTCTCCTGTGC 168
        E,B?,E,C,9AC<FGA<EE5??,A,A<:=:E,=B8C7+++8,++@+,885=D7:@8E+:5*1**11**++<<++*++0++

    225 AGCCTCTGGATTCACCTTCA-GTAGCTATGGCATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTG 300
        ||| || |||| || | ||| | |||||||||  || |||||| ||||||||| ||||| | ||||||||| |||||
    169 AGCGTCGGGATGCA-CATCATGGAGCTATGGCCAGCCCTGGGTACGCCAGGCTACAGGCCACGGGCTGGAGGGGGTG 244
        2A:ECE5EC5**2@ C+:++++++22*2:+29+*2***25/79*0299))*/)*0*0*.75)7:)1)1/))))9:.)

Alignment of Sequence1 (score = 697.0):
    279 AGGCAAGGGGCTGGAGTGGGTGGCAGTTATATCATATGATGGAAGTAATAAATACTATGCAGACTCCGTGAAGGGCCGAT 358
        ||||||| |||||| ||||||||| |||||||  |||| ||||||||||||| ||||||||||| |||||||||||||||
      2 AGGCAAGAGGCTGGTGTGGGTGGCGGTTATATGGTATGGTGGAAGTAATAAACACTATGCAGACCCCGTGAAGGGCCGAT 81
        0*0**)2**/**5D7<15*9<5:1+*0:GF:=C>6A52++*:2+++FF>>3<++++++302**:**/<+**;:/**2+++

    359 TCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCTGAGGACACGGCTGTGTAT 438
        |||||||| |||||||||||||||||||||||||||||||||||||||| ||||||||||| |||||||||||| |||||
     82 TCACCATCGCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAAGAGCCTGAGAGCCGAGGACACGGCTTTGTAT 161
        +<0***C:2+9GGFB?,5,4,+,2F<>FC=*,,C:>,=,@,,;3<@=,3,,<3,CF?=**<>@,?3,<<:3,CC,E,@,@

    439 TACTGTGCGAGAG 451
        |||||||||||||
    162 TACTGTGCGAGAG 174
        ;FCF@+F@FGGF9

>>> Alignments with D gene:

IGHD4-17*00 (total score = 40.0)
Alignment of Sequence1 (score = 40.0):
      7 GGTGACTA 14
        ||||||||
    183 GGTGACTA 190
        :=,,=><G

IGHD4-23*00 (total score = 36.0)
Alignment of Sequence1 (score = 36.0):
      0 TGACTACGGT 9
        || |||||||
    191 TGTCTACGGT 200
        FCGGCFEGFF

IGHD2-21*00 (total score = 35.0)
Alignment of Sequence1 (score = 35.0):
     13 GGTGACT 19
        |||||||
    183 GGTGACT 189
        :=,,=><

>>> Alignments with J gene:

IGHJ6*00 (total score = 172.0)
Alignment of Sequence1 (score = 172.0):
     22 GGACGTCTGGGGCAAAGGGACCACGGTCACCGTCTCCTCA 61
        ||||||| ||||| ||||||||||||||||||||||||||
    203 GGACGTCGGGGGCCAAGGGACCACGGTCACCGTCTCCTCA 242
        =B+7EF>+FFA,8F<E:,5+GDFFE,@F?,,7GGDFEC,F

>>> Alignments with C gene:

No hits.