Export

In order to export result of alignment or clones from binary file (.vdjca or .clns) to a human-readable text file one can use exportAlignments and exportClones commands respectively. The syntax for these commands is:

mixcr exportAlignments [options] alignments.vdjca alignments.txt
mixcr exportClones [options] clones.clns clones.txt

The resulting tab-delimited text file will contain columns with different types of information. If no options specified, the default set of columns, which is sufficient in most cases, will be exported. The possible columns are (see below for details): aligned sequences, qualities, all or just best hit for V, D, J and C genes, corresponding alignemtns, nucleotide and amino acid sequences of gene region present in sequence etc. In case of clones, the additional columns are: clone count, clone fraction etc.

One can customize the list of fields that will be exported by passing parameters to export commands. For example, in order to export just clone count, best hits for V and J genes with corresponding alignments and CDR3 amino acid sequence, one can do:

mixcr exportClones -count -vHit -jHit -vAlignment -jAlignment -aaFeature CDR3 clones.clns clones.txt

The columns in the resulting file will be exported in the exact same order as parameters in the command line. The list of available fields will be reviewed in the next subsections. For convenience, MiXCR provides two predefined sets of fields for exporting: min (will export minimal required information about clones or alignments) and full (used by default); one can use these sets by specifying --preset option:

mixcr exportClones --preset min clones.clns clones.txt

One can add additional columns to preset in the following way:

mixcr exportClones --preset min -qFeature CDR2 clones.clns clones.txt

One can also put all export fields in the file like:

-vHits
-dHits
-feature CDR3
...

and pass this file to export command:

mixcr exportClones --preset-file myFields.txt clones.clns clones.txt

Command line parameters

The list of command line parameters for both exportAlignments and exportClones is the following:

Option Description
-h, --help print help message
-f, --fields list available fields that can be exported
-p, --preset select predefined set of fields to export (full or min)
-pf, --preset-file load file with a list of fields to export
-lf, --list-fields list availabel fields that can be exported
-s, --no-spaces output short versions of column headers which facilitates analysis with Pandas, R/DataFrames or other data tables processing library

The line parameters are only for exportClones:

-c, --chains Limit output to specific locus (e.g. TRA or IGH). Clone fractions will be recalculated accordingly.
-o, --filter-out-of-frames Exclude out of frames (fractions will be recalculated)
-t, --filter-stops Exclude sequences containing stop codons (fractions will be recalculated)
-m, --minimal-clone-count Filter clones by minimal read count.
-q, --minimal-clone-fraction Filter clones by minimal clone fraction.

Available fields

The following fields can be exported both for alignments and clones:

Field name Description
-targets Number of targets
-vHit Best V hit
-dHit Best D hit
-jHit Best J hit
-cHit Best C hit
-vGene Best V hit gene name (e.g. TRBV12-3 for TRBV12-3*00)
-dGene Best D hit gene name (e.g. TRBV12-3 for TRBV12-3*00)
-jGene Best J hit gene name (e.g. TRBV12-3 for TRBV12-3*00)
-cGene Best C hit gene name (e.g. TRBV12-3 for TRBV12-3*00)
-vFamily Best V hit family name (e.g. TRBV12 for TRBV12-3*00)
-dFamily Best D hit family name (e.g. TRBV12 for TRBV12-3*00)
-jFamily Best J hit family name (e.g. TRBV12 for TRBV12-3*00)
-cFamily Best C hit family name (e.g. TRBV12 for TRBV12-3*00)
-vHitScore Score for best V hit
-dHitScore Score for best D hit
-jHitScore Score for best J hit
-cHitScore Score for best C hit
-vHitsWithScore All V hits with score
-dHitsWithScore All D hits with score
-jHitsWithScore All J hits with score
-cHitsWithScore All C hits with score
-vHits All V hits
-dHits All D hits
-jHits All J hits
-cHits All C hits
-vGenes All V gene names (e.g. TRBV12-3 for TRBV12-3*00)
-dGenes All D gene names (e.g. TRBV12-3 for TRBV12-3*00)
-jGenes All J gene names (e.g. TRBV12-3 for TRBV12-3*00)
-cGenes All C gene names (e.g. TRBV12-3 for TRBV12-3*00)
-vFamilies All V gene family anmes (e.g. TRBV12 for TRBV12-3*00)
-dFamilies All D gene family anmes (e.g. TRBV12 for TRBV12-3*00)
-jFamilies All J gene family anmes (e.g. TRBV12 for TRBV12-3*00)
-cFamilies All C gene family anmes (e.g. TRBV12 for TRBV12-3*00)
-vAlignment Best V alignment
-dAlignment Best D alignment
-jAlignment Best J alignment
-cAlignment Best C alignment
-vAlignments All V alignments
-dAlignments All D alignments
-jAlignments All J alignments
-cAlignments All C alignments
-nFeature <gene_feature> Nucleotide sequence of specified gene feature
-qFeature <gene_feature> Quality string of specified gene feature
-aaFeature <gene_feature> Amino acid sequence of specified gene feature
-minFeatureQuality <gene_feature> Minimal quality of specified gene feature
-avrgFeatureQuality <gene_feature> Average quality of specified gene feature
-lengthOf <gene_feature> S length of specified gene feature.
-nMutations <gene_feature> Extract nucleotide mutations for specific gene feature; relative to germline sequence.
-nMutationsRelative <gene_feature> <relative_to_gene_feature> Extract nucleotide mutations for specific gene feature relative to another feature.
-aaMutations <gene_feature> Extract amino acid mutations for specific gene feature
-aaMutationsRelative <gene_feature> <relative_to_gene_feature> Extract amino acid mutations for specific gene feature relative to another feature.
-mutationsDetailed <gene_feature> Detailed list of nucleotide and corresponding amino acid mutations. Format <nt_mutation>:<aa_mutation_individual>:<aa_mutation_cumulative>, where <aa_mutation_individual> is an expected amino acid mutation given no other mutations have occurred, and <aa_mutation_cumulative> amino acid mutation is the observed amino acid mutation combining effect from all other
-mutationsDetailedRelative <gene_feature> <relative_to_gene_feature> Detailed list of nucleotide and corresponding amino acid mutations written, positions relative to specified gene feature. Format <nt_mutation>:<aa_mutation_individual>:<aa_mutation_cumulative>, where <aa_mutation_individual> is an expected amino acid mutation given no other mutations have occurred, and <aa_mutation_cumulative> amino acid mutation is the observed amino acid mutation combining effect from all other
-positionOf <reference_point> S position of specified reference point inside target sequences (clonal sequence / read sequence).
-defaultAnchorPoints Outputs a list of default reference points (like CDR2Begin, FR4End, etc. see documentation for the full list and formatting)
-vIdentityPercents V alignment identity percents
-dIdentityPercents D alignment identity percents
-jIdentityPercents J alignment identity percents
-cIdentityPercents C alignment identity percents
-vBestIdentityPercent Vbest alignment identity percent
-dBestIdentityPercent Dbest alignment identity percent
-jBestIdentityPercent Jbest alignment identity percent
-cBestIdentityPercent Cbest alignment identity percent

The following fields are specific for alignments:

Field name Description
-readId Id of read corresponding to alignment
-sequence Aligned sequence (initial read), or 2 sequences in case of paired-end reads
-quality Initial read quality, or 2 qualities in case of paired-end reads
-descrR1 Description line from initial .fasta or .fastq file of the first read (only available if –save-description was used in align command)
-descrR2 Description line from initial .fasta or .fastq file of the second read (only available if –save-description was used in align command)
-cloneId <index_file> To which clone alignment was attached.
-cloneIdWithMappingType <index_file> To which clone alignment was attached with additional info on mapping type.

The following fields are specific for clones:

Field name Description
-cloneId Unique clone identifier
-count Clone count
-fraction Clone fraction
-sequence Aligned sequence (initial read), or 2 sequences in case of paired-end reads
-quality Initial read quality, or 2 qualities in case of paired-end reads
-readIds <index_file> Read IDs aggregated by clone.

Default anchor point positions

Positions of anchor poins produced by -defaultAnchorPoints option are outputted as a colon separated list. If anchor point is not covered by target sequence nothing is printed for it, but flanking colon symbols are preserved to maintain positions in array. See example:

:::::::::108:117:125:152:186:213:243:244:

If there are several target sequences (e.g. paired-end reads or multi-part clonal sequnce), the array is outputted for each target sequence. In this case arrays are sepparated by comma:

2:61:107:107:118:::::::::::::,:::::::::103:112:120:147:181:208:238:239:

Even if there are no anchor points in one of the parts:

:::::::::::::::::,:::::::::108:117:125:152:186:213:243:244:

The following table shows the correspondance between anchor point and positions in default anchor point array:

Anchors point Zero-based position One-based position
V5UTRBeginTrimmed 0 1
V5UTREnd / L1Begin 1 2
L1End / VIntronBegin 2 3
VIntronEnd / L2Begin 3 4
L2End / FR1Begin 4 5
FR1End / CDR1Begin 5 6
CDR1End / FR2Begin 6 7
FR2End / CDR2Begin 7 8
CDR2End / FR3Begin 8 9
FR3End / CDR3Begin 9 10
Number of 3’ V deletions (negative value), or length of 3’ V P-segment (positive value) 10 11
VEndTrimmed, next position after last aligned nucleotide of V gene 11 12
DBeginTrimmed, position of first aligned nucleotide of D gene 12 13
Number of 5’ D deletions (negative value), or length of 5’ D P-segment (positive value) 13 14
Number of 3’ D deletions (negative value), or length of 3’ D P-segment (positive value) 14 15
DEndTrimmed, next position after last aligned nucleotide of D gene 15 16
JBeginTrimmed, position of first aligned nucleotide of J gene 16 17
Number of 3’ J deletions (negative value), or length of 3’ J P-segment (positive value) 17 18
CDR3End / FR4Begin 18 19
FR4End 19 20
CBegin 20 21
CExon1End 21 22

The following regular expressions can be used to parse content of this field in Python:

  • for length analysis, or analysis of raw alignments:

    ^(?P<V5UTRBegin>-?[0-9]*):(?P<L1Begin>-?[0-9]*):(?P<VIntronBegin>-?[0-9]*):(?P<L2Begin>-?[0-9]*):(?P<FR1Begin>-?[0-9]*):(?P<CDR1Begin>-?[0-9]*):(?P<FR2Begin>-?[0-9]*):(?P<CDR2Begin>-?[0-9]*):(?P<FR3Begin>-?[0-9]*):(?P<CDR3Begin>-?[0-9]*):(?P<V3Deletion>-?[0-9]*):(?P<VEnd>-?[0-9]*):(?P<DBegin>-?[0-9]*):(?P<D5Deletion>-?[0-9]*):(?P<D3Deletion>-?[0-9]*):(?P<DEnd>-?[0-9]*):(?P<JBegin>-?[0-9]*):(?P<J5Deletion>-?[0-9]*):(?P<CDR3End>-?[0-9]*):(?P<CBegin>-?[0-9]*):(?P<CExon1End>-?[0-9]*)$
    

    snipped for Pandas:

    import pandas as pd
    data = pd.read_table("exported.txt", low_memory=False)
    anchorPointsRegex="^(?P<V5UTRBegin>-?[0-9]*):(?P<L1Begin>-?[0-9]*):(?P<VIntronBegin>-?[0-9]*):(?P<L2Begin>-?[0-9]*):(?P<FR1Begin>-?[0-9]*):(?P<CDR1Begin>-?[0-9]*):(?P<FR2Begin>-?[0-9]*):(?P<CDR2Begin>-?[0-9]*):(?P<FR3Begin>-?[0-9]*):(?P<CDR3Begin>-?[0-9]*):(?P<V3Deletion>-?[0-9]*):(?P<VEnd>-?[0-9]*):(?P<DBegin>-?[0-9]*):(?P<D5Deletion>-?[0-9]*):(?P<D3Deletion>-?[0-9]*):(?P<DEnd>-?[0-9]*):(?P<JBegin>-?[0-9]*):(?P<J5Deletion>-?[0-9]*):(?P<CDR3End>-?[0-9]*):(?P<CBegin>-?[0-9]*):(?P<CExon1End>-?[0-9]*)$"
    data = pd.concat([data, d.refPoints.str.extract(anchorPointsRegex, expand=True).apply(pd.to_numeric)], axis=1)
    
  • simplified regular expression with the smaller number of fields, can be used for analysis of CDR3-assembled clonotypes:

    ^(?:-?[0-9]*:){8}(?:-?[0-9]*):(?P<CDR3Begin>-?[0-9]*):(?P<V3Deletion>-?[0-9]*):(?P<VEnd>-?[0-9]*):(?P<DBegin>-?[0-9]*):(?P<D5Deletion>-?[0-9]*):(?P<D3Deletion>-?[0-9]*):(?P<DEnd>-?[0-9]*):(?P<JBegin>-?[0-9]*):(?P<J5Deletion>-?[0-9]*):(?P<CDR3End>-?[0-9]*):(?:-?[0-9]*:){2}(?:-?[0-9]*)$
    

    snipped for Pandas:

    import pandas as pd
    data = pd.read_table("exported.txt", low_memory=False)
    anchorPointsRegex="^^(?:-?[0-9]*:){8}(?:-?[0-9]*):(?P<CDR3Begin>-?[0-9]*):(?P<V3Deletion>-?[0-9]*):(?P<VEnd>-?[0-9]*):(?P<DBegin>-?[0-9]*):(?P<D5Deletion>-?[0-9]*):(?P<D3Deletion>-?[0-9]*):(?P<DEnd>-?[0-9]*):(?P<JBegin>-?[0-9]*):(?P<J5Deletion>-?[0-9]*):(?P<CDR3End>-?[0-9]*):(?:-?[0-9]*:){2}(?:-?[0-9]*)$"
    data = pd.concat([data, d.refPoints.str.extract(anchorPointsRegex, expand=True).apply(pd.to_numeric)], axis=1)
    

Examples

Export only best V, D, J hits and best V hit alignment from .vdjca file:

mixcr exportAlignments -vHit -dHit -jHit -vAlignment input.vdjca test.txt
Best V hit Best D hit Best J hit Best V alignment
IGHV4-34*00   IGHJ4*00 |262|452|453|47|237|SC268GSC271ASC275G|956.1,58|303|450| 56|301|SG72TSA73CSG136TSA144CSA158CSG171T|331.0|
IGHV2-23*00 IGHD2*21 IGHJ6*00 |262|452|453|47|237|SC268GSC271ASC275G|956.1,58|303|450| 56|301|SG72TSA73CSG136TSA144CSA158CSG171T|331.0|

The syntax of alignment is described in appendix.

Exporting well formatted alignments for manual inspection

MiXCR allows to export resulting alignments after align step as a pretty formatted text for manual analysis of produced alignments and structure of library to facilitate optimization of analysis parameters and libraray preparation protocol. To export pretty formatted alignments use exportAlignmentsPretty command:

mixcr exportAlignmentsPretty --skip 1000 --limit 10 input.vdjca test.txt

this will export 10 results after skipping first 1000 records and place result into test.txt file. Skipping of first records is often useful because first sequences in fastq file may have lower quality then average reads, so first resulsts are not representative. It is possible to omit last paramenter with output file name to print result directly to standard output stream (to console), like this:

mixcr exportAlignmentsPretty --skip 1000 --limit 10 input.vdjca

Here is a summary of command line options:

Option Description
-h, --help print help message
-n, --limit limit number of alignments; no more than provided number of results will be outputted
-s, --skip number of results to skip
-t, --top output only top hits for V, D, J nad C genes
--cdr3-contains output only those alignemnts which CDR3 contains specified nucleotides (e.g. --cdr3-contains TTCAGAGGAGC)
--read-contains output only those alignemnts for which corresonding reads contain specified nucleotides e.g. --read-contains ATGCTTGCGCGCT)
--verbose use more verbose format for alignments (see below for example)

Results produced by this command has the following structure:


  >>> Read id: 1

                                                      5'UTR><L1
   Quality    88888888888888888888888887888888888888888888888888888888888888888888888887888878
   Target0  0 AAGGCCTTTCCACTTGGTGATCAGCACTGAGCACAGAGGACTCACCATGGAGTTGGGGCTGAGCTGGGTTTTCCTTGTTG 79
IGHV3-7*00 54 aaggcctttccacttggtgatcagcactgagcacagaggactcaccatggaAttggggctgagctgggttttccttgttg 133

                        L1><L2     L2><FR1
   Quality     88888888887888888888888888888889989989989889999997999999989999999999999999999899
   Target0  80 CTATTTTAGAAGGTGTCCAGTGTGAGGTGAAGTTGGTGGAGTCTGGGGGAGGCCTGGTCCAGCCTGGGGGGTCCCTGAGA 159
IGHV3-7*00 134 ctattttagaaggtgtccagtgtgaggtgCagCtggtggagtctgggggaggcTtggtccagcctggggggtccctgaga 213

                             FR1><CDR1              CDR1><FR2
   Quality     999999999999999999999999999999999999999999999 9999999999999999999999999999999999
   Target0 160 CTCTCCTGTGAAGCCTCCGGATTCACCTTTAGTAGTTATTGGATG-GCATGGGTCCGCCAGGGTCCAGGGCAGGGGCTGG 238
IGHV3-7*00 214 ctctcctgtgCagcctcTggattcacctttagtagCtattggatgAgc-tgggtccgccaggCtccagggAaggggctgg 292

                         FR2><CDR2              CDR2><FR3
   Quality     99999999999999999999999999999999999799999999999999999999999999998999899898999999
   Target0 239 AATGGGTGGGCAACATAAGGCCGGATGGAAGTGAGAGTTGGTACTTGGAGTCTGTGATGGGGCGATTCATGATATCTAGA 318
IGHV3-7*00 293 aGtgggtggCcaacataaAgcAAgatggaagtgagaAAtACtaTGtggaCtctgtgaAgggCcgattcaCCatCtcCaga 372

                                                                                 FR3><CDR3
    Quality     99899899999999988989999889979988888888878878788888888878888888778788888888878888
    Target0 319 GACAACGCCAAGAAGTCACTTTATCTGCAAATGGACAGCCTGAGAGTCGAGGACACGGCCGTCTATTATTGTGCGACTTC 398
 IGHV3-7*00 373 gacaacgccaagaaCtcactGtatctgcaaatgAacagcctgagagCcgaggacacggcTgtGtattaCtgtgcga     448
IGHD3-10*00  12                                                                              ttc 14

                                 CDR3><FR4
    Quality     88888788888888888888888787788777887787777877777877787787877878788788777767778788
    Target0 399 GGAGGAGCCGGAGGACTACTGGGGCCAGGGAGCCCTGGTCACCGTCTCCTCGGCTTCCACCAAGGGCCCATCGGTCTTCC 478
IGHD3-10*00  15 gg-ggag                                                                          20
   IGHJ4*00   8              gactactggggccagggaAccctggtcaccgtctcctc                              45
   IGHG4*00   0                                                      cttccaccaagggcccatcggtcttcc 26
   IGHG3*00   0                                                      cttccaccaagggcccatcggtcttcc 26
   IGHG2*00   0                                                      cCtccaccaagggcccatcggtcttcc 26
   IGHG1*00   0                                                      cCtccaccaagggcccatcggtcttcc 26
   IGHGP*00 194                                                    AgcCtccaccaagggcccatcggtcttcc 222


 Quality     87370
 Target0 479 CCTTG 483
IGHG4*00  27 ccCtg 31
IGHG3*00  27 ccCtg 31
IGHG2*00  27 ccCtg 31
IGHG1*00  27 ccCtg 31
IGHGP*00 223 ccCtg 227

Using of --verbose option will produce alignments in s slightly different format:

>>> Read id: 12343    <--- Index of analysed read in input file

>>> Target sequences (input sequences):

Sequence0:   <--- Read 1 from paired-end read
Contains features: CDR1, VRegionTrimmed, L2, L, Intron, VLIntronL, FR1, Exon1,              <--- Gene features
VExon2Trimmed                                                                                    found in read 1

     0 TCTTGGGGGATTCGGTGATCAGCACTGAACACAGAGGACTCACCATGGAGTTTGGGCTGAACTGGGTTTTCCTCGTTGCT 79  <--- Sequyence & quality 
       FGGEGGGGGDG8F78CFC6CEFF<,CFG9EED,6,CFCC<EEGFG,CE:CCAFFGGC87CEF?A?FBC@FGGFG>B,FC9          of read 1

    80 CTATTAAGAGGTGTCCAGTGTCAGGTGCAGCTGGTGGAGTCTGGGGGTGGCGTGTTCCAGCCTGGGGGGTCCGTGAGACT 159
       F9,A,95AFE,B?,E,C,9AC<FGA<EE5??,A,A<:=:E,=B8C7+++8,++@+,885=D7:@8E+:5*1**11**++<
   160 CTCCTGTGCAGCGTCGGGATGCACATCATGGAGCTATGGCCAGCCCTGGGTACGCCAGGCTACAGGCCACGGGCTGGAGG 239
       <++*++0++2A:ECE5EC5**2@C+:++++++22*2:+29+*2***25/79*0299))*/)*0*0*.75)7:)1)1/)))

   240 GGGTGCGTGGTAGATGGGAA 259
       )9:.)))*1)12***-/).)

Sequence1:   <--- Read 2 from paired-end read
Contains features: JCDR3Part, DCDR3Part, DJJunction, CDR2, JRegionTrimmed, CDR3, VDJunction,
VJJunction, VCDR3Part, ShortCDR3, FR4, FR3

     0 CGAGGCAAGAGGCTGGTGTGGGTGGCGGTTATATGGTATGGTGGAAGTAATAAACACTATGCAGACCCCGTGAAGGGCCG 79
       **0*0**)2**/**5D7<15*9<5:1+*0:GF:=C>6A52++*:2+++FF>>3<++++++302**:**/<+**;:/**2+

    80 ATTCACCATCGCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAAGAGCCTGAGAGCCGAGGACACGGCTTTGT 159
       +++<0***C:2+9GGFB?,5,4,+,2F<>FC=*,,C:>,=,@,,;3<@=,3,,<3,CF?=**<>@,?3,<<:3,CC,E,@

   160 ATTACTGTGCGAGAGGTCAACAGGGTGACTATGTCTACGGTAGGGACGTCGGGGGCCAAGGGACCACGGTCACCGTCTCC 239
       ,@;FCF@+F@FGGF9FD,F>>+B:=,,=><GFCGGCFEGFF?+=B+7EF>+FFA,8F<E:,5+GDFFE,@F?,,7GGDFE

   240 TCAGGGAGTGCATCCGCCCCAACCCTTTTCCCCCTCTCTGCGTTGATACCACTGGCAGCTC 300
       C,FGGGEFCCGEEGGCFCC:8FGEGGGE@DFB-GFGGGGF@GFGFE<,GFCCFCAGC@CCC

>>> Gene features that can be extracted from this (paired-)read:                         <--- For paired-end reads
JCDR3Part, CDR1, VRegionTrimmed, L2, DCDR3Part, VDJTranscriptWithout5UTR, Exon2, L,           some gene features
DJJunction, Intron, FR2, CDR2, VDJRegion, JRegionTrimmed, CDR3, VDJunction, VJJunction,       can be extracted by
VLIntronL, FR1, VCDR3Part, ShortCDR3, Exon1, FR4, VExon2Trimmed, FR3                          merging sequence
                                                                                              information

>>> Alignments with V gene:

IGHV3-33*00 (total score = 1638.0) <--- Alignment of both reads with IGHV3-33
Alignment of Sequence0 (score = 899.0):   <--- Alignment of IGHV3-33 with read 1 from paired-end read
     65 ATTCGGTGATCAGCACTGAACACAGAGGACTCACCATGGAGTTTGGGCTGAGCTGGGTTTTCCTCGTTGCTCTTTTAAGA 144 <--- Germline
        ||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||| ||||||
      9 ATTCGGTGATCAGCACTGAACACAGAGGACTCACCATGGAGTTTGGGCTGAACTGGGTTTTCCTCGTTGCTCTATTAAGA 88  <--- Read
        DG8F78CFC6CEFF<,CFG9EED,6,CFCC<EEGFG,CE:CCAFFGGC87CEF?A?FBC@FGGFG>B,FC9F9,A,95AF     <--- Quality score

    145 GGTGTCCAGTGTCAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGC 224
        |||||||||||||||||||||||||||||||||||||| |||||| ||||||||||| ||||| ||||||||||||||||
     89 GGTGTCCAGTGTCAGGTGCAGCTGGTGGAGTCTGGGGGTGGCGTGTTCCAGCCTGGGGGGTCCGTGAGACTCTCCTGTGC 168
        E,B?,E,C,9AC<FGA<EE5??,A,A<:=:E,=B8C7+++8,++@+,885=D7:@8E+:5*1**11**++<<++*++0++

    225 AGCGTCTGGATTCACCTTCA-GTAGCTATGGCATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTG 300
        |||||| |||| || | ||| | |||||||||  || |||||| ||||||||| ||||| | ||||||||| |||||
    169 AGCGTCGGGATGCA-CATCATGGAGCTATGGCCAGCCCTGGGTACGCCAGGCTACAGGCCACGGGCTGGAGGGGGTG 244
        2A:ECE5EC5**2@ C+:++++++22*2:+29+*2***25/79*0299))*/)*0*0*.75)7:)1)1/))))9:.)

Alignment of Sequence1 (score = 739.0):   <--- Alignment of IGHV3-33 with read 2 from paired-end read
    279 AGGCAAGGGGCTGGAGTGGGTGGCAGTTATATGGTATGATGGAAGTAATAAATACTATGCAGACTCCGTGAAGGGCCGAT 358
        ||||||| |||||| ||||||||| ||||||||||||| ||||||||||||| ||||||||||| |||||||||||||||
      2 AGGCAAGAGGCTGGTGTGGGTGGCGGTTATATGGTATGGTGGAAGTAATAAACACTATGCAGACCCCGTGAAGGGCCGAT 81
        0*0**)2**/**5D7<15*9<5:1+*0:GF:=C>6A52++*:2+++FF>>3<++++++302**:**/<+**;:/**2+++

    359 TCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTAT 438
        |||||||| |||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||| |||||
     82 TCACCATCGCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAAGAGCCTGAGAGCCGAGGACACGGCTTTGTAT 161
        +<0***C:2+9GGFB?,5,4,+,2F<>FC=*,,C:>,=,@,,;3<@=,3,,<3,CF?=**<>@,?3,<<:3,CC,E,@,@

    439 TACTGTGCGAGAG 451
        |||||||||||||
    162 TACTGTGCGAGAG 174
        ;FCF@+F@FGGF9

IGHV3-30*00 (total score = 1582.0)  <--- Alternative hit for V gene
Alignment of Sequence0 (score = 885.0):
     65 ATTCGGTGATCAGCACTGAACACAGAGGACTCACCATGGAGTTTGGGCTGAGCTGGGTTTTCCTCGTTGCTCTTTTAAGA 144
        ||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||| ||||||
      9 ATTCGGTGATCAGCACTGAACACAGAGGACTCACCATGGAGTTTGGGCTGAACTGGGTTTTCCTCGTTGCTCTATTAAGA 88
        DG8F78CFC6CEFF<,CFG9EED,6,CFCC<EEGFG,CE:CCAFFGGC87CEF?A?FBC@FGGFG>B,FC9F9,A,95AF

    145 GGTGTCCAGTGTCAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGC 224
        |||||||||||||||||||||||||||||||||||||| |||||| ||||||||||| ||||| ||||||||||||||||
     89 GGTGTCCAGTGTCAGGTGCAGCTGGTGGAGTCTGGGGGTGGCGTGTTCCAGCCTGGGGGGTCCGTGAGACTCTCCTGTGC 168
        E,B?,E,C,9AC<FGA<EE5??,A,A<:=:E,=B8C7+++8,++@+,885=D7:@8E+:5*1**11**++<<++*++0++

    225 AGCCTCTGGATTCACCTTCA-GTAGCTATGGCATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTG 300
        ||| || |||| || | ||| | |||||||||  || |||||| ||||||||| ||||| | ||||||||| |||||
    169 AGCGTCGGGATGCA-CATCATGGAGCTATGGCCAGCCCTGGGTACGCCAGGCTACAGGCCACGGGCTGGAGGGGGTG 244
        2A:ECE5EC5**2@ C+:++++++22*2:+29+*2***25/79*0299))*/)*0*0*.75)7:)1)1/))))9:.)

Alignment of Sequence1 (score = 697.0):
    279 AGGCAAGGGGCTGGAGTGGGTGGCAGTTATATCATATGATGGAAGTAATAAATACTATGCAGACTCCGTGAAGGGCCGAT 358
        ||||||| |||||| ||||||||| |||||||  |||| ||||||||||||| ||||||||||| |||||||||||||||
      2 AGGCAAGAGGCTGGTGTGGGTGGCGGTTATATGGTATGGTGGAAGTAATAAACACTATGCAGACCCCGTGAAGGGCCGAT 81
        0*0**)2**/**5D7<15*9<5:1+*0:GF:=C>6A52++*:2+++FF>>3<++++++302**:**/<+**;:/**2+++

    359 TCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCTGAGGACACGGCTGTGTAT 438
        |||||||| |||||||||||||||||||||||||||||||||||||||| ||||||||||| |||||||||||| |||||
     82 TCACCATCGCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAAGAGCCTGAGAGCCGAGGACACGGCTTTGTAT 161
        +<0***C:2+9GGFB?,5,4,+,2F<>FC=*,,C:>,=,@,,;3<@=,3,,<3,CF?=**<>@,?3,<<:3,CC,E,@,@

    439 TACTGTGCGAGAG 451
        |||||||||||||
    162 TACTGTGCGAGAG 174
        ;FCF@+F@FGGF9

>>> Alignments with D gene:

IGHD4-17*00 (total score = 40.0)
Alignment of Sequence1 (score = 40.0):
      7 GGTGACTA 14
        ||||||||
    183 GGTGACTA 190
        :=,,=><G

IGHD4-23*00 (total score = 36.0)
Alignment of Sequence1 (score = 36.0):
      0 TGACTACGGT 9
        || |||||||
    191 TGTCTACGGT 200
        FCGGCFEGFF

IGHD2-21*00 (total score = 35.0)
Alignment of Sequence1 (score = 35.0):
     13 GGTGACT 19
        |||||||
    183 GGTGACT 189
        :=,,=><

>>> Alignments with J gene:

IGHJ6*00 (total score = 172.0)
Alignment of Sequence1 (score = 172.0):
     22 GGACGTCTGGGGCAAAGGGACCACGGTCACCGTCTCCTCA 61
        ||||||| ||||| ||||||||||||||||||||||||||
    203 GGACGTCGGGGGCCAAGGGACCACGGTCACCGTCTCCTCA 242
        =B+7EF>+FFA,8F<E:,5+GDFFE,@F?,,7GGDFEC,F

>>> Alignments with C gene:

No hits.

Exporting reads aggregated by clones

MiXCR allows to preserve mapping between initial reads and final clonotypes. There are several options how to access this information.

In any way, first one need to specify additonal option --index for the assemble command:

mixcr assemble --index index_file alignments.vdjca output.clns

This will tell MiXCR to store mapping in the file index_file (actually two files will be created: index_file and index_file.p both of which are used to store the index; in further options one should specify only index_file without .p extension and MiXCR will automatically read both required files). Now one can use index_file in order to access this information. For example using -cloneId option for exportAlignments command:

mixcr exportAlignments -p min -cloneId index_file alignments.vdjca alignments.txt

will print additional column with id of the clone which contains corresponding alignment:

Best V hit Best D hit ... CloneId
IGHV4-34*00   ... 321
IGHV2-23*00 IGHD2*21 ...  
IGHV4-34*00 IGHD2*21 ... 22143
... ... ... ...

For more information one can export mapping type as well:

mixcr exportAlignments -p min -cloneIdWithMappingType index_file alignments.vdjca alignments.txt

which will give something like:

Best V hit Best D hit ... Clone mapping
IGHV4-34*00   ... 321:core
IGHV2-23*00 IGHD2*21 ... dropped
IGHV4-34*00 IGHD2*21 ... 22143:clustered
IGHV4-34*00 IGHD2*21 ... 23:mapped
... ... ... ...

One can also export all read IDs that were aggregated by eah clone. For this one can use -readIds export options for exportClones action:

mixcr exportClones -c IGH -p min -readIds index_file clones.clns clones.txt

This will add a column with full enumeration of all reads that were absorbed by particular clone:

Clone ID Clone count Best V hit ... Reads
0 7213 IGHV4-34*00 ... 56,74,92,96,101,119,169,183...
1 2951 IGHV2-23*00 ... 46,145,194,226,382,451,464...
2 2269 IGHV4-34*00 ... 58,85,90,103,113,116,122,123...
3 124 IGHV4-34*00 ... 240,376,496,617,715,783,813...
...   ... ... ...

Note, that resulting txt file may be very huge since all read numbers that were successfully assembled will be printed.

Finally, one can export reads aggregated by each clone into separate .fastq file. For that one need first to specify additional -g option for align command:

mixcr align -g input.fastq alignments.vdjca.gz

With this option MiXCR will store original reads in the .vdjca file. Then one can export reads corresponding for particular clone with exportReadsForClones command. For example, export all reads that were assembled into the first clone (clone with cloneId = 0):

mixcr exportReadsForClones index_file alignments.vdjca.gz 0 reads.fastq.gz

This will create file reads_clns0.fastq.gz (or two files reads_clns0_R1.fastq.gz and reads_clns0_R2.fastq.gz if the original data were paired) with all reads that were aggregated by the first clone. One can export reads for several clones at a time:

mixcr exportReadsForClones index_file alignments.vdjca.gz 0 1 2 33 54 reads.fastq.gz

This will create several files (reads_clns0.fastq.gz, reads_clns1.fastq.gz etc.) for each clone with cloneId equal to 0, 1, 2, 33 and 54 respectively.