Export¶
In order to export alignment results or clones from a binary file
(.vdjca
or .clns
) to a human-readable text file one can use the
exportAlignments
and exportClones
commands respectively. The
syntax for these commands is:
# export alignments from .vdjca file
mixcr exportAlignments [options] alignments.vdjca alignments.txt
# export alignments from .clna file
mixcr exportAlignments [options] clonesAndAlignments.clna alignments.txt
# export clones from .clns file
mixcr exportClones [options] clones.clns clones.txt
# export clones from .clna file
mixcr exportClones [options] clonesAndAlignments.clna clones.txt
The resulting tab-delimited text file will contain columns with different types of information. If no options are specified, the default set of columns - which is sufficient in most cases - will be exported. The possible columns include (see below for details): aligned sequences, qualities, all or just best hit for V, D, J and C genes, corresponding alignments, nucleotide and amino acid sequences of gene region present in sequence, etc. When exporting clones, the additional columns include: clone count, clone fraction etc.
One can customize the list of fields that will be exported by passing
parameters to export commands. For example, in order to export just
clone count, best hits for V and J genes with corresponding alignments
and CDR3
amino acid sequence, one can do:
mixcr exportClones -count -vHit -jHit -vAlignment -jAlignment -aaFeature CDR3 clones.clns clones.txt
The columns in the resulting file will be exported in exactly the same
order as parameters on the command line. The list of available fields
will be reviewed in the next subsections. For convenience, MiXCR
provides two predefined sets of fields for exporting: min
(will
export minimal required information about clones or alignments) and
full
(used by default); one can use these sets by specifying the
--preset
option:
mixcr exportClones --preset min clones.clns clones.txt
One can add additional columns to the preset in the following way:
mixcr exportClones --preset min -qFeature CDR2 clones.clns clones.txt
One can also put all specify export fields in a separate file:
-vHits
-dHits
-feature CDR3
...
and pass this file to the export command:
mixcr exportClones --preset-file myFields.txt clones.clns clones.txt
To get command line help on export
action one can use
mixcr help exportAlignments
mixcr help exportClones
Command line parameters¶
The following is a list of command line parameters for both exportAlignments
and
exportClones
:
Option | Description |
---|---|
-c , --chains |
Limit output to specific chain(s) (e.g. TRA or IGH). When using
with exportClones , clone fractions will be recalculated
accordingly. |
-p , --preset |
Select a predefined set of fields to export (full , min ,
fullImputed and minImputed , the last two use -nFeatureImputed and
-aaFeatureImputed instead of -nFeature and -aaFeature ;
this will use germline sequences (marked lowercase) for unaligned regions.) |
-pf , --preset-file |
Load a file with a list of fields to export |
-v , --with-spaces |
Output in more human-readable format. |
-n , --limit |
Output only first n records. |
The following parameters only apply to exportClones
:
-o , --filter-out-of-frames |
Exclude out of frames (fractions will be recalculated) |
-t , --filter-stops |
Exclude sequences containing stop codons (fractions will be recalculated) |
-m , --minimal-clone-count |
Filter clones by minimal read count. |
-q , --minimal-clone-fraction |
Filter clones by minimal clone fraction. |
Available fields¶
The following fields can be exported both for alignments and clones:
Field name | Description |
---|---|
-targets |
Number of targets |
-vHit |
Best V hit |
-dHit |
Best D hit |
-jHit |
Best J hit |
-cHit |
Best C hit |
-vGene |
Best V hit gene name (e.g. TRBV12-3 for TRBV12-3*00) |
-dGene |
Best D hit gene name (e.g. TRBV12-3 for TRBV12-3*00) |
-jGene |
Best J hit gene name (e.g. TRBV12-3 for TRBV12-3*00) |
-cGene |
Best C hit gene name (e.g. TRBV12-3 for TRBV12-3*00) |
-vFamily |
Best V hit family name (e.g. TRBV12 for TRBV12-3*00) |
-dFamily |
Best D hit family name (e.g. TRBV12 for TRBV12-3*00) |
-jFamily |
Best J hit family name (e.g. TRBV12 for TRBV12-3*00) |
-cFamily |
Best C hit family name (e.g. TRBV12 for TRBV12-3*00) |
-vHitScore |
Score for best V hit |
-dHitScore |
Score for best D hit |
-jHitScore |
Score for best J hit |
-cHitScore |
Score for best C hit |
-vHitsWithScore |
All V hits with score |
-dHitsWithScore |
All D hits with score |
-jHitsWithScore |
All J hits with score |
-cHitsWithScore |
All C hits with score |
-vHits |
All V hits |
-dHits |
All D hits |
-jHits |
All J hits |
-cHits |
All C hits |
-vGenes |
All V gene names (e.g. TRBV12-3 for TRBV12-3*00) |
-dGenes |
All D gene names (e.g. TRBV12-3 for TRBV12-3*00) |
-jGenes |
All J gene names (e.g. TRBV12-3 for TRBV12-3*00) |
-cGenes |
All C gene names (e.g. TRBV12-3 for TRBV12-3*00) |
-vFamilies |
All V gene family anmes (e.g. TRBV12 for TRBV12-3*00) |
-dFamilies |
All D gene family anmes (e.g. TRBV12 for TRBV12-3*00) |
-jFamilies |
All J gene family anmes (e.g. TRBV12 for TRBV12-3*00) |
-cFamilies |
All C gene family anmes (e.g. TRBV12 for TRBV12-3*00) |
-vAlignment |
Best V alignment |
-dAlignment |
Best D alignment |
-jAlignment |
Best J alignment |
-cAlignment |
Best C alignment |
-vAlignments |
All V alignments |
-dAlignments |
All D alignments |
-jAlignments |
All J alignments |
-cAlignments |
All C alignments |
-nFeature <gene_feature> |
Nucleotide sequence of specified gene feature |
-qFeature <gene_feature> |
Quality string of specified gene feature |
-aaFeature <gene_feature> |
Amino acid sequence of specified gene feature |
-nFeatureImputed <gene_feature> |
Nucleotide sequence of specified gene feature using letters from germline (marked lowercase) for uncovered regions |
-aaFeatureImputed <gene_feature> |
Amino acid sequence of specified gene feature using letters from germline (marked lowercase) for uncovered regions |
-minFeatureQuality <gene_feature> |
Minimal quality of specified gene feature |
-avrgFeatureQuality <gene_feature> |
Average quality of specified gene feature |
-lengthOf <gene_feature> |
Length of specified gene feature. |
-nMutations <gene_feature> |
Extract nucleotide mutations for specific gene feature; relative to germline sequence. |
-nMutationsRelative <gene_feature> <relative_to_gene_feature> |
Extract nucleotide mutations for specific gene feature relative to another feature. |
-aaMutations <gene_feature> |
Extract amino acid mutations for specific gene feature |
-aaMutationsRelative <gene_feature> <relative_to_gene_feature> |
Extract amino acid mutations for specific gene feature relative to another feature. |
-mutationsDetailed <gene_feature> |
Detailed list of nucleotide and corresponding amino acid mutations. Format <nt_mutation>:<aa_mutation_individual>:<aa_mutation_cumulative>, where <aa_mutation_individual> is an expected amino acid mutation given no other mutations have occurred, and <aa_mutation_cumulative> amino acid mutation is the observed amino acid mutation combining effect from all other. WARNING: format may change in following versions. |
-mutationsDetailedRelative <gene_feature> <relative_to_gene_feature> |
Detailed list of nucleotide and corresponding amino acid mutations written, positions relative to specified gene feature. Format <nt_mutation>:<aa_mutation_individual>:<aa_mutation_cumulative>, where <aa_mutation_individual> is an expected amino acid mutation given no other mutations have occurred, and <aa_mutation_cumulative> amino acid mutation is the observed amino acid mutation combining effect from all other. WARNING: format may change in following versions. |
-positionInReferenceOf <reference_point> |
Position of specified reference point inside referencesequences (clonal sequence / read sequence). |
-positionOf <reference_point> |
Position of specified reference point inside targetsequences (clonal sequence / read sequence). |
-defaultAnchorPoints |
Outputs a list of default reference points (like CDR2Begin, FR4End, etc. see documentation for the full list and formatting) |
-targetSequences |
Aligned sequences (targets), separated with comma |
-targetQualities |
Aligned sequence (target) qualities, separated with comma |
-vIdentityPercents |
V alignment identity percents |
-dIdentityPercents |
D alignment identity percents |
-jIdentityPercents |
J alignment identity percents |
-cIdentityPercents |
C alignment identity percents |
-vBestIdentityPercent |
V best alignment identity percent |
-dBestIdentityPercent |
D best alignment identity percent |
-jBestIdentityPercent |
J best alignment identity percent |
-cBestIdentityPercent |
C best alignment identity percent |
-chains |
Chains |
-topChains |
Top chains |
The following fields are specific for alignments:
Field name | Description |
---|---|
-readId |
Id of read corresponding to alignment (deprecated) |
-readIds |
Id(s) of read(s) corresponding to alignment |
-descrR1 |
Description line from initial .fasta or .fastq file (deprecated) |
-descrR2 |
Description line from initial .fasta or .fastq file (deprecated) |
-descrsR1 |
Description lines from initial .fasta or .fastq file for R1 reads (only available if -OsaveOriginalReads=true was used in align command) |
-descrsR2 |
Description lines from initial .fastq file for R2 reads (only available if -OsaveOriginalReads=true was used in align command) |
-readHistory |
Read history |
-cloneId |
To which clone alignment was attached (make sure using .clna file as input for exportAlignments) |
-cloneIdWithMappingType |
To which clone alignment was attached with additional info on mapping type (make sure using .clna file as input for exportAlignments) |
The following fields are specific for clones:
Field name | Description |
---|---|
-cloneId |
Unique clone identifier |
-count |
Clone count |
-fraction |
Clone fraction |
See this chapter for the translation rules used for options like: -aaFeature
.
Default anchor point positions¶
Positions of anchor points produced by the -defaultAnchorPoints
option are outputted as a colon separated list.
If an anchor point is not covered by the target sequence nothing is printed for it, but flanking colon symbols are
preserved to maintain positions in array. See example:
:::::::::108:117:125:152:186:213:243:244:
If there are several target sequences (e.g. paired-end reads or multi-part clonal sequnce), an array is outputted for each target sequence. In this case arrays are separated by a comma:
2:61:107:107:118:::::::::::::,:::::::::103:112:120:147:181:208:238:239:
Even if there are no anchor points in one of the parts:
:::::::::::::::::,:::::::::108:117:125:152:186:213:243:244:
The following table shows the correspondence between anchor points and positions in the default anchor point array:
Anchors point | Zero-based position | One-based position |
---|---|---|
V5UTRBeginTrimmed | 0 | 1 |
V5UTREnd / L1Begin | 1 | 2 |
L1End / VIntronBegin | 2 | 3 |
VIntronEnd / L2Begin | 3 | 4 |
L2End / FR1Begin | 4 | 5 |
FR1End / CDR1Begin | 5 | 6 |
CDR1End / FR2Begin | 6 | 7 |
FR2End / CDR2Begin | 7 | 8 |
CDR2End / FR3Begin | 8 | 9 |
FR3End / CDR3Begin | 9 | 10 |
Number of 3’ V deletions (negative value), or length of 3’ V P-segment (positive value) | 10 | 11 |
VEndTrimmed, next position after last aligned nucleotide of V gene | 11 | 12 |
DBeginTrimmed, position of first aligned nucleotide of D gene | 12 | 13 |
Number of 5’ D deletions (negative value), or length of 5’ D P-segment (positive value) | 13 | 14 |
Number of 3’ D deletions (negative value), or length of 3’ D P-segment (positive value) | 14 | 15 |
DEndTrimmed, next position after last aligned nucleotide of D gene | 15 | 16 |
JBeginTrimmed, position of first aligned nucleotide of J gene | 16 | 17 |
Number of 3’ J deletions (negative value), or length of 3’ J P-segment (positive value) | 17 | 18 |
CDR3End / FR4Begin | 18 | 19 |
FR4End | 19 | 20 |
CBegin | 20 | 21 |
CExon1End | 21 | 22 |
The following regular expressions can be used to parse the contents of this field in Python:
for length analysis, or analysis of raw alignments:
^(?P<V5UTRBegin>-?[0-9]*):(?P<L1Begin>-?[0-9]*):(?P<VIntronBegin>-?[0-9]*):(?P<L2Begin>-?[0-9]*):(?P<FR1Begin>-?[0-9]*):(?P<CDR1Begin>-?[0-9]*):(?P<FR2Begin>-?[0-9]*):(?P<CDR2Begin>-?[0-9]*):(?P<FR3Begin>-?[0-9]*):(?P<CDR3Begin>-?[0-9]*):(?P<V3Deletion>-?[0-9]*):(?P<VEnd>-?[0-9]*):(?P<DBegin>-?[0-9]*):(?P<D5Deletion>-?[0-9]*):(?P<D3Deletion>-?[0-9]*):(?P<DEnd>-?[0-9]*):(?P<JBegin>-?[0-9]*):(?P<J5Deletion>-?[0-9]*):(?P<CDR3End>-?[0-9]*):(?P<CBegin>-?[0-9]*):(?P<CExon1End>-?[0-9]*)$
snipped for Pandas:
import pandas as pd data = pd.read_table("exported.txt", low_memory=False) anchorPointsRegex="^(?P<V5UTRBegin>-?[0-9]*):(?P<L1Begin>-?[0-9]*):(?P<VIntronBegin>-?[0-9]*):(?P<L2Begin>-?[0-9]*):(?P<FR1Begin>-?[0-9]*):(?P<CDR1Begin>-?[0-9]*):(?P<FR2Begin>-?[0-9]*):(?P<CDR2Begin>-?[0-9]*):(?P<FR3Begin>-?[0-9]*):(?P<CDR3Begin>-?[0-9]*):(?P<V3Deletion>-?[0-9]*):(?P<VEnd>-?[0-9]*):(?P<DBegin>-?[0-9]*):(?P<D5Deletion>-?[0-9]*):(?P<D3Deletion>-?[0-9]*):(?P<DEnd>-?[0-9]*):(?P<JBegin>-?[0-9]*):(?P<J5Deletion>-?[0-9]*):(?P<CDR3End>-?[0-9]*):(?P<CBegin>-?[0-9]*):(?P<CExon1End>-?[0-9]*)$" data = pd.concat([data, d.refPoints.str.extract(anchorPointsRegex, expand=True).apply(pd.to_numeric)], axis=1)
A simplified regular expression with a smaller number of fields can be used for analysis of CDR3-assembled clonotypes:
^(?:-?[0-9]*:){8}(?:-?[0-9]*):(?P<CDR3Begin>-?[0-9]*):(?P<V3Deletion>-?[0-9]*):(?P<VEnd>-?[0-9]*):(?P<DBegin>-?[0-9]*):(?P<D5Deletion>-?[0-9]*):(?P<D3Deletion>-?[0-9]*):(?P<DEnd>-?[0-9]*):(?P<JBegin>-?[0-9]*):(?P<J5Deletion>-?[0-9]*):(?P<CDR3End>-?[0-9]*):(?:-?[0-9]*:){2}(?:-?[0-9]*)$
snipped for Pandas:
import pandas as pd data = pd.read_table("exported.txt", low_memory=False) anchorPointsRegex="^^(?:-?[0-9]*:){8}(?:-?[0-9]*):(?P<CDR3Begin>-?[0-9]*):(?P<V3Deletion>-?[0-9]*):(?P<VEnd>-?[0-9]*):(?P<DBegin>-?[0-9]*):(?P<D5Deletion>-?[0-9]*):(?P<D3Deletion>-?[0-9]*):(?P<DEnd>-?[0-9]*):(?P<JBegin>-?[0-9]*):(?P<J5Deletion>-?[0-9]*):(?P<CDR3End>-?[0-9]*):(?:-?[0-9]*:){2}(?:-?[0-9]*)$" data = pd.concat([data, d.refPoints.str.extract(anchorPointsRegex, expand=True).apply(pd.to_numeric)], axis=1)
Examples¶
Export only the best V, D, J hits and the best V hit alignment from a .vdjca
file:
mixcr exportAlignments -vHit -dHit -jHit -vAlignment input.vdjca test.txt
Best V hit | Best D hit | Best J hit | Best V alignment |
---|---|---|---|
IGHV4-34*00 | IGHJ4*00 | |262|452|453|47|237|SC268GSC271ASC275G|956.1,58|303|450|
56|301|SG72TSA73CSG136TSA144CSA158CSG171T|331.0| |
|
IGHV2-23*00 | IGHD2*21 | IGHJ6*00 | |262|452|453|47|237|SC268GSC271ASC275G|956.1,58|303|450|
56|301|SG72TSA73CSG136TSA144CSA158CSG171T|331.0| |
The syntax of alignment is described in appendix.
Exporting well formatted alignments for manual inspection¶
MiXCR is able to export alignments create with the align
step as pretty formatted text (human readable) for manual analysis. This can be
used both to inspect alignments and to facilitate optimization of
analysis parameters and library preparation protocol. To export pretty
formatted alignments use the exportAlignmentsPretty
command:
mixcr exportAlignmentsPretty --skip 1000 --limit 10 input.vdjca test.txt
this will export 10 results after skipping the first 1000 records, then place the
results into the file test.txt
. Skipping earlier records is often useful
because the first sequences in a fastq file may have lower than average read quality.
Omitting the last parameter (output file name) will print results directly
to the standard output stream (to console), like this:
mixcr exportAlignmentsPretty --skip 1000 --limit 10 input.vdjca
Here is a summary of the command line options:
Option | Description |
---|---|
-n , --limit |
limit number of alignments; no more than provided number of results will be outputted |
-s , --skip |
number of results to skip |
-t , --top |
output only top hits for V, D, J nad C genes |
--cdr3-contains |
output only those alignments in which CDR3 contains specified nucleotides (e.g.
--cdr3-contains TTCAGAGGAGC ) |
--read-contains |
output only those alignments for which the corresonding reads contain specified
nucleotides e.g. --read-contains ATGCTTGCGCGCT ) |
--verbose |
use a more verbose format for alignments (see below for example) |
Results produced by this command have the following structure:
>>> Read id: 1 5'UTR><L1 Quality 88888888888888888888888887888888888888888888888888888888888888888888888887888878 Target0 0 AAGGCCTTTCCACTTGGTGATCAGCACTGAGCACAGAGGACTCACCATGGAGTTGGGGCTGAGCTGGGTTTTCCTTGTTG 79 IGHV3-7*00 54 aaggcctttccacttggtgatcagcactgagcacagaggactcaccatggaAttggggctgagctgggttttccttgttg 133 L1><L2 L2><FR1 Quality 88888888887888888888888888888889989989989889999997999999989999999999999999999899 Target0 80 CTATTTTAGAAGGTGTCCAGTGTGAGGTGAAGTTGGTGGAGTCTGGGGGAGGCCTGGTCCAGCCTGGGGGGTCCCTGAGA 159 IGHV3-7*00 134 ctattttagaaggtgtccagtgtgaggtgCagCtggtggagtctgggggaggcTtggtccagcctggggggtccctgaga 213 FR1><CDR1 CDR1><FR2 Quality 999999999999999999999999999999999999999999999 9999999999999999999999999999999999 Target0 160 CTCTCCTGTGAAGCCTCCGGATTCACCTTTAGTAGTTATTGGATG-GCATGGGTCCGCCAGGGTCCAGGGCAGGGGCTGG 238 IGHV3-7*00 214 ctctcctgtgCagcctcTggattcacctttagtagCtattggatgAgc-tgggtccgccaggCtccagggAaggggctgg 292 FR2><CDR2 CDR2><FR3 Quality 99999999999999999999999999999999999799999999999999999999999999998999899898999999 Target0 239 AATGGGTGGGCAACATAAGGCCGGATGGAAGTGAGAGTTGGTACTTGGAGTCTGTGATGGGGCGATTCATGATATCTAGA 318 IGHV3-7*00 293 aGtgggtggCcaacataaAgcAAgatggaagtgagaAAtACtaTGtggaCtctgtgaAgggCcgattcaCCatCtcCaga 372 FR3><CDR3 Quality 99899899999999988989999889979988888888878878788888888878888888778788888888878888 Target0 319 GACAACGCCAAGAAGTCACTTTATCTGCAAATGGACAGCCTGAGAGTCGAGGACACGGCCGTCTATTATTGTGCGACTTC 398 IGHV3-7*00 373 gacaacgccaagaaCtcactGtatctgcaaatgAacagcctgagagCcgaggacacggcTgtGtattaCtgtgcga 448 IGHD3-10*00 12 ttc 14 CDR3><FR4 Quality 88888788888888888888888787788777887787777877777877787787877878788788777767778788 Target0 399 GGAGGAGCCGGAGGACTACTGGGGCCAGGGAGCCCTGGTCACCGTCTCCTCGGCTTCCACCAAGGGCCCATCGGTCTTCC 478 IGHD3-10*00 15 gg-ggag 20 IGHJ4*00 8 gactactggggccagggaAccctggtcaccgtctcctc 45 IGHG4*00 0 cttccaccaagggcccatcggtcttcc 26 IGHG3*00 0 cttccaccaagggcccatcggtcttcc 26 IGHG2*00 0 cCtccaccaagggcccatcggtcttcc 26 IGHG1*00 0 cCtccaccaagggcccatcggtcttcc 26 IGHGP*00 194 AgcCtccaccaagggcccatcggtcttcc 222 Quality 87370 Target0 479 CCTTG 483 IGHG4*00 27 ccCtg 31 IGHG3*00 27 ccCtg 31 IGHG2*00 27 ccCtg 31 IGHG1*00 27 ccCtg 31 IGHGP*00 223 ccCtg 227
Usage of the --verbose
option will produce alignments in a slightly different format:
>>> Read id: 12343 <--- Index of analysed read in input file >>> Target sequences (input sequences): Sequence0: <--- Read 1 from paired-end read Contains features: CDR1, VRegionTrimmed, L2, L, Intron, VLIntronL, FR1, Exon1, <--- Gene features VExon2Trimmed found in read 1 0 TCTTGGGGGATTCGGTGATCAGCACTGAACACAGAGGACTCACCATGGAGTTTGGGCTGAACTGGGTTTTCCTCGTTGCT 79 <--- Sequyence & quality FGGEGGGGGDG8F78CFC6CEFF<,CFG9EED,6,CFCC<EEGFG,CE:CCAFFGGC87CEF?A?FBC@FGGFG>B,FC9 of read 1 80 CTATTAAGAGGTGTCCAGTGTCAGGTGCAGCTGGTGGAGTCTGGGGGTGGCGTGTTCCAGCCTGGGGGGTCCGTGAGACT 159 F9,A,95AFE,B?,E,C,9AC<FGA<EE5??,A,A<:=:E,=B8C7+++8,++@+,885=D7:@8E+:5*1**11**++< 160 CTCCTGTGCAGCGTCGGGATGCACATCATGGAGCTATGGCCAGCCCTGGGTACGCCAGGCTACAGGCCACGGGCTGGAGG 239 <++*++0++2A:ECE5EC5**2@C+:++++++22*2:+29+*2***25/79*0299))*/)*0*0*.75)7:)1)1/))) 240 GGGTGCGTGGTAGATGGGAA 259 )9:.)))*1)12***-/).) Sequence1: <--- Read 2 from paired-end read Contains features: JCDR3Part, DCDR3Part, DJJunction, CDR2, JRegionTrimmed, CDR3, VDJunction, VJJunction, VCDR3Part, ShortCDR3, FR4, FR3 0 CGAGGCAAGAGGCTGGTGTGGGTGGCGGTTATATGGTATGGTGGAAGTAATAAACACTATGCAGACCCCGTGAAGGGCCG 79 **0*0**)2**/**5D7<15*9<5:1+*0:GF:=C>6A52++*:2+++FF>>3<++++++302**:**/<+**;:/**2+ 80 ATTCACCATCGCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAAGAGCCTGAGAGCCGAGGACACGGCTTTGT 159 +++<0***C:2+9GGFB?,5,4,+,2F<>FC=*,,C:>,=,@,,;3<@=,3,,<3,CF?=**<>@,?3,<<:3,CC,E,@ 160 ATTACTGTGCGAGAGGTCAACAGGGTGACTATGTCTACGGTAGGGACGTCGGGGGCCAAGGGACCACGGTCACCGTCTCC 239 ,@;FCF@+F@FGGF9FD,F>>+B:=,,=><GFCGGCFEGFF?+=B+7EF>+FFA,8F<E:,5+GDFFE,@F?,,7GGDFE 240 TCAGGGAGTGCATCCGCCCCAACCCTTTTCCCCCTCTCTGCGTTGATACCACTGGCAGCTC 300 C,FGGGEFCCGEEGGCFCC:8FGEGGGE@DFB-GFGGGGF@GFGFE<,GFCCFCAGC@CCC >>> Gene features that can be extracted from this (paired-)read: <--- For paired-end reads JCDR3Part, CDR1, VRegionTrimmed, L2, DCDR3Part, VDJTranscriptWithout5UTR, Exon2, L, some gene features DJJunction, Intron, FR2, CDR2, VDJRegion, JRegionTrimmed, CDR3, VDJunction, VJJunction, can be extracted by VLIntronL, FR1, VCDR3Part, ShortCDR3, Exon1, FR4, VExon2Trimmed, FR3 merging sequence information >>> Alignments with V gene: IGHV3-33*00 (total score = 1638.0) <--- Alignment of both reads with IGHV3-33 Alignment of Sequence0 (score = 899.0): <--- Alignment of IGHV3-33 with read 1 from paired-end read 65 ATTCGGTGATCAGCACTGAACACAGAGGACTCACCATGGAGTTTGGGCTGAGCTGGGTTTTCCTCGTTGCTCTTTTAAGA 144 <--- Germline ||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||| |||||| 9 ATTCGGTGATCAGCACTGAACACAGAGGACTCACCATGGAGTTTGGGCTGAACTGGGTTTTCCTCGTTGCTCTATTAAGA 88 <--- Read DG8F78CFC6CEFF<,CFG9EED,6,CFCC<EEGFG,CE:CCAFFGGC87CEF?A?FBC@FGGFG>B,FC9F9,A,95AF <--- Quality score 145 GGTGTCCAGTGTCAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGC 224 |||||||||||||||||||||||||||||||||||||| |||||| ||||||||||| ||||| |||||||||||||||| 89 GGTGTCCAGTGTCAGGTGCAGCTGGTGGAGTCTGGGGGTGGCGTGTTCCAGCCTGGGGGGTCCGTGAGACTCTCCTGTGC 168 E,B?,E,C,9AC<FGA<EE5??,A,A<:=:E,=B8C7+++8,++@+,885=D7:@8E+:5*1**11**++<<++*++0++ 225 AGCGTCTGGATTCACCTTCA-GTAGCTATGGCATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTG 300 |||||| |||| || | ||| | ||||||||| || |||||| ||||||||| ||||| | ||||||||| ||||| 169 AGCGTCGGGATGCA-CATCATGGAGCTATGGCCAGCCCTGGGTACGCCAGGCTACAGGCCACGGGCTGGAGGGGGTG 244 2A:ECE5EC5**2@ C+:++++++22*2:+29+*2***25/79*0299))*/)*0*0*.75)7:)1)1/))))9:.) Alignment of Sequence1 (score = 739.0): <--- Alignment of IGHV3-33 with read 2 from paired-end read 279 AGGCAAGGGGCTGGAGTGGGTGGCAGTTATATGGTATGATGGAAGTAATAAATACTATGCAGACTCCGTGAAGGGCCGAT 358 ||||||| |||||| ||||||||| ||||||||||||| ||||||||||||| ||||||||||| ||||||||||||||| 2 AGGCAAGAGGCTGGTGTGGGTGGCGGTTATATGGTATGGTGGAAGTAATAAACACTATGCAGACCCCGTGAAGGGCCGAT 81 0*0**)2**/**5D7<15*9<5:1+*0:GF:=C>6A52++*:2+++FF>>3<++++++302**:**/<+**;:/**2+++ 359 TCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTAT 438 |||||||| |||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||| ||||| 82 TCACCATCGCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAAGAGCCTGAGAGCCGAGGACACGGCTTTGTAT 161 +<0***C:2+9GGFB?,5,4,+,2F<>FC=*,,C:>,=,@,,;3<@=,3,,<3,CF?=**<>@,?3,<<:3,CC,E,@,@ 439 TACTGTGCGAGAG 451 ||||||||||||| 162 TACTGTGCGAGAG 174 ;FCF@+F@FGGF9 IGHV3-30*00 (total score = 1582.0) <--- Alternative hit for V gene Alignment of Sequence0 (score = 885.0): 65 ATTCGGTGATCAGCACTGAACACAGAGGACTCACCATGGAGTTTGGGCTGAGCTGGGTTTTCCTCGTTGCTCTTTTAAGA 144 ||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||| |||||| 9 ATTCGGTGATCAGCACTGAACACAGAGGACTCACCATGGAGTTTGGGCTGAACTGGGTTTTCCTCGTTGCTCTATTAAGA 88 DG8F78CFC6CEFF<,CFG9EED,6,CFCC<EEGFG,CE:CCAFFGGC87CEF?A?FBC@FGGFG>B,FC9F9,A,95AF 145 GGTGTCCAGTGTCAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGC 224 |||||||||||||||||||||||||||||||||||||| |||||| ||||||||||| ||||| |||||||||||||||| 89 GGTGTCCAGTGTCAGGTGCAGCTGGTGGAGTCTGGGGGTGGCGTGTTCCAGCCTGGGGGGTCCGTGAGACTCTCCTGTGC 168 E,B?,E,C,9AC<FGA<EE5??,A,A<:=:E,=B8C7+++8,++@+,885=D7:@8E+:5*1**11**++<<++*++0++ 225 AGCCTCTGGATTCACCTTCA-GTAGCTATGGCATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTG 300 ||| || |||| || | ||| | ||||||||| || |||||| ||||||||| ||||| | ||||||||| ||||| 169 AGCGTCGGGATGCA-CATCATGGAGCTATGGCCAGCCCTGGGTACGCCAGGCTACAGGCCACGGGCTGGAGGGGGTG 244 2A:ECE5EC5**2@ C+:++++++22*2:+29+*2***25/79*0299))*/)*0*0*.75)7:)1)1/))))9:.) Alignment of Sequence1 (score = 697.0): 279 AGGCAAGGGGCTGGAGTGGGTGGCAGTTATATCATATGATGGAAGTAATAAATACTATGCAGACTCCGTGAAGGGCCGAT 358 ||||||| |||||| ||||||||| ||||||| |||| ||||||||||||| ||||||||||| ||||||||||||||| 2 AGGCAAGAGGCTGGTGTGGGTGGCGGTTATATGGTATGGTGGAAGTAATAAACACTATGCAGACCCCGTGAAGGGCCGAT 81 0*0**)2**/**5D7<15*9<5:1+*0:GF:=C>6A52++*:2+++FF>>3<++++++302**:**/<+**;:/**2+++ 359 TCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCTGAGGACACGGCTGTGTAT 438 |||||||| |||||||||||||||||||||||||||||||||||||||| ||||||||||| |||||||||||| ||||| 82 TCACCATCGCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAAGAGCCTGAGAGCCGAGGACACGGCTTTGTAT 161 +<0***C:2+9GGFB?,5,4,+,2F<>FC=*,,C:>,=,@,,;3<@=,3,,<3,CF?=**<>@,?3,<<:3,CC,E,@,@ 439 TACTGTGCGAGAG 451 ||||||||||||| 162 TACTGTGCGAGAG 174 ;FCF@+F@FGGF9 >>> Alignments with D gene: IGHD4-17*00 (total score = 40.0) Alignment of Sequence1 (score = 40.0): 7 GGTGACTA 14 |||||||| 183 GGTGACTA 190 :=,,=><G IGHD4-23*00 (total score = 36.0) Alignment of Sequence1 (score = 36.0): 0 TGACTACGGT 9 || ||||||| 191 TGTCTACGGT 200 FCGGCFEGFF IGHD2-21*00 (total score = 35.0) Alignment of Sequence1 (score = 35.0): 13 GGTGACT 19 ||||||| 183 GGTGACT 189 :=,,=>< >>> Alignments with J gene: IGHJ6*00 (total score = 172.0) Alignment of Sequence1 (score = 172.0): 22 GGACGTCTGGGGCAAAGGGACCACGGTCACCGTCTCCTCA 61 ||||||| ||||| |||||||||||||||||||||||||| 203 GGACGTCGGGGGCCAAGGGACCACGGTCACCGTCTCCTCA 242 =B+7EF>+FFA,8F<E:,5+GDFFE,@F?,,7GGDFEC,F >>> Alignments with C gene: No hits.
Exporting reads aggregated by clones¶
MiXCR allows to preserve information about mapping between initial reads, alignments and final clonotypes by storing output of the assemble
step into special “clones & alignments” container format. There are several ways of accessing this information.
Extracting reads for specific clones¶
The exportReadsForClones
allows to extract original reads that was mapped to specific clones back into fastq
or fasta
formats.
The following command will create reads_cln0_R1.fastq.gz/reads_cln0_R2.fastq.gz, reads_cln1_R1.fastq.gz/reads_cln1_R2.fastq.gz, etc, containing reads corresponding to clone0, clone1 etc…
mixcr exportReadsForClones -s clonesAndAlignments.clna reads.fastq.gz
Or one can extract reads for a buch of clones into a single output:
mixcr exportReadsForClones --id 2 12 45 clonesAndAlignments.clna reads_of_my_clones.fastq.gz
See mixcr help exportReadsForClones
for more information.