Processing RNA-seq dataΒΆ

The typical MiXCR workflow can be applied for the analysis of RNA-seq samples. Though MiXCR can be used with the default parameters for aligning RNA-seq data, it is recommended to use rna-seq preset which is specifically tuned to perform well on such type of input:

mixcr align --parameters rna-seq data_R1.fastq.gz data_R2.fastq.gz alignments.vdjca
mixcr assemble alignments.vdjca clones.clns

Due to the small length of typical sequences obtained by the RNA-seq method, a tangible amount of alignments may be lost on the assemble step due the absence of V or J CDR3 part in the alignment (i.e. V CDR3 found but J CDR3 is too short to extract clonal sequence or vice versa). MiXCR allows to “rescue” such alignments by performing partial assembling of the alignments, i.e. to find overlapping sequences and merge them in order to extend possible alignemnts.

The typical workflow including partial assembling looks like:

mixcr align --parameters rna-seq -OallowPartialAlignments=true data_R1.fastq.gz data_R2.fastq.gz alignments.vdjca
mixcr assemblePartial alignments.vdjca alignmentsRescued.vdjca
mixcr assemble alignmentsRescued.vdjca clones.clns

Note option -OallowPartialAlignments=true of the align command: it will preserve alignments with not fully aligned V and J parts in the .vdjca file. The assemblePartial action will then process theese “bad” alignments in order to find overlapping sequences and restore the full alignment. The following options are available for assemblePartial:

The above parameters can be specified in e.g. the following way:

mixcr assemblePartial -OminimalVJJunctionOverlap=25 alignments.vdjca alignmentsRescued.vdjca

The algorithm which restores merged sequence from two overlapped alignments has the following parameters:

The above parameters can be specified in e.g. the following way:

mixcr assemblePartial -OmergerParameters.minimalAssembleOverlap=15 alignments.vdjca alignmentsRescued.vdjca