Things about High-throughput Sequencing

Complementary approaches


Mainly four steps of illumina sequencing

  • library construction
  • cluster formation
  • sequencing
  • data analysis

Library prep


RNA-seq? Actually DNA sequencing

We cannot directly sequencing the RNA. We can only reverse transcript the RNA(AUCG) to DNA (ATCG) then do sequencing.

Also, there are different types of library prep based on our goals

  • mRNA : polyA
  • lncRNA or lncRNA+mRNA : deplete rRNA
  • sRNA: choose library size (18-40 nt)


Sequencing wrong?

  • cluster identification

  • bubbles

  • synthesis errors (yellow => reason; red => result) (Upper one is Phasing; Below one is Pre-Phasing) phasing

  • duplication image-20190804161759768

Multiplexed sequencing


Libraries are commonly pooled together and sequenced simultaneously via a process known as multiplexing


To distinguish individual libraries throughout this process, sample-specific sequences, called sample indexes or sample barcodes, are added to each fragment in the library during construction. (The barcode information is then used to computationally assign the sequence reads back to the individual libraries.)

The pooled libraries are sequenced simultaneously in a single sequencing run.


Reduce the cost of sequencing substantially and facilitates experimental scalability


  • Duplication: the fraction of mapped reads where any 2 reads share the same 5′ and 3′ coordinates (mostly arise during PCR-based library construction; or artifacts like the same template binds to multiple clusters on a flow cell)

    May lead to false allele frequency representation

    Minimizing duplicates in NGS experiments is critically important. [Picard (MarkDuplicates); SAMTools(rmdup) ]

Yunze Liu
Yunze Liu
Bioinformatics Sharer

Co-founder of Bioinfoplanet(生信星球)