1、高通量测序技术及原理介绍高通量测序技术及原理介绍童贻刚童贻刚 军事医学科学院军事医学科学院 微生物流行病研究所微生物流行病研究所 公司公司系统名系统名测序长度测序长度优点优点缺点缺点Roche/454 FLX System200_700读长最长;通量高同聚性错误;仪器和试剂价格贵Illumina HiSeq 2000/miSeq2 x 150 通量非常高价格贵;后期分析复杂ABI/SOLiD 5500 xl SOLiD25_35 通量高;试剂消耗少读长太短Helicos HeliScope 25_30 通量高读长太短14测序平台测序平台测序长度测序长度进化过程进化过程产出产出测序时间测序时间S
2、OLiD30bp15bp30G10天Solexa Hiseq2000150bp X 230bp,50bp,75bp,100bp600G14天454750bp100bp,400bp0.7G7小时37301000bp X 96300bp,600bp0.0001G2小时15Illumina workflowIllumina workflowvSample preparationShearing,ligate adaptervCluster generationBridge PCRvSequencing on Genome Analyzer IIxRTA(Run Time Analysis)v Ana
3、lysis pipelineOffline analysis,alignment,SNPs calling,reads countingVisualize the data,reports the resultsSequencing processFragment DNARepair ends/Add A overhangLigate adaptersSelect ligated DNAHybridize to flow cellExtend hybridized oligosPerform bridge amplificationPerform sequencing on forward s
4、trandRe-generate reverse strandPerform sequencing on reverse strandCONFIDENTIAL DO NOT DISTRIBUTE1 Library prep(6 hrs)2 Automated Cluster Generation(5 hrs)1-8 samples3 Sequencing(46 to 120 hrs)1-8 samplesSample Prep-ResequencingSurface bound adapter 1Sequencing primer binding siteSurface bound adapt
5、er 2CONFIDENTIAL DO NOT DISTRIBUTECONFIDENTIAL DO NOT DISTRIBUTE Clonal clusters aregenerated in a containedenvironment(need noclean rooms)Sequencing alsoperformed in the flow cellon the generated clustersFlow cell8 channelsKey to the simplifiedworkflowSurface of flowcell coatedwith a lawn ofoligo p
6、airsCluster generation:Hybridize fragment&extendAdaptersequence 50 M singlemoleculeshybridize to thelawn of primersBound moleculesare then extendedby polymerases3 extensionCONFIDENTIAL DO NOT DISTRIBUTEDouble-strandedmolecule isdenatured.Original templateis washed away.Newly synthesizedcovalentlyatt
7、ached to theflow cell surface.CONFIDENTIAL DO NOT DISTRIBUTECluster generation:Denature double-stranded DNANewlysynthesizedstrandOriginaltemplatediscardCluster generation:Covalently boundspatially separated single moleculesSinglemoleculesbound toflow cell ina randompatternCONFIDENTIAL DO NOT DISTRIB
8、UTECluster generation:Bridge amplificationSingle-strand flipsover to hybridize toadjacent primers toform a bridge.Hybridized primeris extended bypolymerases.CONFIDENTIAL DO NOT DISTRIBUTECluster generation:Bridge amplificationdouble-strandedbridge is formed.CONFIDENTIAL DO NOT DISTRIBUTECluster gene
9、ration:Bridge amplificationDouble-stranded bridgeis denatured.Result:Two copies ofcovalently bound single-stranded templates.CONFIDENTIAL DO NOT DISTRIBUTECluster generation:Bridge amplificationSingle-strands flip overto hybridize to adjacentprimers to form bridges.Hybridized primer isextended by po
10、lymerase.CONFIDENTIAL DO NOT DISTRIBUTECluster generation:Bridge amplificationBridge amplificationcycle repeated tillmultiple bridgesare formedCONFIDENTIAL DO NOT DISTRIBUTECluster generationdsDNAbridgesdenatured.Reversestrandscleavedandwashedaway.CONFIDENTIAL DO NOT DISTRIBUTECluster generation lea
11、vinga clusterwith forwardstrands only.CONFIDENTIAL DO NOT DISTRIBUTECluster generationFree 3 endsare blocked topreventunwantedDNA priming.CONFIDENTIAL DO NOT DISTRIBUTECONFIDENTIAL DO NOT DISTRIBUTEhybridizedto adaptersequence.SequencingSequencingprimer isSequencingprimerAdd 4 Fl-NTPs+PolymeraseInco
12、rporatedFl-NTP isimagedTerminator andfluorescent dyeare cleaved fromthe Fl-NTPX 36CONFIDENTIAL DO NOT DISTRIBUTESequencing primerFlow cell imagingTotal Internal Reflection FluorescenceFluidics portFlow cellPrismFluidics portCONFIDENTIAL DO NOT DISTRIBUTECONFIDENTIAL DO NOT DISTRIBUTEPaired end seque
13、ncingSequencedstrandstripped off3-endsunblockedPaired end sequencingBridgeformation3extensionCONFIDENTIAL DO NOT DISTRIBUTEPaired end sequencingDoublestrandedDNA isdenaturedCONFIDENTIAL DO NOT DISTRIBUTEPaired end sequencing3 endsareblockedOriginalforwardstrand iscleavedCONFIDENTIAL DO NOT DISTRIBUT
14、EAdd 4 Fl-NTPs+PolymeraseIncorporatedFl-NTP isimagedTerminator andfluorescent dyeare cleaved fromthe Fl-NTPX 36-50CONFIDENTIAL DO NOT DISTRIBUTESequencing reverse strandHybridizesequencingprimerSolexaSolexaFlow cell in GAIIxFlow cell in GAIIxCONFIDENTIAL DO NOT DISTRIBUTEImage re-analysis piplelineI
15、mageAnalysisBasecallingSequenceAnalysisGA Analysis PipelineInstrument PCAnalysis PC/clusterdatatransferImages(.tif)Lane 1.8Cycle 1.36Tile_Cycle_Image_a,Tile_Cycle_Image_c,Tile_Cycle_Image_g,Tile_Cycle_Image_t.params fileFor each tile:Cluster intensitiesCluster noiseFor each tile:Corrected cluster in
16、tensitiesCluster sequenceCluster probabilitiesFor all data:Quality FilteringSequence AlignmentRun Statistics VisualizationCONFIDENTIAL DO NOT DISTRIBUTEBustardBase with highest corrected intensity is calledACGTCGeraldI AI A+IBGEneration ofRecursiveAnalysesLinked byDependencyIAIBFiltering removes low
17、 quality base callsChastity:C=Default value 0.6Other filters include purity,similarity,neighbor andneighborhood.CONFIDENTIAL DO NOT DISTRIBUTEBustard output*_qseq.txtBustard output*_qseq.txtMachine nameRun number Lane number Tile number X coordY coordSequence Quality PassedFilter IndexRead formatEAS
18、1 89 1 59 111 525 AACCTT 2 TGACCAGCGTCAACCAGTACTACGTCTTTGTCGATAG aaaaa_V_OYOZZYUPJZRX 1EAS1 89 1 59 111 726 AACCTT 2 TCTGGATGAAGAACGATCCGCTGCAGAGGTGCTGGCA _FNXXZWFZ_YYTYMUVBBBBBBBBBBB 0EAS1 89 1 59 111 860 AACCTT 2 TATCGCGTAGTGTAGCACGGCCTTTTTTTCGTCCACC aaaXFUWQUHVN_ZRWZZXFWYFTX 1EAS1 89 1 59 112 377
19、 AACCTT 2 TTTTCTTCTCCTTCGCCATCAGCGACAAAATCAAGCA abbbabbbbbbaaaTaaaaaY_YNaZZ 1EAS1 89 1 59 112 538 AACCTT 2 TGTGAATTAACAGTATTGGCGTAGTTACAGGCAGTGT aa_aabbaaa_aSYZYUBBBBB 1EAS1 89 1 59 112 576 AACCTT 2 TCTCCTTCGTCTTCTTCCATCAGTTGTTCGACCGGCT GJRNGBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0EAS1 89 1 59 112 607 AACCTT
20、 2 TCCACCATCAACTGGTTGCCAGTGCGCGGGCAGTTAA aabaaaaaaX_YTTHTTZQYTX 1EAS1 89 1 59 112 255 AACCTT 2 TGATGCTGATAAGCAGCGTGCTCACAACCCAGATTTG aaba_abaabbbbabababbbbb_aba_Zabbb 1Fastq formatFastq formatvhttp:/www.bioinformatics.babraham.ac.uk/projects/fastqc/GERALD sequenceGERALD sequenceSummary.htmlSummary.h
21、tml(PF:pass filter)FastQCFastQCThird part softwareThird part softwareBrief Bioinform.2011 Jan 18NGS NGS 技术论坛技术论坛vSEQwiki:http:/ vs.Mate-PairPaired-end vs.Mate-PairSOLiDSOLiD System Mate-Paired Library Preparation System Mate-Paired Library Preparation genomic DNAsheared DNAEcoP15I CAP linkers ligate
22、d on to sheared,methylated DNAdigestionFDV-RDV ligated library moleculesbiotinylated internal adaptors with 25-27bp tags from genomic DNA sheared,methylated DNAshearing&end repairmethylationligationcircularizationcircularized DNA with biotinylated internal adaptorsligationSolexa Mate Pair Library I ISolexa Mate Pair Library II II454 Long Span Paired-End I454 Long Span Paired-End II四种关键酶四种关键酶1.Digestion of non-circular DNA 降解线状DNA(ATP dependent Plasmid-Safe DNAse)2.Nick-translation 切口平移(DNA polymerase I)3.T7 exonuclease digestion 从53进一步降解线状DNA,形成3伸出单链4.S1 nuclease digestion 消除3伸出单链谢谢!