1、IPI-International Protein Index 数据库介绍俞 鸿yuhongscbit.orgIPI-International Protein Index EMBL-EBI URL:http:/www.ebi.ac.uk/IPI/IPIhelp.html IPI provides a top level guide to the main databases that describe the proteomes of higher eukaryotic organisms 常使用在质谱的搜库计算算法(算法(Algorithm)IPI是以蛋白质相似性为基础的不同数据库 之间通
2、过映射产生的 两个关键问题 数据库之间如何进行匹配 每个数据集映射结果如何合并到一个数据集中流程流程下载序列数据数据库所有序列两两之间相似性比对互为最佳匹配的蛋白对匹配百分比要求大于95%所有最佳匹配的蛋白对组成一个类类IPI孤立蛋白已有类的部分片段非有类的部分片段SwissprotRefseqEnsemblTrEMBL聚类互为最佳匹配蛋白对互为最佳匹配蛋白对Database BDatabase Aa1,a2,a3.b1,b2,b3.a1与database B中所有蛋白比对后,得到与a1比对的最佳的为b1b1与database A中所有蛋白比对后,得到与a1比对的最佳的为a1a1,b1为互为最
3、佳蛋白的蛋白对IPI序列的确定序列的确定 IPI的序列采用以下数据库的优先级来选取 Swissprot/Refseq/TrEMBL/Ensembl,前题 是类中如果有小片段序列,那么所选取的 序列必须能包括小片段序列MSIPI MSIPI is a database derived from IPI containing additional information about cSNPs and N-terminal peptides in a format suitable for easy use in mass spectrometry search engines.MSIPI is
4、available in the directory ftp:/ftp.ebi.ac.uk/pub/databases/IPI/msipi.物种资源物种资源 Human Mouse Rat Zebrafish Arabidopsis Chicken Cow数据检索数据检索FTP数据资源数据资源 当前版本:ftp:/ftp.ebi.ac.uk/pub/databases/IPI/curren t/旧版本:ftp:/ftp.ebi.ac.uk/pub/databases/IPI/old/数据文件数据文件 Ipi.HUMAN.dat.gz Ipi.HUMAN.fasta.gz Ipi.HUMAN.h
5、istory.gz Ipi.HUMAN.IPC.gz Ipi.HUMAN.mysql.gz Ipi.HUMAN.xrefs.gz Gi2ipi.xrefs.gz Ipi.gene.HUMAN.xrefs.gz数据格式数据格式-UniProt*.dat.gzID IPI00003881.5 IPI;PRT;415 AA.AC IPI00003881;DT 01-OCT-2001(IPI Human rel.2.00,Created)DT 06-OCT-2005(IPI Human rel.3.11,Last sequence update)DE SIMILAR TO HETEROGENEOUS
6、NUCLEAR RIBONUCLEOPROTEIN H.OS Homo sapiens(Human).OC Eukaryota;Metazoa;Chordata;Craniata;Vertebrata;Euteleostomi;OC Mammalia;Eutheria;Primates;Catarrhini;Hominidae;Homo.OX NCBI_TaxID=9606;CC -!-CHROMOSOME:10.CC -!-START CO-ORDINATE:43201071.CC -!-END CO-ORDINATE:43224620.DR UniProtKB/Swiss-Prot;P52
7、597;HNRPF_HUMAN;-.CommentsDR Vega;OTTHUMP00000019482;OTTHUMG00000018029;M.DR Vega;OTTHUMP00000043413;OTTHUMG00000018029;-.DR Vega;OTTHUMP00000043414;OTTHUMG00000018029;-.DR REFSEQ_REVIEWED;NP_004957;GI:4826760;-.DR UniProtKB/TrEMBL;Q5T0N2;Q5T0N2_HUMAN;-.DR UniProtKB/TrEMBL;Q8NI96;Q8NI96_HUMAN;-.DR U
8、niProtKB/TrEMBL;Q96AU2;Q96AU2_HUMAN;-.DR ENSEMBL;ENSP00000338477;ENSG00000169813;-.DR ENSEMBL;ENSP00000348345;ENSG00000169813;-.DR H-InvDB;HIT000003838;HIX0008779;-.DR H-InvDB;HIT000030409;HIX0008779;-.DR H-InvDB;HIT000031821;HIX0008779;-.DR H-InvDB;HIT000037199;HIX0008779;-.DR H-InvDB;HIT000037659;
9、HIX0008779;-.DR UniParc;UPI0000000C5C;-;-.DR HGNC;5039;HNRPF;-.DR Entrez Gene;3185;HNRPF;-.DR UniGene;Hs.808;-;-.DR CCDS;CCDS7204.1;-;-.DR ReAlSplice protein;SL0000062;hnRNPF;factor involved in alternative splicing.DR trome;HTR002991;-;-.DR RZPD;Hs.808;-;Clones and other research material.DR CleanEx
10、;HS_HNRPF;-;-.DR InterPro;IPR012677;a_b_plait_nuc_bd.DR InterPro;IPR000504;RNP1_RNA_bd.DR InterPro;IPR012996;Znf_CHHC.DR Pfam;PF00076;RRM_1;3.DR Pfam;PF08080;zf-RNPHF;1.DR SMART;SM00360;RRM;3.DR PROSITE;PS50102;RRM;2.DR GENE3D;G3D.3.30.70.330;Nucl_bd_a/b_plat;3.SQ SEQUENCE 415 AA;45672 MW;D14E170631
11、FB1F31 CRC64;MMLGPEGGEG FVVKLRGLPW SCSVEDVQNF LSDCTIHDGA AGVHFIYTRE GRQSGEAFVELGSEDDVKMA LKKDRESMGH RYIEVFKSHR TEMDWVLKHS GPNSADSAND GFVRLRGLPFGCTKEEIVQF FSGLEIVPNG ITLPVDPEGK ITGEAFVQFA SQELAEKALG KHKERIGHRYIEVFKSSQEE VRSYSDPPLK FMSVQRPGPY DRPGTARRYI GIVKQAGLER MRPGAYSTGYGGYEEYSGLS DGYGFTTDLF GRDLS
12、YCLSG MYDHRYGDSE FTVQSTTGHC VHMRGLPYKATENDIYNFFS PLNPVRVHIE IGPDGRVTGE ADVEFATHEE AVAAMSKDRA NMQHRYIELFLNSTTGASNG AYSSQVMQGM GVSAAQATYS GLESQSVSGC YGAGYSGQNS MGGYD/数据格式数据格式-fasta*.fasta.gzIPI:IPI00000005.1|SWISS-PROT:P01111-3|TREMBL:P54111|REFSEQ:NP_002515;XP_032698;XP_001317|ENSEMBL:ENSP00000261444
13、|H-INV:HIT000032298 Tax_Id=9606 Transforming protein N-Ras MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAG QEEYSAMRDQYMRTGEGFLCVFAINNSKSFADINLYREQIKRVKDSDDVPMVLVGNKCDL PTRTVDTKQAHELAKSYGIPFIETSAKTRQGVEDAFYTLVREIRQYRMKKLNSSDDGTQG CMGLPCVVM 数据格式数据格式-Protein Cross-References File(*.xrefs)1
14、.Database from which master entry of this IPI entry has been taken.主条目的来源库2.UniProtKB accession number or Vega ID or Ensembl ID or RefSeq ID or TAIR Protein ID or H-InvDB ID.来源ID3.International Protein Index identifier.IPI号4.Supplementary UniProtKB/Swiss-Prot entries associated with this IPI entry.S
15、wiss-Prot号5.Supplementary UniProtKB/TrEMBL entries associated with this IPI entry.TrEMBL号6.Supplementary Ensembl entries associated with this IPI entry.Havana curated transcripts preceeded by the key HAVANA:(e.g.HAVANA:ENSP00000237305;ENSP00000356824;).7.Supplementary list of RefSeq STATUS:ID couple
16、s(separated by a semi-colon;)associated with this IPI entry(RefSeq entry revision status details).8.Supplementary TAIR Protein entries associated with this IPI entry.9.Supplementary H-Inv Protein entries associated with this IPI entry.10.Protein identifiers(cross reference to EMBL/Genbank/DDBJ nucle
17、otide databases).11.List of HGNC number,HGNC official gene symbol couples(separated by by a semi-colon;)associated with this IPI entry.12.List of NCBI Entrez Gene gene number,Entrez Gene Default Gene Symbol couples(separated by a semi-colon;)associated with this IPI entry.13.UNIPARC identifier assoc
18、iated with the sequence of this IPI entry.14.UniGene identifiers associated with this IPI entry.15.CCDS identifiers associated with this IPI entry.16.RefSeq GI protein identifiers associated with this IPI entry.17.Supplementary Vega entries associated with this IPI entry.数据格式 数据格式 Gene Cross-Referen
19、ces File Format(ipi.genes.*.xrefs)数据格式数据格式-GI Cross-References File format数据格式数据格式-InterPro Hits Format(ipi.ipc)数据格式数据格式-History File(*.history.gz)IPI ID Release version when ID was created Release version when ID was deleted,if available or-if not Successor ID,if available or-not CommentsMYSQL数据库及其
20、使用数据库及其使用gunzip ipi.HUMAN.mysql.gz;mysql-h host_name-u username-ppassword IPIhuman ipi.HUMAN.mysql扩展分析扩展分析 编码基因分布 分子量分布 Domain分析.功能分类-GOAGOA Gene Ontology Annotation Database The GOA project aims to provide high-quality Gene Ontology(GO)annotations to proteins in the UniProt Knowledgebase(UniProtKB)and International Protein Index(IPI)and is a central dataset for other major multi-species databases;such as Ensembl and NCBI.俞 鸿上海众信生物技术有限公司公司网站:http:/Email:yuhongscbit.org