收藏 分销(赏)

Webbased-Inference-Detection.ppt

上传人:精*** 文档编号:2404922 上传时间:2024-05-29 格式:PPT 页数:11 大小:335KB 下载积分:8 金币
下载 相关 举报
Webbased-Inference-Detection.ppt_第1页
第1页 / 共11页
Webbased-Inference-Detection.ppt_第2页
第2页 / 共11页


点击查看更多>>
资源描述
Richard ChowPhilippe GolleJessica StaddonPARCWeb-based Inference DetectionWeb 2.0 Security&Privacy,5/24/2007Declassified FBI ReportWeb search on:“sibling saudi magnate”ObservationsMost web pages with terms“sibling saudi magnate”also contain terms“osama bin laden”Hence,deduce the inference:sibling saudi magnate osama bin ladenGet most valid inferences,since the Web is a proxy for all human knowledgeNot complete though!Idea:Deduce inferences from co-occurrence of terms on the WebConceptual FrameworkConsider any Boolean formula of terms,e.g.(saudi AND magnate AND sibling),(osama AND bin AND laden)Evaluates to TRUE or FALSE for each Web pageOr,for each paragraph in each Web page.Strength of inference:Conditional ProbabilityGiven(PRECEDENT)is TRUE,what is probability that(CONSEQUENT)is TRUE?Write:(PRECEDENT)IMPLIES(CONSEQUENT)From now on,restrict to special case:Conjunction of terms implying another conjunction of termsOther cases may be of interest as well:(xxx)IMPLIES(Person1 OR Person2 OR)Traditional Association RulesProblem:Find market items that are commonly purchased togetherRules are of the form:(A)IMPLIES(B),A and B are sets of itemsLegendary example:(diapers)IMPLIES(beer)Confidence of a rule:Pr(B|A)Given that A is purchased,how likely is B to be purchased?Support of a rule:Pr(A and B)What portion of all purchases contain both A and B?Apriori(Agrawal et al):well-known algorithm for this problemWorks for given confidence and support cutoffsWeb Association RulesOur problem:Find terms that are commonly found together on web pages Key differences from traditional association rulesWeb is very large and unstructuredNatural Language Processing(NLP)may provide additional information since we are mining terms from textMore complex rules are of interestBoolean formulae such as(A)IMPLIES(B OR C)Linguistic patterns such as(a followed b)IMPLIES(C)Note that for privacy applications,need to find rules with very low supportApriori algorithm not directly usefulUsing search engines to estimate probabilitiesAnother WayProbability is about 81/234HIV Precision:Top 60 InferencesPrecision:fraction of“correct”inferences producedAnalyzed top precedents appearing in at least 100K documentsMedical expert reviewed these inferences28 were“correct”3 not necessarily connected to HIV,but were related conditions29 unknown or did not indicate HIVMedical expert appropriate for medical records-note that appropriate reviewer depends on the application“Montagnier”not considered“correct”,but was discoverer of the HIV virus“Kwazulu”not considered“correct”,but this province of SA has one of the highest HIV infection rates in the worldInference ProblemMore and more publicly available dataWeb 2.0 technologies becoming common“long tail of the Internet”How to control the release of data?What does the data reveal?Need automated techniquesScenarios:IndividualsAnonymous blogs or postingsRedaction of medical recordsCorporationsNews releasesIdentification of content representing riskGovernmentDeclassification of government documents
展开阅读全文

开通  VIP会员、SVIP会员  优惠大
下载10份以上建议开通VIP会员
下载20份以上建议开通SVIP会员


开通VIP      成为共赢上传

当前位置:首页 > 包罗万象 > 大杂烩

移动网页_全站_页脚广告1

关于我们      便捷服务       自信AI       AI导航        抽奖活动

©2010-2025 宁波自信网络信息技术有限公司  版权所有

客服电话:0574-28810668  投诉电话:18658249818

gongan.png浙公网安备33021202000488号   

icp.png浙ICP备2021020529号-1  |  浙B2-20240490  

关注我们 :微信公众号    抖音    微博    LOFTER 

客服