收藏 分销(赏)

Optics算法.doc

上传人:xrp****65 文档编号:5963690 上传时间:2024-11-24 格式:DOC 页数:11 大小:119.50KB 下载积分:10 金币
下载 相关 举报
Optics算法.doc_第1页
第1页 / 共11页
Optics算法.doc_第2页
第2页 / 共11页


点击查看更多>>
资源描述
1  什么是OPTICS算法 在前面介绍的DBSCAN算法中,有两个初始参数E(邻域半径)和minPts(E邻域最小点数)需要用户手动设置输入,并且聚类的类簇结果对这两个参数的取值非常敏感,不同的取值将产生不同的聚类结果,其实这也是大多数其他需要初始化参数聚类算法的弊端。 为了克服DBSCAN算法这一缺点,提出了OPTICS算法(Ordering Points to identify the clustering structure)。OPTICS并不显示的产生结果类簇,而是为聚类分析生成一个增广的簇排序(比如,以可达距离为纵轴,样本点输出次序为横轴的坐标图),这个排序代表了各样本点基于密度的聚类结构。它包含的信息等价于从一个广泛的参数设置所获得的基于密度的聚类,换句话说,从这个排序中可以得到基于任何参数E和minPts的DBSCAN算法的聚类结果。      2  OPTICS两个概念 核心距离: 对象p的核心距离是指是p成为核心对象的最小E’。如果p不是核心对象,那么p的核心距离没有任何意义。 可达距离: 对象q到对象p的可达距离是指p的核心距离和p与q之间欧几里得距离之间的较大值。如果p不是核心对象,p和q之间的可达距离没有意义。 例如:假设邻域半径E=2, minPts=3,存在点A(2,3),B(2,4),C(1,4),D(1,3),E(2,2),F(3,2) 点A为核心对象,在A的E领域中有点{A,B,C,D,E,F},其中A的核心距离为E’=1,因为在点A的E’邻域中有点{A,B,D,E}>3; 点F到核心对象点A的可达距离为,因为A到F的欧几里得距离,大于点A的核心距离1. 3 算法描述 OPTICS算法额外存储了每个对象的核心距离和可达距离。基于OPTICS产生的排序信息来提取类簇。 算法描述如下: 算法:OPTICS 输入:样本集D, 邻域半径E, 给定点在E领域内成为核心对象的最小领域点数MinPts 输出:具有可达距离信息的样本点输出排序 方法:1 创建两个队列,有序队列和结果队列。(有序队列用来存储核心对象及其该核心对象的直接可达对象,并按可达距离升序排列;结果队列用来存储样本点的输出次 序);        2 如果所有样本集D中所有点都处理完毕,则算法结束。否则,选择一个未处理(即不在结果队列中)且为核心对象的样本点,找到其所有直接密度可达样本点,如过该样本点不存在于结果队列中,则将其放入有序队列中,并按可达距离排序;       3 如果有序队列为空,则跳至步骤2,否则,从有序队列中取出第一个样本点(即可达距离最小的样本点)进行拓展,并将取出的样本点保存至结果队列中,如果它不存在结果队列当中的话。 3.1 判断该拓展点是否是核心对象,如果不是,回到步骤3,否则找到该拓展点所有的直接密度可达点; 3.2 判断该直接密度可达样本点是否已经存在结果队列,是则不处理,否则下一步; 3.2 如果有序队列中已经存在该直接密度可达点,如果此时新的可达距离小于旧的可达距离,则用新可达距离取代旧可达距离,有序队列重新排序; 3.3 如果有序队列中不存在该直接密度可达样本点,则插入该点,并对有序队列重新排序;        4 算法结束,输出结果队列中的有序样本点。 大家或许会很疑惑,这里不也有输入参数E和MinPts吗?其实这里的E和MinPts只是起到算法辅助作用,也就是说E和MinPts的细微变化并不会影响到样本点的相对输出顺序,这对我们分析聚类结果是没有任何影响。 我们采用与先前DBSCAN相同的样本点集合, 对于样本点 a={2,3};b={2,4};c={1,4};d={1,3};e={2,2};f={3,2}; g={8,7};h={8,6};i={7,7};j={7,6};k={8,5}; l={100,2};//孤立点 m={8,20};n={8,19};o={7,18};p={7,17};q={8,21}; 并且使用相同的E=2 MinPts=4时,输出序列为 1->a:1.0 2->e:1.0 3->b:1.0 4->d:1.0 5->c:1.4142135623730951 6->f:1.4142135623730951 ------ 7->g:1.4142135623730951 8->j:1.4142135623730951 9->k:1.4142135623730951 10->i:1.4142135623730951 11->h:1.4142135623730951 ------ 12->n:2.0 13->q:2.0 14->o:2.0 15->m:2.0 如图,按照算法,分三个阶段输出了三波值 {a,e,b,d,c,f} ,{g,j,k,I,h},{n,q,o,m} 这和DBSCAN的类簇结果是一样的。不仅如此,我们通过分析有序图还能直接得到当参数E=1.5,minPts=4时DBSCAN的类簇结果,只要在坐标图中找到Y值小于1.5的样本点即可,只有两类{a,e,b,d,c,f} ,{g,j,k,I,h},其他点被认为是孤立点,和DBSCAN聚类算法取E=1.5,minPts=4时的结果一致。 所以说,这个OPTICS聚类算法所得的簇排序信息等价于一个广泛的参数设置所获得的基于密度的聚类结果。 具体实现算法如下: package com.optics; public class DataPoint {     private String name; // 样本点名     private double dimensioin[]; // 样本点的维度     private double coreDistance; //核心距离,如果该点不是核心对象,则距离为-1     private double reachableDistance; //可达距离     public DataPoint(){     }     public DataPoint(DataPoint e){         this.name=e.name;         this.dimensioin=e.dimensioin;         this.coreDistance=e.coreDistance;         this.reachableDistance=e.reachableDistance;     }     public DataPoint(double dimensioin[],String name){         this.name=name;         this.dimensioin=dimensioin;         this.coreDistance=-1;         this.reachableDistance=-1;     }     public String getName() {         return name;     }     public void setName(String name) {         this.name = name;     }     public double[] getDimensioin() {         return dimensioin;     }     public void setDimensioin(double[] dimensioin) {         this.dimensioin = dimensioin;     }     public double getCoreDistance() {         return coreDistance;     }     public void setCoreDistance(double coreDistance) {         this.coreDistance = coreDistance;     }     public double getReachableDistance() {         return reachableDistance;     }     public void setReachableDistance(double reachableDistance) {         this.reachableDistance = reachableDistance;     } } package com.optics; import java.util.ArrayList; import java.util.Collections; import java.util.Comparator; import java.util.List; public class ClusterAnalysis {     class ComparatorDp implements Comparator<DataPoint>{         public int compare(DataPoint arg0, DataPoint arg1) {             double temp=arg0.getReachableDistance()-arg1.getReachableDistance();             int a = 0;             if (temp < 0) {                 a = -1;             } else {                 a = 1;             }             return a;         }     }         public List<DataPoint> startAnalysis(List<DataPoint> dataPoints,             double radius, int ObjectNum) {         List<DataPoint> dpList = new ArrayList<DataPoint>();         List<DataPoint> dpQue = new ArrayList<DataPoint>();         int total = 0;         while (total < dataPoints.size()) {             if (isContainedInList(dataPoints.get(total), dpList) == -1 ) {                 List<DataPoint> tmpDpList = isKeyAndReturnObjects(dataPoints.get(total),                         dataPoints, radius, ObjectNum);                 if(tmpDpList != null && tmpDpList.size() > 0){                     DataPoint newDataPoint=new DataPoint(dataPoints.get(total));                    dpQue.add(newDataPoint);                 }             }             while (!dpQue.isEmpty()) {                 DataPoint tempDpfromQ = dpQue.remove(0);                 DataPoint newDataPoint=new DataPoint(tempDpfromQ);                 dpList.add(newDataPoint);                 List<DataPoint> tempDpList = isKeyAndReturnObjects(tempDpfromQ,                         dataPoints, radius, ObjectNum);                System.out.println(newDataPoint.getName()+":"+newDataPoint.getReachableDistance());                 if (tempDpList != null && tempDpList.size() > 0) {                     for (int i = 0; i < tempDpList.size(); i++) {                         DataPoint tempDpfromList = tempDpList.get(i);                         int indexInList = isContainedInList(tempDpfromList,                                 dpList);                         int indexInQ = isContainedInList(tempDpfromList, dpQue);                         if (indexInList == -1) {                             if (indexInQ > -1) {                                 int index = -1;                                 for (DataPoint dataPoint : dpQue) {                                     index++;                                     if (index == indexInQ) {                                         if (dataPoint.getReachableDistance() > tempDpfromList                                                 .getReachableDistance()) {                                             dataPoint                                                     .setReachableDistance(tempDpfromList                                                             .getReachableDistance());                                         }                                     }                                 }                             } else {                                 dpQue.add(new DataPoint(tempDpfromList));                             }                         }                     }                     // TODO:对Q进行重新排序                     Collections.sort(dpQue, new ComparatorDp());                 }             }             System.out.println("------");             total++;         }         return dpList;     }         public void displayDataPoints(List<DataPoint> dps){         for(DataPoint dp: dps){             System.out.println(dp.getName()+":"+dp.getReachableDistance());         }     }         private int isContainedInList(DataPoint dp, List<DataPoint> dpList) {         int index = -1;         for (DataPoint dataPoint : dpList) {             index++;             if (dataPoint.getName().equals(dp.getName())) {                 return index;             }         }         return -1;     }        private List<DataPoint> isKeyAndReturnObjects(DataPoint dataPoint,List<DataPoint> dataPoints,double radius,int ObjectNum){        List<DataPoint> arrivableObjects=new ArrayList<DataPoint>(); //用来存储所有直接密度可达对象        List<Double> distances=new ArrayList<Double>(); //欧几里得距离        double coreDistance; //核心距离         for (int i = 0; i < dataPoints.size(); i++) {             DataPoint dp = dataPoints.get(i);             double distance = getDistance(dataPoint, dp);             if (distance <= radius) {                 distances.add(distance);                 arrivableObjects.add(dp);             }         }        if(arrivableObjects.size()>=ObjectNum){            List<Double> newDistances=new ArrayList<Double>(distances);            Collections.sort(distances);            coreDistance=distances.get(ObjectNum-1);            for(int j=0;j<arrivableObjects.size();j++){                 if (coreDistance > newDistances.get(j)) {                     if(newDistances.get(j)==0){                         dataPoint.setReachableDistance(coreDistance);                     }                     arrivableObjects.get(j).setReachableDistance(coreDistance);                 }else{                     arrivableObjects.get(j).setReachableDistance(newDistances.get(j));                 }            }            return arrivableObjects;        }        return null;    }         private double getDistance(DataPoint dp1,DataPoint dp2){         double distance=0.0;         double[] dim1=dp1.getDimensioin();         double[] dim2=dp2.getDimensioin();         if(dim1.length==dim2.length){             for(int i=0;i<dim1.length;i++){                 double temp=Math.pow((dim1[i]-dim2[i]), 2);                 distance=distance+temp;             }             distance=Math.pow(distance, 0.5);             return distance;         }         return distance;     }     public static void main(String[] args){          ArrayList<DataPoint> dpoints = new ArrayList<DataPoint>();                   double[] a={2,3};          double[] b={2,4};          double[] c={1,4};          double[] d={1,3};          double[] e={2,2};          double[] f={3,2};          double[] g={8,7};          double[] h={8,6};          double[] i={7,7};          double[] j={7,6};          double[] k={8,5};          double[] l={100,2};//孤立点          double[] m={8,20};          double[] n={8,19};          double[] o={7,18};          double[] p={7,17};          double[] q={8,21};          dpoints.add(new DataPoint(a,"a"));          dpoints.add(new DataPoint(b,"b"));          dpoints.add(new DataPoint(c,"c"));          dpoints.add(new DataPoint(d,"d"));          dpoints.add(new DataPoint(e,"e"));          dpoints.add(new DataPoint(f,"f"));          dpoints.add(new DataPoint(g,"g"));          dpoints.add(new DataPoint(h,"h"));          dpoints.add(new DataPoint(i,"i"));          dpoints.add(new DataPoint(j,"j"));          dpoints.add(new DataPoint(k,"k"));          dpoints.add(new DataPoint(l,"l"));          dpoints.add(new DataPoint(m,"m"));          dpoints.add(new DataPoint(n,"n"));          dpoints.add(new DataPoint(o,"o"));          dpoints.add(new DataPoint(p,"p"));          dpoints.add(new DataPoint(q,"q"));          ClusterAnalysis ca=new ClusterAnalysis();          List<DataPoint> dps=ca.startAnalysis(dpoints, 2, 4);          ca.displayDataPoints(dps);     } }
展开阅读全文

开通  VIP会员、SVIP会员  优惠大
下载10份以上建议开通VIP会员
下载20份以上建议开通SVIP会员


开通VIP      成为共赢上传

当前位置:首页 > 百科休闲 > 其他

移动网页_全站_页脚广告1

关于我们      便捷服务       自信AI       AI导航        抽奖活动

©2010-2025 宁波自信网络信息技术有限公司  版权所有

客服电话:4009-655-100  投诉/维权电话:18658249818

gongan.png浙公网安备33021202000488号   

icp.png浙ICP备2021020529号-1  |  浙B2-20240490  

关注我们 :微信公众号    抖音    微博    LOFTER 

客服