Python聚類算法之DBSACN實例分析-CDA數據分析師官網

熱線電話：13121318867

登錄

首頁精彩閱讀Python聚類算法之DBSACN實例分析

Python聚類算法之DBSACN實例分析

2018-05-23

收藏

Python聚類算法之DBSACN實例分析

本文實例講述了Python聚類算法之DBSACN。分享給大家供大家參考，具體如下：
DBSCAN：是一種簡單的，基于密度的聚類算法。本次實現中，DBSCAN使用了基于中心的方法。在基于中心的方法中，每個數據點的密度通過對以該點為中心以邊長為2*EPs的網格(鄰域)內的其他數據點的個數來度量。根據數據點的密度分為三類點:
核心點：該點在鄰域內的密度超過給定的閥值MinPs。
邊界點：該點不是核心點，但是其鄰域內包含至少一個核心點。
噪音點：不是核心點，也不是邊界點。
有了以上對數據點的劃分，聚合可以這樣進行：各個核心點與其鄰域內的所有核心點放在同一個簇中，把邊界點跟其鄰域內的某個核心點放在同一個簇中。
# scoding=utf-8
import pylab as pl
from collections import defaultdict,Counter
points = [[int(eachpoint.split("#")[0]), int(eachpoint.split("#")[1])] for eachpoint in open("points","r")]
# 計算每個數據點相鄰的數據點，鄰域定義為以該點為中心以邊長為2*EPs的網格
Eps = 10
surroundPoints = defaultdict(list)
for idx1,point1 in enumerate(points):
for idx2,point2 in enumerate(points):
    if (idx1 < idx2):
      if(abs(point1[0]-point2[0])<=Eps and abs(point1[1]-point2[1])<=Eps):
        surroundPoints[idx1].append(idx2)
        surroundPoints[idx2].append(idx1)
# 定義鄰域內相鄰的數據點的個數大于4的為核心點
MinPts = 5
corePointIdx = [pointIdx for pointIdx,surPointIdxs in surroundPoints.iteritems() if len(surPointIdxs)>=MinPts]
# 鄰域內包含某個核心點的非核心點，定義為邊界點
borderPointIdx = []
for pointIdx,surPointIdxs in surroundPoints.iteritems():
if (pointIdx not in corePointIdx):
    for onesurPointIdx in surPointIdxs:
      if onesurPointIdx in corePointIdx:
        borderPointIdx.append(pointIdx)
        break
# 噪音點既不是邊界點也不是核心點
noisePointIdx = [pointIdx for pointIdx in range(len(points)) if pointIdx not in corePointIdx and pointIdx not in borderPointIdx]
corePoint = [points[pointIdx] for pointIdx in corePointIdx]
borderPoint = [points[pointIdx] for pointIdx in borderPointIdx]
noisePoint = [points[pointIdx] for pointIdx in noisePointIdx]
# pl.plot([eachpoint[0] for eachpoint in corePoint], [eachpoint[1] for eachpoint in corePoint], 'or')
# pl.plot([eachpoint[0] for eachpoint in borderPoint], [eachpoint[1] for eachpoint in borderPoint], 'oy')
# pl.plot([eachpoint[0] for eachpoint in noisePoint], [eachpoint[1] for eachpoint in noisePoint], 'ok')
groups = [idx for idx in range(len(points))]
# 各個核心點與其鄰域內的所有核心點放在同一個簇中
for pointidx,surroundIdxs in surroundPoints.iteritems():
for oneSurroundIdx in surroundIdxs:
    if (pointidx in corePointIdx and oneSurroundIdx in corePointIdx and pointidx < oneSurroundIdx):
      for idx in range(len(groups)):
        if groups[idx] == groups[oneSurroundIdx]:
          groups[idx] = groups[pointidx]
# 邊界點跟其鄰域內的某個核心點放在同一個簇中
for pointidx,surroundIdxs in surroundPoints.iteritems():
for oneSurroundIdx in surroundIdxs:
    if (pointidx in borderPointIdx and oneSurroundIdx in corePointIdx):
      groups[pointidx] = groups[oneSurroundIdx]
      break
# 取簇規模最大的5個簇
wantGroupNum = 3
finalGroup = Counter(groups).most_common(3)
finalGroup = [onecount[0] for onecount in finalGroup]
group1 = [points[idx] for idx in xrange(len(points)) if groups[idx]==finalGroup[0]]
group2 = [points[idx] for idx in xrange(len(points)) if groups[idx]==finalGroup[1]]
group3 = [points[idx] for idx in xrange(len(points)) if groups[idx]==finalGroup[2]]
pl.plot([eachpoint[0] for eachpoint in group1], [eachpoint[1] for eachpoint in group1], 'or')
pl.plot([eachpoint[0] for eachpoint in group2], [eachpoint[1] for eachpoint in group2], 'oy')
pl.plot([eachpoint[0] for eachpoint in group3], [eachpoint[1] for eachpoint in group3], 'og')
# 打印噪音點，黑色
pl.plot([eachpoint[0] for eachpoint in noisePoint], [eachpoint[1] for eachpoint in noisePoint], 'ok')
pl.show()

運行效果截圖如下：

希望本文所述對大家Python程序設計有所幫助。

CDA數據分析師考試相關入口一覽（建議收藏）：

? 想報名CDA認證考試，點擊>>> “CDA報名” 了解CDA考試詳情；

? 想學習CDA考試教材，點擊>>> “CDA教材” 了解CDA考試詳情；

? 想加入CDA考試題庫，點擊>>> “CDA題庫” 了解CDA考試詳情；

? 想了解CDA考試含金量，點擊>>> “CDA含金量” 了解CDA考試詳情；

聚類

數據分析咨詢請掃描二維碼

若不方便掃碼，搜微信號：CDAshujufenxi

上一篇回歸系列（一）| 怎樣正確地理解線性回歸

下一篇2020美國總統競選大戲開鑼，川普當選的奇跡會再發生嗎？

數據分析師考試動態

考試介紹
考試大綱
考試內容
考試地點

CDA報考指南

報考流程
考試時間
報名費用
聯系我們

數據分析學習

數據分析師資訊

更多

Copyright © 2015-2021, www.ruiqisteel.com All Rights Reserved. CDA數據分析師(北京國富如荷網絡科技有限公司) 版權所有京ICP備11001960號-9

京公網安備 11010802034615號經營許可證編號：京B2-20210330

聯系電話：13321103290 (微信同號)

OK

免費資料
免費試聽
訂制課程
職業規劃
認證考試

客服在線

日韩人妻系列无码专区视频,先锋高清无码,无码免费视欧非,国精产品一区一区三区无码

客服在線

立即咨詢

免密碼登錄

提交首次登錄驗證后自動注冊