커패시턴스가있는 k-means 클러스터링?

저는 알고리즘과 최적화에 초보자입니다.
k- 수용체을 구현하려고했지만 지금까지 해결되지 않았고 결과가 좋지 않습니다.
이것은 CVRP 시뮬레이션의 일부로 사용됩니다 (용량이있는 차량 라우팅 문제).
참조 된 알고리즘을 잘못 해석하면 궁금합니다.커패시턴스가있는 k-means 클러스터링?

참조 : "Improved K-Means Algorithm for Capacitated Clustering Problem" (Geetha, Poonthalir, Vanathi)

시뮬레이션 CVRP 1 개 창고로, 15 개 고객을 가지고있다.
각 고객은 유클리드 좌표 (x, y)와 수요가 있습니다.
3 대의 차량이 있으며 용량은 각각 90 명입니다.

이렇게 용량이 큰 k- 수단은 15 명의 고객을 3 대의 차량으로 클러스터링하려고합니다. 각 클러스터의 총 요구량은 차량 용량을 초과하지 않아야합니다.

UPDATE : 참조 된 알고리즘에서

, 나는 코드가 "옆에 가까운 무게 중심"에서 실행될 때 무엇을해야하는지에 대한 정보를 잡을 수 없었다.
즉, 가장 가까운 중심선을 모두 조사한 경우 customers[1]은 아직 할당되지 않은 상태에서 단계 14.b입니다.

이렇게하면 색인 1이 할당되지 않은 고객이 생성됩니다.
참고 : customer[1]은 가장 수요가 많은 고객입니다 (30).
Q :이 조건이 충족되면 코드가 수행해야하는 작업은 무엇입니까?

다음은 참조 된 알고리즘에 대한 나의 해석입니다. 제 코드를 수정하십시오. 감사합니다.

주어 n 스터 (고객) n = customerCount 및 저장소
N 수요
N 좌표 (x, y)는 클러스터
계산 번호 k = (모든 요구 사항의 합)/vehicleCapacity
초기 중량 선택,
5.a. demand을 기준으로 고객을 내림차순으로 정렬 = d_customers,
5.b. ,
진 매트릭스 bin_matrix, 치수 = (customerCount) x (k) 만들기, 초기의 무게 중심 = centroids[0 .. k-1]로 d_customers에서 k 첫 번째 고객을 선택
6.A. 모든 0이 포함 된 bin_matrix을 채우십시오.
시작 WHILE 루프, 조건 = WHILE not converged.
7.a.converged = False
each customers FOR 루프 상태 = FOR 시작
8.a. 고객의 인덱스 = 제가
모든 centroids =>edist
9.a.에 customers[i]에서 유클리드 거리를 계산 edist을 오름차순으로 정렬하십시오.
9.b. 루프, 조건 = while customers[i] 어떤 클러스터에 지정되지 않은 상태 = 가장 가까운 거리에 closest_centroid
시작을 처음 centroid을 선택합니다.
그룹 다른 모든 할당되지 않은 고객 = G,
11.a. G에 대해 closest_centroid을 중심으로 간주하십시오.
계산 우선 customersG 각 Pi,
12.a. 우선 Pi = (distance from customers[i] to closest_cent)/demand[i]
12.b. 최고 우선 순위가 Pi 인 고객을 선택하십시오.
12.c. 우선 순위가 가장 높은 고객은 index = hpc
12d입니다. Q : 우선 순위가 가장 높은 고객을 찾을 수없는 경우 어떻게해야합니까?
할당 centroids[closest_centroid]-customers[hpc] 가능한 경우.
13.a. customers[hpc] = d1,
13b. 중도 회원의 모든 요구의 합 = dtot,
13.c. IF (d1 + dtot) <= vehicleCapacity, THEN ..
13d. customers[hpc]을 centroids[closest_centroid]
13e로 지정하십시오. bin_matrix 업데이트, 행 인덱스 = hpc, 열 인덱스 = closest_centroid, 1으로 설정하십시오.
customers[i] ... THEN는 어떤 클러스터에 (정지) not assigned 경우
14.a. edist에서 가장 가까운 거리에 next nearest centroid을 선택하십시오.
14.b. Q : 다음 가장 가까운 중심이 없다면 무엇을해야합니까?
계산 이전 행렬과 업데이트 된 행렬 bin_matrix를 비교하여 수렴 됨.
15.a. bin_matrix에 변경 사항이없는 경우 converged = True을 설정하십시오.
그렇지 않은 경우 업데이트 된 클러스터에서 new centroids을 계산하십시오.
16.a. 각 클러스터의 구성원을 기반으로 새 centroids' coordinates을 계산하십시오.
16.b. sum_x = 클러스터의 모두 x-coordinate의 합 members,
16.c.num_c = 클러스터의 모든 숫자는 customers (members)
16.d입니다. 새로운 중심의 x-coordinate은 클러스터 = sum_x/num_c입니다.
16.e. 동일한 수식을 사용하여 클러스터의 새로운 중심 값 y-coordinate = sum_y/num_c을 계산합니다.
은 주 WHILE 루프를 반복합니다.

내 코드는 항상 단계 14.b에서 할당되지 않은 고객과 종료됩니다.
그래도 어떤 중심에 아직 할당되지 않은 customers[i]이 있고 "다음 가장 가까운 중심"이 부족합니다.

그리고 결과 클러스터가 열악합니다. 출력 그래프 : 사진 -In

가, 스타가 중심이고, 사각형은 창고입니다.
그림에서 고객은 "1"이라는 레이블이 붙어 있고 demand = 30은 항상 할당 된 클러스터없이 끝났습니다. 첫 번째와 세 번째 클러스터가 잘못 계산

k_cluster 3 
idx [ 1 -1 1 0 2 0 1 1 2 2 2 0 0 2 0] 
centroids [(22.6, 29.2), (34.25, 60.25), (39.4, 33.4)] 
members [[3, 14, 12, 5, 11], [0, 2, 6, 7], [9, 8, 4, 13, 10]] 
demands [86, 65, 77]

프로그램

출력. 인덱스
idx은 '1을'(-1)

Q 할당되지 않은 : 어떻게 내 해석과 내 구현 잘못?
모든 수정, 제안, 도움을 주시면 대단히 감사하겠습니다. 여기

내 전체 코드입니다 :

#!/usr/bin/python 
# -*- coding: utf-8 -*- 
# pastebin.com/UwqUrHhh 
# output graph: i.imgur.com/u3v2OFt.png 

import math 
import random 
from operator import itemgetter 
from copy import deepcopy 
import numpy 
import pylab 

# depot and customers, [index, x, y, demand] 
depot = [0, 30.0, 40.0, 0] 
customers = [[1, 37.0, 52.0, 7], \ 
      [2, 49.0, 49.0, 30], [3, 52.0, 64.0, 16], \ 
      [4, 20.0, 26.0, 9], [5, 40.0, 30.0, 21], \ 
      [6, 21.0, 47.0, 15], [7, 17.0, 63.0, 19], \ 
      [8, 31.0, 62.0, 23], [9, 52.0, 33.0, 11], \ 
      [10, 51.0, 21.0, 5], [11, 42.0, 41.0, 19], \ 
      [12, 31.0, 32.0, 29], [13, 5.0, 25.0, 23], \ 
      [14, 12.0, 42.0, 21], [15, 36.0, 16.0, 10]] 
customerCount = 15 
vehicleCount = 3 
vehicleCapacity = 90 
assigned = [-1] * customerCount 

# number of clusters 
k_cluster = 0 
# binary matrix 
bin_matrix = [] 
# coordinate of centroids 
centroids = [] 
# total demand for each cluster, must be <= capacity 
tot_demand = [] 
# members of each cluster 
members = [] 
# coordinate of members of each cluster 
xy_members = [] 

def distance(p1, p2): 
    return math.sqrt((p1[0] - p2[0])**2 + (p1[1] - p2[1])**2) 

# capacitated k-means clustering 
# http://www.dcc.ufla.br/infocomp/artigos/v8.4/art07.pdf 
def cap_k_means(): 
    global k_cluster, bin_matrix, centroids, tot_demand 
    global members, xy_members, prev_members 

    # calculate number of clusters 
    tot_demand = sum([c[3] for c in customers]) 
    k_cluster = int(math.ceil(float(tot_demand)/vehicleCapacity)) 
    print 'k_cluster', k_cluster 

    # initial centroids = first sorted-customers based on demand 
    d_customers = sorted(customers, key=itemgetter(3), reverse=True) 
    centroids, tot_demand, members, xy_members = [], [], [], [] 
    for i in range(k_cluster): 
     centroids.append(d_customers[i][1:3]) # [x,y] 

     # initial total demand and members for each cluster 
     tot_demand.append(0) 
     members.append([]) 
     xy_members.append([]) 

    # binary matrix, dimension = customerCount-1 x k_cluster 
    bin_matrix = [[0] * k_cluster for i in range(len(customers))] 

    converged = False 
    while not converged: # until no changes in formed-clusters 
     prev_matrix = deepcopy(bin_matrix) 

     for i in range(len(customers)): 
      edist = [] # list of distance to clusters 

      if assigned[i] == -1: # if not assigned yet 
       # Calculate the Euclidean distance to each of k-clusters 
       for k in range(k_cluster): 
        p1 = (customers[i][1], customers[i][2]) # x,y 
        p2 = (centroids[k][0], centroids[k][1]) 
        edist.append((distance(p1, p2), k)) 

       # sort, based on closest distance 
       edist = sorted(edist, key=itemgetter(0)) 

      closest_centroid = 0 # first index of edist 
      # loop while customer[i] is not assigned 
      while assigned[i] == -1: 
       # calculate all unsigned customers (G)'s priority 
       max_prior = (0, -1) # value, index 
       for n in range(len(customers)): 
        pc = customers[n] 

        if assigned[n] == -1: # if unassigned 
         # get index of current centroid 
         c = edist[closest_centroid][1] 
         cen = centroids[c]  # x,y 

         # distance_cost/demand 
         p = distance((pc[1], pc[2]), cen)/pc[3] 

         # find highest priority 
         if p > max_prior[0]: 
          max_prior = (p, n) # priority,customer-index 

       # if highest-priority is not found, what should we do ??? 
       if max_prior[1] == -1: 
        break 

       # try to assign current cluster to highest-priority customer 
       hpc = max_prior[1] # index of highest-priority customer 
       c = edist[closest_centroid][1] # index of current cluster 

       # constraint, total demand in a cluster <= capacity 
       if tot_demand[c] + customers[hpc][3] <= vehicleCapacity: 
        # assign new member of cluster 
        members[c].append(hpc) # add index of customer 

        xy = (customers[hpc][1], customers[hpc][2]) # x,y 
        xy_members[c].append(xy) 

        tot_demand[c] += customers[hpc][3] 
        assigned[hpc] = c # update cluster to assigned-customer 

        # update binary matrix 
        bin_matrix[hpc][c] = 1 

       # if customer is not assigned then, 
       if assigned[i] == -1: 
        if closest_centroid < len(edist)-1: 
         # choose the next nearest centroid 
         closest_centroid += 1 

        # if run out of closest centroid, what must we do ??? 
        else: 
         break # exit without centroid ??? 

      # end while 
     # end for 

     # Calculate the new centroid from the formed clusters 
     for j in range(k_cluster): 
      xj = sum([cn[0] for cn in xy_members[j]]) 
      yj = sum([cn[1] for cn in xy_members[j]]) 
      xj = float(xj)/len(xy_members[j]) 
      yj = float(yj)/len(xy_members[j]) 
      centroids[j] = (xj, yj) 

     # calculate converged 
     converged = numpy.array_equal(numpy.array(prev_matrix), numpy.array(bin_matrix)) 
    # end while 

def clustering(): 
    cap_k_means() 

    # debug plot 
    idx = numpy.array([c for c in assigned]) 
    xy = numpy.array([(c[1], c[2]) for c in customers]) 

    COLORS = ["Blue", "DarkSeaGreen", "DarkTurquoise", 
      "IndianRed", "MediumVioletRed", "Orange", "Purple"] 

    for i in range(min(idx), max(idx)+1): 
     clr = random.choice(COLORS) 
     pylab.plot(xy[idx==i, 0], xy[idx==i, 1], color=clr, \ 
      linestyle='dashed', \ 
      marker='o', markerfacecolor=clr, markersize=8) 
    pylab.plot(centroids[:][0], centroids[:][1], '*k', markersize=12) 
    pylab.plot(depot[1], depot[2], 'sk', markersize=12) 

    for i in range(len(idx)): 
     pylab.annotate(str(i), xy[i]) 

    pylab.savefig('clust1.png') 
    pylab.show() 

    return idx 

def main(): 
    idx = clustering() 
    print 'idx', idx 
    print 'centroids', centroids 
    print 'members', members 
    print 'demands', tot_demand 

if __name__ == '__main__': 
    main()

출처

2013-08-01 silo

나는 downvote에 대한 충동에 저항하고 있습니다 - 이것은 너무 많은 정보입니다. 할당되지 않은 색인 1을 분류하면 해결할 수 있습니까? – doctorlove

@doctorlove 답변을 주셔서 대단히 감사합니다. 위의 "UPDATE (업데이트)"섹션 바로 뒤의 질문을 업데이트했습니다. (귀하의 의견을 오해 할 가능성이있어서 죄송합니다. 저는 영어 원어민이 아니므로, Google이 구문을 번역 한 후에도 "정렬을 위해 해결"의 의미에 대해 의문의 여지가 없습니다. 매우 유감 스럽습니다, 감사합니다.) – silo

s/정렬 "/"문제를 다루십시오 " – doctorlove

총 수요가 전체 용량에 가까울 때이 문제는 bin packing의 양상을 나타 내기 시작합니다. 발견 한 것처럼이 특정 알고리즘의 욕심 많은 접근 방식은 항상 성공적이라고 할 수 없습니다. 나는 저자가 그것을 인정했는지 여부를 모르지만, 그렇지 않은 경우, 검토자가 그것을 잡았어야했다.

이 알고리즘과 같은 것을 계속하려면 integer programming을 사용하여 중도 자에게 요청자를 할당 해 봅니다.

출처

2013-08-01 14:01:30

나는 그것을 이해한다. 고맙습니다. 그런 다음 혼합 정수 프로그래밍으로 전환하겠습니다. – silo

모든 세부 사항을 거치지 않고, 당신이 인용 논문은 섹션의 끝 부분에있는 알고리즘에

if ri is not assigned then 
    choose the next nearest centroid 
end if

를 말한다 5.

다음 가장 가까운 중심이 있어야합니다. 두 개가 등거리라면 나는 당신이 선택한 것이 중요하지 않다고 추정합니다.

출처

2013-08-01 12:39:53 doctorlove

고맙습니다. 다음 가장 가까운 중심을 목록에 저장합니다. 가장 근접한 중심이 3 개까지 있습니다. 필자의 경우, 고객 [1]은 아직 할당되지 않은 상태에서 다음 가장 가까운 센트로가 3 개 선택되었습니다. 우선 순위가 높은 "자체 및 기타"할당되지 않은 고객을 기반으로 "교차"할당을 수행하므로 고객 [1]은 더 이상 우선 순위가 없어 "다음"이 없을 때까지 할당 할 기회가 없습니다. 가장 가까운 중심 "(이미 가장 가까운 중심의 세 번째 색인에 부딪쳤다). 감사. – silo

아우슈비주 - 알 고리즘의 변종을 아는 다른 사람이 도움을 받아 도움을 줄 수 있기를 바랍니다. – doctorlove

내 문제를 조사하고 답변하고 댓글을 달았 기 때문에 고맙습니다. 정말 고맙습니다. – silo

커패시턴스가있는 k-means 클러스터링?

답변

관련 문제