2016-06-03 2 views
0

j가 +1되면 목록의 목록 (X_train_folds)이 범위를 벗어날 수 있기 때문에 아래 코드 (여기에서 "여기에 레이블을 붙인 부분")에서 왜 당황 스럽습니까? 범위의 끝. 왜 이것이 효과가 있을까요? vstack이이 변경을 자동으로 감지 할 수 있기 때문입니까? 나는 그것에 대한 어떠한 문서도 찾을 수 없었다.Numpy : vstack이 자동으로 인덱스가 범위를 벗어 났음을 감지하고 수정합니까?

num_folds = 5 
k_choices = [1, 3, 5, 8, 10, 12, 15, 20, 50, 100] 

X_train_folds = [] 
y_train_folds = [] 
################################################################################ 
# Split up the training data into folds. After splitting, X_train_folds and # 
# y_train_folds should each be lists of length num_folds, where    # 
# y_train_folds[i] is the label vector for the points in X_train_folds[i].  # 
# Hint: Look up the numpy array_split function.        # 
################################################################################ 
X_train_folds = np.array_split(X_train, num_folds) 
y_train_folds = np.array_split(y_train, num_folds) 

# print y_train_folds 

# A dictionary holding the accuracies for different values of k that we find 
# when running cross-validation. After running cross-validation, 
# k_to_accuracies[k] should be a list of length num_folds giving the different 
# accuracy values that we found when using that value of k. 
k_to_accuracies = {} 

################################################################################ 
# Perform k-fold cross validation to find the best value of k. For each  # 
# possible value of k, run the k-nearest-neighbor algorithm num_folds times, # 
# where in each case you use all but one of the folds as training data and the # 
# last fold as a validation set. Store the accuracies for all fold and all  # 
# values of k in the k_to_accuracies dictionary.        # 
################################################################################ 

for k in k_choices: 
    k_to_accuracies[k] = [] 

for k in k_choices: 
    print 'evaluating k=%d' % k 
    for j in range(num_folds): 
     X_train_cv = np.vstack(X_train_folds[0:j]+X_train_folds[j+1:])#<--------------HERE 
     X_test_cv = X_train_folds[j] 

     #print len(y_train_folds), y_train_folds[0].shape 

     y_train_cv = np.hstack(y_train_folds[0:j]+y_train_folds[j+1:]) #<----------------HERE 
     y_test_cv = y_train_folds[j] 

     #print 'Training data shape: ', X_train_cv.shape 
     #print 'Training labels shape: ', y_train_cv.shape 
     #print 'Test data shape: ', X_test_cv.shape 
     #print 'Test labels shape: ', y_test_cv.shape 

     classifier.train(X_train_cv, y_train_cv) 
     dists_cv = classifier.compute_distances_no_loops(X_test_cv) 
     #print 'predicting now' 
     y_test_pred = classifier.predict_labels(dists_cv, k) 
     num_correct = np.sum(y_test_pred == y_test_cv) 
     accuracy = float(num_correct)/num_test 

     k_to_accuracies[k].append(accuracy) 

################################################################################ 
#         END OF YOUR CODE        # 
################################################################################ 

# Print out the computed accuracies 
for k in sorted(k_to_accuracies): 
    for accuracy in k_to_accuracies[k]: 
     print 'k = %d, accuracy = %f' % (k, accuracy) 
+0

'X_train'과'y_train'은 무엇입니까? – miradulo

+0

슬라이스를 사용하고 있기 때문입니다. 'np.arange (5) [: 10]'은 괜찮습니다. – hpaulj

답변

1

번호 vstack은 그 원인이 있지만, indexation of numpy는 매우 강력한되지 않습니다. numpy의 내부 구조는 복잡하며 때로는 복사본을 반환하지만 다른 경우는 view입니다. 그러나 두 경우 모두 메소드를 시작합니다. 그리고 특히이 방법은 인덱싱 자체가 비어있는 경우 (배열 공간 외부로) empty array을 반환합니다.

import numpy as np 

a = np.array([1, 2, 3]) 
print(a[10:]) # This will return empty 
print(a[10]) # This is an error 

결과이다 :

[]

Traceback (most recent call last): File "C:/Users/imactuallyavegetable/temp.py", line 333, in print(a[10]) IndexError: index 10 is out of bounds for axis 0 with size 3

먼저 빈 배열, 제 예외

는 다음 예와 (print)에서 필연적 출력 참조.

+0

와우 정말 도움이됩니다 !!! 정말 고맙습니다! – kwotsin

관련 문제