2016-10-03 7 views
0

LeaveOneGroupOut 전략을 사용하여 모델을 평가하고 싶습니다. sklearn's document에 따르면, cross_val_score가 편리해 보입니다.sklearn의 cross_val_score에 LeaveOneGroupOut 전략 사용

그러나 다음 코드는 작동하지 않습니다.

import sklearn 
from sklearn import datasets 
iris = datasets.load_iris() 
from sklearn.model_selection import cross_val_score 
clf = sklearn.svm.SVC(kernel='linear', C=1) 
# cv = ShuffleSplit(n_splits=3, test_size=0.3, random_state=0) # => this works 
cv = LeaveOneGroupOut # => this does not work 
scores = cross_val_score(clf, iris.data, iris.target, cv=cv) 

오류 메시지는 다음과 같습니다

ValueError        Traceback (most recent call last) 
<ipython-input-40-435a3a7fa16c> in <module>() 
     4 from sklearn.model_selection import cross_val_score 
     5 clf = sklearn.svm.SVC(kernel='linear', C=1) 
----> 6 scores = cross_val_score(clf, iris.data, iris.target, cv=LeaveOneGroupOut()) 
     7 scores 

/Users/xxx/.pyenv/versions/anaconda-2.0.1/lib/python2.7/site-packages/sklearn/model_selection/_validation.pyc in cross_val_score(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch) 
    138            train, test, verbose, None, 
    139            fit_params) 
--> 140      for train, test in cv.split(X, y, groups)) 
    141  return np.array(scores)[:, 0] 
    142 

/Users/xxx/.pyenv/versions/anaconda-2.0.1/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self, iterable) 
    756    # was dispatched. In particular this covers the edge 
    757    # case of Parallel used with an exhausted iterator. 
--> 758    while self.dispatch_one_batch(iterator): 
    759     self._iterating = True 
    760    else: 

/Users/xxx/.pyenv/versions/anaconda-2.0.1/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in dispatch_one_batch(self, iterator) 
    601 
    602   with self._lock: 
--> 603    tasks = BatchedCalls(itertools.islice(iterator, batch_size)) 
    604    if len(tasks) == 0: 
    605     # No more tasks available in the iterator: tell caller to stop. 

/Users/xxx/.pyenv/versions/anaconda-2.0.1/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __init__(self, iterator_slice) 
    125 
    126  def __init__(self, iterator_slice): 
--> 127   self.items = list(iterator_slice) 
    128   self._size = len(self.items) 
    129 

/Users/xxx/.pyenv/versions/anaconda-2.0.1/lib/python2.7/site-packages/sklearn/model_selection/_validation.pyc in <genexpr>(***failed resolving arguments***) 
    135  parallel = Parallel(n_jobs=n_jobs, verbose=verbose, 
    136       pre_dispatch=pre_dispatch) 
--> 137  scores = parallel(delayed(_fit_and_score)(clone(estimator), X, y, scorer, 
    138            train, test, verbose, None, 
    139            fit_params) 

/Users/xxx/.pyenv/versions/anaconda-2.0.1/lib/python2.7/site-packages/sklearn/model_selection/_split.pyc in split(self, X, y, groups) 
    88   X, y, groups = indexable(X, y, groups) 
    89   indices = np.arange(_num_samples(X)) 
---> 90   for test_index in self._iter_test_masks(X, y, groups): 
    91    train_index = indices[np.logical_not(test_index)] 
    92    test_index = indices[test_index] 

/Users/xxx/.pyenv/versions/anaconda-2.0.1/lib/python2.7/site-packages/sklearn/model_selection/_split.pyc in _iter_test_masks(self, X, y, groups) 
    770  def _iter_test_masks(self, X, y, groups): 
    771   if groups is None: 
--> 772    raise ValueError("The groups parameter should not be None") 
    773   # We make a copy of groups to avoid side-effects during iteration 
    774   groups = np.array(groups, copy=True) 

ValueError: The groups parameter should not be None 
scores 

답변

2

당신은 당신의 그룹 당신이 당신의 데이터를 분할하려고하는에 따라 그룹 매개 변수를 정의하지 않습니다. 이 없음 같다 귀하의 경우 :

오류는 인수에서이 매개 변수를 cross_val_score에서 온다. 당신은이

from sklearn.model_selection import LeaveOneGroupOut 
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]]) 
y = np.array([1, 2, 1, 2]) 
groups = np.array([1, 1, 2, 2]) 
lol = LeaveOneGroupOut() 

:

[In] lol.get_n_splits(X, y, groups) 
[Out] 2 

이 그럼 당신은 사용할 수 있습니다 : cross_val_score와

lol.split(X, y, groups) 
+0

수는 없습니다 LeaveOneGroupOut 작업을

봅니다 아래의 예를 따르도록? – rkjt50r983

+1

@ rkjt50r983'cv = LeaveOneGroupOut(). split (X, y, groups)'를 정의한 다음'cross_val_score()'에'cv = cv'를 사용하십시오. – Michael