운동량 기반 확률 적 기울기 (SGD) 구현 방법

길쌈 신경 네트워크를 개발하는 데 python 코드 network3.py (http://neuralnetworksanddeeplearning.com/chap6.html)를 사용하고 있습니다. 그렇게하는 방법을 아는 사람이운동량 기반 확률 적 기울기 (SGD) 구현 방법

velocity = momentum_constant * velocity - learning_rate * gradient 
params = params + velocity

있습니까 : 지금은 다음과 같이 모멘텀 학습 규칙을 추가하여 코드를 조금 수정하려면? 특히 속도를 설정하거나 초기화하는 방법은 무엇입니까? 나는 오직 스크래치 (theano를 사용하지 않는)에서 SDG을 코딩 한

def __init__(self, layers, mini_batch_size): 
    """Takes a list of `layers`, describing the network architecture, and 
    a value for the `mini_batch_size` to be used during training 
    by stochastic gradient descent. 

    """ 
    self.layers = layers 
    self.mini_batch_size = mini_batch_size 
    self.params = [param for layer in self.layers for param in layer.params] 
    self.x = T.matrix("x") 
    self.y = T.ivector("y") 
    init_layer = self.layers[0] 
    init_layer.set_inpt(self.x, self.x, self.mini_batch_size) 
    for j in xrange(1, len(self.layers)): 
     prev_layer, layer = self.layers[j-1], self.layers[j] 
     layer.set_inpt(
      prev_layer.output, prev_layer.output_dropout, self.mini_batch_size) 
    self.output = self.layers[-1].output 
    self.output_dropout = self.layers[-1].output_dropout 


def SGD(self, training_data, epochs, mini_batch_size, eta, 
     validation_data, test_data, lmbda=0.0): 
    """Train the network using mini-batch stochastic gradient descent.""" 
    training_x, training_y = training_data 
    validation_x, validation_y = validation_data 
    test_x, test_y = test_data 

    # compute number of minibatches for training, validation and testing 
    num_training_batches = size(training_data)/mini_batch_size 
    num_validation_batches = size(validation_data)/mini_batch_size 
    num_test_batches = size(test_data)/mini_batch_size 

    # define the (regularized) cost function, symbolic gradients, and updates 
    l2_norm_squared = sum([(layer.w**2).sum() for layer in self.layers]) 
    cost = self.layers[-1].cost(self)+\ 
      0.5*lmbda*l2_norm_squared/num_training_batches 
    grads = T.grad(cost, self.params) 
    updates = [(param, param-eta*grad) 
       for param, grad in zip(self.params, grads)] 

    # define functions to train a mini-batch, and to compute the 
    # accuracy in validation and test mini-batches. 
    i = T.lscalar() # mini-batch index 
    train_mb = theano.function(
     [i], cost, updates=updates, 
     givens={ 
      self.x: 
      training_x[i*self.mini_batch_size: (i+1)*self.mini_batch_size], 
      self.y: 
      training_y[i*self.mini_batch_size: (i+1)*self.mini_batch_size] 
     }) 
    validate_mb_accuracy = theano.function(
     [i], self.layers[-1].accuracy(self.y), 
     givens={ 
      self.x: 
      validation_x[i*self.mini_batch_size: (i+1)*self.mini_batch_size], 
      self.y: 
      validation_y[i*self.mini_batch_size: (i+1)*self.mini_batch_size] 
     }) 
    test_mb_accuracy = theano.function(
     [i], self.layers[-1].accuracy(self.y), 
     givens={ 
      self.x: 
      test_x[i*self.mini_batch_size: (i+1)*self.mini_batch_size], 
      self.y: 
      test_y[i*self.mini_batch_size: (i+1)*self.mini_batch_size] 
     }) 
    self.test_mb_predictions = theano.function(
     [i], self.layers[-1].y_out, 
     givens={ 
      self.x: 
      test_x[i*self.mini_batch_size: (i+1)*self.mini_batch_size] 
     }) 
    # Do the actual training 
    best_validation_accuracy = 0.0 
    for epoch in xrange(epochs): 
     for minibatch_index in xrange(num_training_batches): 
      iteration = num_training_batches*epoch+minibatch_index 
      if iteration % 1000 == 0: 
       print("Training mini-batch number {0}".format(iteration)) 
      cost_ij = train_mb(minibatch_index) 
      if (iteration+1) % num_training_batches == 0: 
       validation_accuracy = np.mean(
        [validate_mb_accuracy(j) for j in xrange(num_validation_batches)]) 
       print("Epoch {0}: validation accuracy {1:.2%}".format(
        epoch, validation_accuracy)) 
       if validation_accuracy >= best_validation_accuracy: 
        print("This is the best validation accuracy to date.") 
        best_validation_accuracy = validation_accuracy 
        best_iteration = iteration 
        if test_data: 
         test_accuracy = np.mean(
          [test_mb_accuracy(j) for j in xrange(num_test_batches)]) 
         print('The corresponding test accuracy is {0:.2%}'.format(
          test_accuracy))

출처

2016-10-04 jingweimo

하지만

1) 무리와 함께 velocities를 시작하려면 사용자 코드에서 판단이 필요합니다 나는 아래 SGD에 대한 코드를 게시 (그라디언트 당 하나),

2) 업데이트에 속도가 포함됩니다.

updates = [(param, param-eta*grad +momentum_constant*vel) 
      for param, grad, vel in zip(self.params, grads, velocities)]

3)과 같이 당신이 velocities를 업데이트 할 수 있도록 각 반복에 그라디언트를 반환하는 훈련 기능을 수정.

출처

2016-10-04 15:15:14 Paul

답변에 1)과 3)의 코드를 쓸 수 있습니까? 나는 파이썬의 초보자에 불과하다. theano.function은 당신이 언급하는 훈련 기능입니까? 감사합니다! – jingweimo

귀하의 교육 기능은 theano 기능이지만 모든 theano 기능은 교육 기능이 아닙니다. 예를 들어 게시 한 코드에서 3 가지 인스턴스가 있습니다. 이것은 가혹한 것처럼 보일지도 모르지만, 만약 당신이 0으로 배열을 시작하는 법을 모른다면 (1 번), 잠시 동안 기계 학습을 잊어 버리고, b) numpy로 과학 프로그래밍을위한 튜토리얼을 수행해야합니다. theano에도 동일하게 적용됩니다. 다른 모든 것은 검은 마법 (당신을 위해)이 될 것입니다. – Paul

이것이 나의 수업 과제입니다. 나는 계속 가기 위해 열심히 노력해야한다! – jingweimo

운동량 기반 확률 적 기울기 (SGD) 구현 방법

답변

관련 문제