tensorflow CNN 예제 훈련 정확도가 대용량 일 때 예기치 않게 1에서 0.06으로 떨어집니다.

훈련 정확도는 26700 회 반복 후 예기치 않게 1에서 0.06으로 떨어집니다. 이 코드는 tensorflow의 온라인 문서에서 가져온 것이므로 필터 크기를 5x5에서 3x3으로, 반복을 20000에서 100000으로, 배치 크기를 50에서 100으로 간단히 수정했습니다. 어떤 신체에서 이것을 설명 할 수 있습니까? ~~AdamOptimizer와 관련이있을 수 있습니다. GradientDesentOptimizer로 변경하면 56200 반복이 발생하지 않기 때문입니다. 확실하지 않습니다.~~ GradientDesentOptimizer에도이 문제가 있습니다.tensorflow CNN 예제 훈련 정확도가 대용량 일 때 예기치 않게 1에서 0.06으로 떨어집니다.

step 26400, training accuracy 1, loss 0.00202696 
step 26500, training accuracy 1, loss 0.0750173 
step 26600, training accuracy 1, loss 0.0790716 
step 26700, training accuracy 1, loss 0.0136688 
step 26800, training accuracy 0.06, loss nan 
step 26900, training accuracy 0.03, loss nan 
step 27000, training accuracy 0.12, loss nan 
step 27100, training accuracy 0.08, loss nan

파이썬 코드 :

import tensorflow as tf 
from tensorflow.examples.tutorials.mnist import input_data 

def weight_varible(shape): 
    initial = tf.truncated_normal(shape, stddev=0.1) 
    return tf.Variable(initial) 

def bias_variable(shape): 
    initial = tf.constant(0.1, shape=shape) 
    return tf.Variable(initial) 

def conv2d(x, W): 
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME') 

def max_pool_2x2(x): 
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') 


mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) 
print("Download Done!") 

sess = tf.InteractiveSession() 

# paras 
W_conv1 = weight_varible([3, 3, 1, 32]) 
b_conv1 = bias_variable([32]) 

# conv layer-1 
x = tf.placeholder(tf.float32, [None, 784]) 
x_image = tf.reshape(x, [-1, 28, 28, 1]) 

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1) 
h_pool1 = max_pool_2x2(h_conv1) 

# conv layer-2 
W_conv2 = weight_varible([3, 3, 32, 64]) 
b_conv2 = bias_variable([64]) 

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2) 
h_pool2 = max_pool_2x2(h_conv2) 

# full connection 
W_fc1 = weight_varible([7 * 7 * 64, 1204]) 
b_fc1 = bias_variable([1204]) 

h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64]) 
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1) 

# dropout 
keep_prob = tf.placeholder(tf.float32) 
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob) 

# output layer: softmax 
W_fc2 = weight_varible([1204, 10]) 
b_fc2 = bias_variable([10]) 

y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2) 
y_ = tf.placeholder(tf.float32, [None, 10]) 

# model training 
cross_entropy = -tf.reduce_sum(y_ * tf.log(y_conv)) 
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy) 

correct_prediction = tf.equal(tf.arg_max(y_conv, 1), tf.arg_max(y_, 1)) 
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 

saver = tf.train.Saver() 
sess.run(tf.initialize_all_variables()) 
for i in range(100000): 
    batch = mnist.train.next_batch(100) 

    if i % 10 == 0: 
     train_accuacy = accuracy.eval(feed_dict={x: batch[0], y_: batch[1], keep_prob: 1.0}) 
     train_cross_entropy = cross_entropy.eval(feed_dict={x: batch[0], y_: batch[1], keep_prob: 1.0}) 
     print("step %d, training accuracy %g, loss %g"%(i, train_accuacy, train_cross_entropy)) 
    train_step.run(feed_dict = {x: batch[0], y_: batch[1], keep_prob: 0.5}) 

# accuacy on test 
save_path = saver.save(sess, "./mnist.model") 
#saver.restore(sess,"./mnist.model") 
print("Model saved in file: %s" % save_path) 
print("test accuracy %g"%(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})))

출처

2016-08-26 lariven

가 실제로 그냥 어디 유모 모든 밖으로는 것 최적화 점 이후에 훈련 된 CNN과이 문제에 달렸다. 제가 생각하기에 비용 함수의 로그에 숫자 안정성 문제가 있습니다. 네트워크가 높은 신뢰를 가지고 예측을 시작하면 (네트워크가 열차에 따라이 문제가 발생할 가능성이 높아짐에 따라 y_conv 벡터는 y_conv = [1, 0] (일괄 처리 무시)처럼 보일 것입니다. 이것은 log(y_conv) = log([1, 0]) = [0, -inf]의 로그를 의미합니다. [1, 0]도 올바르므로 y_ * tf.log(y_conv)을 할 때 0과 무한대를 곱하는 방법을 모르기 때문에 [1, 0] * [0, -inf] = [0, nan]을 실제로 수행하고 있다고 가정 해보십시오. 이러한 비용을 추가하면 비용이 많이 든다.처럼 로그에 작은 일화를 추가하여 문제를 해결할 수 있다고 생각합니다. 숫자 문제를 해결하는 것 같아 tf.nn.sparse_softmax_cross_entropy_with_logits(...)을 사용하여 내 문제를 해결 한 것 같습니다.

출처

2016-08-26 10:30:41 chasep255

xx. 1.softmax_cross_entropy_with_logits()는 잘 작동합니다. 50000 반복 후에 나는 99.3 %의 시험 정확도를 얻었다. 테스트 세트의 정확도를 계산하는 동안 메모리 비용이 너무 많이 듭니다 (약 5GB). – lariven

tensorflow CNN 예제 훈련 정확도가 대용량 일 때 예기치 않게 1에서 0.06으로 떨어집니다.

답변

관련 문제