2017-12-21 2 views
0

저는 이미지 분류를위한 CNN 모델을 개발하려고했습니다. 샘 아브라함Tensorflow CNN 모델은 항상 같은 클래스를 예측합니다

에 의해 tensorflow 및 기계 지능에 대한 다음과 같은 책에서 도움을 받고

  • Learning.TensorFlow.A.Guide.to.Building.Deep.Learning.Systems

  • TensorFlow를 새로운 오전

지난 몇 주 동안 저는 좋은 모델을 개발하기 위해 노력했지만 항상 동일한 예측을합니다. 나는 많은 다른 아키텍처를 시도했지만 운이 없다!

요즘 나는 CIFAR-10 데이터 세트로 모델을 테스트하고 Learning Tensorflow 서적에 제공된 것과 동일한 모델을 사용하기로 결정했습니다. 그러나 결과는 50K 단계를위한 교육 후에도 동일했습니다 (모든 이미지에 대해 동일한 등급).

다음은 내 모델 및 코드의 주요 내용입니다.

1) 1. 다운로드 된 CIFAR-10 이미지 세트는 교육 및 테스트 세트별로 레이블이있는 tfrecord 파일 (레이블은 tfrecord 파일의 CIFAR-10 각 범주에 대한 문자열)로 변환됩니다.

2) tfrecord 파일로부터 이미지를 판독하고, 특정의 카테고리 0-9 각각으로부터 integer32 형식 문자열 라벨 변환)

3 크기 (100)의 랜덤 셔플 일괄 생성

4) 교육 및 테스트 배치를 네트워크로 전달하고 [batch_size, num_class] 크기의 출력을 얻습니다.

5) 아담 최적화 및 softmax를 교차 엔트로피 손실 함수를 (사용하여 모델을 훈련은 그라데이션 최적화뿐만 아니라)

7) 전 교육 후 시험 배치의 모델을 평가를 시도.

8) 전체 데이터 세트 (그러나 내가 여기서 뭘하고 뭔가 잘못

거기에) 다시 시도하는 코드를 다시 실행 다른 모든 시간에 대해 동일한 예측을 얻기? 누군가가이 문제로 나를 도울 수 있다면 고맙겠습니다.

주 - 이미지와 레이블을 tfrecord로 변환하는 방식이 특이 할 수도 있지만 이전에 언급 한 책에서이 아이디어를 생각해 냈습니다.

문제에 대한 나의 코드 :

  • Here 화면입니다 :

    import tensorflow as tf 
    import numpy as np 
    import _datetime as dt 
    import PIL 
    
    # The glob module allows directory listing 
    import glob 
    import random 
    
    from itertools import groupby 
    from collections import defaultdict 
    
    H , W = 32 , 32  # Height and weight of the image 
    C = 3     # Number of channels 
    
    
    sessInt = tf.InteractiveSession() 
    
    # Read file and return the batches of the input data 
    def get_Batches_From_TFrecord(tf_record_filenames_list, batch_size): 
        # Match and load all the tfrecords found in the specified directory 
        tf_record_filename_queue = tf.train.string_input_producer(tf_record_filenames_list) 
    
        # It may have more than one example in them. 
        tf_record_reader = tf.TFRecordReader() 
        tf_image_name, tf_record_serialized = tf_record_reader.read(tf_record_filename_queue) 
    
        # The label and image are stored as bytes but could be stored as int64 or float64 values in a 
        # serialized tf.Example protobuf. 
        tf_record_features = tf.parse_single_example(tf_record_serialized, 
                   features={'label': tf.FixedLenFeature([], tf.string), 
                      'image': tf.FixedLenFeature([], tf.string), }) 
    
        # Using tf.uint8 because all of the channel information is between 0-255 
        tf_record_image = tf.decode_raw(tf_record_features['image'], tf.uint8) 
    
        try: 
         # Reshape the image to look like the input image 
         tf_record_image = tf.reshape(tf_record_image, [H, W, C]) 
    
        except: 
         print(tf_image_name) 
    
        tf_record_label = tf.cast(tf_record_features['label'], tf.string) 
    
        ''' 
        #Check the image and label 
    
        coord = tf.train.Coordinator() 
        threads = tf.train.start_queue_runners(sess=sessInt, coord=coord) 
    
        label = tf_record_label.eval().decode() 
        print(label) 
    
        image = PIL.Image.fromarray(tf_record_image.eval()) 
        image.show() 
    
        coord.request_stop() 
        coord.join(threads) 
        ''' 
    
        # creating a batch to feed the data 
    
        min_after_dequeue = 10 * batch_size 
        capacity = min_after_dequeue + 5 * batch_size 
    
        # Shuffle examples while feeding in the queue 
        image_batch, label_batch = tf.train.shuffle_batch([tf_record_image, tf_record_label], batch_size=batch_size, 
                     capacity=capacity, min_after_dequeue=min_after_dequeue) 
    
        # Sequential feed in the examples in the queue (Don't shuffle) 
        # image_batch, label_batch = tf.train.batch([tf_record_image, tf_record_label], batch_size=batch_size, capacity=capacity) 
    
        # Converting the images to a float to match the expected input to convolution2d 
        float_image_batch = tf.image.convert_image_dtype(image_batch, tf.float32) 
    
        string_label_batch = label_batch 
    
        return float_image_batch, string_label_batch 
    
    #Count the number of images in the tfrecord file 
    
    def number_of_records(tfrecord_file_name): 
        count = 0 
        record_iterator = tf.python_io.tf_record_iterator(path = tfrecord_file_name) 
        for record in record_iterator: 
         count+=1 
    
        return count 
    
    def get_num_of_samples(tfrecords_list): 
        total_samples = 0 
        for tfrecord in tfrecords_list: 
         total_samples += number_of_records(tfrecord) 
    
        return total_samples 
    
    # Provide the input tfrecord names in a list 
    train_filenames = ["./TFRecords/cifar_train.tfrecord"] 
    test_filename = ["./TFRecords/cifar_test.tfrecord"] 
    
    num_train_samples = get_num_of_samples(train_filenames) 
    num_test_samples = get_num_of_samples(test_filename) 
    
    
    print("Number of Training samples: ", num_train_samples) 
    print("Number of Test samples: ", num_test_samples) 
    
    
    ''' 
    IMP Note : (Batch_size * Training_Steps) should be at least greater than (2*Number_of_samples) for shuffling of batches 
    
    ''' 
    train_batch_size = 100 
    
    # Total number of batches for input records 
    # Note - Num of samples in the tfrecord file can be determined by the tfrecord iterator. 
    
    # Batch size for test samples 
    test_batch_size = 50 
    
    train_image_batch, train_label_batch = get_Batches_From_TFrecord(train_filenames, train_batch_size) 
    test_image_batch, test_label_batch = get_Batches_From_TFrecord(test_filename, test_batch_size) 
    
    
    # Definition of the convolution network which returns a single neuron for each input image in the batch 
    
    
    # Define a placeholder for keep probability in dropout 
    # (Dropout should only use while training, for testing dropout should be always 1.0) 
    
    fc_prob = tf.placeholder(tf.float32) 
    conv_prob = tf.placeholder(tf.float32) 
    
    #Helper function to add learned filters(images) into tensorboard summary - for a random input in the batch 
    def add_filter_summary(name, filter_tensor): 
    
        rand_idx = random.randint(0,filter_tensor.get_shape()[0]-1) #Choose any random number from[0,batch_size) 
    
        #dispay_filter = filter_tensor[random.randint(0,filter_tensor.get_shape()[3])] 
    
        dispay_filter = filter_tensor[5]  #keeping the index fix for consistency in visualization 
    
        with tf.name_scope("Filter_Summaries"): 
         img_summary = tf.summary.image(name, tf.reshape(dispay_filter,[-1 , filter_tensor.get_shape()[1],filter_tensor.get_shape()[1],1]), max_outputs = 500) 
    
    
    # Helper functions for the network 
    
    def weight_initializer(shape): 
        weights = tf.truncated_normal(shape, stddev=0.1) 
        return tf.Variable(weights) 
    
    
    def bias_initializer(shape): 
        biases = tf.constant(0.1, shape=shape) 
        return tf.Variable(biases) 
    
    
    def conv2d(input, weights, stride): 
        return tf.nn.conv2d(input, filter=weights, strides=[1, stride, stride, 1], padding="SAME") 
    
    
    def pool_layer(input, window_size=2 , stride=2): 
        return tf.nn.max_pool(input, ksize=[1, window_size, window_size, 1], strides=[1, stride, stride, 1], padding='VALID') 
    
    
    # This is the actual layer we will use. 
    # Linear convolution as defined in conv2d, with a bias, 
    # followed by the ReLU nonlinearity. 
    def conv_layer(input, filter_shape , stride=1): 
        W = weight_initializer(filter_shape) 
        b = bias_initializer([filter_shape[3]]) 
        return tf.nn.relu(conv2d(input, W, stride) + b) 
    
    
    # A standard full layer with a bias. Notice that here we didn’t add the ReLU. 
    # This allows us to use the same layer for the final output, 
    # where we don’t need the nonlinear part. 
    def full_layer(input, out_size): 
        in_size = int(input.get_shape()[1]) 
        W = weight_initializer([in_size, out_size]) 
        b = bias_initializer([out_size]) 
        return tf.matmul(input, W) + b 
    
    ## Model fro the book learning tensorflow - for CIFAR data 
    
    def conv_network(image_batch, batch_size): 
        # Now create the model which returns the output neurons (eequals to the number of labels) 
        # as a final fully connecetd layer output. Which we can use as input to the softmax classifier 
    
        C1 , C2 , C3 = 30 , 50, 80  # Number of output features for each convolution layer 
        F1 = 500      # Number of output neuron for FC1 layer 
    
        #Add original image to tensorboard summary 
    
        add_filter_summary("Original" , image_batch) 
    
        # First convolutaion layer with 5x5 filter size and 32 filters 
        conv1 = conv_layer(image_batch, filter_shape=[3, 3, C, C1]) 
        pool1 = pool_layer(conv1, window_size=2) 
    
        pool1 = tf.nn.dropout(pool1, keep_prob=conv_prob) 
    
        add_filter_summary("conv1" , pool1) 
    
        # Second convolutaion layer with 5x5 filter_size and 64 filters 
        conv2 = conv_layer(pool1, filter_shape=[5, 5, C1, C2]) 
        pool2 = pool_layer(conv2, 2) 
        pool2 = tf.nn.dropout(pool2, keep_prob=conv_prob) 
    
        add_filter_summary("conv2" , pool2) 
    
        # Third convolution layer 
    
        conv3 = conv_layer(pool2, filter_shape=[5, 5, C2, C3]) 
    
        # Since at this point the feature maps are of size 8×8 (following the first two poolings 
        # that each reduced the 32×32 pictures by half on each axis). 
        # This last pool layer pools each of the feature maps and keeps only the maximal value. 
        # The number of feature maps at the third block was set to 80, 
        # so at that point (following the max pooling) the representation is reduced to only 80 numbers 
    
    
        pool3 = pool_layer(conv3, window_size = 8 , stride=8) 
        pool3 = tf.nn.dropout(pool3, keep_prob=conv_prob) 
    
        add_filter_summary("conv3" , pool3) 
    
        # Reshape the output to feed to the FC layer 
        flatterned_layer = tf.reshape(pool3, [batch_size, 
                  -1]) # -1 is to specify to use all the dimensions remaining in the input (other than batch_size).reshape(input ,) 
    
        fc1 = tf.nn.relu(full_layer(flatterned_layer, F1)) 
    
        full1_drop = tf.nn.dropout(fc1, keep_prob=fc_prob) 
    
        # Fully connected layer 2 (output layer) 
        final_Output = full_layer(full1_drop, 10) 
    
        return final_Output, tf.summary.merge_all() 
    
    # Now that architecture is created , next step is to create the classification model 
    # (to predict the output class of the input data) 
    # Here we have used Logistic regression (Sigmoid function) to predict the output because we have only rwo class. 
    # For multiple class problem - softmax is the best prediction function 
    
    
    # Prepare the inputs to the input 
    Train_X , img_summary = conv_network(train_image_batch, train_batch_size) 
    Test_X , _ = conv_network(test_image_batch, test_batch_size) 
    
    # Generate 0 based index for labels 
    Train_Y = tf.to_int32(tf.argmax(
        tf.to_int32(tf.stack([tf.equal(train_label_batch, ["airplane"]), tf.equal(train_label_batch, ["automobile"]), 
              tf.equal(train_label_batch, ["bird"]),tf.equal(train_label_batch, ["cat"]), 
              tf.equal(train_label_batch, ["deer"]),tf.equal(train_label_batch, ["dog"]), 
              tf.equal(train_label_batch, ["frog"]),tf.equal(train_label_batch, ["horse"]), 
              tf.equal(train_label_batch, ["ship"]), tf.equal(train_label_batch, ["truck"]) ])), 0)) 
    
    Test_Y = tf.to_int32(tf.argmax(
         tf.to_int32(tf.stack([tf.equal(test_label_batch, ["airplane"]), tf.equal(test_label_batch, ["automobile"]), 
              tf.equal(test_label_batch, ["bird"]),tf.equal(test_label_batch, ["cat"]), 
              tf.equal(test_label_batch, ["deer"]),tf.equal(test_label_batch, ["dog"]), 
              tf.equal(test_label_batch, ["frog"]),tf.equal(test_label_batch, ["horse"]), 
              tf.equal(test_label_batch, ["ship"]), tf.equal(test_label_batch, ["truck"]) ])), 0)) 
    
    
    # Y = tf.reshape(float_label_batch, X.get_shape()) 
    
    
    # compute inference model over data X and return the result 
    # (using sigmoid function - as this function is the best to predict two class output) 
    # (For multiclass problem - Softmax is the bset prediction function) 
    def inference(X): 
        return tf.nn.softmax(X) 
    
    
    # compute loss over training data X and expected outputs Y 
    # Cross entropy function is the best suited for loss calculation (Than the squared error function) 
    
    # Get the second column of the input to get only the features 
    
    def loss(X, Y): 
        return tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=X, labels=Y)) 
    
    
    # train/adjust model parameters according to computed total loss (using gradient descent) 
    def train(total_loss, learning_rate): 
        return tf.train.AdamOptimizer(learning_rate).minimize(total_loss) 
    
    
    # evaluate the resulting trained model with dropout probability (Ideally 1.0 for testing) 
    def evaluate(sess, X, Y, dropout_prob): 
        # predicted = tf.cast(inference(X) > 0.5 , tf.float32) 
    
        #print("\nNetwork output:") 
        #print(sess.run(inference(X) , feed_dict={conv_prob:1.0 , fc_prob:1.0})) 
    
        # Inference contains the predicted probability of each class for each input image. 
        # The class having higher probability is the prediction of the network. y_pred_cls = tf.argmax(y_pred, dimension=1) 
        predicted = tf.cast(tf.argmax(X, 1), tf.int32) 
    
        #print("\npredicted labels:") 
        #print(sess.run(predicted , feed_dict={conv_prob:1.0 , fc_prob:1.0})) 
        #print("\nTrue Labels:") 
        #print(sess.run(Y , feed_dict={conv_prob:1.0 , fc_prob:1.0})) 
    
        batch_accuracy = tf.reduce_mean(tf.cast(tf.equal(predicted, Y), tf.float32)) 
    
        # calculate the mean of the accuracies of the each batch (iteration) 
        # No. of iteration Iteration should cover the (test_batch_size * num_of_iteration) >= (2* num_of_test_samples) condition 
        total_accuracy = np.mean([sess.run(batch_accuracy, feed_dict={conv_prob:1.0 , fc_prob:1.0}) for i in range(250)]) 
    
        print("Accuracy of the model(in %): {:.4f} ".format(100 * total_accuracy)) 
    
    # create a saver class to save the training checkpoints 
    saver = tf.train.Saver(max_to_keep=10) 
    
    # Create tensorboard sumamry for loss function 
    with tf.name_scope("summaries"): 
        loss_summary = tf.summary.scalar("loss", loss(Train_X, Train_Y)) 
    
    #merged = tf.summary.merge_all() 
    
    # Launch the graph in a session, setup boilerplate 
    with tf.Session() as sess: 
        log_writer = tf.summary.FileWriter('./logs', sess.graph) 
    
        total_loss = loss(Train_X, Train_Y) 
    
        train_op = train(total_loss, 0.001) 
    
        #Initialise all variables after defining all variables 
        tf.global_variables_initializer().run() 
    
        coord = tf.train.Coordinator() 
        threads = tf.train.start_queue_runners(sess=sess, coord=coord) 
    
        print(sess.run(Train_Y)) 
        print(sess.run(Test_Y)) 
    
        evaluate(sess, Test_X, Test_Y,1.0) 
    
    
        # actual training loop------------------------------------------------------ 
        training_steps = 50000 
        print("\nStarting to train model with", str(training_steps), " steps...") 
        to1 = dt.datetime.now() 
    
        for step in range(1, training_steps + 1): 
    
         # print(sess.run(train_label_batch)) 
         sess.run([train_op], feed_dict={fc_prob: 0.5 , conv_prob:0.8}) # Pass the dropout value for training batch to the placeholder 
    
         # for debugging and learning purposes, see how the loss gets decremented thru training steps 
    
         if step % 100 == 0: 
          # print("\n") 
          # print(sess.run(train_label_batch)) 
          loss_summaries, img_summaries , Tloss = sess.run([loss_summary, img_summary, total_loss], 
                 feed_dict={fc_prob: 0.5 , conv_prob:0.8}) # evaluate total loss to add it in summary object 
          log_writer.add_summary(loss_summaries, step) # add summary for each step 
          log_writer.add_summary(img_summaries, step) 
          print("Step:", step, " , loss: ", Tloss) 
    
         if step%2000 == 0: 
          saver.save(sess, "./Models/BookLT_CIFAR", global_step=step, latest_filename="model_chkpoint") 
          print("\n") 
          evaluate(sess, Test_X, Test_Y,1.0) 
    
        saver.save(sess, "./Models/BookLT_CIFAR", global_step=step, latest_filename="model_chkpoint") 
        to2 = dt.datetime.now() 
        print("\nTotal Trainig time Elapsed: ", str(to2 - to1)) 
    
    
        # once the training is complete, evaluate the model with test (validation set)------------------------------------------- 
    
        # Restore the model file and perform the testing 
        #saver.restore(sess, "./Models/BookLT3_CIFAR-15000") 
    
        print("\nPost Training....") 
    
        # Performs Evaluation of model on batches of test samples 
        # In order to evaluate entire test set , number of iteration should be chosen such that , 
        # (test_batch_size * num_of_iteration) >= (2* num_of_test_samples) 
    
        evaluate(sess, Test_X, Test_Y,1.0) # Evaluate multiple batch of test data set (randomly chosen by shuffle train batch queue) 
        evaluate(sess, Test_X, Test_Y,1.0) 
        evaluate(sess, Test_X, Test_Y,1.0) 
    
        coord.request_stop() 
        coord.join(threads) 
        sess.close() 
    
    • Here 내 사전 교육 결과의 스크린 샷입니다 훈련 도중 결과 총 :

    • Here, 나는이 유일한 문제가 있는지 확인하는 코드를 실행하지 않았다

  • 답변

    0

    포스트 훈련 결과의 스크린 샷이지만, 여기 중요한 문제가 하나 있습니다. 분류 할 때는 라벨에 일회용 인코딩을 사용해야합니다. 클래스가 3 개있는 경우 클래스 1의 경우 [1, 0, 0], 클래스 2의 경우 [0, 1, 0], 클래스 3의 경우 [0, 0, 1]이됩니다. 레이블로 1, 2 및 3을 사용하면 다양한 문제가 발생합니다. 예를 들어 클래스 3의 이미지에 대해 클래스 1을 예측하는 것보다 클래스 2를 예측하는 것이 네트워크에 더 많은 불이익을줍니다. tf.nn.softmax_cross_entropy_with_logits과 같은 TensorFlow 함수는 이러한 표현과 함께 작동합니다. 여기 https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/examples/tutorials/mnist/mnist_softmax.py

    one_hot 라벨이 mnist 숫자에 대해 구성되어 방법은 다음과 같습니다 : 답장을 https://github.com/tensorflow/tensorflow/blob/438604fc885208ee05f9eef2d0f2c630e1360a83/tensorflow/contrib/learn/python/learn/datasets/mnist.py#L69

    +0

    감사

    여기에 제대로 손실을 계산하기 위해 라벨을 one_hot 사용의 기본 예입니다. 나는 라벨을 one-hot 인코딩 값으로 변환 할 것이다. 그 외에도 다른 문제가 될 수있는 것이 무엇이겠습니까? –

    +0

    그것이 내가 발견 할 수있는 유일한 것입니다. – iga

    +0

    감사합니다. 레이블에 대해 하나의 핫 인코딩을 시도했지만 여전히 예상 결과를 얻을 수 없습니다. –

    관련 문제