2017-11-20 7 views
0

tensorflow 개체 검색 api를 실행하려고합니다. labelImg를 사용하여 1 클래스의 데이터 세트를 만든 다음 xml을 tfrecord 파일로 변환합니다. 일부 정보 :Nan의 요약 히스토그램 : FirstStageFeatureExtractor

os: Ubuntu 16.04 
gpu: nvidia geforce 1080Ti & 1060 
tensorflow version: 1.3.0 
training model: faster_rcnn_resnet101_coco (although I have tried others) 
Classes: 1 

나는 train.py을 실행하고 교육을 시작합니다.

INFO:tensorflow:global step 363: loss = 1.4006 (0.294 sec/step) 
INFO:tensorflow:Finished training! Saving model to disk. 
Traceback (most recent call last): 
    File "object_detection/train.py", line 163, in <module> 
    tf.app.run() 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run 
    _sys.exit(main(_sys.argv[:1] + flags_passthrough)) 
    File "object_detection/train.py", line 159, in main 
    worker_job_name, is_chief, FLAGS.train_dir) 
    File "/home/ucfadng/tensorflow/models/research/object_detection/trainer.py", line 332, in train 
    saver=saver) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 767, in train 
    sv.stop(threads, close_summary_writer=True) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 792, in stop 
    stop_grace_period_secs=self._stop_grace_secs) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join 
    six.reraise(*self._exc_info_to_raise) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 296, in stop_on_exception 
    yield 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 494, in run 
    self.run_loop() 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 994, in run_loop 
    self._sv.global_step]) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 895, in run 
    run_metadata_ptr) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1124, in _run 
    feed_dict_tensor, options, run_metadata) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run 
    options, run_metadata) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call 
    raise type(e)(node_def, op, message) 
tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: SecondStageFeatureExtractor/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma_1 
     [[Node: SecondStageFeatureExtractor/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma_1 = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](SecondStageFeatureExtractor/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma_1/tag, SecondStageFeatureExtractor/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma/read)]] 
     [[Node: Loss/RPNLoss/map/TensorArray_2/_1353 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_5239_Loss/RPNLoss/map/TensorArray_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]] 

Caused by op u'SecondStageFeatureExtractor/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma_1', defined at: 
    File "object_detection/train.py", line 163, in <module> 
    tf.app.run() 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run 
    _sys.exit(main(_sys.argv[:1] + flags_passthrough)) 
    File "object_detection/train.py", line 159, in main 
    worker_job_name, is_chief, FLAGS.train_dir) 
    File "/home/ucfadng/tensorflow/models/research/object_detection/trainer.py", line 295, in train 
    global_summaries.add(tf.summary.histogram(model_var.op.name, model_var)) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/summary/summary.py", line 192, in histogram 
    tag=tag, values=values, name=scope) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_logging_ops.py", line 129, in _histogram_summary 
    name=name) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op 
    op_def=op_def) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2630, in create_op 
    original_op=self._default_original_op, op_def=op_def) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1204, in __init__ 
    self._traceback = self._graph._extract_stack() # pylint: disable=protected-access 

InvalidArgumentError (see above for traceback): Nan in summary histogram for: SecondStageFeatureExtractor/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma_1 
     [[Node: SecondStageFeatureExtractor/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma_1 = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](SecondStageFeatureExtractor/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma_1/tag, SecondStageFeatureExtractor/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma/read)]] 
     [[Node: Loss/RPNLoss/map/TensorArray_2/_1353 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_5239_Loss/RPNLoss/map/TensorArray_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]] 

내 라벨 맵은 다음과 같습니다 :

이 사용되는 파이프 라인에서 다운로드
item { 
    id: 1 
    name: 'rail' 
} 

: https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/faster_rcnn_resnet101_coco.config

은 내가 TFRecord 살펴 있었다 내가 도착하면 ~ 단계 (355) 나는 오류 파일을 찾고 예상대로 보면 이것이 초기 데이터 세트와 관련이 있다고 생각할 수 있습니다. 그러나 구체적으로이 오류의 원인은 무엇입니까?

내 데이터 세트는 철도 트랙이 포함 된 250 개의 선명한 이미지로 구성됩니다. 각 이미지에 ~ 20/30 개의 레이블이 지정된 객체가 있도록 트랙의 섹션에 레이블을 지정했습니다.

지금까지 시도한 주요한 점은 학습 속도를 낮추고 배치 크기를 변경하는 것이지만 문제를 해결하지 못했습니다.

이 문제를 해결할 수있는 도움이 있으면 매우 감사하겠습니다.

건배

답변

0

지금 version: 1.4.0에 업데이트되지 않은이 더 이상 문제입니다. 나는 다른 것을 바꾸지 않았다. 아마도 이것은 version: 1.3.0의 버그 였을 것입니다. 나는 누군가에게 똑같은 문제가있는 경우를 대비해서 이것을 남겨 둘 것이다.