tensorflow 개체 검색 api를 실행하려고합니다. labelImg를 사용하여 1 클래스의 데이터 세트를 만든 다음 xml을 tfrecord 파일로 변환합니다. 일부 정보 :Nan의 요약 히스토그램 : FirstStageFeatureExtractor
os: Ubuntu 16.04
gpu: nvidia geforce 1080Ti & 1060
tensorflow version: 1.3.0
training model: faster_rcnn_resnet101_coco (although I have tried others)
Classes: 1
나는 train.py
을 실행하고 교육을 시작합니다.
INFO:tensorflow:global step 363: loss = 1.4006 (0.294 sec/step)
INFO:tensorflow:Finished training! Saving model to disk.
Traceback (most recent call last):
File "object_detection/train.py", line 163, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/train.py", line 159, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "/home/ucfadng/tensorflow/models/research/object_detection/trainer.py", line 332, in train
saver=saver)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 767, in train
sv.stop(threads, close_summary_writer=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 792, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 296, in stop_on_exception
yield
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 494, in run
self.run_loop()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 994, in run_loop
self._sv.global_step])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run
options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: SecondStageFeatureExtractor/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma_1
[[Node: SecondStageFeatureExtractor/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma_1 = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](SecondStageFeatureExtractor/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma_1/tag, SecondStageFeatureExtractor/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma/read)]]
[[Node: Loss/RPNLoss/map/TensorArray_2/_1353 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_5239_Loss/RPNLoss/map/TensorArray_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Caused by op u'SecondStageFeatureExtractor/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma_1', defined at:
File "object_detection/train.py", line 163, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/train.py", line 159, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "/home/ucfadng/tensorflow/models/research/object_detection/trainer.py", line 295, in train
global_summaries.add(tf.summary.histogram(model_var.op.name, model_var))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/summary/summary.py", line 192, in histogram
tag=tag, values=values, name=scope)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_logging_ops.py", line 129, in _histogram_summary
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1204, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Nan in summary histogram for: SecondStageFeatureExtractor/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma_1
[[Node: SecondStageFeatureExtractor/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma_1 = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](SecondStageFeatureExtractor/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma_1/tag, SecondStageFeatureExtractor/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma/read)]]
[[Node: Loss/RPNLoss/map/TensorArray_2/_1353 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_5239_Loss/RPNLoss/map/TensorArray_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
내 라벨 맵은 다음과 같습니다 :
이 사용되는 파이프 라인에서 다운로드item {
id: 1
name: 'rail'
}
은 내가 TFRecord 살펴 있었다 내가 도착하면 ~ 단계 (355) 나는 오류 파일을 찾고 예상대로 보면 이것이 초기 데이터 세트와 관련이 있다고 생각할 수 있습니다. 그러나 구체적으로이 오류의 원인은 무엇입니까?
내 데이터 세트는 철도 트랙이 포함 된 250 개의 선명한 이미지로 구성됩니다. 각 이미지에 ~ 20/30 개의 레이블이 지정된 객체가 있도록 트랙의 섹션에 레이블을 지정했습니다.
지금까지 시도한 주요한 점은 학습 속도를 낮추고 배치 크기를 변경하는 것이지만 문제를 해결하지 못했습니다.
이 문제를 해결할 수있는 도움이 있으면 매우 감사하겠습니다.
건배