[TensorFlow] <lagacy> input pipelines / Threading and Queue

input pipeline (guide)

TensorFlow에서 파일을 읽어들이는 효과적인 방법은 input pipeline을 구성하는 것이다.

input pipeline은 다음과 같은 단계로 구성된다.

```python

# step 1

fnames = glob.glob("../sctf_asm/imgs/*")

# step 2 : FIFO queue를 생성하고 filename을 담는다.

# shuffling, epoch limit도 이 메소드가 지원한다.

fname_queue = tf.train.string_input_producer(fnames)

# step 3 : file format에 알맞는 FileReader 설정

reader = tf.WholeFileReader()

fname, content = reader.read(fname_queue)

# step 4 : decode

image = tf.image.decode_png(content, channels=1)

# 여기서 decode_image를 사용하면 더 좋지만 shape=<unknown>이 되어

# ValueError: 'images' contains no shape. 발생

# step 5 : Optional preprocessing ( resize, batch, ... )

image = tf.cast(image, tf.float32)

resized_image = tf.image.resize_images(image, [28, 28])

image_batch = tf.train.batch([resized_image], batch_size=5)

```

이렇게 만들어진 input pipeline은 Queue이기 때문에, QueueRunner를 사용해야 한다.

```python

sess = tf.Session()

coord = tf.train.Coordinator()

threads = tf.train.start_queue_runners(sess=sess, coord=coord)

sess.run(resized_image)

```

Threading and Queue (guide)

QueueRunner (API)

Queues는 multiple threads를 이용해 텐서를 계산하는 TensorFlow mechanism이다.

`` tf.train.QueueRunner``를 이용해 직접 QueueRunner를 생성하고 enqueue, dequeue하는 경우도 있지만 보통은

tf.train.string_input_producer() 같은 메서드를 호출하면 Queue를 반환하며 자동으로 QueueRunner를 현재 그래프에 추가해주기 때문에, 단순히 추가된 QueueRunner를 start 시켜주는 방식으로 사용한다.

```python

tf.train.add_queue_runners() #QueueRunner를 그래프에 추가

tf.train.start_queue_runners() #그래프에 추가된 QueueRunner를 threads로 실행

```

Coordinator

``c Coordinator()``는 여러 thread의 종료를 조정하는데 사용하는 코디네이터를 반환한다.

Queues가 multiple threads에서 돌아가기 때문에 Coordinator를 이용해야 한다.

```python

coord = tf.train.Coordinator()

threads = tf.train.start_queue_runners(sess=sess, coord=coord)

sess.run(...)

#ask(request) for all the threads to stop

coord.request_stop()

#wait for all the threads to terminate.

coord.join(threads)

```

thread를 생성할 때 코디네이터 coord를 넣으면, 이 코디네이터와 연결된 thread들의 종료를 한꺼번에 제어할 수 있다.

종료 그룹같은 거라고 생각하면 된다.

* ``py coord.request_stop()``을 호출하면 각 threads에 stop을 요청하게 되며,

이 메서드가 호출되고 나면 각 thread의 ``py coord.should_stop()``이 ``py True``를 반환하게 되므로

다른 thread에서 request_stop이 있었는지는 should_stop을 사용하면 체크할 수 있다.

모델에 넘기는 데이터 타입

one_hot vector로 지정하고 안하고는 `` label``에만 해당한다. `` image``는 지정하든 안하든 무조건 ``py numpy.ndarray``다.

```python

>>> mnist

Datasets(train=...

>>> type(mnist.test.images)

>>> mnist_no_one_hot.test.labels[0]

>>> mnist.test.labels[0]

array([ 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.])

>>> mnist.test.images[0] # == mnist_no_one_hot.test.images[0]

array([ 0. , 0. , 0. , 0. , 0. ,

. . . . . . . . . . . . . .

0. , 0. , 0. , 0.47450984, 0.99607849,

0.81176478, 0.07058824, 0. , 0. , 0. ,

. . . . . . . . . . . . . .

0. , 0. , 0. , 0. ], dtype=float32)

```

어떤 이미지를 `` prediction``해보기 위해서는 이를 ``py numpy.ndarray``로 변환해야 한다.

저작자표시 비영리 동일조건

'Machine Learning > TensorFlow' 카테고리의 다른 글

[TensorFlow] estimator (0)	2017.11.15
[TensorFlow] Datasets API (0)	2017.11.14
[TensorFlow] TensorBoard (0)	2017.04.23
[TensorFlow] CNN (0)	2017.04.22
[TensorFlow] 자료형과 기본적인 사용법 (0)	2017.04.20

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

[TensorFlow] <lagacy> input pipelines / Threading and Queue

input pipeline (guide)

Threading and Queue (guide)

QueueRunner (API)

Coordinator

모델에 넘기는 데이터 타입

'Machine Learning > TensorFlow' 카테고리의 다른 글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역