ILSVRC2016 Hikvision팀과 Trimps-Soushen팀의 기법

Hikvision

Ensemble A of 3 RPN and 6 FRCN models, mAP is 67 on val2

Our work on object detection is based on Faster R-CNN. We design and validate the following improvements:

* Better network. We find that the identity-mapping variant of ResNet-101 is superior for object detection over the original version.

* Better RPN proposals. A novel cascade RPN is proposed to refine proposals' scores and location. A constrained neg/pos anchor ratio further increases proposal recall dramatically.

* Pretraining matters. We find that a pretrained global context branch increases mAP by over 3 points. Pretraining on the 1000-class LOC dataset further increases mAP by ~0.5 point.

* Training strategies. To attack the imbalance problem, we design a balanced sampling strategy over different classes. With balanced sampling, the provided negative training data can be safely added for training. Other training strategies, like multi-scale training and online hard example mining are also applied.

* Testing strategies. During inference, multi-scale testing, horizontal flipping and weighted box voting are applied.

The final mAP is 65.1 (single model) and 67 (ensemble of 6 models) on val2.

[CLS-LOC]

A combination of 3 Inception networks and 3 residual networks is used to make the class prediction. For localization, the same Faster R-CNN configuration described above for DET is applied. The top5 classification error rate is 3.46%, and localization error is 8.8% on the validation set.

Trimps-Soushen

Object detection (DET)

We use several pre-trained models, including ResNet, Inception, Inception-Resnet etc. By taking the predict boxes from our best model as region proposals, we average the softmax scores and the box regression outputs across all models. Other improvements include annotations refine, boxes voting and features maxout.

Object classification/localization (CLS-LOC)

Based on image classification models like Inception, Inception-Resnet, ResNet and Wide Residual Network (WRN), we predict the class labels of the image. Then we refer to the framework of "Faster R-CNN" to predict bounding boxes based on the labels. Results from multiple models are fused in different ways, using the model accuracy as weights.

저작자표시 비영리 동일조건 (새창열림)

'Machine Learning > Theses' 카테고리의 다른 글

A Multi-view Context-aware Approach to Android Malware Detection and Malicious Code Localization 정리 (0)	2017.06.15
Best Practices for Applying Deep Learning to Novel Application 정리 (0)	2017.05.31
Deep learning malware detection (0)	2017.04.30
Learning Fine-grained Image Similarity with Deep Ranking 정리 (0)	2017.04.25
CNN model과 다양한 분야에 딥러닝을 적용한 논문들 (0)	2017.04.14

Hikvision

Trimps-Soushen

'Machine Learning > Theses' 카테고리의 다른 글

티스토리툴바