PhD Proposal: Efficient Detection and Retrieval in Images

Mahyar Najibi
05.04.2017 13:00 to 14:30
AVW 4172

Object detection and content-based image search are central to many computer vision applications. After the successful re-emergence of the deep convolutional neural networks in computer vision, the accuracy of the object detectors has been greatly improved. However, the state-of-the-art detectors are usually computationally expensive. Partly because they model detection as a two stage problem. First, a proposal stage is deployed for solving the localization sub-problem which generates hundreds of candidate object locations. Then, a classifier verifies the existence of objects and their types inside each of the candidate bounding boxes. We study the problems of generic object detection, salient object detection, and face detection and show how each can be solved efficiently without the need for the expensive proposal stage. For generic object detection, we propose a network which solves detection as a search in the space of all possible bounding boxes. Starting from a regular grid, our network moves and scales the initial boxes iteratively towards objects leading to 5X speed-up compared to the baseline. For salient object detection, we design an algorithm which generates the same number of bounding boxes as the actual salient objects in one step, leading to the state-of-the-art results while processing ~120 frames/second on a single GPU. For face detection, we propose a single-stage light-weight fully convolutional neural network which outperforms the more computationally expensive face detectors. Finally, we explore the problem of large-scale image retrieval. Most of the fast image search approaches are based on discretizing the feature space (using hashing or coding methods) and deploying fast indexing structures. However, all these methods have the assumption that the underlying image dataset is fixed. This assumption is limiting in many online search applications. Re-training the models whenever a change occurs in the dataset is infeasible. To this end, we propose a supervised incremental hashing technique which adapts itself to changes in the underlying data. Moreover, for the unsupervised similarity-based search, we experimentally compare the effectiveness of n-ary and binary coding approaches in different search scenarios. Based on our findings, we propose a new n-ary encoder which improves the performance under different search strategies.

Examining Committee:

Chair: Dr. Larry Davis

Dept rep: Dr. Tom Goldstein

Member: Dr. David Jacobs