PowerPoint: Query Learning for Fish Identification
In fishery community, image-classification-based supervised approaches have been well studied for species identification. However, the classifiers learned from one dataset usually cannot directly be applied for a new dataset since the data captured from different years or regions often have large difference, like different colors, camera distortions and distributions of species. It is unreasonable and huge expensive to label the new data to train new classifiers. Hence, we propose a novel method with the combination of query learning and semi-supervised learning to address this challenge with only a small amount of data need to be labeled. First, an uncertainty measure is designed based on the distance between labeled and unlabeled samples in the transformed feature representations through support vector machine (SVM) classifiers. Then diverse samples with large uncertainty are labeled by a fast-greedy search under the query learning framework. In addition, highly confident unlabeled samples are selected for re-training the classifier under a semi-supervised learning approach. With the proposed fully automatic method, the identification accuracy increases by over 25% with only 5% additional human labeled data on NOAA chute, which shows the query learning is an efficient way to address the challenge of species identification for different data sets.