Object recognition is a well-investigated task. Nevertheless, some difficulties remain. For instance, packaging boxes may have a distinguishing feature only on one side. The classification is impossible when the objects face visually identical sides. A recent study proposes a novel active perception pipeline to solve the problem.

Object recognition. Image credit: Wikitude via Flickr, CC BY-SA 2.0

An image similarity metric based on the embedding of a denoising autoencoder is proposed. This score lets to correctly train classifiers by excluding ambiguous views from the training data.

An active perception framework selects the next best viewpoint to acquire a non-ambiguous, classifiable view. For instance, a robot could look at the other side of the object to avoid ambiguity. The experiments with various household objects proved that the approach is feasible and performative.

Recent visual pose estimation and tracking solutions provide notable results on popular datasets such as T-LESS and YCB. However, in the real world, we can find ambiguous objects that do not allow exact classification and detection from a single view. In this work, we propose a framework that, given a single view of an object, provides the coordinates of a next viewpoint to discriminate the object against similar ones, if any, and eliminates ambiguities. We also describe a complete pipeline from a real object’s scans to the viewpoint selection and classification. We validate our approach with a Franka Emika Panda robot and common household objects featured with ambiguities. We released the source code to reproduce our experiments.