A generalized multiple instance learning algorithm for iterative distillation and cross-granular propagation of video annotations
Abstract
Video annotation is an expensive but necessary task for most vision and learning problems that require building models of visual semantics. This annotation gets prohibitively expensive especially when annotation has to happen at finer grained levels of regions in the videos. One way around the finer grained annotation dilemma is to support annotation at coarser granularity and then propagate this annotation to the finer granularity in a concept-dependent way. In this paper we propose a new generalized multiple instance learning algorithm that can work with any underlying density modeling techniques, and help propagate semantic concepts provided at the coarse granularity of video key-frames to finer grained regions. Our experiments on the NIST TRECVID common annotation corpus reveal improvement in annotation propagation accuracy between 3% to a dramatic 161%. ©2007 IEEE.