Optimal sequential grouping for robust video scene detection using multiple modalities
Abstract
Video scene detection is the task of dividing a video into semantic sections. To perform this fundamental task, we propose a novel and effective method for temporal grouping of scenes using an arbitrary set of features computed from the video. We formulate the task of video scene detection as a generic optimization problem to optimally group shots into scenes, and propose an efficient procedure for solving the optimization problem based on a novel dynamic programming scheme. This unique formulation directly results in a temporally consistent segmentation, and has the advantage of being parameter-free, making it applicable across various domains. We provide detailed experimental results, showing that our algorithm outperforms current state-of-The-Art methods. To assess the comprehensiveness of this method even further, we present experimental results testing different types of modalities and their applicability in this formulation.