^{1}

^{*}

^{1}

Video shot boundary detection is the primary task for content based video management and retrieval system. This paper proposes a shot boundary detection strategy by exploiting the pros of Normalized Periodogram for efficiently representing the content of the video. A Normalized Periodogram based distance metric to detect the key frames using shot boundary, namely Distance- Left-Right (D
_{LR}
), is addressed, which is computed on a sliding sub-window basis. The D
_{LR}
sequence is used to detect the suspected shot boundary frames and a transition type detection procedure is adapted to these suspected frames for discriminating the abrupt and gradual transitions. The
proposed shot boundary detection methodology yields Precision—95.02%, Recall—93.15% and F1 score—94.07% for cut, Precision—86.57%, Recall—86.67% and F1 score—86.61% for gradual, Precision—90.6%, Recall—90.02% and F1 score—90.3% for overall transitions. Experimental results show that the proposed approach is superior to the recently available shot boundary detection techniques because of its robustness and simplicity, and presents an effective distance metric to detect the shot boundary.

In this internet era, Digital Video plays a significant role in human’s daily lives. Many practical applications like Video Retrieval, Video Surveillance, Video Content Analysis, Video Indexing, Video Skimming, etc., face trade-off between complexity and accuracy. The diverse content of video makes video management systems, a challenging task for multimedia researchers. Manual annotation of multimedia data is possible, but it is highly time consuming, which seeks the need for automatic vision algorithms for annotating the multimedia database over Internet. Video Shot Boundary Detection (VSBD) has been widely accepted as a solution to this trade-off and structural analysis of video. Generally, frames extracted from the shot boundary are minimal compared to entire video content and represent the video effectively. A set of frames captured on a single camera is termed as shot. A shot can be categorized into cut or smooth based on the frames involved in transition as shown in

Prior work on shot boundary detection mainly concentrates on the abrupt boundary detection and is very easy to detect the frames of sudden transition since the phenomena involve great discontinuity between adjacent frames. Most approaches involve a feature dissimilarity measure between the adjacent frames and predict the cut transition when the dissimilarity measure exceeds a threshold. Compared to abrupt SBD detection gradual SBD is complex as it does not involve great discontinuity between consecutive frames. Gradual SBD algorithms should be robust enough to issues like camera and object motion. The overall research work carried out in VSBD can be categorized, viz. pixel wise, global based, block based and motion activity based techniques. Various methodologies like [

Block based approaches have been introduced to improve the SBD accuracy and reduce the computation time. All these approaches discussed so far involve features like moment invariants, local feature fusion, entropy, motion vector, Visual Bag of Words, Edge Change ratio, feature points, etc. Detecting the gradual transition by

predicting and training an appropriate model [

As motion is continuous along a shot, motion is also used as a cue to detect shot changes. As the camera and object move graciously within a shot, the resulting motion field within a shot will be continuous. In [

Multiple features like pixel wise difference, color and edge histogram are extracted from the video frames and fed as input to the machine learning classifier, and support vector machine for transition classification [

One of the limitations of various algorithms proposed for VSBD phenomena is the lack of unified approach for detecting all types of transitions in various video streams like Video Lecture, News, Entertainment Shows, Sports and Movies. Many algorithms proposed for detecting all types of transitions include a tedious procedure and high computational cost. Most of the earlier SBD works are evaluated only on bench mark datasets and produce better results at high computational cost. Hence, the proposed methodology introduces a normalized periodogram distance based Left-Right (LR) ratio to detect the abrupt as well as gradual shot boundaries in video, which is efficient and effective in terms of accuracy and computational cost. The main contributions of this work are:

1) A normalized periodogram distance metric based LR ratio is introduced to detect the shots in a given video.

2) The proposed methodology is evaluated in unconstrained videos, including News, entertainment shows, Movies, Sports and TRECVID 2001 Dataset.

This section elaborates the proposed normalized periodogram based D_{LR} metric for detecting both abrupt and gradual transition simultaneously. Given a video, sequence of frames obtained by partitioning the video is denoted as_{k}, the power spectrum is estimated using the classical non- parametric periodogram method. The periodogram of frames can be written as_{LR} metric is computed for the feature frames and compared against the statistical threshold S_{th} chosen by trial and error method. The Frames with D_{LR} metric greater than S_{th} are suspected frames for shots. With the suspected frame as centre, the suitable suspected window is selected to decide the transition type as abrupt/gradual. The proposed flow graph for VSBD is shown in

Periodogram is a non parametric technique for power spectrum estimation [

The periodogram can be written as,

Even though the periodogram is represented using the autocorrelation function, it is necessary to represent periodogram in terms of the input frame/matrix ‘f’. Let f_{B}(i, j) be the dot product of f(i, j) and the box window filter B(i, j),

The autocorrelation function of F_{B}(i, j),

Using Convolution Theorem of Fourier transform,

where F_{B}(k, l) is the Fourier Transform of the frame f_{b}(i, j) at pixel i, j of size M × N.

Previous section clearly shows that the Periodogram is directly proportional to the squared magnitude of the Fourier Transform and is very simple to compute. This section gives a gist of the properties of periodogram as follows,

1) Bias of the Periodogram:

The expected value of the peridogram of f(i, j) is the convolution of the power spectrum with the Fourier transform of Bartlett Window, Periodogram is a biased estimate.

where P_{f}(k, l) is the power spectrum of f(i, j) and W_{B}(k, l) is the Fourier Transform of the Bartlet window.

2) Variance of the periodogram:

Variance of the periodogram does not converges and the periodogram Per_{f}(k, l) is not the consistent estimate of the power spectrum. The variance of the periodogram is proportional to the square of the power spectrum of f(i, j)

A normalized periodogram distance, a periodogram based metric for shot boundary classification is detailed in this section. Consider the Power spectral estimate of two frames as

The periodogram distance between frame x and y can be written as,

The main intention of using periodogram in this work is to visualize the correlation between frames, Hence normalized periodogram is sufficient for this objective and is given by

From the property 2 of periodogram it is evident that the variance of the periodogram is proportional to the spectral value and therefore it is meaningful to use logarithm of normalized periodogram. The normalized periodogram distance satisfies the basic properties of a metric:

Property 1: Symmetry property;

Property 2: Non-negative property;

Property 3: Triangle-inequality;

With the knowledge of Normalized Periodogram Distance (NPD) between consecutive frames, select a sub-window of size 2W + 1 for the D_{LR} metric computation, explained as follows:

Step 1: Select the left “W” frames as sample set “L” and right “W” frames as sample set “R”.

Step 2: Compute the normalized periodogram distance between each sample in L and centre sample, D_{LC} = median(L_{j}-C), where j is the number of frames in left window, and C is the centre NPD frame in the sub-window “W”. Similarly, calculate D_{RC} = median(R_{j}-C).

Step 3: Compute D_{LR} = D_{LC}/D_{RC}._{ }

The same process is repeated for all k frames in the video. The obtained D_{LR} metric is compared against the statistical threshold given by,

The frames with D_{LR} metric greater than S_{th} are termed as suspected frames. These suspected frames are given as input to the Transition Type Identification Procedure (TTIP) and is detailed in the following algorithm. The flow graph of the D_{LR} metric computation followed by TTIP is shown in

Algorithm 1: Transition Type Identification Procedure

Input: Peaks of D_{LR} metric of suspected frames

Output: cut/gradual

Step 1: For each peak of the D_{LR} metric choose a window of size 2Z + 1 with centre as peak value. Select the left “Z” D_{LR} metric as sample set L_{Z} and right “Z” D_{LR} metric as set R_{Z}.

Step 2: T_{DLR} = min(R_{Z})-min(L_{Z}).

Step 3: If T_{DLR} > α1, suspected frame is gradual; Else suspected frame is cut.

This section presents the evaluation of proposed method over existing methodologies for shot boundary detection. Experimentation is carried out using Matlab 8.5 software on DELL i3 core system. Description of the test dataset, evaluation measures and performance of the proposed methodology over state of art methods are detailed below:

To evaluate the performance of the proposed approach, various test videos from OPEN VIDEO [

The parameters need to be set in the proposed method are “W”, “Z”, “α”, “α1”. The sub window size “W” is varied from 5 to 25 in steps of 5 and experimented on the TRECVID data of 5000 frames as shown in

Video | Video details | |||
---|---|---|---|---|

Number of Frames | Duration(s) | Number of Shots | Characteristics | |

VID1 | 200 | 7 | 3 | Varying lighting effects, Object motion |

VID2 | 300 | 10 | 2 | Varying lighting effects, Camera and Object motion |

VID3 | 500 | 16 | 1 | Camera motion |

VID4 | 400 | 13 | 3 | Varying illumination and Object motion |

VID5 | 300 | 10 | 2 | Varying illumination and Object motion |

VID6 | 300 | 10 | 2 | Varying illumination and Object motion |

VID7 | 150 | 5 | 3 | Special effects, Varying illumination and Object motion |

VID8 | 900 | 30 | 6 | Camera and Object motion |

VID9 | 350 | 11 | 4 | Varying lighting effects, Camera and Object motion |

VID10 | 500 | 16 | 3 | Varying lighting effects, Camera and Object motion |

VID11 | 300 | 10 | 2 | Varying lighting effects, Camera and Object motion |

VID12 | 11,356 | 379 | 65 | Varying lighting effects, Camera and Object motion |

VID13 | 16,586 | 553 | 73 | Varying lighting effects, Camera and Object motion |

VID14 | 12,304 | 410 | 103 | Varying lighting effects, Camera and Object motion |

VID 15 | 31,389 | 1046 | 153 | Varying lighting effects, Camera and Object motion |

method and it varies for different video. The window size “Z”, required to determine the type of shot, is chosen as 20, since minimum gradual transition duration involves 25 frames.

For evaluating the proposed D_{LR} based methodology, benchmark video dataset TRECVID 2001 is used to select the key frames from the video. The evaluating metrics namely precision and recal1 are computed using,

As illustrated in _{LR} metric_{.} Running the proposed approach on i3 core system, the time taken for processing the consecutive frames using [

A robust and efficient technique for detecting abrupt and gradual shots in a video is presented. The power spectrum is estimated for video frames and using suitable window size, D_{LR} metric is evaluated for the spectral features extracted from the frames. Suspected video frames are detected using statistical threshold approach on the

Video | VSBD using WHT [ | VSBD using proposed method | ||||||||
---|---|---|---|---|---|---|---|---|---|---|

Cut | Gradual | Cut | Gradual | Overall | ||||||

P | R | P | R | P | R | P | R | P | R | |

VID12 | 85.4 | 97.6 | 90 | 87 | 94.7 | 100 | 92 | 100 | 93.6 | 100 |

VID13 | 86.5 | 82.1 | 88.7 | 85.9 | 95.2 | 86.9 | 80.6 | 80.6 | 89 | 84.4 |

VID14 | 90.6 | 88.8 | 84.6 | 80 | 92.3 | 87.8 | 82.8 | 76.8 | 86.4 | 80.9 |

VID15 | 93.5 | 95.6 | 88.3 | 88.5 | 97.9 | 97.9 | 90.9 | 89.3 | 93.4 | 94.8 |

Video | VSBD using proposed methodology | |
---|---|---|

Manually annotated shots | Proposed automatic detection | |

VID1 | 3 | 3 |

VID2 | 2 | 2 |

VID3 | 1 | 1 |

VID4 | 3 | 3 |

VID5 | 2 | 2 |

VID6 | 2 | 2 |

VID7 | 3 | 3 |

VID8 | 6 | 6 |

VID9 | 4 | 5 |

computed D_{LR} metric and transition type detection procedure is used to classify the abrupt and gradual transitions. Thus the proposed periodogram based D_{LR} metric shows a promising performance in constrained and unconstrained video data for detecting shot boundaries. The proposed method fails under some drastic camera and object movement conditions, which can be improved by including motion feature.

A. Sasithradevi,S. Mohamed Mansoor Roomi, (2016) Video Shot Boundary Detection Using Normalized Periodogram Distance Metric. Circuits and Systems,07,2875-2883. doi: 10.4236/cs.2016.710246