This paper aims to speed up a segmentation algorithm “Grab Cut” by separating the process of segmentation into hierarchical steps. The Grab Cut algorithm segments images by means of the color clustering concept and the process requires a lot of iteration for it to get converged. Therefore, it is a time-consuming process which we are interested in improving this process. In this study, we adopt the idea of hierarchical processing. The first step is to compute at low resolution to make the iteration much faster, and the second step use the result of the first step to carry on iteration at original resolution so that the total execution time can be reduced. Specifically speaking, segmentation of a low resolution image will lead to high-speed and similar-segmentation result to the segmentation at original resolution. Hence, once the iterations at low resolution have converged, we can utilize the parameters of segmentation result to initialize the next segmentation on original resolution. This way, the number of iteration of segmentation at original resolution will be reduced through the initialization of those parameters. Since the execution time of low resolution images is relatively short, the total hierarchical execution time will be reduced consequently. Also, we made a comparison among the four methods of reduction on image resolution. Finally, we found that reducing the number of basins by “Median Filter” resulted in best segmentation speed.
The efficiency of interactive foreground/background segmentation is practically important for image editing. There are two kinds of segmentation methods: “Auto” [
On the other hand, Semi-Auto segmentation added the user information, for example, a desired region assigned by user, will be more accurate and faster than auto segmentation. There is a powerful, popular semi-auto segmentation algorithm which is called “Graph-Cut” [
Our work focuses on the rectangle segmentation stage. Since the rectangle segmentation utilizes the clustering concept [
The practical measure is about “Lazy Snapping” [
The implementation of grab cut segmentation is based on “Clustering” and “Max-Flow Min-Cut Theorem”. Clutering is a concept and also an algorithm that separates multi-dimension data into several groups. Gaussain Mixture Mo-del (GMM) [
The equation form of the probability produced by the k-th gaussian model (component) is (1).
r ( i , k ) = π k N ( x i | μ k , Σ k ) ∑ j = 1 K π j N ( x i | u j , Σ j ) (1)
Now we have learned each data point belongs to the model with the highest probability. Therefore, we can calculate the mean value and the variance (matrix) of every gaussian probability model (component), the results are (2) and (3). Then we do the steps above until the results to get converged.
μ k = 1 N k ∑ i = 1 N r ( i , k ) x i (2)
Σ k = 1 N k ∑ i = 1 N r ( i , k ) ( x i − μ k ) ( x i − μ k ) T (3)
If we build a flow network as
amount of them are all the same. Because current flow value must be identical to the previous flow, or the flow will disappear. We can find out that in
If we image there is a chessboard, the cross point is the pixel, we can apply this theorem [
For a segmentation algorithm, the cost function is always the necessity, they are as Equation (4)-(6), where kn means the n-th pixel belonging to k-th component of GMMs, and αn only possess value 0 and 1 which means the n-th pixel belonging to back/foreground. θ are the parameters of GMMs; they are weighting coefficients, mean values, and covariance matrix. zn is the pixel value of n-th pixel. Moreover, COV(αn, kn) refers to the covariance matrix of the component that belongs to back/foreground and corresponds to the n-th pixel.
U ( α , k , θ , z ) = ∑ n D ( α n , k n , θ , z n ) (4)
D ( α n , k n , θ , z n ) = − log [ π ( α n , k n ) ] + 1 2 log { det [ C O V ( α n , k n ) ] } + 1 2 [ z n − μ ( α n , k n ) ] T [ C O V ( α n , k n ) ] − 1 [ z n − μ ( α n , k n ) ] (5)
θ = { π ( α , k ) , μ ( α , k ) , C O V ( α , k ) , α = 0 , 1 , k = 1 , ⋯ , K } (6)
We can see that the function D in (4)-(5) is a muti-dimension gaussian probility distribution. Therefore, function D means how possible a pixel belongs to back/foreground. The total cost function is as (7) which includes function U and V. The function V is written as (8), where f ( α ) = 0 for α n = α m , and1for α n ≠ α m .
E ( α , k , θ , z ) = U ( α , k , θ , z ) + V ( α , z ) (7)
V ( α , z ) = r ∑ ( m , n ) ∈ C f ( α ) exp ( − β ‖ z m − z n ‖ 2 ) where f ( α ) = { 0 , α n = α m 1 , α n ≠ α m (8)
If two neighbor pixels have large difference in value, they can be considered as background and foreground with low cost. If the values of two neighbor pixels are close, the cost of assigning them different back/foreground will be large. The segmentations executing with max-flow min-cut theorem and cost function (7) are as
In the end, the process of grab cut can be divided into four steps. First, user marks two corner of a rectangle to ensure a segmentation region, and the pixels inside will be used to build foreground GMMs while the pixels outside are used to build background GMMs. Second, these GMMs are used to form the cost function U, and the segmentation will begin. Third, the segmentation result forms new back/foreground GMMs. We need to check if the new GMMs are identical to the old ones. If they didn’t change, it means that the segmentation is converged, the program will go to step four. If they did change, the program will back to step two until it converges. The fourth step is using graph cut to mark the imperfect region to complete the segmentation.
Since the computation complexity of graph cut is O(mn2), which n is the pixel number and m is the flow number. It is a third power of pixel number. So, [
basin as a pixel. This way, the new pixel number reduces rapidly.
Our work is based on lazy snapping and [
Our proposed method is as
The second step is to reduce resolution by using smooth filter, then use the GMMs parameters of the first step to continue segmentation. As a result, the GMMs parameters of the second step are closer to those converged parameters.
The third step is to go back to the original image and use the final parameters of step two to carry out a new segmentation. Although the process is more complicated, the execution time decreases due to the replacing in part of entire process by high speed computation in low resolution stage.
We pick four approaches to accomplish the proposed speeding up algorithm. They are median filter, mean filter, Gaussian filter, and down sampling. By using these filters and down sampling, we can lower the number of basins so that the speed of execution on these images is increased.
We use a method called “normalized ranking” with four methods to find out which method has the best execution time. The normalized ranking is to set the longest execution time as 1, the others are divided by the longest one. Therefore, we will get a grand score by summing scores on all patterns. Obviously, the largest grand score means the longest execution time. In our simulation, the median filter has the best grand score among four methods. The simulation platform is a personal computer with Intel Core i3-530, 2.93 GHz, and 4 G byte DRAM.
Moreover, in
Image segmentation is such a highly developed field and so many researchers are still trying to make some improvement. Our improvement is mainly about speeding up and we used four methods to make the segmentation faster. Finally, we find out that median filter has the best performance among four methods. In the future, if there exists any need is strong to real-time while with less emphasis on
Mask size | Speeding up rate |
---|---|
9-1 | 27.6724% |
7-1 | 28.6286% |
5-1 | 27.8606% |
3-1 | 20.4462% |
9-5-1 | 29.5481% |
9-3-1 | 26.7035% |
quality of image, such as object tracking, our proposed algorithm may be useful for these kind of needs.
The authors would like to thank the Editor and the referee for their comments. This work was supported in part by the National Science Council, Taiwan, under Grant No.98-2221-E-009-138.
Dung, L.-R., Yang, Y.-M. and Wu, Y.-Y. (2018) A Hierarchical Grab Cut Image Segmentation Algorithm. Journal of Computer and Communications, 6, 48-55. https://doi.org/10.4236/jcc.2018.62005