Hand-coloring is very useful when people draw pictures or produce movies, but it is also very time-consuming and expensive. Therefore, with the development of segmentation algorithms and deep learning, a lot of computer-assisted colorization algorithms have been invented, including user-guided colorization, semi-automatic colorization and automatic colorization. In this survey paper, we will only focus on scribble-based colorization in user-guided colorization and review the development on it. We will discuss various principles and methods among different kinds of scribble-based colorization algorithms. We will also compare visual result and performance of algorithms among colorization on grayscale image, colorization on sketch and colorization with networks to conclude their advantages and disadvantages.
Colorization refers to a process of adding colors to black-and-white images (with grayscale information), sketches or even monochrome motion pictures. The traditional hand-coloring methods are very time-consuming and require the effort of a whole group of professional artists. Therefore, many computer-assisted colorization algorithms have been invented to make the process of colorization more efficient.
*they are all co-first authors, signature by the alphabetical order of last names.
Computer-assisted colorization can be implemented in a large range of areas. For example, since the colorization can only be finished by hand, colorizing a black-and-white movie cost more than $3000 per minute of running time to colorize a movie in 1987 according to a report in Popular Mechanics. To reduce the cost, computer-assisted colorization algorithms are invented, offering filmmakers a less expensive way of producing technicolor and bringing more realistic visual effects to the audience.
As a second example, it takes a lot of efforts to colorize the sketch picture after a painter finishes the creative sketch because painters need to find the proper color first, which is very difficult for the novice painters, and then colorize every region in the picture manually, which is very time-consuming. To improve the visual effect and the efficiency of drawing, many automatic colorization algorithms are desirable by which the color of the sketch picture can be assigned automatically and changed quickly according to the painters’ preference.
According to the participation level of users, computer-assisted colorization methods can be divided into three domains: automatic, semi-automatic and user-guided colorization methods. Automatic colorization methods are recent approaches by which the monochrome pictures are directly colorized by training a Convolutional Neural Networks (CNN) with a large-scale image collection. Semi-automatic colorization methods denote approaches by which color pattern is transferred from one or more reference images to the input monochrome picture. User-guided colorization methods are approaches by which users can directly decide color of the corresponding region.
In this paper, we will focus on scribble-based colorization in user-guided colorization. First, we will introduce the history of the scribble-based colorization briefly. Then, we will elaborately talk about three areas of scribble-based colorization, including grayscale image colorization, sketch colorization and colorization with networks. In each area, we will make analyses of differences of principles, methods, performance among the colorization algorithms. Finally, we will draw a conclusion about their advantages and disadvantages. We hope that these analyses can become a useful resource for the scribble-based colorization and computer graphic community.
In the earlier days, the process of colorization was divided into two separated parts: segmentation and filling. However, the error ratio in segmentation remains to be high, which means that a lot of user-interventions are needed to fix the errors, making colorization a tedious, time-consuming and expensive task. To reduce user-interventions, Anat Levin and her colleagues designed a method that focused on assigning colors of pixels utilizing the optimization algorithm according to the similarities of intensities, improving the accuracy in colorization (Levin et al., 2004) . Nonetheless, the processing time is still too long, and users still need to be very careful when choosing the colors and strokes’ positions. The non-iterative method combined with the adaptive edge extraction (Huang et al., 2005) was proposed to reduce the running time of colorization optimization and color bleeding effects. Different with previous methods, Yatziv & Sapiro (2006) presented the idea of color blending, which means that the chrominance value of a pixel is the result of contribution from given colors. To achieve a better visual effect when user strokes are sparse, Luan et al. (2007) proposed to consider both the neighboring pixels with similar intensity and remote pixels with similar textures. For the same purpose, for the same purpose, Xu et al. (2013) applied the probability distribution to find out the most confident stroke color for every pixel and thus decide the color of pixel.
However, intensity-based colorization methods used in colorizing grayscale images may fail to colorize sketch because manga has no grayscale information. Qu et al. designed a method that can propagate color through pattern-continuous regions as well as intensity-continuous regions (Qu et al., 2006) . Later, Sykora and colleagues pointed out the limitations in Qu et al. and formulated their ideal painting tool by converting the properties to an energy minimization framework in order to colorize black-and-white cartoons more conveniently (Sýkora et al., 2009) . With the development of deep learning, Zhang et al. (2017) incorporated U-Net structure into their colorization framework to reduce user’s interventions and improve the visual effect of colorization.
The principle for colorization that neighboring pixels with similar intensities should have similar colors was first proposed by Levin et al. (2004) and then was used by Huang et al. (2005) . Yatziv & Sapiro (2006) proposed another method based on Color Blending, and simply assumed that the closer two pixels are in the intrinsic distance (geodesic distance), the more similar chrominance value they may have.
Levin et al. and Huang et al. used YUV color space where Y ( x , y ) represents the luminance of the pixel whose ordinate is ( x , y ) and U ( x , y ) and V ( x , y ) represent the color volumes, regarding value of Y ( x , y ) to be the input and values of U ( x , y ) and V ( x , y ) to be the output. Output of U ( x , y ) and V ( x , y ) is decided by minimizing the cost function:
J ( U ) = ∑ p ( U ( p ) − ∑ q ∈ N s ( p ) w p q s U ( q ) ) 2 (1)
J ( V ) = ∑ p ( V ( p ) − ∑ q ∈ N s ( p ) w p q s V ( q ) ) 2 (2)
where N s ( p ) is the set of 8 spatially neighboring pixels around pixel p. When the cost function J ( U ) and J ( V ) reduce to the minimum, the corresponding U and V will be the final color volumes for pixel p. It’s worth mentioning that Levin et al. and Huang et al. had different definitions about the weight function w p q s . The formula proposed by Levin et al. is:
w p q s ∝ e − ( Y ( p ) − Y ( q ) ) 2 / 2 σ p 2 (3)
where σ p 2 is the variance of luminance in the window around pixel . However, Huang et al. designed a new function of weight:
w p q s = 1 1 + | Y ( p ) − Y ( q ) | σ p 2 + 1 (4)
By assuming that σ p 2 = 1 , we can get two different functions about w p q s and | Y ( p ) − Y ( q ) | and draw two function graphs according to Equation (3) and Equation (4). From the graphs, we find that when | Y ( p ) − Y ( q ) | (the luminance difference) becomes larger, the function value defined by Equation (3) converges to zero much faster than the function graph defined by Equation (4), proving that the weight defined by Equation (3) will not change when intensity difference between Y(p) and Y(q) is very large. After analyzing different function graphs of those two definitions (
Rather than following the optimization framework, Yaztiv et al. proposed a new framework based on Color Blending which means the chrominance of a pixel is a weighted average of the chrominances in the user strokes. The weight of a specific chrominance is given by a carefully designed weighted function which takes the minimum distance from the pixel “to this chrominance” as an input. A very important but unstated idea beyond this paper is that a grayscale image can be seen as a two-dimensional representation of a special geographic surface. (
choice for measuring the distance between two pixels, but the efficiency may suffer because calulating the geodesic distance involves complex integrals (Yatziv & Sapiro, 2006) . However, since the grayscale image is actually not an ideal geographic surface but a collection of discrete spatial points, it can be simplifed as a nonplanar undirected graph with weights at the edges taking on the Euclidean Distance between two pixels (vertices). Therefore, a significant speed-up can be achieved by computing the mininum distance using Dijksta’s algorithm, which is sufficent for the precision. Combing it with other tricks (Yatziv & Sapiro, 2006) yields a good tradeoff between efficiency and quality.
The method proposed by Levin et al. is very time-consuming, since the optimization included hundreds of iterations. To reduce the running time of optimization, Huang et al. made an improvement on this method. There are four steps in the new method: 1) Triangulate the pixels with assigned colors using the Delauney triangulation algorithm; 2) Calculate the colors of pixels along the edges of those triangles using the color volume of corresponding end vertexes; 3) Initialize the color volumes of the pixels inside the triangles according to the color volumes of edges. 4) Run the colorization optimization algorithm using Equation (1) and Equation (2).
When colorizing natural image, those methods need user to input many strokes and to draw them elaborately. To reduce user’s efforts, Luan et al. (2007) thought it was necessary to consider not only the neighboring pixels with similar intensities but also the remote pixels with similar texture. For the same purpose, Xu et al. (2013) applied the probability distribution to find out the most confident stroke color for every pixel and thus decided the color of pixel.
By adding that remote pixels with similar textures should be colorized with similar color, a new colorization approach (Luan et al., 2007) realized colorization with only a few strokes. After studying failure examples, Luan et al. decided to colorize pixels that are near to edges according to the texture similarity and colorize pixels in smooth regions according to the intensity similarity. Consequently, Luan designed an energy function framework:
E = ∑ p ∈ I ( λ ( p ) E 1 + ( 1 − λ ( p ) ) E 2 ) (5)
where λ ( p ) is the weight map; E 1 and E 2 are textural term and spatial term:
E 1 = ∑ q ∈ N t ( p ) w p q t ‖ L ( C ; p ) − L ( C ; q ) ‖ (6)
E 2 = ∑ q ∈ N s ( p ) w p q s ‖ L ( C ; p ) − L ( C ; q ) ‖ (7)
where N t ( p ) is spatially neighboring pixels of p, N s ( p ) is texturally neighboring pixels of p; C represents all the colors: C = [ C 1 , C 2 , ⋯ , C n ] T ; L ( C ; p ) is the likelihood function. Besides, E 1 and E 2 are defined under the constrain: L ( C ; p ) = [ 0 , ⋯ , 1 k , ⋯ , 0 ] T when p belongs to the stroke with color C k . When the pixel p is close to the edges, λ ( p ) will be larger and thus the spatial term will be less influential than the textural term. N t ( p ) and w p q t are defined in the same way as before, while N s ( p ) is defined by using k-means to cluster on patches according to their similarity in appearance and w p q s is defined according to both the textural similarity and spatial continuity. By iteratively updating L ( C ; p ) until it is fixed, we can solve the energy function and finally get the likelihood vector where the color with the maximum likelihood will be assigned to the corresponding pixel. At last, users can adjust the colors to make the image more vivid only by choosing a few pixels to assign the color. According to those assigned colors and intensity information, the color of other pixels can be produced by interpolations.
Later, Xu et al. proposed to reduce the user strokes from the perspective of designing an adaptive feature space. A feature vector of a pixel in the feature space contains the information of position, color, user control confidence and propagation result in previous iteration. According to the closeness in feature space between pixels, the weighted sum of Gaussian functions is generated by all the pixels in user strokes. After some derivations, to choose the color with maximum probability, Xu et al. presented the objective function:
E ( p ) = ( p − W g ) T ( p − W g ) + λ s p T L p (8)
where g is the user input color vector ( g j is non-zero only when j ∈ Ω stroke ); p is the output color vector; L is the sparse Laplacian for regularization; p T L p is the smoothness term (Lischinski et al., 2006) ; λ s determines the smoothness level; W is the weight matrix relative to the feature vector.
There is a significant difference between grayscale images and sketch images: sketch images have no grayscale information. While many scribble-based colorization algorithms used in colorizing grayscale images were invented after Levin et al., few algorithms can achieve desirable results when they are employed in colorizing sketch images.
Qu et al. first designed a method that can propagate colors through pattern-continuous and intensity-continuous regions in manga (Qu et al., 2006) . Regions with similar pattern features and with open boundaries can be segmented intelligently. After the segmentation, several colorization techniques can be employed in filling in colors.
In the context of colorizing black-and-white cartoons, however, Sykora et al. pointed out the limitations in Manga Colorization and briefly discussed an ideal painting tool instead (Sýkora et al., 2009) . An ideal painting tool may have four major properties as following: 1) It tends to achieve the “largest” colorized area by seeking an optimal boundary, regardless of inner holes and gappy outlines; 2) In a locality, the color is continuous and determined by the nearby scribbles; 3) Soft scribbles (imprecisely-placed strokes, with very few parts outside as well as the majority lying in the interior) are preferred rather than the subtle strokes inside the region; 4) The anti-aliasing effect in the original image is preserved (to address the problem of colorizing regions with vague boundaries). A new black-and-white colorization algorithm is carefully designed pursuing the four ideal properties, and it significantly enhanced the convenience and achieved the very same visual effect with less precise strokes (
Qu et al. use level set method to describe the evolving curve in order to implement
segmentation. By using appropriate formulas to express the propagating speed, this method can achieve desirable segmentation results. Various techniques of colorization will then be applied in color-filling after segmentation.
According to (Qu et al., 2006) , manga colorization includes two parts: segmentation and color-filling.
Three methods used in color-filling were mentioned by Qu et al.: color replacement, stroke-preserving colorization, and pattern to shading. Color replacement method replaces the black or white color on original manga image by the color on the user’s scribble. Stroke-preserving colorization preserves the original pattern during colorization and pattern to shading method transforms the pattern into smooth color shading.
As for segmentation, Qu et al. used the level set method to describe the evolving curve. According to Hamilton-Jacob equation, the following equation can be raised:
∂ ϕ ∂ t = − F | ∇ ϕ | (9)
where ϕ is the implicit 3D function, t is the evolving time, and F denotes the moving speed of the evolving front. For segmentation problems, F can be replaced by the following equation:
F = h ∗ ( F A + F G ) (10)
where F A is normally a constant and F G depends on the geometry of the curve’s evolving front. h is a term used to abort the curve evolution, and has different expressions in pattern-continuous regions and intensity-continuous regions.
・ Pattern-continuous regions
In the pattern-continuous regions, h can be expressed by the following formula:
h P ( x , y ) = 1 1 + | D ( T user , T front ( x , y ) ) | (11)
where T front ( x , y ) and T user are the pattern feature on the evolving front and that on the user scribbles, respectively. Function D measures the difference between T user and T front ( x , y ) . Pattern feature T can be expressed by the statistical feature in Gabor wavelet domain: after implementing Gabor wavelet transforming, the mean value μ m , n and the value of standard deviation σ m , n will be calculated in order to express the feature vector:
T = [ μ 0 , 0 σ 0 , 0 μ 0 , 1 ⋯ μ 3 , 5 σ 3 , 5 ] (12)
・ Intensity-continuous regions
In the intensity-continuous regions, h can be expressed as:
h I ( x , y ) = 1 1 + | ∇ ( G σ ⊗ I ( x , y ) ) | (13)
where G σ is the Gaussian smoothing filter. | ∇ ( G σ ⊗ I ( x , y ) ) | is normally zero except at the places where abrupt image gradient change happens.
To implement leak-proofing, a new equation of F is denoted as:
F = h I ( F A + F I + F G ) (14)
F I can be expressed by:
F I ( x , y ) = − F A R ( | ∇ G σ ⊗ I ( x , y ) | − M 2 M 1 − M 2 − δ ) (15)
where M 1 and M 2 are the maximum and minimum values of | ∇ ( G σ ⊗ I ( x , y ) ) | . Relaxation factor δ ∈ [ 0 , M 1 − M 2 ] and the result of function R is between 0 (include) and 1 (include).
Sykora et al. converted the four requirements they proposed into an energy minimization framework, by which a desirable result was obtained in colorizing black-and-white cartoons.
・ Energy Minimization Framework
Given a grayscale image I, the energy function of a colorization scheme S is denoted by:
E ( S ) = ∑ p ∈ I ∑ q ∈ N ( p ) E p , q ( S p , S q ) + ∑ p ∈ I E p ( S p ) (16)
where S p is the color assigned to pixel p in the scheme, E p , q is the energy of color discontinuity between two neighboring pixels p and q, and E p is the energy of assigning color S p to pixel p. Minimizing the energy function, considering both the effect of the pixel-wise color assignment and the chrominance correlation of two neighboring pixels, gives us an optimal scheme of colorizing black-and-white sketches, satisfying the proposed properties (or requirements).
The reason for introducing E p , q is to manage color discontinuities. One simple observation is that though the original image is a typical black-and-white drawing, there may still exist a lot of vague boundaries. Another observation is that color discontinuities appear in narrow regions with low-intensity pixels. Therefore, contrast enhancement process is performed first, where the outlines are extracted after applying a Laplacian of Gaussian ( L ∘ G ) filter on the source image. Subsequently, the energy of color discotinuity is achieved on the basis of the intensities of the pixels in the processed image.
Term E p measures the effect of the pixel-wise color assignment. E p takes on three distinct values corresponding to three different conditions of pixels, namely not scribbled, under hard scribbles and under soft scribbles.
・ Multiway Cut Problem
Minimizing the proposed energy function can be reduced to solving a multiway cut problem in that the energy of color discontinuity mainly depends on the luminance rather than the chrominance of the strokes (Potts, 1952; Boykov et al., 1998) .
Given an undirected graph G = { V , E } ( V is a set of vertices consisting of both colored and uncolored vertices, and is a set of edges each assigned with a weight), the multiway cut problem is to find a scheme of removing edges to properly partition the graph according to color labels (colored pixels), while obtaining a resulting set of emoved edges with the minimum sum of weights at them.
Since multiway cut problem with three or more terminals is currently intractable in a polynomial time (Dahlhaus et al., 1992) , the resulting schemes can only guarantee a optimality within a certain factor (Dahlhaus et al., 1992; Boykov et al., 2001; Karger et al., 2004) . Sykora et al. proposed a hierarchical method of greedily and gradually dividing the original problem into min-cut subproblems and pruning trivial cases with only one terminal. Compared with the widely-applied a-expansion algorithm (Boykov et al., 2001) , this algorithm is significantly more efficient and therefore is more suitable to be implemented in this interactive colorization setting, though it only approximates the optimal solution r within a comparatively larger factor (i.e. a lower precision).
Compared with previous intensity-based colorization methods, the results produced by Qu et al. (2006) is much more satisfying. Algorithm proposed by Qu et al. can propagate intelligently through pattern-continuous regions (
However, there are still many limitations in Qu et al. where Sykora et al. significantly improves. When given an input with very few strokes in the interior (
With the development of deep learning, many studies of different areas incorporate deep learning networks into their frameworks. Colorization, as one of those areas, can be improved by the help of deep learning networks.
Zhang et al. (2017) proposed to train the Convolutional Neural Network (CNN) with many images and thus applied this network to colorize new grayscale images. Specifically, Zhang et al. implemented the U-Net structure (Ronneberger et al., 2015) to realize colorization, designing feature extraction part, global input part, fusion part and reconstruction part. However, during the process of training the CNN, it’s very difficult to collect user input data. To overcome this difficulty, they obtained the simulating user inputs by sampling several points on the colorful images. The parameter of CNN is calculated by minimizing the cost function:
θ * = arg min θ E X , U , Y ~ D [ Γ ( F ( X , U ; θ ) , Y ) ] (17)
where θ * is the parameter that minimizes the loss function; X is the grayscale image; U is the user input; Y is the color output; F is the CNN network with the parameter θ ; Γ is the function describing the closeness between the true value and the output value.
After experiments,
However,
In this survey paper, we introduce many algorithms about scribble-based colorization published in recent years, divided into three types: 1) colorization on grayscale image; 2) colorization on sketch; 3) colorization with networks. For each area, we present their principles, mathematical methods and experiments.
Comparing different algorithms in grayscale image colorization and sketch colorization, we found out that different mathematical tools were applied by researchers to implement colorization. Each of these tools has its different performance on the respect of efficiency, convenience, leak-proof and continuity (and also anti-aliasing and special effects) (
Efficiency | Convenience | Leak-proof | Color continuity | ||
---|---|---|---|---|---|
Levin et al. (2004) | ★ | ★ | ★ | ★ | |
Huang et al. (2005) | ★★ | ★ | ★ | ★★ | |
Yatziv & Sapiro (2006) | ★★★★★ | ★★ | ★★★ | ★★★★ | |
Luan et al. (2007) | ★★ | ★★★ | ★★★★ | ★★ | |
Xu et al. (2013) | ★★ | ★★★★★ | ★★★★ | ★★★★ | |
Convenience | Leak-proof | Color continuity | Anti-aliasing | Special effect | |
---|---|---|---|---|---|
Levin et al. (2004) | ★★ | ★ | ★★ | × | × |
Qu et al. (2006) | ★★★ | ★★★★ | ★★★★ | × | √ |
Sýkora et al. (2009) | ★★★★★ | ★★★★ | ★★★★ | √ | × |
“×” represents bad performance and “√” represents good performance; “★” represents the degree of performance, so more stars mean better performance.
measured by the running time under the same settings; “Convenience” is a subjective metric for users and algorithms are ranked according to the extent of required user intervention; “Leak-proof” measures the algorithms’ abilities of avoiding color leaking; “Color-continuity” demonstrates the “smoothness” of colors in images which is also an important metric of visual effects; For Sketch Colorization, the ability to recognize and utilize anti-aliasing of lines in the original image (“Anti-aliasing”), and ability to preserve and colorize the special effects (like gradients) are also displayed in the table. Also, it should be noted that these tables may inexplicitly display some subjectivities and “five stars” here just conveys that the algorithm graded achieves a comparatively better (or the best among all shown ones) performance.
Few of the surveyed algorithms are capable of achieving perfect visual effects while minimizing the amount of user-intervention, because it is very difficult to design a specific mathematical model to cover all situations. In light of the absence of the methods that can achieve perfect visual effects efficiently with only several strokes, it still needs more works to optimize the methods mentioned above.
In addition, with the development of deep learning, colorization methods using neural network are more pervasive. Unfortunately, the parameters of a network are very difficult to be determined because people still know very little about relations between the parameters in neural network and actual features in an image. Therefore, the architecture of a neural network should be carefully designed. However, these methods combined with networks can colorize complex images with some sense of intelligence, which is very useful for illustrators. In conclusion, colorization methods combined with advanced artificial neural networks will be very promising in the future.
We would like to thank Prof. Barsky for instructing our investigation and we also would like to thank Yang and Saba for helping us with paper writing.
The authors declare no conflicts of interest regarding the publication of this paper.
Li, S. M., Liu, Q. F., & Yuan, H. Y. (2018). Overview of Scribbled-Based Colorization. Art and Design Review, 6, 169-184. https://doi.org/10.4236/adr.2018.64017