Journal of Signal and Information Processing, 2012, 3, 481-490 Published Online November 2012 (
Human Friendly Interface Design for Virtual Fitting Room
Applications on Android Based Mobile Devices
Cecilia Garcia Martin, Erdal Oruklu
Department of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, USA.
Received August 7th, 2012; revised September 6th, 2012; accepted September 13th, 2012
This paper presents an image processing desig n flow for virtual fitting room (VFR) applications, targeting both personal
computers and mobile devices. The proposed human friendly interface is implemented by a three-stage algorithm: De-
tection and sizing of the user’s body, detection of reference points based on face detection and augmented reality mark-
ers, and superimposition of the clothing over the user’s image. Compared to other existing VFR systems, key difference
is the lack of any prop rietary hardware co mponents or periph erals. Proposed VFR is software based and designed to be
universally compatible as long as the device has a camera. Furthermore, JAVA implementation on Android based mo-
bile systems is computationally efficient and it can run in real-time on existing mobile devices.
Keywords: Virtual Fitting Room; Face Detection; Augmented Reality; Virtual Reality; Human Friendly Interfaces
1. Introduction
Today, mobile commerce and online sales are increasing
at a rapid rate. In 2011, mobile traffic on Black Friday
was 14.3 percent of all retail traffic compared to 5.6 per-
cent in 2010 [1]. Sales on mobile devices increased to 9.8
percent fro m 3.2 percent year over year [2]. Nevert heless,
one area online sales traditionally struggled is fashion
items and clothing. It is estimated that majority of the
consumers don’t buy clothing online because they don’t
want to take any risk with the sizes. In addition, a large
percentage of the purchased items are returned. This
brings an additional financial burden to retail companies.
Therefore, the objective of this work is to develop a vir-
tual fitting room (VFR) application that can run on any
mobile device that has a camera and network connection.
This VFR application can enhance the way customers
shop online and he lp them to choose the correct type and
size of the clothing item. The proposed algorithm is de-
signed to be computationally efficient and it can be used
with existing smart phone devices, improving the way
users shop online for new clothes.
In the next section, we first discuss the existing ap-
proaches for virtual fitting room applications. Section III
presents the detection and sizing of the user’s body. In
Section 4, we present a face detection method and aug-
mented reality markers for determining the reference
points. VFR software implementation details and the user
interface are shown in Section 5. Finally in Section 6,
JAVA based Android application development is pre-
2. Background
Several commercial products exist for VFR implementa-
tion. Styku [3] presents a body scanner that creates a
complete 3D model of the user. This 3D model is then
used in other web pages to try the clothing items on. The
model can be rotated, it can match any size and it even
uses a color map to analyze the fit. The body scanning is
implemented using Microsoft’s Kinect and Asus’ Xtion
devices. A VFR implementation by JCPteen [4] gets an
image of the user and using adobe flash player displays
the clothing items. At the beginning, it shows a shadow
on the screen where users have to fit themselves and after
that the cloth is displayed. In this system if the user is
moving, the item won’t follow or track him. Zugara [5]
offers a VFR that is similar to the JCPteens since the
items don’t move once they are displayed. It is based on
the augmented reality concept. The VFR doesn’t con-
sider the proportions of the user, only shows how it looks
as a fixed template. Similarly, Swivel [6] is labeled as a
try-on system that let users to see how clothes and ac-
cessories look on them in real-time. On the Ray-Ban [7]
web page, there is a Virtual Mirror where a user can see
how the glasses fit on him. If the user turns his head, the
model fits the glasses. User only needs to download a
plug-in and install it. The program works based on aug-
Copyright © 2012 SciRes. JSIP
Human Friendly Interface Design for Virtual Fitting Room Applications on Android Based Mobile Devices
mented reality: At the beginning the user has to match
the face within a shape and position the eyes in a line that
it is shown so it takes references of the head. After that it
displays the model of glasses that have been chosen. On
the Google Play there is one app for Android mobile de-
vices, Divalicious [8], called itself as a virtual dressing
room with more than 300 brands. It works by changing
the clothes of a default model. Finally, there is AR-Door
[9] which has also has a product based on Microsoft Ki-
nect [10]. With this system, the camera tracks the per-
son’s body and a 3D copy of cloth ing is superimposed on
top of the users’ image.
The key difference in our approach is the lack of any
proprietary hardware components or peripherals. Pro-
posed VFR is software based (JAVA) and designed to be
universally compatible as long as the device has a camera.
For the Android application, the minimum API version
supported is the 14. Additionally, p roposed algorithm can
track and resize the clothing according to user’s spatial
In order to create the Android app, we have developed
a human-friend ly interfa ce [11-13] which is defined as an
interactive computing system providing the user an easier
way to communica te with the machin es. In particular, thi s
can be achieved through touch-screen operations and
gestures similar to what people naturally feel with their
five senses. Creating intuitive interfaces with a few but-
tons that illustrate the basi c functionality to the user is pa-
ramount for the wider acceptance of the virtual reality ap-
plications. This was one of the key objectives of this study.
3. Detecting and Sizing the Body
First step of the proposed VFR method is the acquisition
of the shape of the body to get reference points. Refer-
ence points are then used to determine where to display
the clothes. In order to obtain the body shape, we applied
several techniques: 1) Filtering with thresholding, Canny
edge detection, K-means, and 2) Motion detection or
skeleton detection wherein multiple frames were ana-
lyzed for any movement. However, the results were un-
reliable and not good enough to obtain reference points
for displayi n g clot hs.
Therefore, we introduced a new detection methodol-
ogy based on locating the face of the user, adjusting a
reference point at his/her neck and displaying the clothes
based on that point. In addition, another point of refer-
ence can be obtained by using an Augmented Reality
(AR) marker. Details of this algorithm are explained in
Section 4.
For obtaining the size of the user, we follow a similar
automated body featur e extraction technique as shown in
[14]. The idea is to set up the user in front of the camera
and hold him at the beginning at a certain predetermined
distance. The algorithm extracts points on the shoulders
and the belly. Measuring the distance between these
points and knowing the distance from the user to the
camera, the size of the user can be obtained. When the
image (video frame) is acquired, a Canny edge detection
filter is applied to obtain only the silhouette of the body.
Canny edge detection is really susceptible to no ise that is
present in unprocessed data; therefore it uses a filter
where the raw image is convolved with a Gaussian filter.
After convolution, four filters are applied to detect hori-
zontal, vertical and diagonal edges in the processed im-
age. Morphological functions are also applied to obtain a
closed silhouette. Finally, an 8-point Freeman chain code,
shown in Figure 1 is applied to assign a direction to each
We can choose to apply 8 or 4 chain code, then, the
following formula can be used:
 
4 delta2delta2z *x + + y + (1)
which gives the sequence corresponding to rows 1 - 8 in
the preceding table:
11,7,6,5,9,13,14,15 .z These
values can be used as indices into the table, improving
the speed of computing the chain code. Each variation
between consecutive numbers represents a variation of
45˚ so if the difference of direction between consecutive
points is measured and if the change is more than two
(90˚) then a feature point is detected and marked in the
kj j
ed d
 (2)
This is the same than saying that the absolute differ-
ence between two points is bigger than 2 as Equation (2)
states. Finally the distance between them is measured in
the image and related to the distance from user to the
camera to obtain the size. Figure 2 show s an example o f
feature points extraction and the red line that should be
taken for the measure.
4. Reference Points and Cloth Display
A reference point is required to determine where the user
is located. Using face detection, the neck (point of refer-
Figure 1. Freeman’s codification.
Copyright © 2012 SciRes. JSIP
Human Friendly Interface Design for Virtual Fitting Room Applications on Android Based Mobile Devices 483
(a) (b)
Figure 2. Feature points extraction. (a) Canny edges; (b)
Feature points.
ence) is easily detected and the cloths can be fitted auto-
matically for the user. On the other hand, by using an AR
marker (point of reference), a user will have more free-
dom to choose how the cloths fit on her/him. Both refer-
ence points can be obtained and displayed by using the
OpenCV (Open Source Computer Vision) library [15].
OpenCV is a library of programming functions for real
time computer vision applications. This library provides
a comprehensive computer vision infrastructure and
thereby allows users to wo rk at a higher abstraction layer.
Additionally, library functions are optimized for fast and
efficient proce ssi ng .
4.1. Face Detection
In order to detect faces, we use the Haar-like features
[16,17] that are digital image features used in object rec-
ognition. Other face detection approaches in the literature
include methods using OpenCV [18], rectangular fea-
tures [19] as well as improvements to make the algo-
rithms faster for hardware implementation [20,21].
The Haar-like features are so called because they are
computed similar to the coefficients in Haar wavelet
transforms. A set of these features can be used to encode
the contrasts exhibited by a human face and their spatial
relationships. A Haar-like feature considers adjacent rec-
tangular regions at a specific location in a detection
window, sums up the pixel intensities in these regions
and calculates the difference between them. This differ-
ence is then used to categorize subsections of an image
by comparing it to a learned threshold that separates
non-objects from objects. Since a Haar-like feature is
only a classifier, a large number of Haar-like features are
necessary to describe an object with sufficient accuracy.
The database to obtain a stron g learner for the object that
we want to detect and where the Haar-like features are
organized is called classifier cascade.
In the case of proposed VFR algorithm, the encoded
contrasts are the contrasts of the human face as well as
their spatial relationships. This classifier needs to be
trained with hundreds of samples of a particular object,
which will represent the positive examples. Negative
examples are also trained with samples that are not con-
sidered as the object to detect. All the samples must have
the same size (for example 20 × 20). The classifier must
be easily resizable to be able to detect the object with
different size in the image. Therefore, the matching pro-
cedure has to be executed on several scales.
OpenCV uses an xml file that contains all of the char-
acteristics to detect the face. This xml file is read by the
function “CV Haar Detect Objects” and it is compared
with a region of interest of the input image and the clas-
sifier returns 1 if the object is detected, 0 otherwise. If
every simple classifier is positive, the cascade classifier
is positive, otherwise it is negative. In other words, the
face detection is made with a sum of these detected sam-
ples in a predefined position. Once the face is detected a
rectangle is plotted around the face and the location of a
reference point is chosen based on the supposition that
the neck is placed at the middle of the head, half of the
rectangle’s height. Also it is taken that it measures ap-
proximately a third of the head height, hence a third of
the rectangle height. Note that only one face will be de-
tected and it will be the one closest to the camera.
4.2. Marker Detection
An augmented reality marker is used to display (super-
impose) the cloths over the users’ image. In order to de-
tect the marker and obtain a reference, an algorithm with
seven stages has been used.
Divide image in regions: When the image is received,
it is divided into regions of 40 × 40 pixels and there
are horizontal and vertical scan lines every 5 pixels.
Detect edges in regions: For each region, a Gaussian
derivative filter is used to detect the black/white
edges. The filter used is [–3 –5 0 5 3]*A. Once an
edge is detected a Sobel operator is used to determine
the orientation of the edges. The Sobel operator used
121 101
000 ,202
121 101
 
In Figure 3 we can see the edges in blue if they are
vertical and green if they are horizontal.
Find Segments: A Random Sample Consensus grou-
per algorithm is then used to create the line segments
in the regions. This algorithm groups series of points
that can be fitted into lines. First, it arbitrarily takes 2
points of the same region with the same orientation,
then the close points that have a compatible orienta-
Copyright © 2012 SciRes. JSIP
Human Friendly Interface Design for Virtual Fitting Room Applications on Android Based Mobile Devices
Figure 3. Detected edges after Sobel operator.
tion are added to the line segment. Finally, lines con-
taining at least 4 segments are considered as a de-
tected line. This can be observed in Figure 4.
Extend lines along edges: So far only pixels on scan
lines were scanned. Now, the lines are extended pixel
by pixel until a detected corner or until there is no
edges. The green lines displayed on Figure 5 repre-
sent the extension of the previous red lines from one
corner to another corner.
Keep lines with corner: The lines with at least one
corner are kept.
Find markers: Chains of 3 or 4 lines are kept. A chain
of line is found when the end of one line corresponds
to the beginning of the second one. The rectangles
with a black inside are checked as markers. The two
markers on Figure 6 are now detected. However, we
want to detect only ou r marker.
Identify markers: The last step is to identify the inside
of the marker to check that it is our marker. As we
can see on the left picture, this step only checks that
the marker has a black center and is white on the in-
side of the boundaries as we can see in Figure 7.
The result of all these steps combined is shown on Fig-
ure 8. We can see that only our marker is detected and
we have the desired reference point.
Keeping the marker in a position near the belt or even
using one with the AR marker allows obtaining the ref-
erence point and can properly place clothes. This also
would be comfortable to the user because he will have
free hands.
4.3. Cloth Display
Cloth masks are needed to determine which pixels should
be displaying the clothes and which ones not. Each cloth
should have a mask; there is an example of a mask in
Figure 9. A stabilization method is used since face de-
tection provides a rectangle for identification that moves
quickly in position and size. As a consequence, the mean
Figure 4. Detected lines.
Figure 5. Extended lines from one corner to another.
Figure 6. Markers detected.
of 5 previous images is shown.
4.4. VFR Application Flow Chart
Figure 10 shows the flowchart of the VFR functionality.
The algorithm includes several options to change pa-
rameters with user input. By pressing 1 or 2, the detec-
tion method is changed (1 for AR marker, 2 for face de-
Copyright © 2012 SciRes. JSIP
Human Friendly Interface Design for Virtual Fitting Room Applications on Android Based Mobile Devices 485
Figure 7. Points recognized to identify the marker.
Figure 8. Result of marker detection.
Figure 9. Cloth and its mask.
tection). Choosing between S, M and L changes a coeffi-
cient and it changes the size of the selected item. By
pressing a key between Z and N the user will be able to
try different items available in the fitting room. + and –
make the width of the cloth bigger or smaller. Finally,
with the arrow keys, incremental changes can be done
and the position of the item can be modified to make it
more accurate. Figure 11 shows a typical VFR applica-
tion example running on a laptop computer with web-
5. VFR Implementation and Interface
For universal compatibility across different mobile de-
vices, we developed a Java Applet [22], which presents
an ideal solution to enable every customer to be able to
run the Virtual Fitting Room. Java provides easy func-
tions to create a graphical user interface to select size,
different cloths and ad just the position of the clothes. On
the other side, the OpenCV code needs to be adapted
from C++ to Java [23]. In order to be able to execute the
algorithm in Java, JavaCV is used which is a java wrap-
per for OpenCV library [24,25]. JavaCV includes the
commonly used libraries by researchers in the field of
computer vision, such as OpenCV, FFmpeg, libdc1394,
PGR, FlyCapture, OpenKinect, videoInput, and AR-
ToolKitPlus. Hence, the same functions that had been
used in C++ can be now used in Java. JavaCV also
comes with hardware-accelerated displays, easy-to-use
methods to execute code in parallel on multiple cores,
user-friendly geometric and color calibration of cameras
and projectors, and detection and matching of feature
points. The final look of the VFR java applet and the user
interface can be seen in Figure 12.
6. Android Based App Development
The Android app has been developed using Eclipse and
Java. Android SDK [26] used in combination with ADT
plug in Eclipse present a flexible environment to built
and test any Android applications. This app is available
for any device with front camera but will work at full
with two cameras.
Once the Android VFR app is started, the user can
choose between different options. First he has to select
the size, between XSmall, by default, and XLarge. After
that he selects the clothing that he wants to try. The app
detects the face of the user and it displays the cloth using
as reference a rectangle that is drawn around the face of
the user (Green rectangle in Figure 13). The size of this
rectangle depends on the user’s face: if the user is close
to the camera the rectangle will be bigger, on the other
hand if he is farther it will be smaller. By using this rec-
tangle and references, the clothes that are going to appear
in the screen are scaled by the app.
The maximum distance that the user can reach with his
arm sometimes isn’t enough to obtain a clear view so
here the app offers two possibilities. One is leaving the
phone in a fixed position and make himself fit into the
screen and see how the clothet on him or use the but- s fi
Copyright © 2012 SciRes. JSIP
Human Friendly Interface Design for Virtual Fitting Room Applications on Android Based Mobile Devices
Copyright © 2012 SciRes. JSIP
Figure 10. VFR flowchart.
Human Friendly Interface Design for Virtual Fitting Room Applications on Android Based Mobile Devices 487
Figure 11. VFR example.
Figure 12. VFR graphic interface.
Figure 13. Calculations for the position.
ton to switch to the back camera and let someone else
hold the phone and take a picture of the user to be able to
check how the cloth looks in him.
The clothes that are displayed on the screen follow and
track the user similar to desktop computer implementa-
tion shown in Section 5. In order to calculate the posi-
tion where the image has to be displayed, the measure-
ments from the face rectangle and the image width and
height has been used. The image is displayed setting the
origin, reference point, at the top-left corner. X coordi-
nate is obtained by acquiring the X center of the rectan-
gle and subtracting half of the image’s width. Y coordi-
nate is obtained starting from the bottom point of the
rectangle and adding one third of the image’s height.
This is represented in Figure 13. The equations applied
osXRect.centerX imageWidth/2
Both equations have an offset in X and Y coordinates
since the images used may have different sizes.
As expected, the camera follows the user in real time.
In order to implement the face detection, the Android’s
face detection API is used, released in the Android 4.0
SDK, which can detect the face in the previous frame and
specify facial features as the position of the eyes or the
mouth. To use this, a Face Detection Listener has been
implemented and it returns an array containing all the
faces that have been detected with their characteristics.
A transformation has to be applied to the rectangle’s
coordinates that come from (–1000, –1000) top-left to
(1000, 1000) bottom-right in order to be able to adjust it
to any android screen. Using the API has been chosen
over using JavaCV and the Haar-like classifier because
this one has been optimized for Android systems. To
adjust the clothes, it has been settled as Image View and
displayed with addView() method from that class.
The VFR app works using activities; these are single
and focused “actions” that the user can do. Each activity
has its own lifecycle and as soon as the app is started the
main activity comes to the foreground of the system. In
Figure 14, the complete lifecycle of an activity can be
seen. The visible lifetime of the app happens between
onStart() until a call to onStop(). The foreground lifetime,
interacting with the user, occurs between onResume()
and onPause(). It has to be noted that we have to declare
in the Android Manifest all the hardware that it is going
to be used in our device as well as the fundamental char-
acteristics of our app. The user interface has been created
in a hierarchical way using View and different Layouts
rovided. p
Copyright © 2012 SciRes. JSIP
Human Friendly Interface Design for Virtual Fitting Room Applications on Android Based Mobile Devices
Figure 14. Android activity lifecycle.
In order to show the camera, a class CameraPreview
extending SurfaceView has been used to draw the cam-
era surface embedded inside a view hierarchy. This class
works as a secondary thread that can render into the screen
and also has an activity cycle as the Activity does. In this
case, it follows SurfaceCreated(), Surface-Changed() and
SurfaceDestroyed() where Created() and Destroyed() sets
the visibility of the window and Change() looks for any
change on it.
In the main activity of the VFR, there are the listeners
for the user interface buttons as well as the method to
adjust the position of the clothes depending on the face
position. In Figure 15, the final view of the Android
VFR can be seen.
Finally, the performance of the application has been
analyzed using the debugging tool called DDMS (Dalvik
Debug Monitor Server). When DDM S st art s, it connect s t o
Android Debug Bridge (ADB). When a device is con-
nected, a VM monitoring service is created between ADB
and DDMS, which notifies DDMS w hen a VM p roc ess i s
Copyright © 2012 SciRes. JSIP
Human Friendly Interface Design for Virtual Fitting Room Applications on Android Based Mobile Devices 489
Figure 15. Final view of android VFR.
initiated or the changes that have been done through an
assigned port that usually is the 8600.
In order to measure the performance of the tool, the
method profiling has been started. This tool tracks certain
metrics about a method, such as number of calls, execu-
tion time, and time spent executing the method. Here two
panels are obtained: the timeline panel that describes
when each thread and method started/stopped and on the
other hand the profile panel that provides a summary of
what happened inside a method. With the profile panel
comes a table that shows exclusive time, that is the time
spent in a method and the inclusive time that is the time
spent in the method plus the time spent in any other
called method (child).
Based on this profiling, face detection algorithm takes
only 0.1% exclusive time and 13.7% of inclusive time.
This is due to the fact that its child has to display the
cloths depending upon the position of the face. The
method for drawing (i.e., displaying) clothes shows an
exclusive time of 0.2% and 29.7% of inclusive time.
Therefore, it can be seen that the most expensive proc-
esses, in terms of computation time, are the ones related
with image displaying.
7. Conclusion
In this work, a virtual fitting room application for mobile
devices was implemented successfully. The main objec-
tive to obtain a real time, platform independent applica-
tion was achieved. Users are able to select sizes from XS
to XL and chose between different cameras on the device
to implement the VFR. In addition, the algorithm can
track and scale the clothing according to user’s position
and movement. By deploying it to the Android Market or
Apple Store, this application can be used by retail com-
panies for increasing their online presence.
[1] IBM Coremetrics Benchmark Reports, 2012.
[2] N. Y. Armonk, “Cyber Monday Online Spending Increase s
by 33 Percent Over 2010, Reports IBM,” 2011.
[3] Skytu, 2011.
[4] JCPteen, 2011.
[5] Zugara, 2011.
[6] Swivel, 2011.
[7] RayBan, 2011.
[8] Divalicious, 2011.
[9] Topshop, 2011.
[10] Microsoft Kinect, 2011.
[11] M. Popa, “Hand Gesture Recognition Based on Acceler-
ometer Sensors,” International Conference on Networked
Computing and Advanced Information Management, Gye-
ongju, 21-23 June 2011, pp. 115-120.
[12] J. Liu, “A New Reading Interface Design for Senior Citi-
zens,” 2011 First International conference on Instrumenta-
tion, Measurement, Computer, Communication and Con-
trol, Beijing, 21-23 October 2011, pp. 349-352.
[13] N. Kubota, D. Koudu and S. Kamijima, “Human-Friendly
Interface Based on Visual Attention for Multiple Mobile
Robots,” Automation Congress World, Budapest, 24-26
July 2006, pp. 1-6.
[14] Y. Lin, M. Jiun and J. Wang, “Automated Body Feature
Extraction from 2D Images,” Expert Systems with Appli-
cations, Vol. 38, No. 3, 2011, pp. 2585-2591.
[15] Intel Inc., “Open Computer Vision Library,” 2011.
[16] “Haar-Like Feature Face Detection,” 2011.
[17] C. Chai and Y. Wang, “Face Detection Based on Extend-
ed Haar-Like Features,” International Conference on Me-
Copyright © 2012 SciRes. JSIP
Human Friendly Interface Design for Virtual Fitting Room Applications on Android Based Mobile Devices
Copyright © 2012 SciRes. JSIP
chanical and Electronics Engineering (ICMEE), Kyoto,
1-3 August 2010, pp. 442-445.
[18] M. Zuo, G. Zeng and X. Tu, “Research and Improvement
of Face Detection Algorithm Based on the OpenCV,” In-
ternational Conference on Information Science and En-
gineering (ICISE), Hangzhou, 4-6 December 2010, pp.
[19] D. Lee, “A Face Detection and Recognition Sy stem Based
on Rectangular Feature Orientation,” International Con-
ference on System Science and Engineering (ICSSE),
Taipei, 1-3 July 2010, pp. 495-499.
[20] L. Acasandrei and A. Barriga, “Accelerating Viola-Jones
Face Detection for Embedded and SoC Environments,”
2011 5th ACM/IEEE international conference Distributed
Smart Cameras (ICDSC), Ghent, 22-25 August 2011, pp.
[21] S. Rigos, “A Hardware Acceleration Unit for Face De-
tection,” Mediterranean Conference on Embedded Com-
puting (MECO), Bar, 19-21 June 2012, pp. 17-21.
[22] Java Documentation, 2011.
[23] S. Audet, “Hints for Converting OpenCV C/C++ Code to
JavaCV,” 2012.
[24] S. Audet, “Java Interface to OpenCV,” 2011.
[25] Java Wrapper, 2012.
[26] Android Developers Website, 2012.