Holographic Raman Tweezers Controlled by Hand Gestures and Voice Commands

doi:10.4236/opj.2013.32B076

Paper Menu >>

Journal Menu >>

Optics and Photonics Journal, 2013, 3, 331-336

doi:10.4236/opj.2013.32B076 Published Online June 2013 (http://www.scirp.org/journal/opj)

Holographic Raman Tweezers Controlled by Hand

Gestures and Voice Commands

Zoltan Tomori1, Marian Antalik1,2, Peter Kesa2, Jan Kanka3, Petr Jakl3, Mojmir Sery3,

Silvie Bernatova3, Pavel Zemanek3

1Department of Biophysics, Institute of Experimental Physics SAS, Kosice, Slovakia

2Department of Biochemistry, P. J. Safarik University, Kosice, Slovakia

3Institute of Scientific Instruments of the ASCR v.v.i., Brno, Czech Republic

Email: tomori@saske.sk

Received 2013

ABSTRACT

Several attempts have appeared recently to control optical trapping systems via touch tablets and cameras instead of a

mouse and joystick. Our approach is based on a modern low-cost hardware combined with fingertips and speech recog-

nition software. Positions of operator's hands or fingertips control the positions of trapping beams in holographic optical

tweezers that provide optical manipulation with microobjects. We tested and adapted two systems for hands position

detection and gestures recognition – Creative Interactive Gesture Camera and Leap Motion. We further enhanced the

system of Holographic Raman tweezers (HRT) by voice commands controlling the micropositioning stage and acquisi-

tion of Raman spectra. Interface communicates with HRT either directly by which requires adaptatio n of HRT firmware,

or indirectly by simulating mou se and keyboard messages. Its utilization in real exp eriments speeded up the op erator’s

communication with the system cca. Two times in comparison with the traditional control by the mouse and the key-

board.

Keywords: Holographic Optical Tweezers; Raman Tweezers; Natural User Interface; Leap Motion; Gesture Camera

1. Introduction

Optical tweezers represent a tool that uses a tightly fo-

cused laser beam for contactless three-dimensional ma-

nipulation with electrically neutral objects of sizes from

tens of nanometers to tens of micrometers [1]. An object

is trapped near the focus of the laser beam and reposi-

tioning of the beam focus is followed by the object

movement to the new beam focus position. The human

operator usually controls the position of the trapping

beam by a traditional pointing device like mouse or joy-

stick. Some research groups have used 3D joystick and

haptic devices [2]. Manipulation with more objects needs

more independent trapping beams that can be obtained

by several different methods. The most flexible one uses

a spatial light modulator that splits one beam into more

beams with independently positioned foci in all three

dimensions. Such holographic optical tweezers [3] can be

easily controlled by PC and in co mbinatio n with a system

detecting position of manipulated objects provides the

bases for efficient feedback control. For example, a CCD

camera with appropriate software detects the position of

each finger and transforms it into controlled manipula-

tion with several particles simultaneously [4]. In [5] this

idea was modified into "Multi-touch console" which was

a big horizontal board where several operators can work

simultaneously. Natural consequence of the recent

worldwide expansion of touch screen tablets was their

exploitation to move the particles in th e XY plane, while

the Z-axis coordinate is determined by zooming ("stretch-

ing" particles between two fingers) [6].

The Microsoft Kinect sensor is able to capture 3D im-

ages (X, Y, Z coordinates) and thus open s new possibili-

ties to control optical trapping [7]. However, Kinect was

primary designed to capture the whole human body and

therefore the recognitio n of near objects (hand s of sitting

person) is limited. The new generation of sensors (Crea-

tive Interactive Gesture Camera, Leap Motion) allow to

overcome this problem. They integrate several methods

belonging to the quickly growing area of computer vision

called "Natural User Interface" (NUI).

Optical micromanipulation techniques can be easily

combined with other techniques, such as force measure-

ment (photonic force microscope) or Raman laser spec-

troscopy (Raman tweezers), with rather complex control

interface. NUI technology in these areas can significantly

speed up the dwell-time between the action of th e opera-

tor and the reaction of the system. Our proof-of-concept

experiments combine the hand gestures recognition for

Z. TOMORI ET AL.

332

navigation of the trapping beams with the speech recog-

nition for other advanced commands to the HRT.

2. Material and Methods

2.1. Hardware

We used a homemade system utilizing Spatial Light Modu-

lator (Hamamatsu X10468-03) and trapping fiber laser

(IPG YLM-10-LP-SC) with maximal output power 10W

at wavelength 1070 nm for holographic optical tweezers,

spectrometer Shamrock SR-303i and low-noise camera

(Andor Newton DU970P) for Raman spectra acquisition

and Mad City Labs micropositioning stage (Nano-view)

for three-dimensional positioning of the sample in the

optical microscope of own design. Both, optical tweezers

and NUI devices are controlled by PC (Intel Processor,

8GB RAM) based on the Windows7 operating system.

The hands positions are acquired by two advanced de-

vices released almost at the same time (end of 2012).

 “Creative Interactive Gesture Camera" sold by Intel

(shortly “Gesture Camera” in the following text) captures

both RGB image and Depth map [8]. Their combination

gives 3D information about the position of human op-

erator’s hands. Although the camera contains also a mi-

crophone array we employed an external microphone to

acquire the voice commands.

 “Leap Motion” is a sensor based on a similar prin-

ciple like Kinect but it achieves 200x better precision due

to the patented algorithm based on the built-in model of

fingers, hands and elongated tools (e.g. pencil) [9]. As

members of official developers community we have a

chance to test the prototype version (not sold yet).

2.2. Software

In our work we exploited several independent software

packages and libraries supplied with hardware.

 HRT software written in Labview which controls all

hardware parts of optical tweezers in traditional "key-

board & mouse" mode.

 Intel Perceptual Computing SDK 2013 Beta3 soft-

ware development kit and libraries supplied with the

Creative Interactive Gesture Camera, which significantly

simplify programming. PCSDK contains demo samples

explaining how to acquire depth image of hands and cal-

culate the coordinates of fingertips.

 "Nuance Dragon Assistant Core" software for the

contextual voice recognition which transforms voice

signal from microphone into text.

 Leap Motion SDK supplied with libraries and ex-

amples for the development of own applications. The

latest version of Leap Motion SDK supports hand ges-

tures recognition.

 We exploited Microsoft Visual Studio 2010 with

C++ compiler for development of the NUI software and

for the modifications of PCSDK samples.

2.3. Control of HRT via NUI Module

We developed a "Natural User Interface" (NUI) software

for the control of the HRT system employing above the

mentioned libraries. Figure 1. shows the whole system

running asynchronously as follows.

2.3.1. Voice Commands Recognition

Voice detector watches the acoustic signal of micro-

phones. If a continuous signal appears that can be inter-

preted as a voice command, it is acquired and sent to

speech recognition program (Nuance Dragon Assistant

Core). The program compares the voice sample with

templates in dictionary and returns a text string of the

most similar word. We created specific dictionary of

commands (see Table 1) which was exploited instead of

the default one. The limited number of commands sig-

nificantly reduces the risk of the wrong recognition. Se-

lected set of commands should have well-recognizable

Figure 1. Data flow diagram. Left: Operator sitting in convenient position has his elbows on the table. Right: Data from de-

tectors are sent directly to tweezers firmware (CONTROL SW module) or to NUI module via keyboard KB.

Z. TOMORI ET AL. 333

Table 1. Dictionary of voice commands.

Voice Commands Table

Command Param. Function

1 click 0 Activate clicker 0 (Raman)

2 one 1 Activate clicker 1

3 previous 33 Press key Page Up (focus up)

4 next 34 Press key Page Down (focus d own)

5 left 37 Press key Left (move stage )

6 up 38 Press key Up (move sta ge )

7 right 39 Press key Right (move stage )

8 down 40 Press key Down (move stage )

acoustic sound, similar words like "one" and "done"

should not be both included in the same dictionary.

Modification of the dictionary is quite simple by the

modification of text file. Currently, voice commands in

English are implemented, the extension to other lan-

guages is expected soon. All text strings are based on

Unicode coding to simplify transfer to foreign languages.

2.3.2. H a n d s a n d Fi ngertips D e t e ction

We tested both of the mentioned devices. Gesture Cam-

era periodically (25 times/sec) acquires depth images and

identifies connected components (blobs) of the same

depth. If the shape of a blob looks like a hand, it detects

its skeleton - a set of lines between the hand center and

the individual fingers (see Figure 2(d)). Leap Motion is

able to capture cca. 150 frames/sec (more in USB 3.0).

Both devices send the positions of fingertips (x,y,z coor-

dinates) to the NUI module along with a flag containing

information related to hand visibility, openness, etc.

2.3.3. Hand Gestur es Recog nition

Both detectors support recognition of simple one-hand

gestures. Gesture camera identifies gestures like THUMB

UP/DOWN, VICTORY, SWIPE, CIRCLE, WAVE etc.

Leap motion is able due to the more precise detection of

fingertips to recognize fine gestures. For instance, move-

ment of finger towards keyboard and back (KEYTAP

gesture simulating key pressing), finger movement to-

wards screen and back (SCREENTAP gesture simulating

click), swiping motion of a finger (SWIPE gesture intui-

tively means rejection). All these gestures can be trans-

formed into commands and exploited to control optical

tweezers similarly as voice commands.

2.3.4. Screen Calibration

Hand detector sends the coordinates of fingertips (X,Y,Z)

in real units (e.g. millimeters) where the origin is the

center of sensor. In the simplest case, the camera is

placed at the center of the screen top edge and we

roughly consider z-coordinate be equal to the distance of

fingertip from the screen. The screen coordinates (x,y)

are given by the ratio screen/camera resolution.

Figure 1. Dialogs and windows on the screen. A) Tweezers control dialog box. B) "Clicker" window which can be placed

above any control button to invoke its clicking by software. Horizontal arrows can change the level of transparency of the

clicker window between the fully visible to invisible. C) Live video from the tweezers camera displaying trapped objects. Red

circles near the centers of particles indicate active traps. D) Two-fingers mode (thumb + index) suggesting real tweezers E)

Live video from NUI camera displaying hands, their centers and index fingertips.

Z. TOMORI ET AL.

334

Generally, sensor can be placed apart from the screen

(e.g. Leap Motion usually lies on the desktop in front of

the screen). Relation between the sensor and the screen

coordinate systems is called “pose” and is defined by the

transformation matrix which contains rotation, transla-

tion and scale factors [10]. This matrix, obtained by cali-

bration process, transf orms 3D fingertip positio n (X,Y,Z)

to the screen coordinate (x,y). User can exploit his finger

as a laser pointer – the cursor appears on the calibrated

screen in the position where the finger is pointing to.

Leap Motion supplies a calibration program as well as a

set of functions (e.g. for calculation of the distance be-

tween the fingertip and the screen plane).

2.3.5. Communication between Progr ams

It is based on the client/server strategy using UDP net-

work protocol (OSC format [11] supported by "liblo"

library). The head of OSC message contains identifica-

tion string "/voice" or "/hands" which NUI module de-

tects and processes by a proper function. In our experi-

ment both - client and server programs run on the same

computer, however the network protocols lead to the

straightforward extension to real network as described in

the part "Future work".

2.3.6. Direct Control of a System

The fingertips recognition module as well as the speech

recognition module are able to control systems directly

assuming a proper communication interface (such as in

our HRT system). If this is not the case for a system, a

cooperation with the developer of su ch system is needed.

To avoid complications connected to such direct control

we developed an indirect control via simulation of mouse

and keyboard commands instead.

2.3.7. Indirect Control of a System

Majority of systems are typically controlled by a firm-

ware via keyboard (pressing and releasing a key) and

mouse (clicking the corresponding button on dialog box).

The idea was to catch control command from camera and

voice recognition modules by intermediate NUI module

which controls system firmware simulating keyboard and

mouse messages. While simulation of a keyboard event

is straightforward, mouse events simulation s have to take

into account the variab le position of the control d ialog on

the screen. We developed a small target-like window

with adjustable transparency (Window B in Figure 1).

This "clicker" window should be manually placed above

the controlled button of the dialog box before the ex-

periment. As a result, NUI program can "click" the ex-

ternal program button whenever required. Table 1 in

column "Param." shows parameters tied with the given

command (either the virtual code of keyboard key in

decimal format or the identifier of clicker).

2.3.8. Modes of Operation

Traditional control of HRT system by mouse or joystick

can be combined with new NUI control tools by many

ways.

Conservative users can prefer a traditional approach

keeping mouse in th e primary (usually right) h and for th e

precise and reliable positioning of the laser trap. The

other (secondary) hand can show the target position of a

moving particle or it can define the gestures controlling

the mode of operation. For instance, KEYTAP gesture

activates pointed laser trap and SWIPE gesture deacti-

vates it (see Figure 3).

In some situations, we can replace mouse by the fin-

gertips detector completely. This allows the simultaneou s

movement of up to 10 particles per user (the number of

users is not limited to one). Of course, the risk of wrong

fingertip detection increases with the number of tracked

fingers.

Currently, reasonable compromise is the using of 2

index fingers of both hands with the possibility to replace

operatively the primary hand by mouse when necessary.

The optimal mode of operation is the user’s choice and it

strongly depends on the application and the type of samples.

Figure 2. Example of gestures recognized by Leap Motion detector. Left: KEYTAP gesture, right: SWIPE gesture. (Pictures

copied from the Leap Motion documentation [9].

Z. TOMORI ET AL. 335

3. Experiment

We used the described HRT system and as the sample we

took droplets of liquid crystal (6CB or 8CB) dyed with

Nile red and dispersed in water. The droplets were ma-

nipulated by two laser beams of total power 3W. NUI

module was programmed in the way that the open hand

represents non active trap, closing of the hand activates

the trap on the given position. Moving of the closed

hands intuitively corresponds to the moving of trapped

objects. Closed hand with the index finger up increases

the sensitivity of detection. The angle between thumb

and index can intuitively suggest the function of a real

tweezers. One possibility is to exploit this gesture for the

focusing. If the angle between index and thumb is mini-

mal, the system uses only XY coordinates to move ob-

jects. If this angle is above some threshold, the system

evaluates Z-coordinates and changes the focus according

to the distance of the hand from the screen. We found

that the fingertip navigation is more sensitive compared

to the traditional mouse control in exp eriments where we

excited the whispery gallery modes [12] in the LC droplets

by navigating the laser beam precisely on the droplet

edge.

If both hands were busy by manipulating the objects,

the voice commands were very helpful to control other

functions of the system described in Table 1 . We placed

the clicker window above the button RAMAN of the

control dialog and thus voice command “click” switches

device to the Raman spectra measurement mode. We

could easily add commands for fast movement of micro-

positioning stage corresponding to SHIFT+Arrow key

however this function was rarely used in experiment.

4. Conclusions

Extremely fast progress in the NUI technology brings an

intensive search for possible applications in various areas.

In our opinion, one of such areas is optical microma-

nipulation with microobjects where the three dimensional

positions of trapped objects are intuitively controlled by

fingertips positions combined with gestures and voice

commands. For this purpose we exploited very recent

technology (Gesture camera and Leap Motion sensor).

Unlike to the solution based on Microsoft Kinect sensor

[7] our solution allows the convenient work in sitting

position with elbows sup ported by the table.

Comparison of both sensors mentioned above is out of

scope of this paper and it would require more extensive

testing. However, our experiments showed that Leap

Motion is more precise, faster, reliable and has simpler

SDK. On the other hand, Gesture Camera and SDK from

Intel have broader range of NUI functions, it generates

color images and depth maps (not only fingertips coor-

dinates) and exploits OpenCV library proper for image

processing applications. Comparison of prices ($70 and

$150) does not make sense in this application.

Voice commands are helpful especially if both hands

are occupied. Our proof-of-concept experiment showed

that NUI increases the efficiency of the tweezers control

compared to mouse based trapping cca. 2 times. How-

ever, this number can be higher with increasing experi-

mental experience. The efficiency is also image depend-

ent and task dependent. Anyway, application of NUI

methods is the way how to improve interactive microma-

nipulation techniques with respect to expected stan-

dardization in this area.

Future Work

Our software was designed to remain open for future

improvements. Further experiments should determine

optimal set of gestures and voice commands. We plan to

extend the software to full network version allowing re-

mote control of tweezers ("NUI teletweezing"). This ex-

tension assumes streaming of live images from the mi-

croscope camera and sending them to the client. Then

semi-automated methods of optical trapping based on the

image analysis would be possible. We plan additional

testing of the other NUI software tools in order to

achieve better control. We will try to define a set of spe-

cific gestures for optical tweezers.

5. Acknowledgements

This work was supported by Slovak research grant agen-

cies APVV (gran t 0526-11 ) and V EGA (gr ant 2-191- 11),

Slovak Academy of Science in frame of CEX NAN-

OFLUID and Agency for structural funds of EU (projects

26220120033 and 262 20220061). We thank to Leap Mo-

tion for pro vi ding the protot y p e of sensor.

REFERENCES

[1] K. C. Neuman and S. M. Block, “Optical Trapping,” Re-

view of Scientific Instruments, Vol. 75, No. 9, 2004, pp.

2787-2809.doi:10.1063/1.1785844

[2] R. Bowman, D. Preece, G. Gibson and M. Padgett,

“Stereoscopic Particle Tracking for 3D Touch, Vision and

Closed-loop Control in Optical Tweezers,” Journal of

Optics, Vol. 13, No. 4, 2011, p. 044003.

doi:10.1088/2040-8978/13/4/044003

[3] J. E. Curtis, B. A. Koss and D. G. Grier, “Dynamic Holo-

graphic Optical Tweezers,” Optics Community,Vol. 207,

No. 1-6, 2002, pp. 169-175.

doi:10.1016/S0030-4018(02)01524-9

[4] G. Whyte, G. Gibson, J. Leach, M. Padgett, D. Robert,

and M. Miles, “An Optical Trapped Microhand for Ma-

nipulating Micron-sized Objects,” Optics Express, Vol.

14, No. 25, 2006, pp. 12497-12502.

doi:10.1364/OE.14.012497

Z. TOMORI ET AL.

336

[5] J. A. Grieve, A. Ulcinas, S. Subramanian, G. M. Gibson,

M. J. Padgett, D. M. Carberry and M. J. Miles, “Hands-

on with Optical Tweezers: A Multitouch Interface for

Holographic Optical Trapping,” Optics Express, Vol. 17,

No. 5, 2009, pp. 3595-3602. doi:10.1364/OE.17.003595

[6] R. W. Bowman,et al., “iTweezers: optical micromanipu-

lation controlled by an Apple iPad,” Journal of Optics,

Vol. 13, 2011, p. 044002.

doi:10.1088/2040-8978/13/4/044002

[7] C. McDonald, M.McPherson, C. McDougall and D.

McGloin. HoloHands: Kinect Control of Optical Tweez-

ers. arXiv: 12 11.0220v1[physics.pop-ph].

[8] Intel Perceptual Computing SDK 2013 Beta.

http://software.intel.com/en-us/vcsource/tools/perceptual-

computing-sdk

[9] “Leap Motion”, 2013, https://www.leapmotion.com

[10] R. Hartley and A. Zisserman, “Multiple View Geometry

in Computer Vision,” 2-nd edition, Cambridge University

Press, 2004.

[11] Open sound format specification

http://opensoundcontrol.org

[12] F. Vollmer and S. Arnold, “Whispering-gallery-mode

biosensing: label-free detection down to single mole-

cules,” Nature Methods, Vol.5, No. 7, 2008, pp. 591-596.

doi:10.1038/Nmeth.1221