World Journal of Engineering and Technology
Vol.03 No.03(2015), Article ID:60465,12 pages

Towards Autonomous Vehicles with Advanced Sensor Solutions

Matti Kutila1, Pasi Pyykönen1, Aarno Lybeck2, Pirita Niemi2, Erik Nordin3

1VTT Technical Research Centre Ltd., P.O. Box 1300, FI-33101 Tampere, Finland

2TTS, P.O. Box 5, FI-05200 Rajamäki, FINLAND

3Volvo Group Trucks Technology (GTT), BF40562 M1.6, SE-405 08 Göteborg, Sweden


Received 29 May 2015; accepted 15 October 2015; published 22 October 2015


Professional truck drivers are an essential part of transportation in keeping the global economy alive and commercial products moving. In order to increase productivity and improve safety, an increasing amount of automation is implemented in modern trucks. Transition to automated heavy good vehicles is intended to make trucks accident-free and, on the other hand, more comfortable to drive. This motivates the automotive industry to bring more embedded ICT into their vehicles in the future. An avenue towards autonomous vehicles requires robust environmental perception and driver monitoring technologies to be introduced. This is the main motivation behind the DESERVE project. This is the study of sensor technology trials in order to minimize blind spots around the truck and, on the other hand, keep the driver’s vigilance at a sufficiently high level. The outcomes are two innovative truck demonstrations: one R & D study for bringing equipment to production in the future and one implementation to the driver training vehicle. The earlier experiments include both driver monitoring technology which works at a 60% - 80% accuracy level and environment perception (stereo and thermal cameras) whose performance rates are 70% - 100%. The results are not sufficient for autonomous vehicles, but are a step forward, since they are in-line even if moved from the lab to real automotive implementations.


Autonomous Driving, Camera, Driver Monitoring, Environment Perception, Automated Vehicle, Sensor, Laser Scanner, Truck, Radar, Data Fusion

1. Introduction

This paper aims to outline the major innovations introduced by the DESERVE (DEvelopment platform for Safe and Efficient dRiVe) project in the area of driver monitoring [1]-[3]. The whole project aims at designing and developing a Tool Platform for embedded Advanced Driver Assistance Systems (ADAS) to exploit the benefits of cross-domain software reuse, standardised interfaces, and easy and safety-compliant integration of heterogeneous automotive modules. The main research question of the study is to identify the optimal sensor solutions for the DESERVE platform which are required by the selected ADAS functions for supporting transition to automated vehicles [4]. The project has selected 22 different modules to implement 11 driver support applications selected according to user needs analysis.

Ÿ Lane change assistance system

Ÿ Pedestrian safety systems

Ÿ Forward/rearward looking system (distant range)

Ÿ Adaptive light control

Ÿ Park assistance

Ÿ Night vision system

Ÿ Cruise control system

Ÿ Traffic sign and traffic light recognition

Ÿ Map-supported systems

Ÿ Vehicle interior observation

Ÿ Driver monitoring

Essential for intelligent vehicles is the perception of the external environment [5]: So far, the fusion of available sensors and artificial intelligence is not capable of “seeing” and understanding vehicle’s surroundings as accurately as a human being can. Concerning environment perception sensing and human detection technologies [6]-[9], there are still gaps and a need for enhanced sensor performance a number of times over the past decades e.g. Adose, PReVENT, InteractIVe, MiniFaros, Artrac, AdapaIVe DESERVE, RobustSense to name the most recent ones only. To sum up, the enhancement needs are for 1) practically all sensor types: high price 2) monostatic radio radar: not long enough range, lack of lateral resolution and reliable target classification 3) lidar: insufficient poor weather performance, size 4) camera-based image processing: short range, and it offers the possibility of identifying objects, but requires high computational power which means response times are critical. They are expensive and, as with laser, have unstable performance in poor lighting conditions or in adverse weather.

Fitted in the vehicle approach, this gives modular software architecture for the in-vehicle platform, as depicted in Figure 1. This platform provides the method to combine the sensor which is the main topic of this article and the 11 applications.

The applications developed will be tested in different demonstrations to show that the platform is not limited to one single vehicle type. The vehicles are:

Figure 1. DESERVE’s in-vehicle platform software architecture.

Ÿ medium class passenger car => Fiat

Ÿ luxury passenger car => Daimler

Ÿ motorcycle => Ramboll

Ÿ heavy goods vehicle => Volvo

Ÿ driver training truck => TTS

Additionally, tests will also be conducted in simulators, e.g. a simulator for driver monitoring functions and a simulator for cruise control systems.

Driver monitoring is one of the key areas, and only very few real applications out of the labs exist in the markets in these days. This has been recognized as an important feature for introducing semi-automatic vehicles before full automation can be deployed [10]. Vehicle environment needs to be adapted according to driver behaviour and, on the other hand, driver engagement with the driving task needs to be supervised.

Drivers of heavy goods vehicles in daily traffic suffer just as much from lack of attention as do drivers of passenger cars. The consequences of traffic accidents with heavy vehicles for other road users are more serious compared to those involving passenger cars which raise the significance of the problem. The project target was to introduce driver monitoring in driving simulators to provide training to professional truck drivers, see Figure 2. For the training supervisor, it is challenging to detect all the situations when the candidate is not paying sufficient attention to traffic in a simulator or during a driving session. Thus, the driving monitoring system helps to gather the data during the training period.

Detecting the objects around the heavy goods vehicle has been under intensive research for years. 10 years ago, in addition to mirror adjustments and cockpit ergonomics, truck manufacturers started to bring more environment perception technology to vehicles. Subsequently, the number of laser scanners, radars and cameras has been increased in order to detect the objects in the areas which are not driver’s line of sight. Autonomous vehicles use similar technique, but robustness needs a steady performance increase. This study will present the experiments of stereo vision and thermal cameras conducted in the truck demonstration cases.

2. Driver Monitoring Concept

The DESERVE project demonstrates the multi-camera-based driver monitoring system in the cabin of a heavy goods vehicle. The aim of driver monitoring is to ensure that critical safety information has been properly registered by a driver. The driver monitoring cameras are situated so that the driver’s face is visible on camera, even if the driver is not turning their head to look out of the windows. However, monitoring is more challenging, since the driver needs to turn their head to see out of the large cabin of a heavy goods vehicle. Therefore, this implementation requires a new way of adapting the driver monitoring system to detect the direction of the driver’s gaze and calculate an activity index of driver awareness.

Visual distraction detection is a rule-based classifier for detecting whether the driver focuses his or her attention towards a road or other attraction (e.g. vehicle controls a mobile phone, radio, etc.). Cognitive distraction detection bases on Support vector machine (SVM) is a classification method, which optimises the locations of

Figure 2.Driver monitoring simulator environment for truck driver training.

hyperplanes in such a way that the margin between the negative and positive feature examples is maximised.. It was in our interest to focus more on testing the feasibility of the SVM to the highly non-linear input data, in which the cognitive indicators are quite poorly interpretable. Thus, the previously tested SVM gave us an opportunity to avoid too heavy resource allocation for writing the classifier and allowed us to put more effort on optimising the performance. Therefore, the SVM principles are not explained here in depth but rather the goal is to identify the constraints in order to detect in-vehicle cognitive distraction.

The monitoring development is not limited to heavy goods vehicles (see Figure 3) but has been implemented in the motorcycle too within the DESERVE project. This an interesting approach since rarely same technique can be used in motorcycle and in-vehicle. In two-wheeler implementation, the rider is only monitored with a single camera, installed on the handlebars of the motorcycle. The camera is looking towards the rider’s helmet, so that the optimal image of the rider’s eyes and face is captured (see Figure 4). In some scenarios, the rider’s eyes are protected with the visor of the helmet. This blocks the optical path and makes it impossible to use a camera to detect rider’s eyes and estimate gaze orientations. Therefore, the helmet orientation is measured when facial features (i.e. the rider’s eyes) are not visible. Orientation of the helmet can be calculated from the orientation of a static helmet components i.e. visor and reflector stamps. The main idea is to add a software module to the system which detects the orientation of a helmet when eye recognition is impossible.. This system needs an automatic adaptation procedure so as to analyze the orientation of the rider’s eyes and their contact with the road ahead.

Figure 3. The driver monitoring in the truck demonstration.

Figure 4. Rider monitoring in motorcycle demonstration.

In the motorcycle case, the hardware consists of one USB 3.0 camera providing 720 p resolution still images for rider monitoring, and one front facing camera for environmental image. The camera is connected to a fanless automotive PC running the Windows 7.32 bit operating system. The automotive PC is running the Microsoft SQL 2010 database to store measurements and analyse results from rider detection applications. However, in the truck case the driver monitoring functionality is implemented using three RGB HD cameras installed in the cabin of a training vehicle [11].

The next development steps are to compare the collected data driver monitoring and environment perception in real time and to create a multilayer warning system so as to raise the driver’s awareness of possible incidents. High-Tec multi-sensor software developing products are used to build this. The special warning protocol will be applied if the driver is not reacting in safe way. For example: The driver’s intention is to change lane. Does the driver look in the side mirror first? Does the driver react to the lane change warning? Does the driver recognize the blind spot area? How does the driver react when noticing an obstacle in another lane? Does the driver slow down or continue to change lane?

As a result of comparing the data, the numeric information from different sources of driver behaviour is analysed. In this way, the situation is shown to the driver, how he or she reacts and behaves in different traffic situations and how the driver succeeds in terms of economical driving and traffic safety. This enables us to develop a reliable and independent method to assess their skills.

3. Environment Perception

3.1. Overview

One essential area in the automotive industry is the detection of vulnerable road users (VRU) in front of the vehicle. Alternative ways were implemented for environment perception: thermal and stereo cameras which are dedicated to human detection. In the Volvo truck demonstrator, the normal sensors for object detection in front (forward-looking camera, long-range and mid-range radars) have been complemented with wide-angle cameras for near range object detection.

The main focus of VRU detection was to find reliable descriptors for human feature extraction from the thermal camera image. For this purpose, we selected the HOG (Histogram of oriented gradients) descriptors algorithm which is suitable especially for human detection [6]. The idea of the algorithm is to concentrate on object appearance instead of other features i.e. the colour or edge shape of human objects. Therefore, this algorithm gives a good starting point for human detection in difficult conditions where only the appearance of a human body can be detected. These conditions can be dark, smoky, snowy or rainy conditions where it is possible to detect only the shape or “appearance” of the human.

We have used well-known open CV [9] implementation that is based directly on the original HOG [6]. This implementation also included a feature classification module that gave us a good starting point for detection. The algorithm is modified by also using features for partial human descriptors (lower and upper body). This enables human detection when the human object is only partially shown in image.

Figure 5” shows the flow chart for the algorithm’s basic working principle for human detection. The first part is to acquire a time-stamped grayscale image from a thermal camera which highlights a warm object from

Figure 5. Human detection algorithm modules with HOG features from image acquisition to object selection.

the colder background. For this image, basic image filtering with noise reduction is performed. The next phase is to extract HOG features from the image and perform a classification. The output of this phase is a set of regions of interest (ROIs). These regions represent areas where HOG features were classified as human’s detections. The last step is to filter false detections by using a rule-based classification based on ROI location, shape and size. This will remove detections that are presenting other than human objects, i.e. warm cars, animals or some other warm object.

3.2. Thermal Camera

In our experiments, the FLIR thermal camera with the 320x240 vanadium oxide (VOx) uncooled microbolometer sensor array was used. The camera provides a 360° field of view. In Figure 6, an example of how human objects are seen from the thermal camera. Objects that are warmer than dark colder background which are shown with the light grey colours.

The developed VRU detection system is based on feature descriptors called the Histogram of Oriented Gradients (HOG) [12]. The algorithm was originally designed for human detection from video or image sequences. In these experiments, the OpenCV library has been installed, which is based on the original HOG descriptors [13]. The original algorithm is modified by also using features for partial human descriptors for classification (lower and upper body). The use of separate lower upper body descriptors enables also detection of humans that are only partially in the camera view.

3.3. Stereo Camera

In addition to the thermal camera, a stereo camera system has been implemented for human detection. The basic idea for boosting VRU detection with a stereo camera is to detect human object first from the original stereo camera grey scale image with HOG detectors and then to determine the object distance and position from the depth image. VRU detection can also be carried out directly from the depth image, but in this case instead of using HOG descriptors, detection is done by detecting all possible obstacles instead of trying to extract only the VRUs.

When cameras are installed in the heavy goods vehicle and near the top of the heavy vehicle structures, the camera is pointing to an object from an upper position and at the same time “hiding” human features. Figure 4 is a good example of the upper view of stereo camera system and a human object. In this case only part of the human upper body is visible and is therefore not generating features recognizable as a human.

For obstacle detection for blind spot detection purposes, the main focus is to detect whether the obstacle (VRU or more stationary) is at risk of colliding with a moving vehicle. In this case, HOG features were noticed to be too strong for fast obstacle detection and classification with the developed algorithm.

Figure 6. Human detection with thermal camera.

Figure 7 shows a flow chart describing phases of the obstacle detection algorithm with the stereo camera system. In the first phase, two grayscale stereo camera images are acquired from the camera and filtered to remove noise. From these images, a disparity map with a point cloud is calculated to give a distance map outside the vehicle. In this example, one object (green) stands out from the background (orange). The last phase is to segment the distance map so as to extract interesting objects. These objects are also classified with a light rule-based algorithm based on object size and distance.

The Vislab 3D-E Stereo camera system with 639 × 476 image array and 57 degrees field of view was used. An example of how human objects are seen from a depth image and original grey scale image is visualised in Figure 8. As can be seen there, the grey scale image from the camera sensors provides a better resolution for human detection. The depth image provides a low resolution depth map that gives the position estimation of a detected human related to a vehicle.

4. Experimental Setup

The purpose of this experimental setup is to find a way to gather data from different sources for demonstration. In the Volvo truck, the production ADAS sensors in front (front looking camera, long-range and mid-range radars) have been complemented with wide angle cameras for short range object detection (see Figure 9).

Figure 7. Human detection algorithm modules with stereo vision from image acquisition to object classification.

Figure 8. Human detection with stereo vision.

Figure 9. ADAS sensor setup and views in front of the vehicle.

The data is collected from the training dedicated IVECO heavy vehicle internal signals in which a wide range of valuable data is available (e.g. lane detection, axel loads, distant range, speed, breaking, engine torque, steering angle, etc.). In addition to these, a few long and short-range radars, stereo cameras, acceleration sensor, GPS systems etc. are installed so as to ensure the adequate and reliable information read from the vehicle and environmental surroundings (see Figure 10). As the result of this information, the ADAS functions selected (e.g. blind spot detection, lane changing, start-and-stop functionality, etc.) are initiated. According to the ADAS functions implemented, the system triggers different warning devices (sound, voice signal, light, vibration, etc.) to alert the driver if necessary. The main aim in the training case is to monitor if the driver reacts immediately when the he/she observes the warning signals.

The test truck also collects data concerning the behaviour of the driver. Behaviour is analysed according to the vehicle (breaking, acceleration, driving on bends, turning of the steering wheel, etc.) and with the driver monitoring system installed in the cabin dashboard. The monitoring system detects driver’s gaze and head activity indices. The rules have been created for gaze orientation variation in various driving situations which are based on careful, economical, safe driving and perception of the driving environment. For example, when starting to reverse, the driver should pay attention to the side mirrors. Or, when changing the lane, the driver needs to check the free space from the mirror first.

During the experimental driver training tests, experiences of the liability and functionality of sensors in the various weather conditions e.g. snow, rain, ice, salt, darkness are collected. For testing VRU detection in the vehicle environment, a thermal camera and stereo camera system was installed in the vehicle. The tests focuses on measuring how reliable human detection with different camera systems could be. In the test, both camera systems were installed in the front of the vehicle to capture the same scenery in the same direction.

5. Expected Performance

5.1. Driver Monitoring

The heavy goods vehicle in the prior setup was equipped with driver monitoring functionality as the new demonstration vehicles [14]. The results refer similar system experimented in past since the latest setup was not available for data gathering when this article was prepared. Twelve drivers drove the truck for about an hour in different traffic environments described by the experimenters as “motorway” (low complexity environment), “city” (high complexity) and “intermediate complexity environment”. The test samples covered people of different ages and genders and have been recorded in real traffic within 2 weeks’ time to cover an extensive number of driving scenarios.

The performance of visual distraction detection was determined in terms of how well the algorithm can detect glances towards various clusters in a vehicle cockpit (see Table 1). The attention clusters implemented consist of left- and right mirrors and windscreen, which also included the road-ahead cluster. The table shows the manually evaluated results in the truck application [14].

The features used by the cognitive distraction detection module are: gaze angles, head rotations and lane position. The standard deviations of the above features are used as indicative measures of the driver’s activity. Figure 11 shows the results of the cognitive distraction detection evaluation. The graph provides a realistic

Figure 10. Environment sensor setup in the Iveco driver training vehicle.

Figure 11. Results of the truck tests for detecting cognitive distraction with a Support Vector Machine-type classifier. The horizontal axis is the threshold between cognitive and non-cognitive output, and the vertical axis is hit-rate. The hit-rate for non-cognitive detections is shown as a continuous line while cognitive detection is a dashed line.

Table 1. The results of the truck tests for capturing the different clusters in the truck’s cockpit.

picture of the performance in the truck. The overall detection performance is some 68% in truck and about 80 % in the passenger cars.

5.2. Environment Perception

5.2.1. Thermal Camera

Figure 12 shows the detection results with thermal camera system. The detection results are shown by dark and light bars. The dark bar indicates the number of undetected objects, and the light bar the number of detected objects. From the test video sequence, every human object was identified with the specified ID number (x-axis numbering). As one can see, IDs 1 to 4 are getting lower detection results that IDs 5 to 14. This indicates that human objects that are only partially visible are not more visible even in thermal imaging. It should be noted that human objects with 5, 6 and 8 are not detected with the thermal camera due to the narrower field of view compared to the stereo camera case. This will cause a human object at the edge of the field of view to be blurred by the optics or to be out view.

5.2.2. Stereo Camera

Figure 13 shows the number of detected and undetected VRU objects with the stereo camera system. In this test, the depth image was not used, because the algorithm uses a grey scale image. The depth image will provide only position information for detecting objects. Detection was done by using a grayscale image from the stereo camera image sensor.

Figure 12. Number of detected and undetected human objects with a thermal camera system. Dark bars indicate the number of undetected, and light bars detected human objects.

Figure 13. Number of detected and undetected human objects with the stereo camera system. Dark bars indicate the number of undetected, and light bars detected human objects.

In this sequence, the total number of separate human object was 14. The Y-axis in the diagram shows the total number of video frames where humans with given ID numbers are visible and therefore, should be visible also for the detection algorithm. In this test, there were different scenarios for human visibility. IDs 1 to 4 were only partially visible or almost hidden behind static objects (corners, traffic signs, or other vehicles). IDs from 5 to 14 were clearly visible (i.e. walking in a free space).

As can be seen, human objects with an ID lower than 4, were not detected in in most of the cases, and the detection rate was less than 50%. On the other hand, human IDs higher than 4 were detected almost with 100% detection performance rate, except with ID 10, 12 and 14. In these cases, the human object was clearly visible in an open field, but features of human appearance were not classified correctly. This may be due to the fact that shadows, illumination or other environment features were degrading the quality of these test images.

6. Conclusions

This article showed the preliminary results of the DESERVE project, which aims to enhance existing ADAS functions. Sensor technology, both in environment perception and driver monitoring, plays an essential role in the path to automated driving.

Two heavy good vehicles (Volvo and Iveco) have been installed with advanced camera systems so as to detect blind spots and humans around the vehicle. The first demonstration is dedicated for market introduction in coming years with improved safety functions. The second one is equipped for training professional truck drivers to work using ecological and safe driving habits.

Even if the detection ranges between stereo vision and far-infrared imaging are similar the difference exists in the detection performance. Although the far-infrared camera can capture pedestrians clearly in Figure 3, the true positive rate quite low compared to the stereo vision. This implies the thermal camera to be better but the drawback is performance hot conditions when outdoor temperature approaches 37˚ degrees. On the other hand the stereo vision can detect static obstacles without thermal radiation. The final sensor setup of autonomous vehicle is always the balanced between different sensors which performance differ according actual outdoor conditions.

The preliminary results done before this latest setup provide the initial estimation that a driver monitoring system performs with 60% to 80% accuracy. On the other hand, environment perception depends on outdoor conditions, and performance varies from 50% to100% depending on shadows, illumination, etc. This is the challenge facing future autonomous vehicles, and many steps are still needed to improve the robustness of the optical sensing technology. Even with this pre-study without having the final implementation available, the indications of how to further develop the software have been identified, which is the main aim of these tests. In these specific condition in a real-world automotive application, these functions already work.


This study has been conducted in the Development platform for Safe and Efficient dRiVE (DESERVE) project under the ECSEL Joint Undertaking programme. Financial support has kindly been given for this research by the European Commission under the ECSEL Joint Undertaking and TEKES?the Finnish Funding Agency for Innovation.

We would like to express our gratitude to the whole consortium for providing scope for our work and fruitful discussions. We also express special thanks to Mr Pertti Peussa from VTT and Mr Arto Kyytinen from TTS for their support in preparing the experimental design.

Cite this paper

Matti Kutila,Pasi Pyykönen,Aarno Lybeck,Pirita Niemi,Erik Nordin, (2015) Towards Autonomous Vehicles with Advanced Sensor Solutions. World Journal of Engineering and Technology,03,6-17. doi: 10.4236/wjet.2015.33C002


  1. 1. Pallaro, P., et al. (2013) Development Platform Requirements. The DESERVE Artemis-JU (295364) Deliverable D12.1.

  2. 2. Kutila, M., Pyykönen, P., van Koningsbruggen, P., Pallaro, N. and Pérez-Rastelli, J (2014) The DESERVE Project: Towards Future ADAS Functions. Proceedings of the International Conference on Embedded Computer Systems: Architectures, MOdeling and Simulation (SAMOS XIV), Samos, Greece, 14-17 July 2014, 308-313.

  3. 3. Morignot, P., Perez, J.R. and Nashashibi, F. (2014) Arbitration for Balancing Control between the Driver and ADAS Systems in an Automated Vehicle: Survey and Approach. Proceeding of IEEE Intelligent Vehicles Symposium, Michigan, USA, 8-11 June 2014, 575-580.

  4. 4. ERTRAC (2015) Automated Driving Roadmap. 3rd Draft for Public Consultation. ERTRACT Task Force Connectivity and Automated Driving.

  5. 5. Kuchinskas, S (2015) Levelling Up to Driverless Cars. TU-Automotive Magazine Article.

  6. 6. Dollar, P., Wojek, C., Schiele, B. and Perona, P. (2012) Pedestrian Detection: An Evaluation of the State of the Art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34, 743-761.

  7. 7. Enzweiler, M. and Gavrila, D.M. (2009) Monocular Pedestrian Detection: Survey and Experiments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 2179-2195.

  8. 8. Benenson, R., Omran, M, Hosang, J. and Schiele, B. (2014) Ten Years of Pedestrian Detection, What Have We Learned? ECCV Workshop on Computer Vision for Road Scene Understanding and Autonomous Driving.

  9. 9. Gavrila, D.M. (2001) Sensor-Based Pedestrian Protection. IEEE Intelligent Systems, 16, 77-81.

  10. 10. Barua, N., Natarajan, P.T., Chandrasekar, P. and Singh, S. (2014) Strategic Analysis of the European Mar-ket for V2V and V2I Communication Systems. Frost & Sullivan report MA29-18.

  11. 11. Pyykönen, P., Virtanen, A. and Kyytinen, A. (2015) Developing Intelligent Blind Spot Detection System for Heavy Goods Vehicles. Submitted to the 18th International IEEE Conference on Intelligent Transportation Systems—ITSC 2015, Las Palmas de Gran Canaria, Spain, 15-18 September 2015. (unpublished)

  12. 12. Dalal, N. and Triggs, B. (2005) Histograms of Oriented Gradients for Human Detection, Computer Vision and Pattern Recognition. Proceedings of IEEE Computer Society Conference (CVPR), 1, 886-893, 25-25 June 2005.

  13. 13. HOG Descriptors for OpenCV Library.

  14. 14. Kutila, M., Jokela, M., Markkula, G. and Rué, M.R. (2008) Driver Distraction Detection with a Camera Vision System. Proceedings of the IEEE International Conference on Image Processing (ICIP 2007), 16-19 September 2007, Texas, San Antonio, Vol. VI, 201-204.