Dynamic Voltage Frequency Scaling (DVFS) techniques are used to improve energy efficiency of GPUs. Literature survey and thorough analysis of various schemes on DVFS techniques during the last decade are presented in this paper. Detailed analysis of the schemes is included with respect to comparison of various DVFS techniques over the years. To endow with knowledge of various power management techniques that utilize DVFS during the last decade is the main objective of this paper. During the study, we find that DVFS not only work solely but also in coordination with other power optimization techniques like load balancing and task mapping where performance and energy efficiency are affected by varying the platform and benchmark. Thorough analysis of various schemes on DVFS techniques is presented in this paper such that further research in the field of DVFS can be enhanced.
As we move from mega scale to petascale era, the requirements of data processing and computation are growing exponentially. In order to accomplish this high computation demand, researchers have moved from serial computation platforms to high performance computation (HPC) platforms such as multicore processor, FPGAs and heterogeneous system (GPU supported systems) etc. GPUs, in particular, have been widely used for HPC applications due to their extremely high computational powers. A large number of supercomputer found in TOP500 list use GPU to achieve unprecedented computational power [
Today, GPU has become the core part of high performance system having hundreds to thousands of processor cores and much higher peak performance than CPUs. Hence, many HPC applications utilize the power of GPUs. For example, the recently built supercomputer Tianhe-1A has won the second spot on the TOP 500 list [
On the other hand, manufacturers increase the number of processing core to gain the high performance which has resulted in raising the power consumption of GPUs. They consume much high power as compared to CPUs and the raised levels of power consumption of GPUs have significant impact on reliability, architecture design, economic feasibility and deployment into widespread range of application domains. In recent years, several research has been accomplished for the reduction of power consumption of homogenous as well as heterogeneous systems and various techniques to reduce the power consumption of both the systems have been proposed. In [
1. DVFS based techniques;
2. CPU-GPU workload division techniques;
3. Saving energy in GPU components;
4. Dynamic resource allocation techniques;
5. Application specific and programming level techniques.
In this paper, literature survey and thorough analysis of various schemes on DVFS techniques to reduce the energy efficiency of GPUs only are presented. The rest of the paper is organized as follows.
As shown in
Due to application limitation, it is not always possible for an application to map all the available cores. In several applications, memory bandwidth [
to save the power consumed by these unutilized cores. As said by Anderson in [
Dynamic voltage and frequency scaling (DVFS) is a technique widely used for reducing energy consumption of processors by varying the voltage and frequency at run time [
DVFS can be used to eliminate power-wasting idle times by lowering the processor’s voltage and frequency during low workload periods so that the processor will have meaningful work at all times leading reduction in the overall power consumption. The energy consumed by GPU is given by the following equation [
where,
E = Energy consumed by GPU Measured in joules (J);
C = Capacitance;
V = Voltage supply to GPU;
f = Clock frequency of GPU.
Thus, the power consumed by a task may be decreased by reducing V or F, or both. However, for tasks that require a fixed 5 amount of work, reducing the frequency may simply take more time to complete the work. As a result, little or no energy will be saved. Therefore, intelligent DFVS techniques are required to improve the energy efficiency of GPUs. Many techniques are used to control power consumption by controlling the frequency, since processor frequency has a strong effect on power consumption and temperature. Dynamic voltage and frequency scaling (DVFS) are the most commonly used techniques in modern processors [
This section describes various DVFS techniques explored by the researchers exclusively for GPUs. We found that DVFS not only work solely but also work in coordination with other techniques like workload divisions/ task division techniques to give the best result. This section categorized the DVFS techniques in the following heads:
A. Schemes using core DVFS technique
B. Schemes using DVFS with other GPU optimization (Hybrid DVFS)
In Section 3.1, detailed description of energy saving methods with only DVFS technique is presented. In Section 3.2, those methods which not only used DVFS but also used some other optimization such as task mapping, workload division, load balancing etc. in coordinated manner are discussed.
Intelligent use of DVFS technique may reduce energy consumption of GPU’s energy demand. It is a challenging task for DVFS to save energy while preserving the performance [
1. Compute Intensive Application
2. Memory Intensive Application
3. Hybrid Application
The authors identified these three classes of kernel on the basis of two metrics proposed in [
1. Rate of instruction issues
2. Ratio of Global memory transaction to Computation Instruction
On the basis of above ration, following application kernel belonging to above three categories are identified. The kernel categories and application kernels are shown in
In [
In [
On average, [
A power management approach is presented in [
S. No | Kernel Category | Kernel |
---|---|---|
1 | Compute intensive | Dense matrix multiplication |
2 | Memory intensive | Dense matrix transpose |
3 | Hybrid | Fast Fourier transform |
Method Adopted | Throughput Improvement | Power Constraint |
---|---|---|
Appropriately choosing the number of operating cores and their voltage/frequency for a given application. | 29% | No Power Constraint |
Changing the number of operating cores and the voltage/frequency of on-chip interconnects/caches for a given application | 13% | No Power Constraint |
Vary the number of operating cores and the voltages/frequencies of both cores and on-chip interconnects/caches. | 10% | Power Constraint |
Vary the number of operating cores and the voltages/frequencies of both cores and on-chip interconnects/caches every 20 µs within the power constraint | 38% | Power Constraint |
Proposed integrated approach is able to reduce power consumption of 3-D games by up to 26% for comparable frame per second range.
A broad study of GPU DVFS conducted on 37 benchmark kernel is presented in [
Matrices
Core scaling and memory scaling does not work well for every application kernel and some application kernel gives best result at default frequency settings.
Ge et al. [
1. Performance
2. Power
3. Energy
4. Energy Efficiency
Experiments are performed on Tesla K20 series GPU form the family of Keplers architecture that support power management and power accounting features. The scheme presented in [
DVFS shown in the previous section suffers from major energy/performance trade-off issues. If not intelligently selected, it may affect performance/energy or both. Therefore, researchers combine DVFS with some other optimization techniques to further improve the performance as well as energy efficiency of running kernel. Over the years, considerable work has been proposed in [
Liu et al. [
Assignment phase is responsible to assign application to processor like GPU/CPU. Thereafter, Load Balancing phase will manage the Load among CPU and GPU. Finally, DVFS phase scale the frequency as per requirement while meeting all the deadlines. Assignment phase calculate the heterogeneous ratio for each of application to take the assignment decision. Heterogeneous ratio is given by
where
If
With the proposed method in [
There has been lot of work done on saving energy consumption of either CPU or GPU but, the work in isolated manner cannot achieve maximum performance. Ma et al. [
Tier-2 adjusted the frequency of GPU cores and memory is adjusted along with the frequency and voltage of the CPU to achieve largest possible energy savings with marginal performance degradation. However, [
where
capping techniques can achieve more than 93% of performance as compared to the ideal one. In [
1. Energy efficiency mode
2. High performance mode
Working of Equalizer can easily understand with the help of
On the cost of 6% extra energy consumption, this mode achieves 22% performance improvement. To achieve this improvement in both the modes, equalizer tunes three major architectural parameters: No of Concurrent Thread, Core Frequency, and Memory Frequency according to the mode selected. As per the requirement, Equalizer tunes these three parameters. Wang and Nagarajan [
with GPGPU-SIM [
Since the authors in this field usually applied DVFS on single GPU, Ren et al. [
The scheme demonstrated that intelligent use of GPU parallelization, CPU frequency scaling and power load scheduling methods will improve the performance of application while reducing the energy consumption of processing elements in multiple GPU platforms. Wu et al. [
DVFS can be used either in isolated manner or in coordination with some other techniques. As shown in
Load Balancing | DVFS (CPU ONLY) | Execution Time Improvement | Energy Improvement | |
---|---|---|---|---|
CPU + GPU | NO | YES | YES | NO |
CPU + MULTIPLE GPU | NO | YES | YES | NO |
CPU + MULTIPLE GPU | YES | YES | YES | YES |
Author | Technique Used | No of Benchmark Used | Benchmark Or Application Kernel | Energy Improvement | Performance Improvement | Platform for Parallel Implementation | ||
---|---|---|---|---|---|---|---|---|
Core DVFS | ||||||||
Lee et al. [ | DVFS | Not specified | Not specified | 65% | Not specified | Not specified | ||
Jiao et al. [ | DVFS | 3 | Dense matrix multiplication Dense matrix transpose Fast Fourier transform | 4% | Not specified | NVidiaGTX-280 | ||
Lee et al. [ | DVFS | 39 | GPGPU-Sim Rodinia ERCBench | Power constraint | 20% | GPGPU-Sim (Simulate Quadro FX 5800) | ||
Mei et al. [ | DVFS | 37 | CUDA SDK 4.1 Rodinia | 19.28% | 4 | NVIDIA GeForce GTX 560 Ti | ||
Ge et al. [ | DVFS | 1 | Matrix multiplication Traveling salesman problem Finite state machine | Not specified | Not specified | NVIDIA Tesla K20c | ||
Hybrid DVFS | ||||||||
Liu et al. [ | DVFS with Load Balancing | 4 | AMD OPENCL Sdk IBM | 20% | Performance constraint | AMD Radeon HD 5770 | ||
Ma et al. [ | DVFS with Task Mapping | 9 | Rodinia | 21.04% | Marginal performance degradation | NVIDIA GeForce 8800 GTX GPU | ||
Komoda et al. [ | DVFS with Task Mapping | 25 | Rodinia BLAS Library | Power constraint | 93% | NVIDIA Tesla K20c | ||
Sethia and Mahlke [ | DVFS with Vary No of Thread | 27 | Rodinia Parboil | 15% (Energy efficiency mode) | 20% (Performance mode) | GPGPU-Sim (Simulate GTX480) | ||
Wang & Nagarajan [ | DVFS with PID | 12 | CUDA Sdk | 23% | 4% | GPGPU-Sim (Simulate GTX480) | ||
designed to improve either performance [
In this paper, survey and analysis of several DVFS techniques aimed at analyzing and improving the energy efficiency of GPUs are presented. The key emphasis is on the need of power management in GPUs and identification of important trends in DVFS which are admirable for future study. In our study, we classify the research on DVFS into schemes using core DVFS technique and schemes using DVFS with other GPU optimization (Hybrid DVFS) and highlight the underlying similarities and differences between them. Energy efficiency and performance variation of applications running on GPU are presented in this paper such that breakthrough invention of designing Green GPUs for further research can be accomplished. In future, DVFS can be pooled with other techniques such that energy saving in an optimized way can be attained and electric bill as well as carbon footprint of IT infrastructure can be reduced.
AshishMishra,NilayKhare, (2015) Analysis of DVFS Techniques for Improving the GPU Energy Efficiency. Open Journal of Energy Efficiency,04,77-86. doi: 10.4236/ojee.2015.44009