GPU (Graphics Processing Unit) is used in various areas. Therefore, the demand for the verification of GPU programs is increasing. In this paper, we suggest the method to detect bank conflict by using symbolic execution. Bank conflict is one of the bugs happening in GPU and it leads the performance of programs lower. Bank conflict happens when some processing units in GPU access the same shared memory. Symbolic execution is the method to analysis programs with symbolic values. By using it, we can detect bank conflict on GPU programs which use many threads. We implement a prototype of the detector for bank conflict and evaluate it with some GPU programs. The result states that we can detect bank conflict on the programs with no loop regardless of the number of threads.
Nowadays, GPGPU (General-purpose computing on GPU) is one of the most remarkable topics in the field of HPC and in various study fields [
If you want to make the most use of GPU, you should write GPU programs with considering the architecture of GPU. One of what you should do is not to make bank conflict. Bank conflict happens when some processing units in GPU access the same shared memory. When bank conflict happens, the program instruction executed in parallel is executed in sequential. As a result, it makes the performance lower. It is hard for programmers to find whether bank conflict happens manually. If there is a bank conflict detector for GPU programs, debugging is easier.
In this paper, we propose the method to detect bank conflict. The method uses symbolic execution to analysis GPU programs. By executing symbolically, GPU programs using many threads can be executed with the number of states small and state explosion is lightened. We implement a prototype of our method and evaluate it with some tests.
In Utah university, P. Li, G. Li and G. Gopalakrishnan study verification for GPU programs with symbolic execution [
In Imperial College London, A. Betts et al. study verification for GPU programs with Boogie [
In Twente University, M. Huisman et al. study verification for GPU programs with separation logic [
There are two differences between these related works and our study. First, we target GPU programs written in CUDA instead of GPU programs written in LLVM. GPU programs written in CUDA is small than written in LLVM so that we can reduce the number of states. Second, we verify only bank conflict. Bank conflict happens in the same instruction, so we can assume all instruction is executed at the same time. By these two differences, the number of states in execution is smaller and the verification is easier.
GPU architecture is so unique compared to CPU [
One SM has eight SPs, one instruction buffer, and one shared memory. There is only one instruction buffer so that eight SPs in one SM execute the same instruction at the same time. Each instruction needs four clocks and Super-pipeline is employed in GPU. So each of eight SPs can execute four instructions at the same time. Therefore, one SM can execute 32 instructions at the same time and 32 threads make one warp. GPU handles many warps containing thousands or millions of threads and each warp is assigned to one SM.
GPU Programs are written in the form of SIMD (Single-Instrument Multiple-Data) [
In CUDA, there are some built-in variables to write programs in form of SIMD. To make the description easier, we describe only one built-in variable, threadIdx. This variable indicates the index of threads. Enormous threads are made on GPU. So threads are managed in a three-dimensional such as x-axis, y-axis, and z-axis. ThreadIdx is defined as a structure containing the variables, x, y, and z. For example, when six threads are made and each size of dimensions is 3, 2, and 1, threadIdx’s values are the followings. (x, y, z) = (1, 1, 1), (1, 2, 1), (2, 1, 1), (2, 2, 1), (3, 1, 1), (3, 2, 1) Each thread uses the same program but the processing is changed by using threadIdx. In detail, the condition of branch and the address of arrays contain threadIdx.
Shared memory of SM has 32 banks (
Bank conflict is the bug happening by the property of bank. When multiple SP tries to access the same bank of shared memory, completely parallel execution isn’t achieved and the performance is lower. This phenomenon is called bank conflict. For example, bank conflict happens in
Before symbolic execution, we describe SAT and SMT. Given a proposal formula with proposal variables and logic operators, the problem to check whether a set of variables meeting this formula exists is SAT (Boolean or Propositional Satisfiability Testing). If the set exists, it is satisfiable. If not, it is unsatisfiable. A tool to check SAT is called SAT solver.
SMT is the extension of SAT with backgrounds in the area of mathematics. In SMT, check a first-logic formula with real number, bit operation, and data structure. If it is satisfiable, one example meeting this formula is got. Many SMT solvers have been developed and we use Z3.
Z3 is a SMT solver developed by Microsoft Research [
Symbolic execution is one of methods to execute and analysis programs [
values as inputs and to represent the values of program variables as symbolic formula. As a result, the final outputs calculated by programs represent functions with symbolic values. In symbolic execution, a state of programs in execute has the symbolic values of variables and the path condition. A path condition is a symbolic formula which means the condition to execute this state. Symbolic execution tree is made as a result of symbolic execution. It is an execution tree with symbolic formula.
Model checking is one of formal methods to check whether a program meets a property. In model checking, use a model to represent the system and a property met in the system. Generally, a model is an automaton such as Kripke structure and a property is written in temporal logic.
In our study, a model is a symbolic execution tree and a property is written in temporal logic. For the states which access shared memory, the conjunction of the negation of the symbolic formula representing the property and the path condition is checked by SMT solver. If the conjunction is unsatisfiable, the property is met. If the conjunction is satisfiable, the property isn’t met and we can get the counter-example. This counter-example is an example of an input which doesn’t meet the property.
In our study, bank conflict on GPU programs is detected. To detect it, we use model checking with symbolic execution. There are some reasons to use it.
The first reason is to reduce the state explosion. Bank conflict happens between any pair of threads. The number of states is enormous considering all combinations of all threads. If we assume that the number of threads is n, the number of two-threads combinations is
The second reason is that we target GPU programs written in CUDA. Inputs of programs is usually undecidable, so it is convenient to set symbolic values to inputs. If inputs aren’t given, we can detect bank conflict.
The overview of our study is shown in
the solver says satisfiable, bank conflict happens. If the solver says unsatisfiable, bank conflict doesn’t happen. A condition for detecting bank conflict is shown in the later subsection.
It describes symbolic executor in our study. It takes a control flow graph as inputs and outputs the symbolic execution tree. Each state of a symbolic execution tree contains a program counter, values of program variables, and a path condition. Initially, an initial state is made. This path condition is True and program variables are assigned different symbolic values. Symbolic execution is done in the following rules.
1) In an assignment expression, the path condition isn’t changed but values of program variables are calculated.
2) In a branch expression if (e) S1 else S2, the state is divided into two state, “the state that the condition is true” and “the state that the condition is false”. In “the state that the condition is true”, the path condition pc is updated to pc
3) In a loop expression, the loop is expanded until the condition is met. SMT solver is used to check whether a loop is expanded. If the condition is satisfiable, expand the loop. If not, stop expanding the loop.
We check whether bank conflict happens in each state of the symbolic execution tree. For it, 1) we convert a path condition for one thread convert into the path condition for two threads and 2) we check satisfiability of the conjunction of two path conditions and the condition for detecting bank conflict.
1) Two path conditions are needed because bank conflict happens between two threads. The variables in CUDA are local variables, shared memory variables, device public variables, and built-in variables. Local variables and built-in variables are peculiar to one thread, so these variables of two threads has different symbolic values. Shared memory variables and device public variables are common in threads, so these variables have the same symbolic values.
2) The condition for detecting bank conflict is the conjunction of three conditions. The first condition is for two symbolic variables, a_vaddr and b_vaddr. These variables indicate the address of shared memory in which we check whether bank conflict happens. These variables are based on the state of symbolic execution tree. The second condition is the following.
The variable, threadIdx_a, indicates one of two threads and the variable, threadIdx_b, indicates the other. To consider threadIdx in one warp, two threadIdxs need to be chose in a warp. For it, introduce the variables, tn, a, and b. The third condition is the following.
This condition indicates that the two addresses of shared memory belong to the same bank. By this condition, we can check whether bank conflict happens. If SMT solver says satisfiable, bank conflict happens and the counter-example means the actual values of programs variables. If SMT solver says unsatisfiable, bank conflict doesn’t happen.
We develop a prototype of our model checker to detect bank conflict. We prepare seven GPU programs to evaluate the prototype. These programs are a part of test cases for GPU programs, Gklee Tests, published by Utah University [
In the paper, we describe GPU architecture, model checking with symbolic execution, and, the method to detect bank conflict. In our method, we convert GPU programs written in CUDA to the control flow graph to execute symbolically and execute symbolically the programs on the control flow graph. Then we check whether bank conflict happens on each states of the symbolic execution tree given by a symbolic executor.
We evaluated the prototype of this method. The result shows that we can detect bank conflict on programs with no loop regardless of the number of threads. It means we achieve to avoid state explosion due to increasing of the number of threads. However, when we analyze the programs containing a loop, state explosion happens. Hence we need to reduce the number of states by handling a loop unrolling well.
As the future work, we try to think about the method to reduce the number of states by using counter-example guided abstraction refinement (CEGAR) [
Hamaya, K. and Yamane, S. (2017) Detecting Bank Conflict of GPU Programs Using Symbolic Execution―Case Study. Journal of Software Engineering and Applications, 10, 159-167. https://doi.org/10.4236/jsea.2017.102009