**Journal of Modern Physics**

Vol.09 No.09(2018), Article ID:86407,11 pages

10.4236/jmp.2018.99107

Two-Level Block Decompositions for Solving Helmholtz Equation via Chebyshev Pseudo Spectral Method

Hsin-Chu Chen^{ }

Department of Computer and Information Science, Clark Atlanta University, Atlanta, GA, USA

Copyright © 2018 by author and Scientific Research Publishing Inc.

This work is licensed under the Creative Commons Attribution International License (CC BY 4.0).

http://creativecommons.org/licenses/by/4.0/

Received: June 14, 2018; Accepted: July 29, 2018; Published: August 1, 2018

ABSTRACT

In this paper, we consider solving the Helmholtz equation ${\nabla}^{2}u+{\omega}^{2}u=f\left(x,y\right)$ in the Cartesian domain $\Omega =\left[-1,1\right]\otimes \left[-1,1\right]$, subject to homogeneous Dirichlet boundary condition, discretized with the Chebyshev pseudo-spectral method. The main purpose of this paper is to present the formulation of a two-level decomposition scheme for decoupling the linear system obtained from the discretization into independent subsystems. This scheme takes advantage of the homogeneity property of the physical problem along one direction to reduce a 2D problem to several 1D problems via a block diagonalization approach and the reflexivity property along the second direction to decompose each of the 1D problems to two independent subproblems using a reflexive decomposition, effectively doubling the number of subproblems. Based on the special structure of the coefficient matrix of the linear system derived from the discretization and a reflexivity property of the second-order Chebyshev differentiation matrix, we show that the decomposed submatrices exhibits a similar property, enabling the system to be decomposed using reflexive decompositions. Explicit forms of the decomposed submatrices are derived. The decomposition not only yields more efficient algorithm but introduces coarse-grain parallelism. Furthermore, it preserves all eigenvalues of the original matrix.

**Keywords:**

Helmholtz Equation, Chebyshev Pseudo-Spectral Method, Chebyshev Differentiation Matrix, Coarse-Grain Parallelism, Reflexive Matrix

1. Introduction

In this paper, we consider the numerical solution to the Helmholtz equation ${\nabla}^{2}u+{\omega}^{2}u=f\left(x,y\right)$ in the Cartesian domain $\Omega =\left[-1,1\right]\otimes \left[-1,1\right]$ with homogeneous Dirichlet boundary condition, where ${\nabla}^{2}$ is the Laplacian operator, $\omega $ is a real parameter representing the wavenumber, u is the amplitude associated with wave propagation, and $f\left(x\mathrm{,}y\right)$ is a forcing function. Without loss of generality we assume u = 0 on the boundary, i.e.,

${\nabla}^{2}u+{\omega}^{2}u=f\left(x,y\right),\left(x,y\right)\in \Omega \text{and}\mathrm{}u=0\text{on}\mathrm{}\partial \Omega $ (1)

We present a highly parallel numerical algorithm for solving the Helmholtz equation discretized by the Chebyshev pseudospectral scheme. Chebyshev pseudo-spectral methods have long been used to numerically solve partial differential equations [1] [2] [3] . This discretization yields a linear system with coefficient matrix of a special block structure that allows for a block diagonalization which decouples the original matrix, along with some proper permutation of the decoupled matrix into several independent submatrices. By exploiting a reflexivity property that is inherent in the Chebyshev differentiation matrix, it is possible to further decompose each of the submatrices into two independent submatrices when the boundary conditions at both ends of the decoupled 1D subproblems are the same. This second level of decomposition effectively doubles the number of independent submatrices, yielding a higher degree of course-grain parallelism.

This paper is organized as follows. In Section 2, we present the general structure of the linear system derived from the Chebyshev collocation method discretized independently in x and y direction. In Section 3, we address the two-level block decomposition. This decomposition was proposed by the author in [4] for solving Poisson equations with identical grids in either direction. In this paper, we reformulate the decomposition to allow for the number of grid points in the x direction to be different from that in the y direction and derive the explicit forms for the decomposed submatrices, making applications of this approach more flexible. The computational procedure and a numerical example to demonstrate its usefulness are presented in Sections 4 and 5, respectively.

2. Linear System from the Chebyshev Collocation Method

In this section, we briefly describe the Chebyshev collocation method for solving the Helmholtz equation in two dimensions. We assume that the problem is discretized with a tensor product grid $\left({x}_{i}\mathrm{,}{y}_{j}\right)$, $0\le i\le {N}_{x}\mathrm{,0}\le j\le {N}_{y}$, where ${x}_{i}$ and ${y}_{j}$ are the Chebyshev points in x and y direction, respectively. Let ${D}_{x}$, indexed from 0 to ${N}_{x}$, be the $\left({N}_{x}+1\right)\ast \left({N}_{x}+1\right)$ Chebyshev spectral differentiation matrix associated with the x direction. The entries of ${D}_{x}$ are given as [1] [2] [3]

${\left({D}_{x}\right)}_{00}=\frac{2{N}_{x}^{2}+1}{6},\mathrm{}{\left({D}_{x}\right)}_{{N}_{x}{N}_{x}}=-\frac{2{N}_{x}^{2}+1}{6},$

${\left({D}_{x}\right)}_{jj}=\frac{-{x}_{j}}{2\left(1-{x}_{j}^{2}\right)},\mathrm{}j=1,2,\cdots ,{N}_{x}-1$

${\left({D}_{x}\right)}_{ij}=\frac{{c}_{i}{\left(-1\right)}^{i+j}}{{c}_{j}\left({x}_{i}-{x}_{j}\right)},\mathrm{}i\ne j,i,j=0,1,\cdots ,{N}_{x}$

where ${c}_{i}=\{\begin{array}{l}2\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{for}\text{\hspace{0.17em}}i=0\text{or}\mathrm{}{N}_{x}\\ 1\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{otherwise}\end{array}$ . The Chebyshev spectral differentiation matrix ${D}_{y}$ associated with the y direction can be obtained in a similar may.

Now, let ${A}_{x}$ be the stripped matrix of ${D}_{x}^{2}$ by removing the first and last rows and columns of ${D}_{x}^{2}$ and ${A}_{y}$ be the stripped matrix of ${D}_{y}^{2}$ . We have

$\begin{array}{l}{\left({A}_{x}\right)}_{ij}={\left({D}_{x}^{2}\right)}_{ij},i,j=1,2,\cdots ,{N}_{x}-1,\\ {\left({A}_{y}\right)}_{ij}={\left({D}_{y}^{2}\right)}_{ij},i,j=1,2,\cdots ,{N}_{y}-1.\end{array}$ (2)

With the grid just described, the Chebyshev collocation method yields, after removing the boundary nodes whose values are already known from the system, the following linear system

$Ku=f,K={I}_{p}\otimes {A}_{y}+{A}_{x}\otimes {I}_{q}+{\omega}^{2}\left({I}_{p}\otimes {I}_{q}\right)$ (3)

where $p={N}_{x}-1$, $q={N}_{y}-1$, and ${I}_{k}$ represents the identity matrix of dimension k.

Many solution schemes can be employed to solve Equation (3), including Gaussian eliminations, Gauss-Seidel iterations, ADI iterations, and matrix diagonalizations [5] [6] [7] [8] . In this paper, the block version of the matrix diagonalization method will be employed to first decompose the matrix K into a block diagonal matrix consisting of q diagonal blocks. Each diagonal block will then be further decomposed into two smaller diagonal sub-blocks using reflexive decompositions, yielding a total of 2q diagonal blocks. Note that q is the number of internal grid points along the y direction. This two-step block decomposition scheme allows for the linear system Equation (3) to be decoupled into 2q linear subsystems which can then be solved in parallel with course-grain parallelism using multiprocessors or networked computers.

3. Two-Level Block Decompositions

3.1. Level-1 Decomposition: Block Diagonalization

The diagonalization scheme has been used elsewhere [7] [8] [9] . In this section, however, we address the decomposition of the matrix K into q diagonal blocks using a different representation. To begin with, let ${Q}_{y}$ be such a matrix that the similarity transformation ${Q}_{y}^{-1}{A}_{y}{Q}_{y}$ diagonalizes ${A}_{y}$ into ${\Lambda}_{y}$ :

${Q}_{y}^{-1}{A}_{y}{Q}_{y}={\Lambda}_{y}$

A natural choice of ${Q}_{y}$ is to have its columns formed from the eigenvectors of ${A}_{y}$ so that ${\Lambda}_{y}$ is a diagonal matrix that consists of the eigenvalues of ${A}_{y}$ . Now split the coefficient matrix K in Equation (3) into three parts ${K}_{1}$, ${K}_{2}$, and ${K}_{3}$ :

${K}_{1}={I}_{p}\otimes {A}_{y},{K}_{2}={A}_{x}\otimes {I}_{q},{K}_{3}={\omega}^{2}\left({I}_{p}\otimes {I}_{q}\right)$

and let $Q={I}_{p}\otimes {Q}_{y}$ . We have the transformed matrix

$\stackrel{\u02dc}{K}={Q}^{-1}KQ={\stackrel{\u02dc}{K}}_{1}+{\stackrel{\u02dc}{K}}_{2}+{\stackrel{\u02dc}{K}}_{3}$ (4)

with

${\stackrel{\u02dc}{K}}_{1}={Q}^{-1}{K}_{1}Q,{\stackrel{\u02dc}{K}}_{2}={Q}^{-1}{K}_{2}Q,{\stackrel{\u02dc}{K}}_{3}={Q}^{-1}{K}_{3}Q,\text{where}{Q}^{-1}={I}_{p}\otimes {Q}_{y}^{-1}.$

We now show that the transformed matrix $\stackrel{\u02dc}{K}$ can be permuted to yield a block diagonal matrix. First, we observe that both ${\stackrel{\u02dc}{K}}_{1}$ and ${\stackrel{\u02dc}{K}}_{3}$ are diagonal matrices and ${\stackrel{\u02dc}{K}}_{2}$ is a diagonal block matrix since

${\stackrel{\u02dc}{K}}_{1}={Q}^{-1}{K}_{1}Q=\left({I}_{p}\otimes {Q}_{y}^{-1}\right)\left({I}_{p}\otimes {A}_{y}\right)\left({I}_{p}\otimes {Q}_{y}\right)={I}_{p}\otimes \left({Q}_{y}^{-1}{A}_{y}{Q}_{y}\right)={I}_{p}\otimes {\Lambda}_{y},$

${\stackrel{\u02dc}{K}}_{3}={Q}^{-1}{K}_{3}Q=\left({I}_{p}\otimes {Q}_{y}^{-1}\right)\left({\omega}^{2}{I}_{p}\otimes {I}_{q}\right)\left({I}_{p}\otimes {Q}_{y}\right)={\omega}^{2}\left({I}_{p}\otimes {I}_{q}\right)$

and

${\stackrel{\u02dc}{K}}_{2}={Q}^{-1}{K}_{2}Q=\left({I}_{p}\otimes {Q}_{y}^{-1}\right)\left({A}_{x}\otimes {I}_{q}\right)\left({I}_{p}\otimes {Q}_{y}\right)={A}_{x}\otimes {I}_{q}.$

Accordingly,

$\stackrel{\u02dc}{K}={Q}^{-1}KQ={I}_{p}\otimes {\Lambda}_{y}+{A}_{x}\otimes {I}_{q}+{\omega}^{2}\left({I}_{p}\otimes {I}_{q}\right).$ (5)

Let ${a}_{ij}$ be the ${\left(i\mathrm{,}j\right)}^{th}$ element of ${A}_{y}$, $i,j=1,2,\cdots ,q$ . In its matrix form, $\stackrel{\u02dc}{K}$ can be expressed as

$\stackrel{\u02dc}{K}=\left[\begin{array}{cccc}\left({a}_{11}+{\omega}^{2}\right)I+{\Lambda}_{y}& {a}_{12}I& \cdots & {a}_{1p}I\\ {a}_{21}I& \left({a}_{22}+{\omega}^{2}\right)I+{\Lambda}_{y}& \cdots & {a}_{2p}I\\ \vdots & \vdots & \ddots & \vdots \\ {a}_{p1}I& {a}_{p2}I& \cdots & \left({a}_{pp}+{\omega}^{2}\right)I+{\Lambda}_{y}\end{array}\right]$

where I is of dimension q. Apparently, $\stackrel{\u02dc}{K}$ is also a diagonal block matrix because both I and ${\Lambda}_{y}$ are diagonal matrices. The diagonal block matrix $\stackrel{\u02dc}{K}$ can be rearranged to yield a block diagonal matrix $\stackrel{^}{K}$ :

$\stackrel{^}{K}=\left[\begin{array}{cccc}{\stackrel{^}{K}}_{1}& 0& \cdots & 0\\ 0& {\stackrel{^}{K}}_{2}& \ddots & \vdots \\ \vdots & \ddots & \ddots & 0\\ 0& \cdots & 0& {\stackrel{^}{K}}_{q}\end{array}\right]$

via appropriate permutations of rows and columns as shown below. Let ${\stackrel{\u02dc}{K}}_{ij}$ be the ${\left(i\mathrm{,}j\right)}^{th}$ block of $\stackrel{\u02dc}{K}$ and P be such a permutation matrix that the transformation ${P}^{\text{T}}\stackrel{\u02dc}{K}P$ rearranges the entries of $\stackrel{\u02dc}{K}$ in such a way that

${\stackrel{^}{K}}_{l}\left(i,j\right)={\stackrel{\u02dc}{K}}_{i,j}\left(l\right),i,j=1,2,\cdots ,p$

where ${\stackrel{^}{K}}_{l}\left(i\mathrm{,}j\right)$ is the ${\left(i\mathrm{,}j\right)}^{th}$ element of ${\stackrel{^}{K}}_{l}$ and ${\stackrel{\u02dc}{K}}_{i\mathrm{,}j}\left(l\right)$ is the ${l}^{th}$ diagonal element of ${\stackrel{\u02dc}{K}}_{i\mathrm{,}j}$ . Then we have

${P}^{\text{T}}\stackrel{\u02dc}{K}P=\stackrel{^}{K}={\stackrel{^}{K}}_{1}\oplus {\stackrel{^}{K}}_{2}\oplus \cdots \oplus {\stackrel{^}{K}}_{q},$ (6)

where

${\stackrel{^}{K}}_{l}={A}_{x}+\left({\omega}^{2}+{\left({\lambda}_{l}\right)}_{y}\right){I}_{p},l=1,2,\cdots ,q$

with ${\left({\lambda}_{l}\right)}_{y}$ being the ${l}^{th}$ eigenvalue of ${A}_{y}$ . Accordingly,

$\stackrel{^}{K}={P}^{\text{T}}\stackrel{\u02dc}{K}P={P}^{\text{T}}{Q}^{-1}KQP$

since $\stackrel{\u02dc}{K}={Q}^{-1}KQ$ . In other words, the matrix K has been transformed to a block diagonal matrix consisting of q blocks. This decomposition takes the advantage of the homogeneity property of the problem with the boundary conditions mentioned in Equation (1) along the y direction to decouple the 2D problem into q independent 1D problems. A similar decomposition can be performed by using the homogeneity property of the problem along the x direction.

3.2. Level-2 Decomposition: Reflexive Decomposition

As seen from the previous section, the matrix K can be transformed to a block diagonal matrix, implying that the original problem can be decomposed into independent subproblems which can then be solved in parallel using multi-processors. Although each diagonal block ${\stackrel{^}{K}}_{l}$ in Equation (6) can be further decomposed into a diagonal matrix using a single transformation matrix ${Q}_{x}$ that consists of the eigenvectors of ${A}_{x}$, this diagonalization yields little advantage in general, especially a multiprocessing environment, unless the eigenpairs are readily available, which is not the case for the matrix ${A}_{x}$ .

In this section, we shall employ the reflexivity property of the diagonal blocks $\text{\hspace{0.05em}}{\stackrel{^}{K}}_{l}\text{\hspace{0.05em}}$ to further decompose each of them into two independent sub-blocks using another decomposition technique called the reflexive decomposition [10] .

Let ${J}_{n}$ be the cross identity matrix of dimension n: ${J}_{n}=\left[\begin{array}{ccc}& & 1\\ & \u22f0& \\ 1& & \end{array}\right]$ . It is

trivial to see that ${J}_{n}{J}_{n}={I}_{n}$ . Now, we note that the Chebyshev differentiation matrix ${D}_{x}$ is anti-reflexive and ${D}_{x}^{2}$ is reflexive, with respect to ${J}_{N+1}$ . In other words,

${D}_{x}=-{J}_{N+1}{D}_{x}{J}_{N+1}\text{and}\mathrm{}{D}_{x}^{2}={J}_{N+1}{D}_{x}^{2}{J}_{N+1}.$

Accordingly, the $p\times p$ matrix ${A}_{x}$, which is the stripped matrix of ${D}_{x}^{2}$ (Equation (2)), is reflexive with respect to ${J}_{p}$ . In fact, it is a centro-symmetric matrix [11] [12] . Recall that ${\stackrel{^}{K}}_{l}={A}_{x}+\left({\omega}^{2}+{\left({\lambda}_{l}\right)}_{y}\right){I}_{p}$ . Therefore, $\text{\hspace{0.05em}}{\stackrel{^}{K}}_{l}\text{\hspace{0.05em}}$ satisfy the reflexivity property

$\text{\hspace{0.05em}}{\stackrel{^}{K}}_{l}\text{\hspace{0.05em}}={J}_{p}{\stackrel{^}{K}}_{l}{J}_{p},l=1,2,\cdots ,q,$

indicating that each $\text{\hspace{0.05em}}{\stackrel{^}{K}}_{l}\text{\hspace{0.05em}}$ can be decomposed via reflexive decompositions into two block diagonal submatrices of almost equal size, depending on whether p is even or odd. The decompositions can be done through orthogonal transformations which have been shown in [10] . Here we just present the results.

First consider the case when p is even. Assume $p=2k$ for some $k>0$,

and ${A}_{x}$ is evenly partitioned into $2\times 2$ sub-blocks: ${A}_{x}=\left[\begin{array}{cc}{A}_{11}& {A}_{12}\\ {A}_{21}& {A}_{22}\end{array}\right]$ . Taking

the advantage of the reflexivity property of $\text{\hspace{0.05em}}{\stackrel{^}{K}}_{l}\text{\hspace{0.05em}}$ by applying the orthogonal

transformation ${X}_{A}^{\text{T}}\text{\hspace{0.05em}}{\stackrel{^}{K}}_{l}\text{\hspace{0.05em}}{X}_{A}$ to $\text{\hspace{0.05em}}{\stackrel{^}{K}}_{l}\text{\hspace{0.05em}}$ where ${X}_{A}=\frac{1}{\sqrt{2}}\left[\begin{array}{cc}I& -{J}_{k}\\ {J}_{k}& I\end{array}\right]$, one can easily

decompose $\text{\hspace{0.05em}}{\stackrel{^}{K}}_{l}\text{\hspace{0.05em}}$ into two decoupled diagonal subblocks of equal size:

$\begin{array}{l}{D}_{l}={X}_{A}^{\text{T}}{\stackrel{^}{K}}_{l}{X}_{A}={X}_{A}^{\text{T}}\left({A}_{x}+\left({\omega}^{2}+{\left({\lambda}_{l}\right)}_{y}\right)I\right)X\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}={X}_{A}^{\text{T}}{A}_{x}{X}_{A}+\left({\omega}^{2}+{\lambda}_{l}\right)I=\left[\begin{array}{cc}{D}_{l1}& 0\\ 0& {D}_{l2}\end{array}\right]\end{array}$ (7)

where ${D}_{l1}={A}_{11}+{A}_{12}{J}_{k}+\left({\omega}^{2}+{\left({\lambda}_{l}\right)}_{y}\right){I}_{k}$ and ${D}_{l2}={A}_{22}-{A}_{21}{J}_{k}+\left({\omega}^{2}+{\left({\lambda}_{l}\right)}_{y}\right){I}_{k}$

In the case when p is odd, say $p=2k+1,k>0$, we consistently partition ${A}_{x}$ and J as

${A}_{x}=\left[\begin{array}{ccc}{A}_{11}& {A}_{12}& {A}_{13}\\ {A}_{21}& {a}_{22}& {A}_{23}\\ {A}_{31}& {A}_{32}& {A}_{33}\end{array}\right],J=\left[\begin{array}{ccc}& & {J}_{k}\\ & 1& \\ {J}_{k}& & \end{array}\right].$

By taking the transformation matrix ${X}_{A}$ to be ${X}_{A}=\frac{1}{\sqrt{2}}\left[\begin{array}{ccc}I& 0& -{J}_{k}\\ 0& \sqrt{2}& 0\\ {J}_{k}& 0& I\end{array}\right]$, the matrix $\text{\hspace{0.05em}}{\stackrel{^}{K}}_{l}$ can be decoupled via the orthogonal transformation ${X}_{A}^{\text{T}}{\stackrel{^}{K}}_{l}{X}_{A}$ as

$\begin{array}{c}{D}_{l}={X}_{A}^{\text{T}}{\stackrel{^}{K}}_{l}{X}_{A}={X}_{A}^{\text{T}}\left({A}_{x}+\left({\omega}^{2}+{\left({\lambda}_{l}\right)}_{y}\right)I\right){X}_{A}\\ ={X}_{A}^{\text{T}}{A}_{x}{X}_{A}+\left({\omega}^{2}+{\left({\lambda}_{l}\right)}_{y}\right)I=\left[\begin{array}{cc}{D}_{l1}& 0\\ 0& {D}_{l2}\end{array}\right]\end{array}$

where

${D}_{l1}=\left[\begin{array}{cc}{A}_{11}+{A}_{13}{J}_{k}& \sqrt{2}{A}_{12}\\ \sqrt{2}{A}_{21}& {a}_{22}\end{array}\right]+\left({\omega}^{2}+{\left({\lambda}_{l}\right)}_{y}\right)$

and

${D}_{l2}=\left({A}_{33}-{A}_{31}{J}_{k}\right)+\left({\omega}^{2}+{\left({\lambda}_{l}\right)}_{y}\right){I}_{k}.$

In either case, let $X={I}_{q}\otimes {X}_{A}={X}_{A}\oplus {X}_{A}\oplus \cdots \oplus {X}_{A}$ . The orthogonal transformation ${X}^{\text{T}}\stackrel{^}{K}X$ yields

$\begin{array}{c}{X}^{\text{T}}\stackrel{^}{K}X={X}_{A}^{\text{T}}{\stackrel{^}{K}}_{1}{X}_{A}\oplus {X}_{A}^{\text{T}}{\stackrel{^}{K}}_{2}{X}_{A}\oplus \cdots \oplus {X}_{A}^{\text{T}}{\stackrel{^}{K}}_{q}{X}_{A}\\ ={D}_{1}\oplus {D}_{2}\oplus \cdots \oplus {D}_{q}\\ =\left({D}_{11}\oplus {D}_{12}\right)\oplus \left({D}_{21}\oplus {D}_{22}\right)\oplus \cdots \left({D}_{q1}\oplus {D}_{q2}\right),\end{array}$

which obviously consists of 2q independent diagonal blocks since each ${D}_{l}$ has two independent diagonal blocks.

4. Computational Procedure

The numerical solution to the Helmholtz Equation (1) discretized by the Chebyshev pseudospectral method yields the linear system

$Ku=f,K={I}_{p}\otimes {A}_{y}+{A}_{x}\otimes {I}_{q}+{\omega}^{2}\left({I}_{p}\otimes {I}_{q}\right)$

where ${I}_{p}$ is the identity matrix of dimension p, ${A}_{y}$ is a $q\times q$, ${A}_{x}$ a $p\times p$, and K a $pq\times pq$ matrix. Here p (q) is the number of internal grid points in the x (y) direction. The two-level decomposition scheme described in this paper yields the following transformed linear system consisting of 2q independent

subsystems, each of dimension $\frac{p}{2}$ when p is even.

$Dv=g,D=\left({D}_{11}\oplus {D}_{12}\right)\oplus \left({D}_{21}\oplus {D}_{22}\right)\oplus \cdots \oplus \left({D}_{q1}\oplus {D}_{q2}\right).$

The sizes of the sub-blocks in D differ at most by 1 when p is an odd number. Computationally, this decomposition is very attractive. Not only can the decoupled subsystems be solved much more efficiently, they can also be solved in parallel with course-grain parallelism using a computer system with multi-processors. This two-level composition consists of the following three stages.

1) Transform $Ku=f$ to $\stackrel{\u02dc}{K}\stackrel{\u02dc}{u}=\stackrel{\u02dc}{f}$ where $\stackrel{\u02dc}{K}={Q}^{-1}KQ$, $\stackrel{\u02dc}{u}={Q}^{-1}u$, and $\stackrel{\u02dc}{f}={Q}^{-1}f$ .

This is a typical similarity transformation that is achieved by premultiplying both sides of $Ku=f$ with ${Q}^{-1}$ and inseting $Q{Q}^{-1}$, which is simply an identity matrix, between K and u. The matrix Q has been defined in Section 3.1 and $\stackrel{\u02dc}{K}$ can be found in Equation (5).

2) Transform $\stackrel{\u02dc}{K}\stackrel{\u02dc}{u}=\stackrel{\u02dc}{f}$ to $\stackrel{^}{K}\stackrel{^}{u}=\stackrel{^}{f}$ where $\stackrel{^}{K}={P}^{\text{T}}\stackrel{\u02dc}{K}P$, $\stackrel{^}{u}={P}^{\text{T}}\stackrel{\u02dc}{u}$, and $\stackrel{^}{f}={P}^{\text{T}}\stackrel{\u02dc}{f}$ .

The main purpose of this transformation is to reorder the unknowns of the linear system so that the decoupled coefficent matrix $\stackrel{\u02dc}{K}$ is converted to a block diagonal matrix ( $\stackrel{^}{K}$ ) for the sake of computational convenience. The matrix P, which is just a permutation matrix, has been explained in Section 3.1 and $\stackrel{^}{K}$ can be found in Equation (6).

3) Transform $\stackrel{^}{K}\stackrel{^}{u}=\stackrel{^}{f}$ to $Dv=g$ where $D={X}^{\text{T}}\stackrel{^}{K}X$, $v={X}^{\text{T}}\stackrel{^}{u}$, and $g={X}^{\text{T}}\stackrel{^}{f}$ .

This transformation is analogous to that in stage 1 except that X here is an orthogonal matrix. Therefore we replace ${X}^{-1}$ by ${X}^{\text{T}}$ . The matrices X and D can be found in Section 3.2

The advantage of this approach is that the decomposed form of D is explicitly known and, therefore, the actual decompositions from K to $\stackrel{\u02dc}{K}$, from $\stackrel{\u02dc}{K}$ to $\stackrel{^}{K}$, and from $\stackrel{^}{K}$ to D are not necessary. Furthermore, the diagonal blocks of D can be obtained without any matrix-matrix multiplications. The computational procedure of this two-level decomposition boils down to the following three typical steps.

1) Decomposition step. In this step, we shall compute D and g. Since each diagonal block of D,

$D=\left({D}_{11}\oplus {D}_{12}\right)\oplus \left({D}_{21}\oplus {D}_{22}\right)\oplus \cdots \oplus \left({D}_{q1}\oplus {D}_{q2}\right)\mathrm{,}$

is independent and explicitly known, D can be obtained in parallel using 2q processors, without any difficulty. The main task here is, thus, to obtain g. This can be achieved by transforming f to $\stackrel{\u02dc}{f}$ first, which involves solving the linear system $Q\stackrel{\u02dc}{f}=f$ . It is essential to note that Q, $Q={I}_{p}\otimes {Q}_{y}$, is a block diagonal matrix and, therefore, $\stackrel{\u02dc}{f}$ can be obtained by solving p independent linear subsystems, each of size q, instead of solving the linear system as a whole, which is of size pq. This step can be performed in parallel with large granularity. The matrix ${Q}_{y}$, consisting of the eigenvectors of ${A}_{y}$, is in general not known explicitly and, thus, has to be computed from ${A}_{y}$ . Fortunately, ${A}_{y}$ is only of dimension q. Once $\stackrel{\u02dc}{f}$ has been computed, $\stackrel{^}{f}$ and g can be obtained easily because $\stackrel{^}{f}$ is simply a permuted version of $\stackrel{\u02dc}{f}$ and $g={X}^{\text{T}}\stackrel{^}{f}$ involves no matrix-vector multiplications due to the special form of X.

2) Solution step. In this step, we solve the linear system $Dv=q$ for v. Note that D consists of 2q diagonal blocks: ${D}_{l1}$ and ${D}_{l2}$, $l=1,2,\cdots ,q$ . By consistently partitioning the vectors x and g into 2q parts, based on the size of the blocks, the linear system $Dv=g$ can now be expressed as

${D}_{l1}{v}_{2l-1}={g}_{2l-1},$

${D}_{l2}{v}_{2l}={g}_{2l}$

for $l=1,2,\cdots ,q$ . Since all of the subsystems are independent, the linear system $Dv=g$ can be solved in parallel, using 2q processors, one for each subsystem.

3) Retrieval step. Once v has been obtained, the solution u to the original system can be retrieved from v as follows. We first compute $\stackrel{^}{u}$ from v using $\stackrel{^}{u}=Xv$, which again involves no matrix-vector multiplication. We then obtain $\stackrel{\u02dc}{u}$ by simply permuting $\stackrel{^}{u}$ using $\stackrel{\u02dc}{u}=P\stackrel{^}{u}$ . The final solution u can now be computed by the matrix multiplication $u=Q\stackrel{\u02dc}{u}$ . Note that $X={I}_{q}\otimes {X}_{A}$ and $Q={I}_{p}\otimes {Q}_{y}$, implying that $\stackrel{^}{u}$ and u can be obtained in parallel with coursegrain parallelism.

To end this section, it deserves mentioning that this two-level decomposition is a similarity transformation and therefore, all eigenvalues of the original matrix K are preserved in matrix D. Obtaining eigenvalues from D is far more efficient than from K.

5. Numerical Example

To demonstrate the usefulness of the two-level decomposition scheme, we present a numerical example derived from the Helmholtz equation with $\omega =3$ over a square domain on $\left[-\mathrm{1,1}\right]\times \left[-\mathrm{1,1}\right]$, subject to homogeneous Dirichlet boundary conditions on a grid with ${N}_{x}=5$ and ${N}_{y}=4$ . The numerical results presented in this section were conducted using GNU Octave which is software that offers many powerful high-level programming features for scientific computations, similar to Matlab. It suffices to show the final decomposed submatrices to illustrate the advantage of this approach. By excluding the grid points on the boundary and using the lexicographic ordering to number the internal nodes [3] , the matrix K, of dimension 12, yields

$K={I}_{4}\otimes {A}_{y}+{A}_{x}\otimes {I}_{3}+{\omega}^{2}\left({I}_{4}\otimes {I}_{3}\right)$

where

${A}_{x}=\left[\begin{array}{rrrr}\hfill -31.53& \hfill 12.68& \hfill -3.69& \hfill 2.21\\ \hfill 7.32& \hfill -10.07& \hfill 5.79& \hfill -1.91\\ \hfill -1.91& \hfill 5.79& \hfill -10.07& \hfill 7.32\\ \hfill 2.21& \hfill -3.69& \hfill 12.68& \hfill -31.53\end{array}\right]\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}{A}_{y}=\left[\begin{array}{rrr}\hfill -14.0& \hfill 6.0& \hfill -2.0\\ \hfill 4.0& \hfill -6.0& \hfill 4.0\\ \hfill -2.0& \hfill 6.0& \hfill -14.0\end{array}\right].$

For simplicity, we have rounded off the numbers after the last digit shown. First we observe that both ${A}_{x}$ and ${A}_{y}$ are reflexive. In fact, they are centrosymmetric matrices. Let the columns of ${Q}_{y}$ be the eigenvectors of ${A}_{y}$, of dimension 3, and ${P}^{\text{T}}$ be the following permutation matrix of dimension 12,

${P}^{\text{T}}\left(k\mathrm{,}l\right)=\{\begin{array}{ll}1\hfill & \text{for}k=i+\left(j-1\right)\ast 4,l=j+\left(i-1\right)\ast 3,1\le i\le 4,1\le j\le 3\hfill \\ 0\hfill & \text{otherwise}\text{.}\hfill \end{array}$

From the Level-1 decomposition presented in Section 3,

${P}^{\text{T}}\stackrel{\u02dc}{K}P=\stackrel{^}{K}={P}^{\text{T}}{Q}^{-1}KQP={\stackrel{^}{K}}_{1}\oplus {\stackrel{^}{K}}_{2}\oplus {\stackrel{^}{K}}_{3},Q={I}_{3}\otimes {Q}_{x},$

we easily obtain the decoupled diagonal blocks ${\stackrel{^}{K}}_{l}$ without the need of any matrix multiplications once the eigenvalues of ${A}_{y}$, ${\left({\lambda}_{l}\right)}_{y}$, are available:

$\begin{array}{l}{\stackrel{^}{K}}_{l}={A}_{x}+\left({\omega}^{2}+{\left({\lambda}_{l}\right)}_{y}\right){I}_{4}={\stackrel{^}{K}}_{l}={A}_{x}+\left({\omega}^{2}{I}_{4}\right)+{\left({\lambda}_{l}\right)}_{y}{I}_{4}\\ =\left[\begin{array}{cccc}-22.53+{\left({\lambda}_{l}\right)}_{y}& 12.68& -3.69& 2.21\\ 7.32& -1.07+{\left({\lambda}_{l}\right)}_{y}& 5.79& -1.91\\ -1.91& 5.79& -1.07+{\left({\lambda}_{l}\right)}_{y}& 7.32\\ 2.21& -3.69& 12.68& -22.53+{\left({\lambda}_{l}\right)}_{y}\end{array}\right],1\le l\le 3,\end{array}$

where $\left\{{\left({\lambda}_{1}\right)}_{y},{\left({\lambda}_{2}\right)}_{y},{\left({\lambda}_{3}\right)}_{y}\right\}=\left\{-19.54,-12.00,-2.46\right\}.$

The numerical values of $\text{\hspace{0.05em}}{\stackrel{^}{K}}_{l}\text{\hspace{0.05em}}$ clearly indicate that each $\text{\hspace{0.05em}}{\stackrel{^}{K}}_{l}\text{\hspace{0.05em}}$ is reflexive with respect to ${J}_{4}$ . Now, applying the Level-2 decomposition to each $\text{\hspace{0.05em}}{\stackrel{^}{K}}_{l}\text{\hspace{0.05em}}$ by taking

${X}_{A}=\frac{1}{\sqrt{2}}\left[\begin{array}{cc}I& -{J}_{2}\\ {J}_{2}& I\end{array}\right]$ yields

${D}_{1}={X}_{A}^{\text{T}}{\stackrel{^}{K}}_{1}{X}_{A}=\left[\begin{array}{rrrr}\hfill -39.87& \hfill 8.99& \hfill & \hfill \\ \hfill 5.41& \hfill -14.82& \hfill & \hfill \\ \hfill & \hfill & \hfill -26.40& \hfill 9.22\\ \hfill & \hfill & \hfill 16.38& \hfill -44.29\end{array}\right],$

${D}_{2}={X}_{A}^{\text{T}}{\stackrel{^}{K}}_{2}{X}_{A}=\left[\begin{array}{rrrr}\hfill -32.32& \hfill 8.99& \hfill & \hfill \\ \hfill 5.41& \hfill -7.28& \hfill & \hfill \\ \hfill & \hfill & \hfill -18.86& \hfill 9.22\\ \hfill & \hfill & \hfill 16.38& \hfill -36.74\end{array}\right],$

${D}_{3}={X}_{A}^{\text{T}}{\stackrel{^}{K}}_{3}{X}_{A}=\left[\begin{array}{rrrr}\hfill -22.78& \hfill 8.99& \hfill & \hfill \\ \hfill 5.41& \hfill 2.27& \hfill & \hfill \\ \hfill & \hfill & \hfill -9.31& \hfill 9.22\\ \hfill & \hfill & \hfill 16.38& \hfill -27.20\end{array}\right],$

where the unshown entries in the matrices are 0. Apparently, each $\text{\hspace{0.05em}}{\stackrel{^}{K}}_{l}\text{\hspace{0.05em}}$ has been further decomposed into two independent diagonal blocks, yielding a total of six independent diagonal blocks from the original matrix K. Note that the decomposed matrices ${D}_{l}\mathrm{,1}\le l\le 3$ can be obtained directly from ${A}_{x}$, $\omega $, and ${\left({\lambda}_{l}\right)}_{y}$ using Equation (7) without the need of computing $\text{\hspace{0.05em}}{\stackrel{^}{K}}_{l}\text{\hspace{0.05em}}$ . Note also that the eigenvalues of K, of dimension 12, can now be easily obtained from the six diagonal blocks, each of dimension only 2, a clear indication of the advantage of this approach.

6. Conclusions

In this paper, we have presented a two-level block decomposition scheme for the numerical solution to the Helmholtz equation ${\nabla}^{2}u+{\omega}^{2}u=f\left(x,y\right)$ in the Cartesian domain $\Omega =\left[-\mathrm{1,1}\right]\otimes \left[-\mathrm{1,1}\right]$, subject to homogeneous Dirichlet boundary condition, discretized with the Chebyshev pseudo-spectral method. For a computational grid with ${N}_{x}+1$ grid points in the x direction and ${N}_{y}+1$ in the y direction, our first-level decomposition yields ${N}_{y}-1$ independent 1D subproblems, excluding the known values at boundary points, using the eigen-pairs of the 1D second-order Chebyshev differentiation matrix along y. The decoupled coefficient matrices are then rearranged to form a block diagonal matrix through a proper permutation. The second level of decomposition further decouples each of the subproblem into two smaller and independent subproblems using reflexive decompositions, yielding a total of $2\left({N}_{y}-1\right)$ independent subproblems.

The general form of the final decoupled submatrices has also been explicitly derived, eliminating the need of performing matrix-matrix multiplications during the decomposition step. A numerical example is presented to illustrate the applicability and to demonstrate the advantage of this two-level decomposition scheme. This scheme leads naturally to a highly parallel numerical algorithm for solving the problem under consideration, since all subproblems can be solved in parallel using multiprocessors.

To close our conclusion, it deserves mentioning that this block decomposition can be readily employed for finding the eigenvalues of the problem as well. This is due to the fact that all eigenvalues of the original matrix are preserved in the decomposed submatrices because the first-level decomposition is a similarity transformation and the second-level decomposition is an orthogonal transformation. The preservation of all eigenvalues in the submatrices makes this block decomposition scheme even more attractive because finding eigenvalues from the decomposed submatrices is obviously much more efficient than from the original matrix, not to mention the computational benefit that can be achieved from the large-grain parallelism induced by the decomposition.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

Cite this paper

Chen, H.-C. (2018) Two-Level Block Decompositions for Solving Helmholtz Equation via Chebyshev Pseudo Spectral Method. Journal of Modern Physics, 9, 1713-1723. https://doi.org/10.4236/jmp.2018.99107

References

- 1. Martinez, J.d.J. and Esperanca, P.d.T.T. (2007) Journal of the Brazilian Society of Mechanical Sciences and Engineering, XXIX, 317-328. https://doi.org/10.1590/S1678-58782007000300013
- 2. Peyret, R. (2002) Applied Mathematical Sciences. Vol. 148, Springer-Verlag, New York, 448 p.
- 3. Trefethen, L.N. (2000) Spectral Methods in Matlab. SIAM University City Center, Philadelphia, 165 p.
- 4. Chen, H.-C. (2015) Neural, Parallel, and Scientific Computations, 23, 267-275.
- 5. Ehrenstein, U. and Peyret, P. (1989) International Journal for Numerical Methods in Fluids, 9, 427-452. https://doi.org/10.1002/fld.1650090405
- 6. Golub, G.H. and Van Loan, C.F. (1983) Matrix Computations. The Johns Hopkins University Press, Maryland, 476 p.
- 7. Haidvogel, D.B. and Zang, T. (1979) Journal of Computational Physics, 30, 167-180. https://doi.org/10.1016/0021-9991(79)90097-4
- 8. Julien, K. and Watson, M. (2009) Journal of Computational Physics, 228, 1480-1503. https://doi.org/10.1016/j.jcp.2008.10.043
- 9. Lynch, R.E., Rice, J.R. and Thomas, D.H. (1964) Numerische Mathematik, 6, 185-199. https://doi.org/10.1007/BF01386067
- 10. Chen, H.-C. and Sameh, A. (1989) SIAM Journal on Matrix Analysis and Applications, 10, 39-64. https://doi.org/10.1137/0610004
- 11. Andrew, A.L. (1973) Technometrics, 15, 405-407. https://doi.org/10.1080/00401706.1973.10489052
- 12. Cantoni, A. and Butler, P. (1976) Linear Algebra and Its Applications, 13, 275-288. https://doi.org/10.1016/0024-3795(76)90101-4