CMC CMC CMC Computers, Materials & Continua 1546-2226 1546-2218 Tech Science Press USA 19397 10.32604/cmc.2022.019397 Article Efficient Gauss-Seidel Precoding with Parallel Calculation in Massive MIMO Systems Efficient Gauss-Seidel Precoding with Parallel Calculation in Massive MIMO Systems Efficient Gauss-Seidel Precoding with Parallel Calculation in Massive MIMO Systems HwangHyun-Sun1 RoJae-Hyun2 ParkChan-Yeob1 YouYoung-Hwan3 SongHyoung-Kyu1songhk@sejong.ac.kr Department of Information and Communication Engineering, Convergence Engineering for Intelligent Drone, Sejong University, Seoul, 05006, Korea Department of Convergence Engineering for Intelligent Drone, Sejong University, Seoul, 05006, Korea Department of Computer Engineering, Convergence Engineering for Intelligent Drone, Sejong University, Seoul, 05006, Korea Corresponding Author: Hyoung-Kyu Song. Email: songhk@sejong.ac.kr 30082021 70 1 491 504 1242021 1852021 © 2022 Hwang et al. 2022 Hwang et al. This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

A number of requirements for 5G mobile communication are satisfied by adopting multiple input multiple output (MIMO) systems. The inter user interference (IUI) which is an inevitable problem in MIMO systems becomes controllable when the precoding scheme is used. In this paper, the horizontal Gauss-Seidel (HGS) method is proposed as precoding scheme in massive MIMO systems. In massive MIMO systems, the exact inversion of channel matrix is impractical due to the severe computational complexity. Therefore, the conventional Gauss-Seidel (GS) method is used to approximate the inversion of channel matrix. The GS has good performance by using previous calculation results as feedback. However, the required time for obtaining the precoding symbols is too long due to the sequential process of GS. Therefore, the HGS with parallel calculation is proposed in this paper to reduce the required time. The rows of channel matrix are eliminated for parallel calculation in HGS method. In addition, HGS uses the ordered channel matrix to prevent performance degradation which is occurred by parallel calculation. The HGS with proper number of parallelly computed symbols has better performance and reduced required time compared to the traditional GS.

Massive MIMO GS matrix inversion linear precoding
Introduction

Multiple input multiple output (MIMO) systems are key components of future wireless communication in terms of high data rate over a limited frequency resource [1,2]. Massive MIMO systems are specific cases of MIMO systems. The number of base station (BS) antennas is much larger than the number of antennas for terminals in massive MIMO systems. In massive MIMO systems, the inter user interference (IUI) which is an inevitable problem in MIMO systems becomes controllable when the precoding scheme is used. The precoding schemes in massive MIMO systems have been studied as promising techniques which achieve spatial multiplexing gain to increase throughput and spatial diversity to improve reliability at 5G . The massive MIMO systems promise significant improvements in terms of spectral efficiency, reliability, and data rate compared to multi-user (MU) MIMO systems by using large channel matrix . The significant improvement of performances for the massive MIMO is shown mathematically in . If the channel matrix is extremely large, the effect of fast fading and non-correlated noise becomes extinct . However, computational complexity at BS is extremely increased because of large array of antennas at BS. The inversion of the channel matrix is critical task for massive MIMO signal processing . It means that the precoding schemes which use exact inversion of channel matrix are impractical in massive MIMO systems. Therefore, the schemes which use approximated matrix inversion have been studied for achieving low computational complexity in massive MIMO systems.

The match filter (MF) precoding is the simplest precoding scheme. However, the performance of the MF precoding is much poorer than the ZF scheme . The ZF scheme at MU-MIMO systems has sub-optimal performance and the lowest computational complexity. In contrast, the ZF in massive MIMO systems has optimal performance because of the large channel matrix . However, in massive MIMO systems, since the inversion of large gram matrix has to be calculated, ZF scheme has impractically large computational complexity which is cubic order with respect to the number of users.

Therefore, in , the approximate inversion of gram matrix is calculated by using Neumann Series (NS). The computational complexity of approximate inversion using NS is lower than exact inversion and the calculation of NS is parallelized. However, when the iteration of NS is more than 2, NS has also computational complexity which is cubic order.

In contrast, precoding scheme based on Gauss-Seidel (GS) which was proposed in  keeps the computational complexity which is square order. In addition, since the result of the previous precoding symbol is used as the feedback of the following calculation of next precoding symbol, the performance of the scheme based on GS is better than the scheme based on NS. It means that the calculation at GS is not parallel processing and requires longer time for obtaining precoding symbol.

Therefore, in this paper, horizontal GS (HGS) which reduces required time for obtaining precoding symbol is proposed. The HGS calculates some precoding symbols without feedback of the previous precoding symbol for parallel processing. The conventional schemes use calculation result of previous symbol for calculating present symbol. It means that conventional schemes have to wait for result of the (n1)th symbol to calculate the nth symbol . However, parallel calculation at HGS does not need to wait for result of previous symbol. Parallel calculation technique calculates some symbols simultaneously. The calculation without feedback at GS results in the performance degradation. In HGS, the channel matrix is sorted to overcome performance degradation. The calculation without feedback at GS results in the performance degradation. In HGS, the channel matrix is sorted to overcome performance degradation. Thus, the required time for obtaining precoding symbol is reduced at HGS by calculating precoding symbol without feedback. In addition, the performance of HGS is better than GS by sorting the channel matrix.

In Section 2, system model which is used in this paper is explained. The conventional schemes which are compared with the proposed scheme are explained in Section 3. In Section 4, HGS scheme is proposed. Section 5 shows the performance comparison between the conventional schemes and the proposed scheme.

System Model

In Fig. 1, massive MIMO broadcasting system which is composed of one base station with NT transmit antennas and K users is considered. Rayleigh flat fading channel is assumed and the kth user has one receive antenna. The number of transmit antennas is much larger than the total number of receive antennas NR(NTNR).

System model of the downlink massive MIMO channel

The transmit signal vector for the kth user is xk. The received signal yk for the kth device can be expressed as follows, yk=HkPkxkintended signal+j=1,jkKHkPjxjIUI+nk, where HkCNR×NT, PkCNT×1, j and nkCNk×1 are the channel matrix of the kth user, precoding matrix for the kth user, index of other users except for the kth user and the additive white Gaussian noise vector of the kth user which has zero mean and variance σn2, respectively.

The set of entire received signal vectors yCNR×1 can be expressed as follows, y=[y1TyKT]T=[H1HK][P1PK][x1xK]+[n1nK]=H~[x1xK]+[n1nK], where H~CNR×K is the effective channel. The effective channel H~ is obtained as follows, H~=[H1P1H1P2H1PKH2P1H2P2H2PKHKP1HKP2HKPK].

Diagonal components in the effective channel matrix are MIMO channels for each user and off-diagonal components express the IUI.

Conventional Schemes Zero Forcing Precoding

The ZF precoding scheme uses exact inversion of the gram matrix to eliminate IUI and inter antenna interference (IAI). The precoding matrix of ZF can be expressed as follows, PZF=βHH(HHH)1=βHHZ1, where β=Ktr(Z1) and Z=HHH are the normalization factor and the gram matrix, respectively.

The precoded signal vector can be expressed as follows, s=βHHZ1x=βHHx^, where x^=Z1x. In massive MIMO systems, since the channel matrix has large array, the ZF can achieve optimal performance. However, the exact inversion of gram matrix is hard to be calculated because of high computational complexity of {\cal O}(K3). Therefore, the method to calculate approximate inversion of the gram matrix is needed.

Gauss-Seidel Precoding

For downlink massive MIMO systems, the columns of channel matrix H are asymptotically orthogonal . It means that the gram matrix Z of the channel is the Hermitian positive definite which is the condition to exploit GS method. Therefore, the approximate inversion of gram matrix can be obtained by GS method.

The GS method is an iterative technique for solving linear equation Ax=b when A, x and b are the K×K Hermitian positive definite matrix, the K×1 unknown vector and the K×1 measurement vector, respectively. The A can be decomposed as follows, A=D+L+LH, where D, L and LH are the diagonal component, the strictly lower triangular component and the strictly upper triangular component of A, respectively. The solution of Ax=b which is calculated iteratively by using GS method can be expressed as follows, x(i+1)=(D+L)1(bLHx(i)), where x(i+1) and x(i) are the (i+1)th and the (i)th approximations of x.

The GS method can be applied to linear equation x^=Z1x which can be rewritten as Zx^=x. The solution of linear equation Zx^=x is expressed as follows, x^(i+1)=(D+L)1(xLHx^(i)), where x^(0) is the initial solution which is zero vector in this paper. Each component of x^(i=1) in Eq. (8) is sequentially calculated as follows, x^m(i+1)=1zmm(xmn=1m1zmmxn(i+1)n=m+1Kzmmxn(i)). It means that the components of x^(i+1) from the first to the (m1)th index are used as feedback for computing x^m(i+1). Therefore, the performance for the GS method is enhanced due to the use of previous results. However, since the calculation of the mth component needs results of previous components, the mth component has to wait for calculation results of the components from the first to the (m1)th index. It means that the required time to calculate precoding symbol is long. Therefore, GS method which can be computed in parallel is proposed in this paper to reduce the required time.

Proposed Horizontal Gauss-Seidel Precoding

The ZF precoding scheme has optimal performance with large scale MIMO systems. However, when the ZF is used, large computational complexity of exact matrix inversion is inevitable. Therefore, the methods which compute approximate inversion of channel matrix with low complexity have been studied such as GS method. In GS method, the results of the precoding symbol from the first to the (m1)th index are used to calculate the mth precoding symbol as feedback. Since the previous results are used at the current calculation, the performance of GS method is improved. It means that the precoding symbols are computed sequentially. The required time to obtain precoding symbols becomes long due to sequential process of GS method. Therefore, HGS method with parallel calculation is proposed to reduce the required time.

Fig. 2 is the flow chart of the HGS in massive MIMO systems. The parallel calculation at GS method means the calculation of current precoding symbol without feedback. The precoding symbols which are computed without feedback have to be chosen for parallel calculation. Therefore, the criterion for choosing the precoding symbols without feedback has to be defined. For example, when x^1(i+1) at the (i+1)th iteration is calculated at conventional GS method, the results at the (i)th iteration are only needed. However, when x^2(i+1) at the (i+1)th iteration is computed, a result x^1(i+1) at the (i+1)th iteration is used. In other words, when the mth component of x^(i+1) is current calculated symbol, (m1) results at the (i+1)th iteration are used as feedback. The number of used feedbacks is increased as an index of x^(i+1) grows. In other words, with HGS, if the mth precoding symbol is chosen for calculating without feedback, the (m1) feedbacks are not used. The number of feedbacks which are not used in HGS varies depending on the index of the selected symbol. Therefore, the precoding symbol with the smallest index has to be chosen as calculated symbol without feedback. The symbol that requires the least feedbacks at calculation is chosen to reduce performance degradation by not using feedbacks. In HGS, the rows of L are eliminated horizontally from the second to the (s+1)th index not to use results of calculation at the current iteration when the s is the number of selected symbols.

For example, Fig. 3 shows lower triangular matrix L when the number of calculated precoding symbol without feedback is 0 (s=0). When the number of computed precoding symbol without feedback is 1 or 2 (s=1 or s=2), the row of L is eliminated as shown in Fig. 3.

The components which are excluded from L are added to upper triangular matrix LH as shown in Fig. 4. Therefore, Eq. (7) for selected symbol can be rewritten as follows, x^m(i+1)=1zmm(xmn=1Kzmmxn(i)),m=1,,s+1.

Flow chart of HGS in massive MIMO systems

The required time for calculating a precoding symbol is shown in Tab. 1. If the required time for calculating a precoding symbol is t, the Kt time is needed at conventional GS method. In HGS, when the number of symbols which are parallelly calculated is s, the total required time is (Ks)t. However, HGS method cannot avoid the performance degradation because of the parallel calculation. Since the previous results are not used for computing current precoding symbol, the performance of the HGS method becomes poor.

Lower triangle matrix <inline-formula id="ieqn-103"><mml:math id="mml-ieqn-103"><mml:mrow><mml:mrow><mml:mi mathvariant="bold">L</mml:mi></mml:mrow></mml:mrow></mml:math></inline-formula> with various <inline-formula id="ieqn-104"><mml:math id="mml-ieqn-104"><mml:mi>s</mml:mi></mml:math></inline-formula> Upper triangle matrix <inline-formula id="ieqn-105"><mml:math id="mml-ieqn-105"><mml:mrow><mml:mrow><mml:mi mathvariant="bold">L</mml:mi></mml:mrow></mml:mrow></mml:math></inline-formula> with various <inline-formula id="ieqn-106"><mml:math id="mml-ieqn-106"><mml:mi>s</mml:mi></mml:math></inline-formula> The required time for calculating a precoding symbol
Scheme Required time for all symbols
ZF t
GS Kt
HGS (Ks)t

The performance degradation of the HGS can be prevented by sorting the gram matrix of the channel. The criterion of sorting the gram matrix is the off-diagonal components of Z. The power of the off-diagonal components for the first ordering is expressed as follows, Pm1=n=1nmK|zmn|. The off-diagonal components of the gram matrix mean the correlation with other channels. The large correlation with other channels means that the interference from other channels is large. Therefore, the symbol which has the channel corresponding to large off-diagonal components needs many feedbacks. It means that the row of gram matrix with maximum off-diagonal components should replace the bottom of the matrix. If the gth row of Z has maximum off-diagonal components, it becomes the Kth row of ordered Z. Since the interference between the gth channel and other channels is already considered at the first ordering, the power of the off-diagonal components is calculated except for the interference from the gth row to find the row with the second largest off-diagonal component. Therefore, the power of the off-diagonal components for second ordering is expressed as follows, Pm2=Pm1|zmg|. The row with maximum Pm2 becomes the (K1)th row of ordered Z. The power of the off-diagonal components for the third ordering is also calculated except for the interference from the row with maximum Pm2. Therefore, the power of the off-diagonal components for the second ordering is expressed as follows, Pmi=Pm(i1)|zmg|i=2,,K, where the g is the index of the row with maximum Pm(i1). The row with minimum off-diagonal components are positioned on top of the matrix. The gram matrix at HGS is sorted into the off-diagonal components of the gram matrix. The performance of HGS is better than the GS by sorting the gram matrix. In addition, the performance of HGS is still better than the GS when the s is smaller than K2. It means that the HGS has better performance and shorter required time than traditional GS. The HGS needs extra operation for ordering the gram matrix of the channel compared to the traditional GS. However, while the channel is unchanged, the benefit from reducing the required time with an arrangement is considerable.

Since the approximation of HGS rapidly gets close to the exact inversion of the gram matrix, the performance of HGS is better than the conventional GS. However, the GS has better convergence rate by using only sequential calculation. The HGS overcomes poor convergence rate by sorting the gram matrix. The approximation error between approximate solution and exact solution can be expressed as follows, x^(i+1)x^=(D+L)1LH(x^(i)x^)==((D+L)1LH)i+1(x^(0)x^), where the x^=Z1x is the exact solution with inversion of the gram matrix. Eq. (14) can be rewritten as follows, x^(i+1)x^=Mi+1(x^(0)x^), where M=(D+L)1LH is the iteration matrix. Since the small approximation error means fast convergence, the convergence rate accelerates when Frobenius norm of M is small .

Simulation Results

The error and throughput performances for the HGS are evaluated and compared with the conventional GS scheme. The Rayleigh flat fading channel is used and the perfect channel estimation is assumed. All elements of the channel matrices have independent complex Gaussian random variables with zero mean and unit variance. The system which has much larger number of transmit antennas compared to the number of receive antennas is considered. The number of transmit antennas is 80 or 100. The number of antennas for each user is 1 and total number of receive antennas is 10. The used modulation is 16-quadrature amplitude modulation (QAM). The number of GS iterations is 2 or 3. The number of precoding symbols which are computed parallelly at HGS is 0, 2 or 5. The bit error rate (BER) performances for the HGS are evaluated with various number of parallelly calculated symbols. Tab. 2 shows the simulation parameters.

Simulation parameters for HGS
 Number of transmit antennas NT=80,100 Number of receive antennas NR=10 Number of users K=10 Modulation 16QAM Symbol size 128 Number of GS iterations 2, 3 Channel Rayleigh flat fading

The BER and throughput performances for the HGS are shown in Figs. 58. The enhancement of the BER performance at HGS is obtained by ordering the gram matrix of the channel. The performances with 2 GS iterations are shown in Fig. 5. Since the performance enhancement is occurred by ordering gram matrix but the performance degradation of parallel calculation is not occurred, the HGS without parallel calculation (HGS-0) has the best performance. Therefore, the throughput of HGS-0 is larger than other schemes in Fig. 7a. In Fig. 7b, the gap of throughput between the schemes except HGS-5 is small. However, the HGS without parallel calculation has the same required time for obtaining precoding symbol as conventional GS. The HGS with 2 symbols which are parallelly computed (HGS-2) has better performance compared to the conventional GS. In Fig. 7a, HGS-2 achieves the maximum throughput rapidly compared to the conventional GS. In addition, the required time for calculating precoding symbol is reduced compared to the conventional GS due to the parallel calculation. However, since the number of feedback at HGS-2 is smaller than the feedback at HGS-0, the BER performance of HGS-2 is poorer than HGS-0. The throughput of HGS-2 is slightly smaller than HGS-0 in Fig. 7a. However, the throughput performances of HGS-0 and HGS-2 are almost same in Fig. 7b. When the number of symbols which are computed parallelly is larger than K2, the performance degradation of parallel calculation is larger than the performance enhancement of ordering gram matrix. Therefore, the performance of HGS with 5 symbols which are computed parallelly is poorer than the conventional GS. In Fig. 5b, since the number of transmit antennas grows to 100, the channel becomes more diagonal dominant. Therefore, the performances of all schemes are improved due to the diagonal dominant channel. Even though all schemes except HGS-5 has almost same throughput performance due to the diagonal dominant channel in Fig. 7b, the HGS-0 has poorer throughput performance because of the large number of parallelly computed symbols.

BER performance comparison with 2 GS iterations between the GS and the proposed HGS. (a) 2 GS iterations and <inline-formula id="ieqn-107"><mml:math id="mml-ieqn-107"><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>T</mml:mi></mml:msub></mml:mrow><mml:mo>×</mml:mo><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>80</mml:mn><mml:mo>×</mml:mo><mml:mn>10</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>, (b) 2 GS iterations and <inline-formula id="ieqn-108"><mml:math id="mml-ieqn-108"><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>T</mml:mi></mml:msub></mml:mrow><mml:mo>×</mml:mo><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>80</mml:mn><mml:mo>×</mml:mo><mml:mn>10</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> BER performance comparison with 3 GS iterations between the GS and the proposed HGS. (a) 3 GS iterations and <inline-formula id="ieqn-109"><mml:math id="mml-ieqn-109"><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>T</mml:mi></mml:msub></mml:mrow><mml:mo>×</mml:mo><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>80</mml:mn><mml:mo>×</mml:mo><mml:mn>10</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>, (b) 3 GS iterations and <inline-formula id="ieqn-110"><mml:math id="mml-ieqn-110"><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>T</mml:mi></mml:msub></mml:mrow><mml:mo>×</mml:mo><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>80</mml:mn><mml:mo>×</mml:mo><mml:mn>10</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> Throughput performance comparison with 2 GS iterations between the GS and the proposed HGS. (a) 2 GS iterations and <inline-formula id="ieqn-111"><mml:math id="mml-ieqn-111"><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>T</mml:mi></mml:msub></mml:mrow><mml:mo>×</mml:mo><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>80</mml:mn><mml:mo>×</mml:mo><mml:mn>10</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>, (b) 2 GS iterations and <inline-formula id="ieqn-112"><mml:math id="mml-ieqn-112"><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>T</mml:mi></mml:msub></mml:mrow><mml:mo>×</mml:mo><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>80</mml:mn><mml:mo>×</mml:mo><mml:mn>10</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> Throughput performance comparison with 3 GS iterations between the GS and the proposed HGS. (a) 3 GS iterations and <inline-formula id="ieqn-113"><mml:math id="mml-ieqn-113"><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>T</mml:mi></mml:msub></mml:mrow><mml:mo>×</mml:mo><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>80</mml:mn><mml:mo>×</mml:mo><mml:mn>10</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>, (b) 3 GS iterations and <inline-formula id="ieqn-114"><mml:math id="mml-ieqn-114"><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>T</mml:mi></mml:msub></mml:mrow><mml:mo>×</mml:mo><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>80</mml:mn><mml:mo>×</mml:mo><mml:mn>10</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>

The BER and throughput performances with 3 GS iterations are shown in Figs. 6 and 8. Since the 1 GS iteration is added, the approximate solution becomes close to the exact solution. Therefore, the performances of all schemes with 3 GS iterations are better than the performances of all schemes with 2 GS iteration. In addition, the GS, HGS-0 and HGS-2 have almost optimal BER performance. However, the HGS-5 has poorer performance compared to other schemes due to parallel calculation of many symbols in Fig. 6a. The HGS-5 has almost similar performance with other schemes due to diagonal dominant channel in Fig. 6b. The all schemes have almost optimal throughput performance due to added GS iteration in Fig. 8.

The comparison of Frobenius norm is shown in Fig. 9. Since precoding symbols of the GS and HGS-0 are calculated with whole lower and upper triangular matrix for sequential calculation, the GS and HGS-0 have the smallest Frobenius norm. In addition, since the HGS-0 has performance enhancement by sorting the gram matrix, the HGS-0 has better BER performance compared to conventional GS. Since the lower triangular matrix without 2 rows is used at calculation, the HGS-2 has slightly large Frobenius norm compared to GS and HGS-0. However, the HGS-2 overcomes the reduced Frobenius norm by sorting the gram matrix. Therefore, the performance of HGS-2 is better than the GS although the Frobenius norm of the HGS-2 is larger than GS. Since the HGS-5 has too many symbols which are calculated parallelly, the difference of Frobenius norm between GS and HGS-5 becomes large. Therefore, the HGS-5 can not overcome the reduced Frobenius norm even though the HGS-5 uses the ordered gram matrix. The BER performance of HGS-5 is poorer than the GS because of the reduced Frobenius norm.

Comparison of Frobenius norm with <inline-formula id="ieqn-115"><mml:math id="mml-ieqn-115"><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mn>10</mml:mn></mml:math></inline-formula>
Conclusion

Massive MIMO system is a key component of future wireless communication in terms of high data rate over a limited frequency resource. In MIMO broadcast channel, IUI occurs inevitably at each device. Therefore, the BS has to utilize precoding schemes for IUI reduction.

The HGS with parallel operation is proposed to reduce the required time for obtaining the precoding symbol in massive MIMO systems. The conventional GS method has performance enhancement due to the sequential calculation which uses previous results as feedback. It means that the required time for obtaining the precoding symbols is too long. When the required time for obtaining one precoding symbol is t, the total required time at GS is Kt because of sequential calculation. Therefore, in HGS, some symbols are parallelly computed without feedback to reduce the required time. When the number of symbols which are parallelly calculated is s, the total required time is reduced to (Ks)t. However, the parallel calculation without feedback in HGS gives bad influence to BER performance. Therefore, the gram matrix which is sorted by interference of other channel is used to overcome the performance degradation of parallel calculation in HGS. When the number of symbols which are parallelly calculated is under K2, the BER performance of HGS is better than the conventional GS. However, the performance degradation of parallel calculation has serious impact on the performance compared to the performance enhancement of ordered gram matrix when the number of parallelly calculated symbol is larger than K2 . Therefore, the performance of HGS with too many parallelly calculated symbols is poorer than the GS. The way to overcome performance degradation which is occurred when the number of parallelly calculated symbols is larger than K2 has to be studied. In this paper, the proposed HGS scheme is proposed for reducing the required time by using parallel calculation.

Funding Statement: This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education (2020R1A6A1A03038540) and was supported by the National Research Foundation of Korea(NRF) Grant funded by the Korea government (MSIT) (2021R1A2C2005777).

Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.

References P. Li, Y. Gao, Z. Li and D. Yang, “Pilot sequence assignment for spatially correlated massive MIMO circumstances,” KSII Transactions on Internet and Information Systems, vol. 13, no. 1, pp. 237253, 2019. P. I. Tebe, G. Wen, J. Li, Y. Huang, A. E. Ampoma et al., “Massive MIMO with transceiver hardware impairments: Performance analysis and phase noise error minimization,” KSII Transactions on Internet and Information Systems, vol. 13, no. 5, pp. 23572380, 2019. Q. Deng, X. Liang, X. Wang, M. Huang, C. Dong et al., “Fast converging iterative precoding for massive MIMO systems: An accelerated Weighted neumann series-steepest descent approach,” IEEE Access, vol. 8, pp. 5024450255, 2020. Y. Liu, J. Liu, Q. Wu, Y. Zhang and M. Jin, “A near-optimal iterative linear precoding with low complexity for massive MIMO systems,” IEEE Communications Letters, vol. 23, no. 6, pp. 11051108, 2019. J. Ø. Nielsen, A. Karstensen, P. C. F. Eggers, E. De Carvalho, G. Steinböck et al., “Precoding for TDD and FDD in measured massive MIMO channels,” IEEE Access, vol. 8, pp. 193644193654, 2020. J. Chen, “Low-cost and power-efficient massive MIMO precoding: Architecture and algorithm designs,” IEEE Transactions on Vehicular Technology, vol. 69, no. 7, pp. 74297442, 2020. M. A. M. Albreem, A. A. El-Saleh and M. Juntti, “Linear massive MIMO uplink detector based on joint jacobi and gauss-seidel methods,” in 2020 16th Int. Conf. on the Design of Reliable Communication Networks DRCN 2020, Milano, Italy, pp. 14, 2020. R. Chataut, R. Akl and M. Robaei, “Accelerated and preconditioned refinement of gauss-seidel method for uplink signal detection in 5G massive MIMO systems,” in 2020 10th Annual Computing and Communication Workshop and Conf. (CCWC), Las Vegas, NV, USA, pp. 8389, 2020. F. Rusek, D. Persson, B. K. Lau, E. G. Larsson, T. L. Marzetta et al., “Scaling up MIMO: Opportunities and challenges with very large arrays,” IEEE Signal Processing Magazine, vol. 30, no. 1, pp. 4060, 2013. H. Huh, G. Caire, H. C. Papadopoulos and S. A. Ramprashad, “Achieving massive MIMO spectral efficiency with a not-so-large number of antennas,” IEEE Transactions on Wireless Communications, vol. 11, no. 9, pp. 32263239, 2012. H. Q. Ngo, E. G. Larsson and T. L. Marzetta, “Energy and spectral efficiency of very large multiuser MIMO systems,” IEEE Transactions on Communications, vol. 61, no. 4, pp. 14361449, 2013. T. L. Marzetta, “Noncooperative cellular wireless with unlimited numbers of base station antennas,” IEEE Transactions on Wireless Communications, vol. 9, no. 11, pp. 35903600, 2010. I. A. Khoso, X. Dai, M. N. Irshad, A. Khan and X. Wang, “A low-complexity data detection algorithm for massive MIMO systems,” IEEE Access, vol. 7, pp. 3934139351, 2019. C. Tang, C. Liu, L. Yuan and Z. Xing, “High precision low complexity matrix inversion based on newton iteration for data detection in the massive MIMO,” IEEE Communications Letters, vol. 20, no. 3, pp. 490493, 2016. X. Gao, O. Edfors, F. Rusek and F. Tufvesson, “Linear pre-coding performance in measured very-large MIMO channels,” in 2011 IEEE Vehicular Technology Conf. (VTC Fall), San Francisco, CA, pp. 15, 2011. H. Prabhu, J. Rodrigues, O. Edfors and F. Rusek, “Approximative matrix inverse computations for very-large MIMO and applications to linear precoding systems,” in 2013 IEEE Wireless Communications and Networking Conf. (WCNC), Shanghai, pp. 27102715, 2013. L. Dai, X. Gao, X. Su, S. Han, I. Chih-Lin et al., “Low-complexity soft-output signal detection based on gauss–seidel method for uplink multiuser large-scale MIMO systems,” IEEE Transactions on Vehicular Technology, vol. 64, no. 10, pp. 48394845, 2015. W. S. Lee, J. Y. Jang, J. H. Ro, J. Kim and H. K. Song, “An efficient modified gauss-seidel precoder for downlink massive MIMO systems,” IEEE Access, vol. 8, pp. 202164202173, 2020. G. H. Golub, and C. F. Van Loan, “Matrix computations,” Google Scholar, vol. 3, pp. 111263112264, 2012.