Home / Advanced Search

  • Title/Keywords

  • Author/Affliations

  • Journal

  • Article Type

  • Start Year

  • End Year

Update SearchingClear
  • Articles
  • Online
Search Results (9)
  • Open Access

    ARTICLE

    A Hybrid Parallel Strategy for Isogeometric Topology Optimization via CPU/GPU Heterogeneous Computing

    Zhaohui Xia1,3, Baichuan Gao3, Chen Yu2,*, Haotian Han3, Haobo Zhang3, Shuting Wang3

    CMES-Computer Modeling in Engineering & Sciences, Vol.138, No.2, pp. 1103-1137, 2024, DOI:10.32604/cmes.2023.029177

    Abstract This paper aims to solve large-scale and complex isogeometric topology optimization problems that consume significant computational resources. A novel isogeometric topology optimization method with a hybrid parallel strategy of CPU/GPU is proposed, while the hybrid parallel strategies for stiffness matrix assembly, equation solving, sensitivity analysis, and design variable update are discussed in detail. To ensure the high efficiency of CPU/GPU computing, a workload balancing strategy is presented for optimally distributing the workload between CPU and GPU. To illustrate the advantages of the proposed method, three benchmark examples are tested to verify the hybrid parallel strategy in this paper. The results… More > Graphic Abstract

    A Hybrid Parallel Strategy for Isogeometric Topology Optimization via CPU/GPU Heterogeneous Computing

  • Open Access

    ARTICLE

    Accelerating Falcon Post-Quantum Digital Signature Algorithm on Graphic Processing Units

    Seog Chung Seo1, Sang Woo An2, Dooho Choi3,*

    CMC-Computers, Materials & Continua, Vol.75, No.1, pp. 1963-1980, 2023, DOI:10.32604/cmc.2023.033910

    Abstract Since 2016, the National Institute of Standards and Technology (NIST) has been performing a competition to standardize post-quantum cryptography (PQC). Although Falcon has been selected in the competition as one of the standard PQC algorithms because of its advantages in short key and signature sizes, its performance overhead is larger than that of other lattice-based cryptosystems. This study presents multiple methodologies to accelerate the performance of Falcon using graphics processing units (GPUs) for server-side use. Direct GPU porting significantly degrades performance because the Falcon reference codes require recursive functions in its sampling process. Thus, an iterative sampling approach for efficient… More >

  • Open Access

    ARTICLE

    Implementing Delay Multiply and Sum Beamformer on a Hybrid CPU-GPU Platform for Medical Ultrasound Imaging Using OpenMP and CUDA

    Ke Song1,*, Paul Liu2, Dongquan Liu3

    CMES-Computer Modeling in Engineering & Sciences, Vol.128, No.3, pp. 1133-1150, 2021, DOI:10.32604/cmes.2021.016008

    Abstract A novel beamforming algorithm named Delay Multiply and Sum (DMAS), which excels at enhancing the resolution and contrast of ultrasonic image, has recently been proposed. However, there are nested loops in this algorithm, so the calculation complexity is higher compared to the Delay and Sum (DAS) beamformer which is widely used in industry. Thus, we proposed a simple vector-based method to lower its complexity. The key point is to transform the nested loops into several vector operations, which can be efficiently implemented on many parallel platforms, such as Graphics Processing Units (GPUs), and multi-core Central Processing Units (CPUs). Consequently, we… More >

  • Open Access

    ARTICLE

    Efficient Concurrent L1-Minimization Solvers on GPUs

    Xinyue Chu1, Jiaquan Gao1,*, Bo Sheng2

    Computer Systems Science and Engineering, Vol.38, No.3, pp. 305-320, 2021, DOI:10.32604/csse.2021.017144

    Abstract Given that the concurrent L1-minimization (L1-min) problem is often required in some real applications, we investigate how to solve it in parallel on GPUs in this paper. First, we propose a novel self-adaptive warp implementation of the matrix-vector multiplication (Ax) and a novel self-adaptive thread implementation of the matrix-vector multiplication (ATx), respectively, on the GPU. The vector-operation and inner-product decision trees are adopted to choose the optimal vector-operation and inner-product kernels for vectors of any size. Second, based on the above proposed kernels, the iterative shrinkage-thresholding algorithm is utilized to present two concurrent L1-min solvers from the perspective of the… More >

  • Open Access

    ABSTRACT

    CUDA Techniques in Computational Mechanics

    Peng Wang

    The International Conference on Computational & Experimental Engineering and Sciences, Vol.20, No.4, pp. 117-118, 2011, DOI:10.3970/icces.2011.020.117

    Abstract Current trends in high performance computing (HPC) are moving towards the availability of several cores on the same chip of contemporary processors in order to achieve speed-up through the extraction of potential fine-grain parallelism of applications. The trend is led by GPUs, which have been developed exclusively for computational tasks as massively-parallel co-processors to the CPU. During 2010 an extensive set of new HPC architectural feature were developed in the third generation of NVIDIA GPUs (Fermi), giving computational mechanics an opportunity to expand use of GPU modelling and simulation.

    This presentation will examine examples relevant to industry-scale HPC practice… More >

  • Open Access

    ARTICLE

    Local strong form meshless method on multiple Graphics Processing Units

    G. Kosec1,2, P. Zinterhof3

    CMES-Computer Modeling in Engineering & Sciences, Vol.91, No.5, pp. 377-396, 2013, DOI:10.3970/cmes.2013.091.377

    Abstract This paper deals with the implementation of the local meshless numerical method (LMM) on general purpose graphics processing units (GPU) in solving partial differential equations (PDE). The local meshless solution procedure is formulated in a way suitable for parallel execution and has been implemented on multiple GPUs. The implementation is tested on a solution of diffusion equation in a 2D domain. Different setups of the meshless approach regarding the selection of basis functions are tested on an interval up to 2.5 million of computational points. It is shown that monomials are a good selection of the basis when working with… More >

  • Open Access

    ARTICLE

    Particle-based Fluid Flow Simulations on GPGPU Using CUDA

    Kazuhiko Kakuda1, Tsuyoki Nagashima1, Yuki Hayashi1, Shunsuke Obara1, Jun Toyotani1, Nobuya Katsurada2, Shunji Higuchisup>2, Shohei Matsuda2

    CMES-Computer Modeling in Engineering & Sciences, Vol.88, No.1, pp. 17-28, 2012, DOI:10.3970/cmes.2012.088.017

    Abstract An acceleration of the particle-based incompressible fluid flow simulations on GPU using CUDA is presented. The particle method is based on the MPS (Moving Particle Semi-implicit) scheme using logarithmic-type weighting function to stabilize the spurious oscillatory solutions for the pressure fields which are governed by Poisson equation. The standard MPS scheme is widely utilized as a particle strategy for the free surface flow, the problem of moving boundary, multi-physics/multi-scale ones, and so forth. Numerical results demonstrate the workability and the validity of the present approach through dam-breaking flow problem. More >

  • Open Access

    ARTICLE

    Optimizations for Elastodynamic Simulation Analysis with FMM-DRBEM and CUDA

    Yixiong Wei1, Qifu Wang1,2, Yingjun Wang1, Yunbao Huang1

    CMES-Computer Modeling in Engineering & Sciences, Vol.86, No.3, pp. 241-274, 2012, DOI:10.3970/cmes.2012.086.241

    Abstract In this study, we propose a novel method to accelerate the process of elastodynamic analysis in 3D problems with BEM (boundary element method). With applying the DRBEM (dual reciprocity boundary element method) to form new integral equations for reducing complexity;the modified FMM (fast multipole method)is introduced to simplify the computation process and save storage space by avoiding intermediate coefficientmatrices. At the same time, FMM-DRBEM is reprogrammed in parallel byapplying GPU with CUDA (Compute Unified Device Architecture)for improving efficiency further.The main features in this paper are: ( 1 )with respect to defects of classical method for elastodynamic, modified FMM-DRBEM algorithm is… More >

  • Open Access

    ARTICLE

    Fast and High-Resolution Optical Inspection System for In-Line Detection and Labeling of Surface Defects

    M. Chang1,2,3, Y. C. Chou1,2, P. T. Lin1,2, J. L. Gabayno2,4

    CMC-Computers, Materials & Continua, Vol.42, No.2, pp. 125-140, 2014, DOI:10.3970/cmc.2014.042.125

    Abstract Automated optical inspection systems installed in production lines help ensure high throughput by speeding up inspection of defects that are otherwise difficult to detect using the naked eye. However, depending on the size and surface properties of the products such as micro-cracks on touchscreen panels glass cover, the detection speed and accuracy are limited by the imaging module and lighting technique. Therefore the current inspection methods are still delegated to a few qualified personnel whose limited capacity has been a huge tradeoff for high volume production. In this study, an automated optical technology for in-line surface defect inspection is developed… More >

Displaying 1-10 on page 1 of 9. Per Page