Analysis of Schedule and Layout Tuning for Sparse Matrices With Compound Entries on GPUs
Computer Graphics Forum
Large sparse matrices with compound entries, i.e. complex and quaternionic matrices as well as matrices with dense blocks, are a core component of many algorithms in geometry processing, physically based animation and other areas of computer graphics. We generalize several matrix layouts and apply joint schedule and layout autotuning to improve the performance of the sparse matrix-vector product on massively parallel graphics processing units. Compared to schedule tuning without layout tuning, we achieve speedups of up to 5.5×. In comparison to cuSPARSE, we achieve speedups of up to 4.7×.
GPU Data Structures and Code Generation for Modeling, Simulation, and Visualization
Darmstadt, TU., Diss., 2019
Virtual prototyping, the iterative process of using computer-aided (CAx) modeling, simulation, and visualization tools to optimize prototypes and products before manufacturing the first physical artifact, plays an increasingly important role in the modern product development process. Especially due to the availability of affordable additive manufacturing (AM) methods (3D printing), it is becoming increasingly possible to manufacture customized products or even for customers to print items for themselves. In such cases, the first physical prototype is frequently the final product. In this dissertation, methods to efficiently parallelize modeling, simulation, and visualization operations are examined with the goal of reducing iteration times in the virtual prototyping cycle, while simultaneously improving the availability of the necessary CAx tools. The presented methods focus on parallelization on programmable graphics processing units (GPUs). Modern GPUs are fully programmable massively parallel manycore processors that are characterized by their high energy efficiency and good priceperformance ratio. Additionally, GPUs are already present in many workstations and home computers due to their use in computer-aided design (CAD) and computer games. However, specialized algorithms and data structures are required to make efficient use of the processing power of GPUs. Using the novel GPU-optimized data structures and algorithms as well as the new applications of compiler technology introduced in this dissertation, speedups between approximately one (10×) and more than two orders of magnitude (> 100×) are achieved compared to the state of the art in the three core areas of virtual prototyping. Additionally, memory use and required bandwidths are reduced by up to nearly 86%. As a result, not only can computations on existing models be executed more efficiently but larger models can be created and processed as well. In the area of modeling, efficient discrete mesh processing algorithms are examined with a focus on volumetric meshes. In the field of simulation, the assembly of the large sparse system matrices resulting from the finite element method (FEM) and the simulation of fluid dynamics are accelerated. As sparse matrices form the foundation of the presented approaches to mesh processing and simulation, GPU-optimized sparse matrix data structures and hardware- and domain-specific automatic tuning of these data structures are developed and examined as well. In the area of visualization, visualization latencies in remote visualization of cloud-based simulations are reduced by using an optimizing query compiler. By using hybrid visualization, various user interactions can be performed without network round trip latencies.
Integrating Server-based Simulations into Web-based Geo-applications
Eurographics 2019. Short Papers
Annual Conference of the European Association for Computer Graphics (Eurographics) <40, 2019, Genoa, Italy>
In this work, we present a novel approach for combining fluid simulations running on a GPU server with terrain rendered by a web-based 3D GIS system. We introduce a hybrid rendering approach, combining server-side and client-side rendering, to interactively display the results of a shallow water simulation on client devices using web technology. To display water and terrain in unison, we utilize image merging based on depth values.We extend it to deal with numerical and compression artifacts as well as Level-of-detail rendering and use Depth Image Based Rendering to counteract network latency.
Joint Schedule and Layout Autotuning for Sparse Matrices with Compound Entries on GPUs
Vision, Modeling, and Visualization
Vision, Modeling, and Visualization (VMV) <24, 2019, Rostock, Germany>
Large sparse matrices with compound entries, i.e., complex and quaternionic matrices as well as matrices with dense blocks, are a core component of many algorithms in geometry processing, physically based animation, and other areas of computer graphics. We generalize several matrix layouts and apply joint schedule and layout autotuning to improve the performance of the sparse matrix-vector product on massively parallel graphics processing units. Compared to schedule tuning without layout tuning, we achieve speedups of up to 5:5x. In comparison to cuSPARSE, we achieve speedups of up to 4:7x.
Tetrahedral Mesh Processing and Data Structures for Adaptive Volumetric Mesh Booleans on GPUs
Darmstadt, TU, Master Thesis, 2019
The virtual prototyping process is time consuming and laborious. Especially the necessity of returning to CAD after CAE analysis imposes development times. In order to allow for immediate editing of FEM meshes, this thesis attempts to devise an efficient concept for adaptive boolean operations on tetrahedral FEM meshes. Due to the impressive aggregated processing power of GPUs, the proposed concept utilizes the GPU. This thesis presents algorithms and data structures amenable to tetrahedral mesh processing on the GPU. It presents an efficient adaptive subdivision refinement algorithm. This thesis also introduces a new spatial data structure for efficient traversal and construction on the GPU. Another contribution is a GPU efficient mesh facet classification procedure, which in conjunction with a GPU parallel mesh composition procedure enables rapid construction of result meshes. Additionally, this thesis presents a parallel tetrahedral mesh optimization procedure for GPUs. The proposed concept allows for basic adaptive boolean operations on tetrahedral meshes accelerated by the GPU. With the contributed tetrahedral mesh processing functionality, development times in the virtual prototyping process reduce by at least a factor of 50x. This is a compelling result for tetrahedral mesh processing on the GPU.
Continuous Property Gradation for Multi-material 3D-printed Objects
Solid Freeform Fabrication 2018: Proceedings of the 29th Annual International Solid Freeform Fabrication Symposium - An Additive Manufacturing Conference
Annual International Solid Freeform Fabrication Symposium - An Additive Manufacturing Conference <29, 2018, Austin, TX, USA>
Modern AM processes allow for printing multiple materials. The resulting objects can be stiff/dense in some areas and soft/porous in others, resulting in distinct physical properties. However, modeling material gradients is still tedious with current approaches, especially when smooth transitions are required. Current approaches can be distinguished into a) NURBS-BReps-based and b) voxel-based. In case of NURBS-BReps, discrete material distributions can be modeled by manually introducing separate shells inside the object; smooth gradation can only be approximated in discrete steps. For voxel representations, gradation is discrete by design and comes along with an approximation error. In addition, interacting on a per-voxel basis is tedious for the designer/engineer. We present a novel approach for representing material gradients in volumetric models using subdivision schemes, supporting continuity and providing elegant ways for interactive modeling of locally varying properties. Additionally, the continuous volumetric representation allows for on-demand sampling at any resolution required by the 3D printer.
Deconstruction Project Planning of Existing Buildings Based on Automated Acquisition and Reconstruction of Building Information
Automation in Construction
During their lifecycles, buildings are changed and adapted to the requirements of generations of users, residents and proprietaries over several decades. At the end of their life time, buildings undergo either retrofit or deconstruction (and replacement) processes. And, modifications and deviations of the original building structure, equipment and fittings as well as the deterioration and contamination of buildings are often not well documented or only available in an outdated and unstructured way. Thus, in many existing buildings, incomplete, obsolete or fragmented building information is predominating and hampering retrofit and deconstruction project planning. To plan change or deconstruction measures in existing buildings, buildings are audited manually or with stationary laser scans which requires great effort of skilled staff and expensive equipment. Furthermore, current building information models or deconstruction planning systems are often not able to deal with incomplete building information as it occurs in existing buildings. We develop a combined system named ResourceApp of a hardware sensor with software modules for building information acquisition, 3D reconstruction, object detection, building inventory generation and optimized project planning. The mobile and wearable system enables planner, experts or decision makers to inspect a building and at the same time record, analyze, reconstruct and store the building digitally. For this purpose, a Kinect sensor acquires point clouds and developed algorithms analyze them in real-time to detect construction elements. From this information, a 3D building model and building inventory is automatically derived. Then, the generated building reconstruction information is used for optimized project planning with a solution algorithm of the multi-mode resource-constrained project scheduling problem (MRCPSP) at hand. In contrast to existing approaches, the system allows mobile building recording during building walkthrough, real-time reconstruction and object detection. And, based on the automatically captured and processed building conditions by sensor data, the system performs an integrated project planning of the building deconstruction with available resources and the required decontamination and deconstruction activities. Furthermore, it optimizes time and cost considering secondary raw material recovery, usage of renewable resources, staff qualification, onsite logistics, material storage and recycling options. Results from field tests on acquisition, reconstruction and deconstruction planning are presented and discussed in an extensive non-residential case study. The case study shows that the building inventory masses are quite well approximated and project planning works well based on the chosen methods. Nevertheless, future testing and parameter adjustment for the automated data processing is needed and will further improve the systems' quality, effectiveness and accuracy. Future research and application areas are seen in the quantification and analysis of the effects of missing data, the integration of material classification and sampling sensors into the system, the system connection to Building Information Modelling (BIM) software via a respective interface and the transfer and extension to retrofit project planning.
GPU-based Polynomial Finite Element Matrix Assembly for Simplex Meshes
Computer Graphics Forum
Pacific Conference on Computer Graphics and Applications (PG) <26, 2018, Hong Kong, China>
In this paper, we present a matrix assembly technique for arbitrary polynomial order finite element simulations on simplex meshes for graphics processing units (GPU). Compared to the current state of the art in GPU-based matrix assembly, we avoid the need for an intermediate sparse matrix and perform assembly directly into the final, GPU-optimized data structure. Thereby, we avoid the resulting 180% to 600% memory overhead, depending on polynomial order, and associated allocation time, while simplifying the assembly code and using a more compact mesh representation. We compare our method with existing algorithms and demonstrate significant speedups.
Copyright: This is the accepted version of the following article: Mueller‐Roemer, J. S., and A. Stork. "GPU-based Polynomial Finite Element Matrix Assembly for Simplex Meshes." Computer Graphics Forum 37, no. 7 (2018): 443-454, which has been published in final form at http://onlinelibrary.wiley.com. This article may be used for non-commercial purposes in accordance with the Wiley Self-Archiving Policy [http://olabout.wiley.com/WileyCDA/Section/id-820227.html].
GPU-basierte Shallow Water Simulation
Darmstadt, TU, Bachelor Thesis, 2018
Menschliche Eingriffe in natürliche Flussläufe sowie die steigende Urbanisierung verstärken das Risiko von Schäden durch Hochwasser insbesondere in bevölkerungsreichen Gebieten. Vergangene Ereignisse wie das Elbhochwasser von 2002, 2006, 2013 und im Sommer 2016 haben gezeigt, dass das Risiko für Mensch und Material durch Hochwasser in den letzten Jahren gestiegen ist. Zur Unterstützung in der Planung für den Katastrophenfall, sowie in der Vorhersage, werden numerische Simulationen genutzt, um den Wasserverlauf sowie das Abflussverhalten abzuschätzen. Um in der Vorhersage nutzbar zu sein, müssen diese Simulationen schneller als Echtzeit laufen. Da die zu simulierenden Gebiete oft mehrere Quadratkilometer abdecken, werden dazu in der Regel Approximationen wie die Shallow Water Equations, auch unter dem Namen Saint-Venant-Gleichungen bekannt, eingesetzt (siehe z.B. ). Dabei handelt es sich um eine Annäherung der Navier-Stokes-Gleichungen für allgemeine Fluiddynamik für Fälle in denen der Druck näherungs-weise hydrostatisch ist und das Fluidvolumen durch ein Höhenfeld beschrieben werden kann. Neben Approximationen ist es wichtig, die vorhandene Hardware optimal zu nutzen. Sei es durch spezialisierte Datenstrukturen (siehe z.B. ) oder durch Parallelisierung auf massiv parallelen Graphikprozessoren (GPUs), wie es Brodtkorb et al.  und Vacondio et al.  demonstriert haben. Ziel dieser Arbeit ist es, eine Auswahl an Verfahren bezüglich ihrer Eignung für die Berechnung auf GPUs zu treffen, diese Auswahl zu implementieren und in Bezug auf Geschwindigkeit und Präzision zu vergleichen. Dabei sind nur Verfahren, die auch Nass-Trocken-Fronten beschreiben können, in Betracht zu ziehen, da diese in der Überflutungssimulation notwendig sind. Literaturverzeichnis:  F. Aureli, A. Maranzoni, P. Mignosa und C. Ziveri, „A weighted surface-depth gradient method for the numerical integration of the 2D shallow water equations with topography,“ Advances in Water Resources, Bd. 31, Nr. 7, pp. 962-974, 2008.  L. Qiuhua und A. Borthwick, „Adaptive quadtree simulation of shallow flows with wet-dry fronts over,“ Computers & Fluids, Bd. 38, pp. 221-234, 2009.  A. Brodtkorb, M. Sætra und M. Altinakar, „Efficient shallow water simulations on GPUs: Implementation, visualization, verification, and validation,“ Computers & Fluids, Bd. 55, pp. 1-12, 2012.  R. Vacondio, A. Dal Palù und P. Mignosa, „GPU-enhanced Finite Volume Shallow Water solver for fast flood simulations,“ Environmental Modeling & Software, Bd. 57, pp. 60-75, 2014.
Integrating Interactive Design and Simulation for Mass Customized 3D-Printed Objects - A Cup Holder Example
Solid Freeform Fabrication 2017: Proceedings of the 28th Annual International Solid Freeform Fabrication Symposium - An Additive Manufacturing Conference
Annual International Solid Freeform Fabrication Symposium - An Additive Manufacturing Conference <28, 2017, Austin, USA>
We present an approach for integrating interactive design and simulation for customizing parameterized 3D models. Instead of manipulating the mesh directly, a simplified interface for casual users allows for adapting intuitive parameters, such as handle diameter or height of our example object - a cup holder. The transition between modeling and simulation is performed with a volumetric subdivision representation, allowing direct adaption of the simulation mesh without re-meshing. Our GPU-based FEM solver calculates deformation and stresses for the current parameter configuration within seconds with a pre-defined load case. If the physical constraints are met, our system allows the user to 3D print the object. Otherwise, it provides guidance which parameters to change to optimize stability while adding as little material as possible based on a finite differences optimization approach. The speed of our GPU-solver and the fluent transition between design and simulation renders the system interactive, requiring no pre-computation.
Ternary Sparse Matrix Representation for Volumetric Mesh Subdivision and Processing on GPUs
Computer Graphics Forum
Eurographics Symposium on Geometry Processing (SGP) <15, 2017, London, UK>
In this paper, we present a novel volumetric mesh representation suited for parallel computing on modern GPU architectures. The data structure is based on a compact, ternary sparse matrix storage of boundary operators. Boundary operators correspond to the first-order top-down relations of k-faces to their (k-1)-face facets. The compact, ternary matrix storage format is based on compressed sparse row matrices with signed indices and allows for efficient parallel computation of indirect and bottomup relations. This representation is then used in the implementation of several parallel volumetric mesh algorithms including Laplacian smoothing and volumetric Catmull-Clark subdivision. We compare these algorithms with their counterparts based on OpenVolumeMesh and achieve speedups from 3× to 531×, for sufficiently large meshes, while reducing memory consumption by up to 36%.
Copyright: This is the accepted version of the following article: Mueller‐Roemer, J. S., C. Altenhofen, and A. Stork. "Ternary Sparse Matrix Representation for Volumetric Mesh Subdivision and Processing on GPUs." Computer Graphics Forum 36, no. 5 (2017): 59-69, which has been published in final form at http://onlinelibrary.wiley.com. This article may be used for non-commercial purposes in accordance with the Wiley Self-Archiving Policy [http://olabout.wiley.com/WileyCDA/Section/id-820227.html].
Adaptives und hybrides SLAM für handgeführte RGBD-Kameras
Darmstadt, TU, Master Thesis, 2016
Mit der steigenden Beliebtheit von RGBD-Sensoren wurde viel Forschung im Bereich der Aufnahme und Rekonstruktion von dreidimensionalen Umgebungen mit Hilfe von solchen Sensoren betrieben. Für die Konstruktion muss das sogenannte Simultaneous Localization and Mapping (SLAM)-Problem gelöst werden. Die meisten RGBD-SLAM-Systeme verwenden hierbei den punktbasierten Iterative Closest Point (ICP)-Algorithmus. Auch wenn ICP ein gut untersuchter Algorithmus ist, so stößt er bei verrauschten Daten und besonders bei texturarmen Bereichen mit wenigen geometrischen Merkmalen, wie z.B. großen leeren Flächen, auf Probleme. Eine Option, diese Limitierung anzugehen, ist das zusätzliche Ausnutzen von Ebenen in der Szene, besonders da sie die häufigste Form in von Menschen erbauten Innenräumen und Außenanlagen sind. Taguchi et al. [TJRF13] veröffentlichte 2013 die erste globale Registrierungsmethode, in welcher Punkt-zu-Punkt- und Ebene-zu-Ebene-Korrespondenzen zu einem echtzeitfähigen SLAM-System vereint werden. Kurz darauf folgte die Publikation von Ataer-Cansizoglu et al. [ACTRG13], welche zusätzlich ein Bewegungsvorhersage-Modell ausnutzt, um Korrespondenzen zu bestimmen. Ein Nachteil dieser Verfahren ist die hohe Verarbeitungszeit eines Registrierungsschrittes. Dieser bewirkt, dass die Verfahren nicht in der Lage sind, interaktive Rekonstruktionen durchzuführen. Das Ziel dieser Arbeit ist die Implementierung eines SLAM-Algorithmus für handgeführte RGBDKameras, der sowohl Punkte, als auch Flächen zur Registrierung nutzt. Im Gegensatz zu bestehenden Verfahren wird in dieser Arbeit ein lokaler Registrierungsalgorithmus umgesetzt. Flächenmerkmale werden bevorzugt verwendet, da ihre Anzahl in Szenen signifikant geringer ist als die von Punkten. Das ermöglicht eine schnellere Korrespondenzsuche und Registrierung. Dem zugrundeliegenden RANSACbasierten Algorithmus reicht bereits eine minimale Anzahl an Korrespondenzen aus, um die Sensorpose zu bestimmen. Somit ist der Algorithmus in der Lage, die Registrierung auch in texturarmen Bereichen mit wenigen geometrischen Merkmalen durchzuführen, in denen Techniken, welche nur Punkte benutzen, scheitern. Des Weiteren ermöglicht der lokale Registrierungsansatz eine interaktive Nutzung, um dem Nutzer in Echtzeit Rückmeldung über den Registrierungsprozess zu geben. Zusätzlich implementierte Erweiterungen, welche die detektierten Flächeninformationen zur Geometriekorrektur ausnutzen, unterstützen den Registrierungsvorgang. Durchgeführte Experimente demonstrieren eine interaktive Rekonstruktion von Innenräumen mit einer handgeführten RGBD-Kamera, einer Kinect. Zudem weist das System im Gegensatz zu vergleichbaren hybriden Systemen eine sechsfach höhere Rekonstruktionsrate auf. Bei der Gegenüberstellung anhand eines Benchmark-Datensatzes für RGBD-Sensoren konnte des Weiteren in texturarmen Umgebungen eine Überlegenheit gegenüber punktbasierten Verfahren nachgewiesen werden.
JIT-Compilation for Interactive Scientific Visualization
WSCG 2016. Short Papers Proceedings
International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision (WSCG) <24, 2016, Plzen, Czech Republic>
Due to the proliferation of mobile devices and cloud computing, remote simulation and visualization have become increasingly important. In order to reduce bandwidth and (de)serialization costs, and to improve mobile battery life, we examine the performance and bandwidth benefits of using an optimizing query compiler for remote postprocessing of interactive and in-situ simulations. We conduct a detailed analysis of streaming performance for interactive simulations. By evaluating pre-compiled expressions and only sending one calculated field instead of the raw simulation results, we reduce the amount of data transmitted over the network by up to 2/3 for our test cases. A CPU and a GPU version of the query compiler are implemented and evaluated. The latter is used to additionally reduce PCIe bus bandwidth costs and provides an improvement of over 70% relative to the CPU implementation when using a GPU-based simulation back-end.
A Cut-Cell Geometric Multigrid Poisson Solver for Fluid Simulation
Computer Graphics Forum
Annual Conference of the European Association for Computer Graphics (Eurographics) <36, 2015, Zürich, Switzerland>
We present a novel multigrid scheme based on a cut-cell formulation on regular staggered grids which generates compatible systems of linear equations on all levels of the multigrid hierarchy. This geometrically motivated formulation is derived from a finite volume approach and exhibits an improved rate of convergence compared to previous methods. Existing fluid solvers with voxelized domains can directly benefit from this approach by only modifying the representation of the non-fluid domain. The necessary building blocks are fully parallelizable and can therefore benefit from multi- and many-core architectures.
Deformation Simulation using Cubic Finite Elements and Efficient p-multigrid Methods
Computers & Graphics
We present a novel p-multigrid method for efficient simulation of corotational elasticity with higher-order finite elements. In contrast to other multigrid methods proposed for volumetric deformation, the resolution hierarchy is realized by varying polynomial degrees on a tetrahedral mesh. The multigrid approach can be either used as a direct method or as a preconditioner for a conjugate gradient algorithm. We demonstrate the efficiency of our approach and compare it to commonly used direct sparse solvers and preconditioned conjugate gradient methods. As the polynomial representation is defined w.r.t. the same mesh, the update of the matrix hierarchy necessary for corotational elasticity can be computed efficiently. We introduce the use of cubic finite elements for volumetric deformation and investigate different combinations of polynomial degrees for the hierarchy. We analyze the applicability of cubic finite elements for deformation simulation by comparing analytical results in a static and dynamic scenario and demonstrate our algorithm in dynamic simulations with quadratic and cubic elements. Applying our method to quadratic and cubic finite elements results in a speed-up of up to a factor of 7 for solving the linear system.
A p-Multigrid Algorithm using Cubic Finite Elements for Efficient Deformation Simulation
VRIPHYS 14: 11th Workshop in Virtual Reality Interactions and Physical Simulations
International Workshop in Virtual Reality Interaction and Physical Simulations (VRIPHYS) <11, 2014, Bremen, Germany>
We present a novel p-multigrid method for efficient simulation of co-rotational elasticity with higher-order finite elements. In contrast to other multigrid methods proposed for volumetric deformation, the resolution hierarchy is realized by varying polynomial degrees on a tetrahedral mesh. We demonstrate the efficiency of our approach and compare it to commonly used direct sparse solvers and preconditioned conjugate gradient methods. As the polynomial representation is defined w.r.t. the same mesh, the update of the matrix hierarchy necessary for co-rotational elasticity can be computed efficiently. We introduce the use of cubic finite elements for volumetric deformation and investigate different combinations of polynomial degrees for the hierarchy. We analyze the applicability of cubic finite elements for deformation simulation by comparing analytical results in a static scenario and demonstrate our algorithm in dynamic simulations with quadratic and cubic elements. Applying our method to quadratic and cubic finite elements results in speed up of up to a factor of 7 for solving the linear system.
Wind Tunnel Test and CFD/CAA Analysis on a Scaled Model of a Nose Landing Gear
Greener Aviation. Clean Sky Breakthroughs and Worldwide Status
Conference "Greener Air" <2014, Brussels, Belgium>
In work package 2.2.4 "NLG Low-Noise Enabling Technologies" of the Clean Sky GRA LNC project, the Fraunhofer Institute proposes hubcaps for reducing noise from a nose landing gear (NLG) as the most promising solution. The purpose of this paper is to prove the effect of the hubcaps experimentally and numerically. A simplified and 1:5-scaled model of a NLG was first created by the rapid prototyping technique together with hubcaps that can cover both the outer and inner hub cavities. Noise radiated from various NLG configurations with and without hubcaps were measured during they were placed in the wind tunnel. In the configuration without hubcaps, two major noise peaks in addition to a continuous spectrum were observed in the direction parallel to the wheel axle. When the inner hubcaps were attached to the NLG, the levels of the peaks were significantly reduced. The outer caps have no effects on the noise reduction. Nearly the same noise spectrum as the original no-hubcap configuration was observed. Although the peaks were not clearly observed in the direction perpendicular to the axle, the same noise reduction could be recognized in the inner-hubcap configuration. In the numerical examination, a stationary CFD analysis with a k-\\'0f turbulence model was first performed and a CAA analysis was then carried out based on Lighthill's aeroacoustic analogy after reconstructing a time-varying turbulent flow by a stochastic noise generation and radiation model. In the CAA analysis of the no-hubcap configuration, a strong fluctuation in the right and left inner hub cavities, where pressure is oscillating alternately, was observed. This fluctuation served as a dipole noise source whose direction is parallel to the wheel axle. The simulated spectrum of far field sound pressure in this direction has the peaks corresponding to the ones experimentally observed. In the hubcap configuration, the pressure fluctuation in the inner hubcap cavities was greatly reduced. Because of this, the noise peaks were well depressed. Due to the dipole characteristics of the noise source, no clear peaks were simulated in the far field spectrum in the direction perpendicular to the axle. In conclusion, the effectiveness of the inner hubcaps has been proved in the wind tunnel experiment and confirmed in the numerical analysis. The mechanism of noise reduction by the inner caps has also been clarified.