This quantity comprises 27 contributions to the Forth Russian-German complex examine Workshop on Computational technology and excessive functionality Computing provided in October 2009 in Freiburg, Germany. The workshop was once prepared together via the excessive functionality Computing heart Stuttgart (HLRS), the Institute of Computational applied sciences of the Siberian department of the Russian Academy of Sciences (ICT SB RAS) and the element of utilized arithmetic of the college of Freiburg (IAM Freiburg) The contributions variety from laptop technology, arithmetic and excessive functionality computing to purposes in mechanical and aerospace engineering. They exhibit a wealth of theoretical paintings and simulation adventure with a possible of bringing jointly theoretical mathematical modelling and utilization of excessive functionality computing structures offering the state-of-the-art of computational technologies.

A register-file with 128 registers of 128 bit width holds the vector operands. Two instructions are issued per clock cycle and executed in order. Four single precision floating point multiply-add instructions may be handled per cycle. 2 GHz processor is around 200 GFLOP/s. The single precision units deliver results rounded to zero restricting their usability. The SPEs IEEE double precision performance of aggregated 26 GFLOP/s ˘ Zs ´ PCs peak performance. The is by far less but still four times more than todayâA main memory bandwidth peak performance relation of 1 B/FLOP is a reasonable value for double precision results.

Phys. Eng. Sci. 455, 3107–3128 (1999) 6. : Flooding and drying in discontinuous Galerkin finite-element discretizations of shallow-water equations. Part 1: One dimension. J. Sci. Comput. 22-23, 47–82 (2005) 7. : Finite-volume model for shallow-water flooding of arbitrary topography. J. Hydraul. Eng. 128, 289–298 (2002) 8. : A wetting and drying treatment for the Runge-Kutta discontinuous Galerkin solution to the shallow water equations. Comput. Methods Appl. Mech. Engrg. 198, 1548–1562 (2009) 9.

The PowerPC processor allows for two hardware threads and operates in order. In order suppresses the ability of reordering instructions at run time for better filling the pipelines but simplifies the architecture and allows for higher frequencies. The burden for the compiler is higher as is seen for the IA64 processors. M. Resch and U. Küster The Altivec/VMX vector part of the PowerPC allows for relatively high single precision floating point performance. Instruction and data L1-caches have 32 KB.

