Refine
H-BRS Bibliography
- yes (19)
Departments, institutes and facilities
Document Type
- Conference Object (16)
- Article (2)
- Part of a Book (1)
Year of publication
Language
- English (19)
Keywords
- SpMV (3)
- GPU (2)
- Single Instruction Multiple Data (SIMD) (2)
- Sparse Matrix Vector multiply (SpMV) (2)
- intrinsics (2)
- Augmented Reality (1)
- CPU (1)
- CSR5BC (1)
- CUDA (1)
- Electromagnetic Fields (1)
We present an interactive system that uses ray tracing as a rendering technique. The system consists of a modular Virtual Reality framework and a cluster-based ray tracing rendering extension running on a number of Cell Broadband Engine-based servers. The VR framework allows for loading rendering plugins at runtime. By using this combination it is possible to simulate interactively effects from geometric optics, like correct reflections and refractions.
We present our approach to extend a Virtual Reality software framework towards the use for Augmented Reality applications. Although VR and AR applications have very similar requirements in terms of abstract components (like 6DOF input, stereoscopic output, simulation engines), the requirements in terms of hardware and software vary considerably. In this article we would like to share the experience gained from adapting our VR software framework for AR applications. We will address design issues for this task. The result is a VR/AR basic software that allows us to implement interactive applications without fixing their type (VR or AR) beforehand. Switching from VR to AR is a matter of changing the configuration file of the application. We also give an example of the use of the extended framework: Augmenting the magnetic field of bar magnets in physics classes. We describe the setup of the system and the real-time calculation of the magnetic field, using a GPU.
This paper describes the work done at our Lab to improve visual and other quality of Virtual Environments. To be able to achieve better quality we built a new Virtual Environments framework called basho. basho is a renderer independent VE framework. Although renderers are not limited to graphics renderers we first concentrated on improving visual quality. Independence is gained from designing basho to have a small kernel and several plug-ins.
Todays Virtual Environment frameworks use scene graphs to represent virtual worlds. We believe that this is a proper technical approach, but a VE framework should try to model its application area as accurate as possible. Therefore a scene graph is not the best way to represent a virtual world. In this paper we present an easily extensible model to describe entities in the virtual world. Further on we show how this model drives the design of our VE framework and how it is integrated.
Ray Tracing, accurate physical simulations with collision detection, particle systems and spatial audio rendering are only a few components that become more and more interesting for Virtual Environments due to the steadily increasing computing power. Many components use geometric queries for their calculations. To speed up those queries spatial data structures are used. These data structures are mostly implemented for every problem individually resulting in many individually maintained parts, unnecessary memory consumption and waste of computing power to maintain all the individual data structures. We propose a design for a centralized spatial data structure that can be used everywhere within the system.
Most VE-frameworks try to support many different input and output devices. They do not concentrate so much on the rendering because this is tradi- tionally done by graphics workstation. In this short paper we present a modern VE framework that has a small kernel and is able to use different renderers. This includes sound renderers, physics renderers and software based graphics renderers. While our VE framework, named basho is still under development we have an alpha version running under Linux and MacOS X.
We present basho, a light weight and easily extendable virtual environment (VE) framework. Key benefits of this framework are independence of the scene element representation and the rendering API. The main goal was to make VE applications flexible without the need to change them, not only by being independent from input and output devices. As an example, with basho it is possible to switch from local illumination models to ray tracing by just replacing the renderer. Or to replace the graphical representation of the scene elements without the need to change the application. Furthermore it is possible to mix rendering technologies within a scene. This paper emphasises on the abstraction of the scene element representation.
The SpMV operation -- the multiplication of a sparse matrix with a dense vector -- is used in many simulations in natural and engineering sciences as a computational kernel. This kernel is quite performance critical as it is used, e.g.,~in a linear solver many times in a simulation run. Such performance critical kernels of a program may be optimized on certain levels, ranging from using a rather coarse grained and comfortable single compiler optimization switch down to utilizing architecural features by explicitly using special instructions on an assembler level. This paper discusses a selection of such program optimization techniques in this spectrum applied to the SpMV operation. The achievable performance gain as well as the additional programming effort are discussed. It is shown that low effort optimizations can improve the performance of the SpMV operation compared to a basic implementation. But further than that, more complex low level optimizations have a higher impact on the performance, although changing the original program and the readability / maintainability of a program significantly.
Improving the Performance of Parallel SpMV Operations on NUMA Systems with Adaptive Load Balancing
(2018)
For a parallel Sparse Matrix Vector Multiply (SpMV) on a multiprocessor, rather simple and efficient work distributions often produce good results. In cases where this is not true, adaptive load balancing can improve the balance and performance. This paper introduces a low overhead framework for adaptive load balancing of parallel SpMV operations. It uses statistical filters to gather relevant runtime performance data and detects an imbalance situation. Three different algorithms were compared that adaptively balance the load with high quality and low overhead. Results show that for sparse matrices, where the adaptive load balancing was enabled, an average speedup of 1.15 (regarding the total execution time) could be achieved with our best algorithm over 4 different matrix formats and two different NUMA systems.
SpMV Runtime Improvements with Program Optimization Techniques on Different Abstraction Levels
(2016)
The multiplication of a sparse matrix with a dense vector is a performance critical computational kernel in many applications, especially in natural and engineering sciences. To speed up this operation, many optimization techniques have been developed in the past, mainly focusing on the data layout for the sparse matrix. Strongly related to the data layout is the program code for the multiplication. But even for a fixed data layout with an accommodated kernel, there are several alternatives for program optimizations. This paper discusses a spectrum of program optimization techniques on different abstraction layers for six different sparse matrix data format and kernels. At the one end of the spectrum, compiler options can be used that hide from the programmer all optimizations done by the compiler internally. On the other end of the spectrum, a multiplication kernel can be programmed that use highly sophisticated intrinsics on an assembler level that ask for a programmer with a deep understanding of processor architectures. These special instructions can be used to efficiently utilize hardware features in processors like vector units that have the potential to speed up sparse matrix computations. The paper compares the programming effort and required knowledge level for certain program optimizations in relation to the gained runtime improvements.
In this paper, a set of micro-benchmarks is proposed to determine basic performance parameters of single-node mainstream hardware architectures for High Performance Computing. Performance parameters of recent processors, including those of accelerators, are determined. The investigated systems are Intel server processor architectures as well as the two accelerator lines Intel Xeon Phi and Nvidia graphic processors. Results show similarities for some parameters between all architectures, but significant differences for others.