You are not logged in.

**ThaOneDon****Member**

25/06/2016

**??? ... hmm**

Efficient Ray Tracing Kernels for Modern CPU Architectures NanoRT Source Code

Ray stream techniques augment the fast single-ray traversal with increased utilization of CPU vector units and leverage memory bandwidth for batches of rays. Despite their success, the proposed implementations suffer from high bookkeeping cost and batch fragmentation, especially for small batch sizes. Contribution is two-fold:

For Coherent ray sets - a large packet traversal tailored to the BVH4 that is faster than the original BVH2 variant, and

for Incoherent ray batches - a novel implementation of ray streams which reduces the bookkeeping cost while strictly maintaining the preferred traversal order of individual rays.

Offline

**ThaOneDon****Member**

07/2016

**Renderer/Rendering**

State-of-the-Art in GPU-Based Large-Scale Volume Visualization /Source Code for out-of-core Renderer/

Combining the parallel processing power of GPUs with out-of-core methods and data streaming, a major enabler for interactivity is making both the computational and the visualization effort proportional to the amount and resolution of data that is actually visible on screen, i.e., "output-sensitive" algorithms and system designs. This leads to recent output-sensitive approaches that are "ray-guided," "visualization-driven," or "display-aware."

State-of-the-art in Compressed GPU-Based Direct Volume Rendering

Compression and level-of-detail pre-computation does not have to adhere to real-time constraints and can be performed off-line for high quality results. In contrast, adaptive real-time rendering from compressed representations requires fast, transient, and spatially independent decompression.

In this report, we review the existing compressed GPU volume rendering approaches, covering sampling grid layouts, compact representation models, compression techniques, GPU rendering architectures and fast decoding techniques.

**Anti-Aliasing**

Source Code/Paper for "Temporal Reprojection Anti-Aliasing" used in Playdead's INSIDE

Transparency and Anti-Aliasing Techniques for Real-Time Rendering

Filtering Approaches for Real-Time Anti-Aliasing

2 huge papers rounding up and comparing all the most useful Anti-Aliasing techniques used pre-2013.

**Shadows/Shadow Mapping-Volumes**

Deep Partitioned Shadow Volumes Using Stackless and Hybrid Traversals Shader Code ( PSV )

In spite of many works in the literature, Shadow Mapping and Shadow Volumes retain several drawbacks.

While Shadow Maps are not pixel-accurate, they are widely used in practice because they are fast.

Shadow Volumes are less efficient but they are still investigated because they are pixel-accurate.

A huge amount of work tried to improve the shadow maps accuracy and the shadow volumes efficiency.

Gerhards et al. have recently proposed a novel approach, the Partitioned Shadow Volumes (PSV) algorithm [GMAG15]. This method relies on an old idea [CF89] which was completely different from the original Shadow Volumes algorithm proposed by Crow [Cro77]. Thanks to a specific partitioning strategy, PSV have many advantages and allow real-time object based pixel-accurate shadows. This makes the algorithm an interesting option which has been little explored, contrary to shadow mapping or shadow volumes.

*Last edited by ThaOneDon (2016-08-03 20:04:12)*

Offline

**ThaOneDon****Member**

08/2016**Renderer/Rendering**

Temporal Coherence Methods in Real-Time Rendering

Although graphics cards continue to evolve with an ever-increasing amount of computational power, the speed gain is easily counteracted by increasingly complex and sophisticated shading computations. For real-time applications, the direct consequence is that image resolution and temporal resolution are often the first candidates to bow to the performance constraints.

The underlying observation of methods in this paper are that a higher resolution and frame rate do not necessarily imply a much higher workload, but a larger amount of redundancy and a higher potential for amortizing rendering over several frames.

We describe a general approach, image-space reprojection, with several implementation algorithms that facilitate reusing shading information across adjacent frames. We also discuss data-reuse quality and performance related to reprojection techniques. Finally, in the second half of this survey, we demonstrate various applications that exploit TC in real-time rendering.

**Ambient Obscurance/Occlusion/GI**

A Survey of Volumetric Illumination Techniques for Interactive Volume Rendering DSSDO

In recent years several advanced volumetric illumination techniques to be used in interactive scenarios have been proposed. These techniques claim to have perceptual benefits as well as being capable of producing more realistic volume rendered images. Naturally, they cover a wide spectrum of illumination effects, including varying shading and scattering effects. This survey, reviews and classifies the existing techniques - their technical realization, their performance behavior as well as their perceptual capabilities.

Ambient Occlusion on Mobile: an empirical comparison (Source Code - last pages)

Enormous paper on GI for mobile but applies everywhere really.

**Meshes**

Real-time Rendering Techniques with Hardware Tessellation

State-of-the-Art on Tessellation.

As a result of recent advances in graphics hardware, in particular the GPU tessellation unit, complex geometry can now be generated on-the-fly within the GPU’s rendering pipeline. This has enabled the generation and displacement of smooth parametric surfaces in real-time applications. However, many well established approaches in offline rendering are not directly transferable due to the limited tessellation patterns or the parallel execution model of the tessellation stage. In this survey, we provide an overview of recent work and challenges in this topic by summarizing, discussing, and comparing methods for the rendering of smooth and highly-detailed surfaces in real-time.

Efficient GPU Rendering of Subdivision Surfaces using Adaptive Quadtrees

In this method, a subdivision surface model is rendered in a single pass, without a separate subdivision step. Each quad face is submitted as a single tessellated primitive; a per-face adaptive quadtree is used to map tessellated vertices to the appropriate subdivided face. By traversing the quadtree for each post-tessellation vertex, we are able to accurately and efficiently evaluate the limit surface.

We evaluate our method on a variety of assets, and realize performance that can be three times faster than state-of-the-art approaches. In addition, our streaming formulation makes it easier to integrate subdivision surfaces into applications and shader code written for polygonal models.

**Shadows/Shadow Mapping-Volumes**

Fast Percentage Closer Soft Shadows using Temporal Coherence

This method improves the rendering performance of the Percentage Closer Soft Shadows method by exploiting the temporal coherence between individual frames: The costly soft shadow recalculation is saved whenever possible by storing the old shadow values in a screen-space History Buffer. By extending the shadow map algorithm by a so-called Movement Map, we can not only identify regions disoccluded by camera movement, but also robustly detect and update shadows cast by moving objects: Only the shadows in the areas marked red in the right image have to be re-evaluated. This saves rendering time and doubles the soft shadow rendering performance in real-time 3D scenes with both static and dynamic objects.

Revectorization-Based Shadow Mapping Source Code/etc In General: Survey

In this paper, we reduce aliasing with the revectorization based shadow mapping. To effectively reduce the perspective aliasing, we revectorize shadow boundaries based on their discontinuity directions. Then, we take advantage of the discontinuity space to filter the shadow silhouettes, further suppressing the remaining artifacts. To control the filter kernel size, we incorporate percentage-closer filtering into the algorithm. This enables us to reduce jagged shadow boundaries, to simulate penumbra and to provide high-quality screen-space anti-aliasing. Compared to previous techniques, we show that shadow revectorization produces less artifacts, consumes less memory and offers real-time performance.

**Textures**

Real-time BC6H Compression on GPU Source Code for the Compressor Intel`s ISPC Compressor

BC6H is a lossy block based compression format designed for compressing half floating point textures. It’s fully hardware supported starting from DX11 and current-gen consoles. It uses fixed size 4x4 texel blocks, which is very convenient for native hardware decompression. It doesn’t support alpha and sampling alpha is required to return one. It has 6:1 compression ratio or in other words it uses 8 bits per texel. All these properties make it a very convenient replacement of hacky encodings like RGBE, RGBM or RGBK, which we used in previous generation games for storing HDR textures.

*Last edited by ThaOneDon (2016-08-30 10:02:18)*

Offline

**ThaOneDon****Member**

09/2016**Ambient Obscurance/Occlusion/GI**

Interactive Global Illumination Effects Using Deterministically Directed Layered Depth Maps More(Source Code)

A layered depth map(LDM) is an extension of the well-known depth map used in rasterization. Multiple layered depth maps can be used as a coarse scene representation.

LDMs were invented to solve the issue of rendering transparent objects in a rasterization pipeline [MCTB11]. Specifically, to implement so-called order-independent transparency(OIT).

Originally constructed using the so called depth peeling method which employed the shadow map depth test to "peel off" each layer. Atomic operations and shader storage buffer objects (SSBOs) allow for more advanced implementations.

We develop two global illumination methods which make use of such scene representations. The first is an interactive ambient occlusion method. The second is an interactive singlebounce indirect lighting method based on photon differentials.

Expressive Single Scattering for Light Shaft Stylization

This paper presents several strategies to stylize volumetric single scattering, overcoming the difficulty that light shafts depend on the layout of an entire environment. Our approach is compatible with animated scenes and relies on very efficient solutions, which makes it ready to be used for real-time applications, and enables a quick exploration of the various settings. The techniques are applied at a global scope – i.e., for the whole scene – but can also be used to make local changes to the scattering behavior.

Image-based occluder manipulations modify the complexity of the scattering appearance and are controlled by only a few parameters. Transfer functions allow us to interactively design a general mood and the result can even be transferred to other scenes. As an alternative, users can design a light map to modify the light emittance by relying on an optimization process which ensures that user-defined constraints are respected, which are defined using a painting metaphor.

Furthermore, we employ an efficient algorithm to approximate heterogeneity and enable the control of scattering intensity, noise frequency, and heterogeneity ratio. Finally, our solution supports key-framed animation to steer the stylization over time.

Guarded order independent transparency

Real-Time Deep Image Rendering and Order Independent Transparency

Faster Transparency from Low Level Shader Optimisation (Source Code)

Recent graphics hardware features, namely atomic operations and dynamic memory location writes, now make it possible to capture and store all per-pixel fragment data from the rasterizer in a single pass in what we call a deep image. A deep image provides a state where all fragments are available and gives a more complete image based geometry representation, providing new possibilities in image based rendering techniques.

A core and driving application is order-independent transparency(OIT). A number of deep image sorting improvements are presented, through which an order of magnitude performance increase is achieved, significantly advancing the ability to perform transparency rendering in real time. In the broader context of image based rendering we look at deep images as a discretized 3D geometry representation and discuss sampling techniques for raycasting and antialiasing with an implicit fragment connectivity approach.

**Terrain Rendering/Level-of-Detail(LOD)/Occlusion Culling**

Adaptive level of detail rendering estimation based on wavelet transform

Alternative approach to the adaptive triangulation problem. A technique of terrain rendering which uses wavelet transform to select appropriate LOD is described. This technique is a region-based multi-resolution approach that partitions the terrain into tiles that can be processed independently. The wavelet transform is used as a mathematical framework which localizes rough surface approximation where approximation error has to be controlled. It permits to choose the appropriate resolution according to local characteristics of the examined surface. To avoid visual artifacts like popping, either geomorphs are used or the maximum screen-space error is restricted to one pixel and the C-BDAM method is utilized.

Vertex Discard Occlusion Culling Hierarchical-Z with Compute Shaders (video)

with Voxels

Performing visibility determination in densely occluded environments is essential to avoid rendering unnecessary objects and achieve high frame rates. In this implementation, the image space Occlusion Culling algorithm is done completely in GPU, avoiding the latency introduced by returning the visibility results to the CPU. It utilizes the GPU rendering power to construct the Occlusion Map and then performs the image space visibility test by splitting the region of the screen space occludees into parallelizable blocks. This implementation is especially applicable for low end graphics hardware and the visibility results are accessible by GPU shaders. It can be applied with excellent results in scenes where pixel shaders alter the depth values of the pixels, without interfering with hardware Early-Z culling methods. We demonstrate the benefits and show the results of this method in real-time densely occluded scenes.

Aggressive Region-Based Visibility Computation Using Importance Sampling

An aggressive region-based visibility sampling algorithm for general 3D scenes. The visibility portal in the scene is a cue for guiding the visibility sampling. Our algorithm extends the image space sampling algorithm by measuring the size and orientation of portals in the scene, and results in a predictable sampling mechanism of visible set in the image space. An importance visibility sampling scheme in the image space is proposed based on the visibility portals, and used to guide the sampling process. Each newly added visibility sample is placed at the position potentially visible for most missing polygons. Experiments show that our sampling approach can effectively improve the performance of visibility sampling in both the convergence rate and the visual quality compared to the previous approaches.

Downsampling Scattering Parameters for Rendering Anisotropic Media

A new approach to compute scattering parameters at reduced resolutions. Many detailed appearance models involve high-resolution volumetric representations. Such level of detail leads to high storage but is usually unnecessary especially when the object is rendered at a distance. However, naïve downsampling often loses intrinsic shadowing structures and brightens resulting images.

Our method computes scaled phase functions, a combined representation of single-scattering albedo and phase function, and provides significantly better accuracy while reducing the data size by almost three orders of magnitude.

We also show that modularity can be exploited to greatly reduce the amortized optimization overhead by allowing multiple synthesized models to share one set of downsampled parameters. Our optimized parameters generalize well to novel lighting and viewing configurations.

**Compression/Caching/Streaming**

Zstandard(Source Code) - BSD Licensed

Fast lossless compression algorithm. Targeting zlib-level and better.

**State-of-the-Art/Comparisons/Roundups/Surveys**

Rendering view dependent reflections using the graphics card(Source Code)

(Screen space reflections(SSR), parallax-corrected cube mapping(PCCM) and billboard reflections(BBR))

*Last edited by ThaOneDon (2016-11-18 10:25:30)*

Offline

**ThaOneDon****Member**

10/2016**Renderer/Rendering**

PBR (Physically Based Rendering)

Moving FROSTBITE to PBR Detailed Course Notes(Source Code/Algorithms etc)

Progressive Light Transport Simulation on the GPU: Survey and Improvements Implementing PBR In Detail Code

Position-Dependent Importance Sampling of Light Field Luminaires(Supplemental)

Depth-fighting Aware Methods for Multi-fragment Rendering

Multi-fragment rasterization is susceptible to flickering artifacts when two or more visible fragments of the scene have identical depth values. This phenomenon is called coplanarity or Z-fighting and incurs various unpleasant and unintuitive results when rendering complex multi-layer scenes.

In this work, we develop depth-fighting aware algorithms for reducing, eliminating and/or detecting related flaws in scenes suffering from duplicate geometry. We adapt previously presented single and multi-pass rendering methods, providing alternatives for both commodity and modern graphics hardware.

Optimizing the Graphics Pipeline with Compute

Low-level optimization for GCN

SIMD

More specifically, how to render triangles fast, by not rendering so many triangles.

**Ambient Obscurance/Occlusion/GI**

Spherical Illuminance Composition for Real-Time Indirect Illumination(Source Code etc included) More

Another Implementation

(Sample Elimination for Generating Poisson Disk Sample Sets)

The concepts of light transport for the purpose of rendering are well understood, but expensive to calculate. For real-time solutions, simplification is necessary, often at the cost of visual quality.

The proposed method is fast enough to be suitable for real-time applications on contemporary consumer hardware. It is based on the radiosity technique with various adaptions to speed up the computation. An infinite number of light bounces can be calculated iteratively. Indirect light is stored in the form of spherical harmonics (SH). This directional representation increases the quality of the results and enables the use of normal mapping. The idea of irradiance volumes has been incorporated to provide indirect lighting for dynamic objects.

The proposed technique supports full dynamic lighting and works with all commonly used light source models. In addition, area and environment lighting are facilitated.

Furthermore, we present details on how our technique can be implemented on contemporary hardware. Various approaches are explained and compared to give guidelines for practical implementation.

Practical Realtime Strategies for Accurate Indirect Occlusion More(newer paper from 2016)

GTAO, which is able to match a ground truth reference in half a millisecond on current console hardware. This is done by using an alternative formulation of the ambient occlusion equation, and an efficient implementation which distributes computation using spatio-temporal filtering. We then extend GTAO with a novel technique that takes into account near-field global illumination, which is lost when using ambient occlusion alone. Finally, we introduce a technique for specular occlusion, GTSO, symmetric to ambient occlusion which allows to compute realistic specular reflections from probe-based illumination. Our techniques are efficient, give results close to the ray-traced ground truth, and have been integrated in recent AAA console titles.

Screen Space Reflections(Source Code)

kode80's screen space reflections implementation for Unity3D 5. Features screen space ray casting, backface depth buffer for per-pixel geometry thickness, distance attenuated pixel stride, rough/smooth surface reflection blurring, fully customizable for quality/speed and more.

Irradiance regression for efficient final gathering in global illumination

Photon mapping is widely used for global illumination rendering because of its high computational efficiency. But its efficiency is still limited, mainly by the intensive sampling required in final gathering, a process that is critical for removing low frequency artifacts of density estimation. In this paper, we propose a method to predict the final gathering estimation with direct density estimation, thereby achieving high quality global illumination by photon mapping with high efficiency. We first sample the irradiance of a subset of shading points by both final gathering and direct radiance estimation. Then we use the samples as a training set to predict the final gathered irradiance of other shading points through regression.

**Shadows/Shadow Mapping-Volumes**

Fast robust and precise shadow algorithm

The algorithm is based on silhouette shadow volumes and it rivals the standard shadow mapping performance. Our performance is usually superior when compared with high resolution shadow maps. Moreover, it does not suffer from a number of artefacts of shadow mapping and always provides per pixel correct results.

We put all our algorithms evaluating silhouette edges to vertex shaders. Specially precomputed data are fed to the vertex shaders that extrude shadow volume sides just for silhouette edges. Some optimizations are deployed for performance and data size reasons that are important especially on low performance configurations, such as cost-effective tablets and mobile phones.

The paper evaluates our solution on number of models. Our solution performs on par with high resolution omnidirectional shadow mapping.

Real-Time Image-Based Volume Lighting

We propose a two-step, GPU-friendly technique for realtime rendering of heterogeneous participating media under distant environment lighting.

First, our algorithm estimates the spherical scattered radiance at a number of points in the medium and projects this function into the spherical harmonics basis. In the second step we render use the scattered radiance information to compute single scattering by ray-marching.

Our method is easy to implement using GPU shaders and does not require any precomputation, hence supporting dynamic lighting, animated media, dynamic optical properties of the volume, emission and self-shadowing.

Efficient High-Quality Shadow Maps

This thesis provides an efficient GPU implementation of various optimizations to basic shadow mapping. The optimizations, which echo the idea of making full use of the available resolution and precision, are simple to implement, provide a great deal of improvement and allow for some amount of dynamic refinement of shadows with change in the camera view.

**Terrain Rendering/Level-of-Detail(LOD)/Occlusion Culling**

On-the-fly Generation and Rendering of Infinite Cities on the GPU

Traditional approaches rely on evaluating a shape grammar and storing the geometry produced as a preprocessing step. During rendering, the pregenerated data is then streamed to the GPU. By interweaving generation and rendering, we overcome the problems and limitations of streaming pregenerated data. Using our methods of visibility pruning and adaptive level of detail, we are able to dynamically generate only the geometry needed to render the current view in real-time directly on the GPU. We also present a robust and efficient way to dynamically update a scene’s derivation tree and geometry, enabling us to exploit frame-to-frame coherence.

An Adaptive and Hybrid Approach to Revisiting the Visibility Pipeline

We introduce a new visibility paradigm based on the use of Ja1 triangulation structure that avoids the rendering of too much unnecessary primitives in a 3D scene. This structure is capable of executing culling operations in order to deal with the minimum amount of primitives in a scene during the rendering stage, as possible. To do that, we propose to execute the culling by combining the paradigms based on viewing frustum, back-face culling and occlusion culling all using the Ja1 triangulation as a spatial data structure.

To our knowledge, this approach is new in comparison to existing approaches and there is no way to do a comparison because the objective is different of those (real-time and interactive visualization through the web, of massive and large scenarios). To our belief, this approach to occlusion culling is also different from previous known works.

**Compression/Caching/Streaming**

Real-Time 3-D Wavelet Lifting

A single-loop approach designed to transform 3-D data.

This work presents a fast streaming unit for computing a 3-D discrete wavelet transform. The unit can continuously consume source data and instantly produce resulting coefficients. Considering this approach, every input as well as output sample is visited only once. The streaming unit can be further improved by exploiting suitable SIMD instruction set.

**State-of-the-Art/Comparisons/Roundups/Surveys/Analysis**

Rendering massive 3D scenes in real-time

Analyzing Deferred Rendering Techniques

Register Efficient Dynamic Memory Allocator for GPUs

A Survey on Implicit Surface Polygonization

Continuity and Interpolation Techniques More in Detail(Code/Samples/Algorithms)

Vector Field Processing on Triangle Meshes Tangent Vector Fields on Triangulated Surfaces

Recent Advances in Adaptive Sampling and Reconstruction for Monte Carlo Rendering

A survey of photon mapping state-of-the-art research and future challenges

Theory and Numerical Integration of Subsurface Light Transport

A Survey Of Techniques for Approximate Computing

**Animation/Physics**

Deferred Warping

The technique works in two steps: First, the surface deformation of the target object is determined and the resulting transformation field is stored as a matrix texture. Then the matrix texture is used as look-up table to transform a given geometry onto a deformed surface. Splitting the process in two steps yields a large flexibility since different attachment types can be realized by simply defining specific mapping functions.

Our technique can directly handle complex topology changes within the surface. We demonstrate a fast implementation in the vertex shading stage allowing the use of highly decorated surfaces with millions of triangles in real-time.

**Anti-Aliasing**

A Temporal Stable Distance To Edge Anti-aliasing

The implementation can, without any sub-pixel information and by storing extra geometrical data in a pre-render pass, prevent temporal instability and solve aliasing artifacts during a post-render pass. Thus being a real alternative to the state of the art post-processing Anti-Aliasing solutions, in sense of performance and quality in high end game engines and systems.

Right now it uses features only supported by GCN hardware for solving triangle edges. However this feature can easily be removed from the solution making it implementable on a large variety of hardware. If this is the case prototype 1 can be an excellent complement to Anti-Aliasing solutions such as Multi Sampling which can not solve alpha clipped edges.

Triangle-based Geometry Anti-Aliasing(TGAA)

In this paper, we present a new post-processing sub-pixel anti-aliasing algorithm via reliable triangle-based geometry extracted from the rendering pipeline.

The geometry information is employed to recover sub-pixel level structures by estimating coverage areas of relevant triangles.

The method shows good scalability in performance since geometry buffer is stored efficiently and is inexpensive to access. In addition, a linear rather than nearest neighbor color sampling process is incorporated into geometric filter to generate a more accurate anti-aliasing result.

**Shaders**

Extending the Graphics Pipeline with Adaptive, Multi-Rate Shading

Due to complex shaders and high-resolution displays (particularly on mobile graphics platforms), fragment shading often dominates the cost of rendering in games. To improve the efficiency of shading on GPUs, we extend the graphics pipeline to natively support techniques that adaptively sample components of the shading function more sparsely than per-pixel rates.

We perform an extensive study of the challenges of integrating adaptive, multi-rate shading into the graphics pipeline, and evaluate two- and three-rate implementations that we believe are practical evolutions of modern GPU designs.

Towards Automatic Band-Limited Procedural Shaders

This paper explores the problem of analytically computing a band-limited version of a procedural shader as a continuous function of the sampling rate. There is currently no known way of analytically computing these integrals in general. We explore the conditions under which exact solutions are possible and develop several approximation strategies for when they are not.

Rather than addressing rendering time while tolerating a certain amount of infidelity in the resulting image, our approach explicitly addresses the visual property (lower sampling rate) that enables previous techniques to tolerate error in shaders at lower levels of detail. We apply local transformations to the shader program, but produce a single shader with a dependent frequency spectrum.

Compared to supersampling methods, our approach produces shaders that are less expensive to evaluate and closer to ground truth in many cases. Compared to mipmapping or precomputation, our approach produces shaders that support an arbitrary bandwidth parameter and require less storage.

We evaluate our method on a range of spatially-varying shader functions, automatically producing antialiased versions that have comparable error to 4x4 multisampling but can be over an order of magnitude faster.

Fast multi-resolution shading of acquired reflectance using bandwidth prediction

We accelerate the shading of acquired materials by selectively shading image pixels and adapting the number of samples used in the integration across pixels.

Integration with Stochastic Point Processes

Low-Discrepancy Blue Noise Sampling More (Source Code etc)

We derive exact formulae for bias and variance of integral estimates in terms of the spatial or spectral characteristics of integrands, and first and second order product density measures of general point patterns. The formulae allow us to study and design sampling schemes adapted to different classes of integrands by analyzing the effect of sampling density, weighting, and correlations among point locations separately.

We then focus on non-adaptive correlated stratified sampling patterns and specialize the formulae to derive closed-form and easy-to-analyze expressions of bias and variance for various stratified sampling strategies. Based on these expressions, we perform a theoretical error analysis for integrands involving the discontinuous visibility function.

We show that significant reductions in error can be obtained by considering alternative sampling strategies instead of the commonly used random jittering or low discrepancy patterns.

Various such sampling methods are used in rendering anti-aliasing, shadows/occlusion, geometry... etc.

**Meshes**

Multi-Resolution Meshes for Feature-Aware Hardware Tessellation

A general framework for the construction and rendering of non-uniform LODs suitable for hardware tessellation.

Its key component is a novel hierarchical representation of multiresolution meshes that allows us to finely control the topological locations of vertex splits and merges. We thus managed to relax the regularity of fractional tessellation, while retaining the efficiency of the respective GPU’s units.

Within our framework, we presented a dedicated mesh decimation scheme that can be driven by any edge-based error metric. In particular, by applying it with a feature-preserving geometric error, we leveraged hardware tessellation for feature-aware LOD rendering of meshes.

Quantized Global Parametrization

Global surface parametrization often requires the use of cuts or charts due to non-trivial topology. In recent years a focus has been on so-called seamless parametrizations, where the transition functions across the cuts are rigid transformations with a rotation about some multiple of 90◦.

Of particular interest, e.g. for quadrilateral meshing, paneling, or texturing, are those instances where in addition the translational part of these transitions is integral (or more generally: quantized). We show that finding not even the optimal, but just an arbitrary valid quantization (one that does not imply parametric degeneracies), is a complex combinatorial problem.

We present a novel method that allows us to solve it, i.e. to find valid as well as good quality quantizations. It is based on an original approach to quickly construct solutions to linear Diophantine equation systems, exploiting the specific geometric nature of the parametrization problem. We thereby largely outperform the state-of-the-art, sometimes by several orders of magnitude.

**Textures**

GST(GPU-decodable Supercompressed Textures)(Source Code)

Modern GPUs supporting compressed textures allow interactive application developers to save scarce GPU resources such as VRAM and bandwidth. Compressed textures use fixed compression ratios whose lossy representations are significantly poorer quality than traditional image compression formats such as JPEG. We present a new method in the class of supercompressed textures that provides an additional layer of compression to already compressed textures. Our texture representation is designed for endpoint compressed formats such as DXT and PVRTC and decoding on commodity GPUs. We apply our algorithm to commonly used formats by separating their representation into two parts that are processed independently and then entropy encoded. Our method preserves the CPU-GPU bandwidth during the decoding phase and exploits the parallelism of GPUs to provide up to 3X faster decode compared to prior texture supercompression algorithms. Along with the gains in decoding speed, our method maintains both the compression size and quality of current state of the art supercompressed texture representations.

RGBV

The overall result is an image the same size as DXT5, trading some alpha precision for a large reduction in color artifacts. The color map also lends itself well to lossless compression, reducing disk size significantly compared to unmodified DXT when used with wfLZ.

Lightmap Compression

Various tests and observations to find most effective methods.

The main challenge of compressing lightmaps is that often they have a wider range than regular diffuse textures. This range is not as large as in typical HDR textures, but it’s large enough that using regular LDR formats results in obvious quantization artifacts. Lightmaps don’t usually have high frequency details, they are often close to greyscale, and only have smooth variations in the chrominance.

Per-face parameterization for Texture Mapping of Geometry in Real-Time

Ptex-like.

Traditional UV-mapping often causes discontinuities which commonly results in visible seams in the end results. If any change is done to the vertex positions or the topology a remapping of the UV-map has to be done. Mesh colors aims to avoid these problems by skipping the transformation to 2D space as in UV-mapping, and associating color samples directly with the geometry of a mesh.

The results show that mesh colors is a viable alternative in a real-time renderer. Though not as fast as regular UV-mapped textures due to lack of hardware accelerated filtering operations, mesh colors is a realistic alternative for special cases where regular texture-mapping would be cumbersome to work with or produce sub-par results.

Virtual Texturing(Source Code) Software Virtual Textures Virtual Texturing in WebGL

Selection of Just-In-Time texture tiles for the compression of gigapixel textures

ARB_sparse_texture2 Real Virtual Texturing Adaptive Virtual Texture Rendering

Incremental loading of terrain textures

Volume Encoded UV-Maps Bindless Texturing

Virtual texturing is a solution to the problem of real-time rendering of scenes with vast amounts of texture data which does not fit into graphics or main memory. Virtual texturing works by preprocessing the aggregate texture data into equally-sized tiles and determining the necessary tiles for rendering before each frame. These tiles are then streamed to the graphics card and rendering is performed with a special virtual texturing fragment shader that does texture coordinate adjustments to sample from the tile storage texture.

**??? ... hmm**

Dynamic Occlusion with Signed Distance Fields

Free Penumbra Shadows for Raymarching Distance Fields

Raymarching Distance Fields

Ray Marching Distance Fields in Real-time on WebGL

Raymarching Distance Fields: Concepts and Implementation

Enhanced Sphere Tracing hg_sdf library

Vector-to-Closest-Point Octree for Surface Ray-Casting

GPU Ray Tracer Using Ray Marching and Distance Fields

Raymarching is a 3d-rendering technique, praised by programming-enthusiasts for both its simplicity and speed. It has been used extensively in the demoscene, producing low-size executables and amazing visuals.

DIRT: Deferred Image-based Ray Tracing

Our method, designed entirely on the rasterization pipeline, alters the acceleration data structure construction from a per-fragment to a per-primitive basis in order to simultaneously support three important, generally conflicting in prior art, objectives: fast construction times, analytic intersection tests and reduced memory requirements.

In every frame, our algorithm operates in two stages: A compact representation of the scene geometry is built based on primitive linked-lists, followed by a traversal step that decouples the ray-primitive intersection tests from the illumination calculations; a process inspired by deferred rendering and the path integral formulation of light transport.

Efficient empty space skipping is achieved by exploiting several culling optimizations both in xy- and z space, such as pixel frustum clipping, depth subdivision and lossless buffer down-scaling.

An extensive experimental study is finally offered showing that our method advances the area of image-based ray tracing under the constraints posed by arbitrarily complex and animated scenarios.

*Last edited by ThaOneDon (2016-11-18 10:26:40)*

Offline

**ThaOneDon****Member**

11/2016**Renderer/Rendering**

VOXELS

Geometry-shader-based real-time voxelization and applications

PROCEDURAL

Real-Time Rendering of Volumetric Clouds

Amortized Noise

**Ambient Obscurance/Occlusion/GI**

Interactive diffuse global illumination discretization methods for dynamic environments

The solutions proposed in this dissertation are based on approximations that concentrate on discretization methods of the problem domain.

First we considered the creation of a discretized representation of the visibility function around an object, as the exact visibility computation is expensive to compute in real-time. Then we examined the creation of a discretized representation of the incoming light in order to estimate diffuse interactions from multiple light bounces. Finally, we investigated the creation of a discretized representation of the scene geometry and use it for accelerating the above process.

Stochastic Screen Space Reflections(Source Code)

**Terrain Rendering/Level-of-Detail(LOD)/Occlusion Culling**

Optimisations of the light culling algorithm in a Forward+ Rendering Pipeline

Tile based systems, such as Forward+, utilise general purpose compute to dynamically build linked lists of lights entirely on the GPU, however the effectiveness of the pipeline is heavily dependent on the accuracy and performance of the light/tile intersection tests. This project focuses on the improvement of these intersection tests, providing a more efficient Forward+ rendering pipeline.

**Compression/Caching/Streaming**

A Caching System for a Dependency-aware Scene Graph(Poster)(Full Paper, Source Code included))

This thesis proposes a scene graph caching system that automatically creates an alternative representation of selected subgraphs. This alternative representation poses a render cache in the form of a so-called instruction stream which allows to render the cached subgraph at lower CPU cost and thus more quickly than with a regular render traversal.

In order to be able to update render caches incrementally in reaction to certain scene graph changes, a dependency system was developed. This system provides a model for describing and tracking changes in the scene graph and enables the scene graph caching system to update only those parts of the render cache that needs to be updated.

The actual performance characteristics of the scene graph caching system were investigated using a number of synthetic test scenes in different configurations. These tests showed that the caching system is most useful in scenes with a high structural complexity (high geometry count and/or deep scene graph hierarchies) and moderate primitive count per geometry.

**Animations/Physics**

An Efficient Energy Transfer Inverse Kinematics Solution(Source Code - page 102)

Our method builds upon a mass-spring model and relies on force interactions between masses. Joint rotations are computed using the closed-form method with predefined local axis coordinates. Combining these two approaches offers convincing visual quality results obtained with high time performance.

**Animations/Physics**

Particle Systems Using 3D Vector Fields with OpenGL Compute Shaders(Source Code - last pages))

Particle systems and particle effects are used to simulate a realistic and appealing atmosphere in many virtual environments. However, they do occupy a significant amount of computational resources. The demand for more advanced graphics increases by each generation, likewise does particle systems need to become increasingly more detailed.

This thesis proposes a texture-based 3D vector field particle system, computed on the GPU, and compares it to an equation-based particle system.

Several tests were conducted comparing different situations and parameters. All of the tests measured the computational time needed to execute the different methods.

Rigid Body Physics for Synthetic Data Generation

For synthetic data generation with concave collision objects, two physics simulations techniques are investigated; convex decomposition of mesh models for globally concave collision results, and a GPU implemented rigid body solver using spherical decomposition and impulse based physics with a spatial sorting-based collision detection.

Fast Collision Culling in Large-Scale Environments Using GPU Mapping Function

In order to take advantage of the high number of cores, a new mapping function is defined that enables GPU threads to determine the objects pair to compute without any global memory access.

These new optimized GPU kernel functions use the thread indexes and turn them into a unique pair of objects to test. A square root approximation technique is used based on Newton’s estimation, enabling the threads to only perform a few atomic operations.

A first characterization of the approximation errors is presented, enabling the fixing of incorrect computations. The I/O GPU streams are optimized using binary masks.

**Anti-Aliasing**

Enhanced Subpixel Morphological Antialiasing(SMAA)(Source Code)

A very efficient GPU-based MLAA implementation, capable of handling subpixel features seamlessly, and featuring an improved and advanced pattern detection & handling mechanism.

**Shaders**

Shader Minifier(Source Code)(Latest Binary)

**Meshes**

Fast Screen Space Curvature Estimation on GPU(Source-Shader Code/Demo etc)

Curvature is an important geometric property in computer graphics that provides information about the behavior of object surfaces. The exact curvature can only be calculated for a limited set of surfaces description. Most of the time, we deal with triangles, point sets or some other discrete representation of the surface. For those, curvature computation is problematic. Moreover, most of existing algorithms were developed for static geometry and can be slow for interactive modeling.

This paper proposes a screen space method which estimates the mean and Gaussian curvature at interactive rates. The algorithm uses positions and normals to estimate the curvature from the second fundamental form matrix. Using the screen space has advantages over the classical approach: low-poly geometry can be used and additional detail can be added with normal and bump maps.

**State-of-the-Art/Comparisons/Roundups/Surveys/Analysis**

Temporal Coherence Methods in Real-Time Rendering(Warping library)(Source Code for examples)

Spatial and Spectral Methods for Irregular Sampling in Computer Graphics

Feature Aware Sampling and Reconstruction

Combining displacement mapping methods on the GPU for real-time terrain visualization

**??? ... hmm**

Denoising Point Sets via L0 Minimization

Surface reconstruction is a widely-used geometry processing tool for digitizing real-world objects. In many cases, the input to a reconstruction algorithm is a point set acquired from the object in question. However, despite new methods and acquisition hardware, errors such as noise and outliers inevitably appear in these point sets. Moreover, the quality of the reconstructed surface strongly depends on the quality of the input point set.

We present an anisotropic point cloud denoising method using L0 minimization. The L0 norm directly measures the sparsity of a solution, and we observe that many common objects can be defined as piece-wise smooth surfaces with a small number of features. Hence, we demonstrate how to apply an L0 optimization directly to point clouds, which produces sparser solutions and sharper surfaces than either the L1 or L2 norms.

Our method can faithfully recover sharp features while at the same time smoothing the remaining regions even in the presence of large amounts of noise.

*Last edited by ThaOneDon (2017-02-08 10:29:13)*

Offline

**ThaOneDon****Member**

12/2016**Renderer/Rendering**

GPGPU Scalable Compiler Optimizations

Unite 2016 - Tools, Tricks and Technologies for Reaching Stutter Free 60 FPS in INSIDE

**Ambient Obscurance/Occlusion/GI**

original Spherical Harmonics(Source Code, etc)

Neural Network Ambient Occlusion(NNAO)(Homepage(Source Code/Shaders/Filters))

We build a database of camera depths, normals, and ground truth ambient occlusion as calculated using an offline renderer, and use a neural network to learn a mapping from the depth and normals surrounding the pixel to the ambient occlusion of that pixel. Once trained we convert the neural network into an optimised shader which is more accurate than existing techniques, has better performance, no user parameters other than the occlusion radius, and can be computed in a single pass allowing it to be used as a drop-in replacement.

**Shadows/Shadow Mapping-Volumes**

An evaluation of moving shadow detection techniques

Shadows of moving objects may cause serious problems in many computer vision applications, including object tracking and object recognition. In common object detection systems, due to having similar characteristics, shadows can be easily misclassified as either part of moving objects or independent moving objects. To deal with the problem of misclassifying shadows as foreground, various methods have been introduced. This paper addresses the main problematic situations associated with shadows and provides a comprehensive performance comparison on up-to-date methods that have been proposed to tackle these problems.

**Compression/Caching/Streaming**

Convex Hull Problems(Streaming Geometry)(Source Code)

The convex hull is a well-studied problem with a large body of results and algorithms in a variety of contexts.

We consider three contexts: when only an approximate convex hull is required, when the input points come from a (potentially unbounded) data stream, and when layers of concentric convex hulls are required.

Existing algorithms for these problems either do not achieve optimal runtime and linear space, or are overly complex and difﬁcult to implement and use in practice. This thesis remedies this situation by proposing novel algorithms that are both simple and optimal. The simplicity is achieved by independently computing four sets of monotone convex layers in time and linear space. These are then merged together in O(n log n) time.

**Animation/Physics**

Project Chrono(Source Code)

An Multi-physics Simulation Engine/C++ Library based on a platform-independent open-source design.

**Anti-Aliasing**

Variance reduction using interframe coherence for animated scenes

In an animated scene, geometry and lighting often change in an unpredictable way. Rendering algorithms based on various methods are usually employed to precisely capture all features of an animated scene. However, often these methods typically take a long time to produce a noise-free image.

In this paper, we propose a variance reduction technique which exploits coherence between frames.

Firstly, we introduce a dual cone model to measure the incident coherence intersecting camera rays in object space. Secondly, we allocate multiple frame buffers to store image samples from consecutive frames. Finally, the color of a pixel in one frame is computed by borrowing samples from neighboring pixels in current, previous, and subsequent frames. Our experiments show that noise is greatly reduced by our method since the number of effective samples is increased by use of borrowed samples.

*Last edited by ThaOneDon (2016-12-28 18:17:49)*

Offline

**ThaOneDon****Member**

01/2017**Renderer/Rendering**

PBR (Physically Based Rendering)

Renderers Laugh Engine(Vulkan based)

**Ambient Obscurance/Occlusion/GI**

Real-Time Global Illumination using Precomputed Light Field Probes (Homepage(Source Code etc)) NVIDIA's

Raytracing Reflection, Refraction, Fresnel, Total Internal Reflection, and Beer’s Law(Shader Code)

**Animation/Physics**

Cubemap based collision detection

The usual algorithm requires to compute an octree for the scenery meshes. Then collisions between the character and the scenery are computed using sphere-octree collision detection algorithm. The octree can be either precomputed and included into meshes data, or computed at the loading of the application.

Our algorithm computes physics by rendering a world axis aligned depth cubemap. It can work with low end graphic devices, and computations are done mainly on GPU.

**Terrain Rendering/Level-of-Detail(LOD)/Occlusion Culling**

Irregular Morphing for Real-Time Rendering of Large Terrain

**Audio**

Efficient Approximation of HRTF in Subbands for Accurate Sound Localization(Source Code)

Results indicate that the proposed algorithms preserve the salience of spatial cues, even for relatively high approximation tolerances, yielding computationally very efficient implementations.

**Shaders**

High Dynamic Range Imaging Pipeline on the GPU

In this article we aim to fill a gap of providing a detailed description of how the HDRI pipeline, from HDR image assembly to tone mapping, can be implemented exclusively on the GPU. We also explain the trade-offs that need to be made for improving efficiency and show timing comparisons for CPU vs GPU implementations.

Another goal of this paper is to demonstrate how both the global and local versions of this operator can be efficiently implemented by using fragment shaders. Different from previous work, we will show that the implementation of this operator neither requires expensive convolution nor Fourier transform operations to compute local adaptation luminances.

*Last edited by ThaOneDon (2017-01-15 07:54:55)*

Offline

**Hypernova^****Member**

Hey ThaOneDon,

It's great that you're compiling a list of interesting technologies, but the list has grown to the size that it's overwhelming for anyone who is wants to take a look.

I would recommend instead keeping a very short and concise list of suggestions, providing your own reasoning on why it should be implemented, backed by your own experience in that system. Throwing out information like this (much of which is quite irrelevant to Tesseract) is not likely to attract any attention.

Better yet, make your own proof-of-concept by integrating the changes yourself and show it off.

Best of luck!

*Last edited by Hypernova^ (2017-01-12 20:48:45)*

Offline

**RaZgRiZ****Moderator**

Hypernova^ wrote:

Hey ThaOneDon,

It's great that you're compiling a list of interesting technologies, but the list has grown to the size that it's overwhelming for anyone who is wants to take a look.

I would recommend instead keeping a very short and concise list of suggestions, providing your own reasoning on why it should be implemented, backed by your own experience in that system. Throwing out information like this (much of which is quite irrelevant to Tesseract) is not likely to attract any attention.

Better yet, make your own proof-of-concept by integrating the changes yourself and show it off.

Best of luck!

I think at this point it's more of a tech blog thing and he just adds to it every one in a while when he stumbles onto something interesting :P

Offline

**ThaOneDon****Member**

Yeah...

I`ve already started to scrub stuff that overlaps or is replaced by something better.

I`m also doing all i can think of to format it in a way that its easier to find.

How "compact" it is depends on tech/papers themselves, if stuff is completely different i can`t throw it out. (Source Code etc is also very hard to find actually, its lucky if any of this has any)

Everything i`ve added is already in the mindset that it has to be somehow beyond what Tesseract accomplishes to limit the scope realistically.

I`ve also started to keep alternatives. Which i guess i should limit to something like 2.

Reason i`m not keeping it short is because i want this to be definitive, most effective, up-to-date and useful tech "post" or blog etc that covers all the bases... i can make. lol

I think its getting plenty of views, 60k is not bad.

*Last edited by ThaOneDon (2017-01-13 06:14:23)*

Offline

**ThaOneDon****Member**

02/2017**Renderer/Rendering**

An Incremental Rendering VM

PROCEDURAL

Real-Time Editing of Procedural Terrains More(Source Code etc)

**Ambient Obscurance/Occlusion/GI**

Screen Space Reflections in Killing Floor 2(Source Code included)

**Animation/Physics**

Generalized Canonical Time Warping(Source Code)

Generic Convex Collision Detection using Support Mapping

libccd(Source Code) - 3-clause BSD Licensed. Open Source.

Defending Continuous Collision Detection against Errors

Numerical errors and rounding errors in continuous collision detection (CCD) can easily cause collision detection failures if they are not handled properly.

This paper demonstrates a set of simple modifications to make a basic CCD implementation failure-proof. Using error analysis, we prove the safety of these methods and we formulate suggested tolerance values to reduce false positives.

**Anti-Aliasing**

Improved Geometry Buffer Anti-Aliasing(GBAA+)(Source Code)

GBAA is an improved version GPAA.

The underlying idea is that instead of looking for sharp edges in the original image to assess the location of the geometric edges, you can use the information about the edges in a "pure form", having received it from the renderer.

Actual improvement lies in the fact that the direction and distance to the boundaries of the triangles are calculated in a geometry shader, which eliminates the need for pre-processing of geometry and rasterization of lines, reduces memory usage and, most importantly, eliminates the dependence of the performance on the geometric complexity of the scene.

**State-of-the-Art/Comparisons/Roundups/Surveys/Analysis**

Fundamental computational geometry on the GPU

*Last edited by ThaOneDon (2017-02-08 10:06:29)*

Offline

**ThaOneDon****Member**

03/2017**Renderer/Rendering**

Reversed-Z Logarithmic

A Method for Automatically Creating and Using Billboards to Increase the Speed of Object Rendering

OIT

Real Time Depth Sorting of Transparent Fragments

Phenomenological Transparency

VOXELS

Infinite Sparse Volumes

Real-Time Volumetric Lighting using SVOs

PROCEDURAL

Generating Compelling Procedural 3D Environments and Landscapes

A Non-linear GPU Thread Map for Triangular Domains

There is a stage in the GPU computing pipeline where a grid of thread-blocks, in parallel space, is mapped onto the problem domain, in data space. Threads that fall inside the domain perform computations while threads that fall outside are discarded at runtime.

In this work we study the case of mapping threads efficiently onto triangular domain problems and propose a block-space linear map λ(ω), based on the properties of the lower triangular matrix, that reduces the number of unnecessary threads from O(n2) to O(n).

This study is about the performance of algorithms, with similar purpose as Carmack and Lomont implementation of square root using three iterations of the Newton-Raphson method and the magic number “0x5f3759df”.

**Ambient Obscurance/Occlusion/GI/Reflections**

Global illumination effects with sampled geometry

Wrap Shading Extension to Energy-Conserving Wrapped Diffuse

Automatic Optimization for Large-Scale Real-Time Coastal Water Simulation

Real-time Interactive Water Waves

Real-Time Screen Space Fluid Rendering with Scene Reflections

To solve the singular problem of water waves obtained with the traditional model, a hybrid deep-shallow-water model is estimated by using an automatic coupling algorithm. It can handle arbitrary water depth and different underwater terrain. As a certain feature of coastal terrain, coastline is detected with the collision detection technology. Then, unnecessary water grid cells are simplified by the automatic simplification algorithm according to the depth. Finally, the model is calculated on CPU and the simulation is implemented on GPU.

**Terrain Rendering/Level-of-Detail(LOD)/Occlusion Culling**

Real-Time Level-of-Detail (Algorithms, Notes, Code etc, ALL included)

**Animation/Physics**

6D Frictional Contact for Rigid Bodies

Enhanced FFD-AABB Collision Algorithm for Deformable Objects

Particle Simulation with GPUs

A Novel GPU-Based Deformation Pipeline

Deformation Pipeline that is independent of the integration solver used and allows fast rendering of deformable soft bodies on the GPU. The proposed method exploits the transform feedback mechanism of the modern GPU to bypass the read-back, thus, reusing the modified positions and/or velocities of the deformable object in a single pass in real time.

**Audio**

Steam Audio(License) (Copyrighted - Valve Corporation) - HRTF etc, Requires a License but Free

Geometric-based reverberator using acoustic rendering networks

**Anti-Aliasing**

Wire-AA

**Meshes**

Geometry Batching Using Texture-Arrays

Batching can be used to group and sort geometric primitives into batches to reduce the number of required state changes, whereas the size of the batches determines the number of required draw-calls, and therefore, is critical for rendering performance.

For example, in the case of texture atlases, which provide an approach for efficient texture management, the batch size is limited by the efficiency of the texture-packing algorithm and the texture resolution itself.

This paper presents a pre-processing approach and rendering technique that overcomes these limitations by further grouping textures or texture atlases and thus enables the creation of larger geometry batches. It is based on texture arrays in combination with an additional indexing schema that is evaluated at run-time using shader programs.

Basically, facilitates a flexible partitioning of geometry.

Exact, robust, and efficient regularized Booleans on general 3D meshes

Besides their utility and importance, Booleans are challenging to compute when dealing with meshes, because of topological changes, geometric degeneracies, etc.

We overcome these limitations and present an exact and robust approach performing on general meshes, required to be only closed and orientable.

Our method is based on a few geometric and topological predicates that allow to handle all input/output cases considered as degenerate in existing solutions, such as voids, non-manifold, disconnected, and unbounded meshes, and to robustly deal with special input configurations.

Robust Polyhedral Minkowski Sums with GPU Implementation

A convolution algorithm for Minkowski sums of polyhedra with robust CPU and GPU implementations.

The algorithm contains several innovations that support distributed computation. The computational bottleneck is finding the intersecting pairs of facets. We enabled a distributed algorithm by creating a novel type of kd-tree that eliminates duplicate entries without using global memory.

The memory bottleneck is the arrangements of the facets of the convolution. We removed this bottleneck by processing the facets in groups and by removing most of the blocked geometry.

We solved the robustness problem, which is the primary implementation challenge for computational geometry algorithms, using our ACP strategy.

**Textures**

The Implementation of a Scalable Texture Cache(Source Code)

Real Time Rendering of Parametric Surfaces on the GPU (Algorithms, Notes, Code etc, ALL included)

**AI/Scripting**

Dynamic and Robust Local Clearance Triangulations

A optimization of A* algorithm to make it close to human pathfinding behavior

Time-Bounded Best-First Search for Reversible and Non-reversible Search Graphs

**State-of-the-Art/Comparisons/Roundups/Surveys/Analysis**

Optimization Techniques for 3D Graphics Deployment

A Catalog of Stream Processing Optimizations

Light Shafts Rendering for Indoor Scenes

On Some Interactive Mesh Deformations

Adaptive Physically Based Models in Computer Graphics

Algorithms and Criteria for Volumetric Centroidal Voronoi Tessellations

Shadow Mapping Algorithms: Applications and Limitations

A Comprehensive Study on Pathfinding Techniques for Robotics and Video Games

**??? ... hmm**

Fast Data Parallel Radix Sort Implementation in DirectX 11 Compute Shader

The algorithms implement several optimization techniques to take advantage of the HW architecture such as:

taking advantage of kernel fusion strategy, the synchronous execution of threads in a warp/waveform to eliminate the need for barrier synchronization, using shared memory across threads within a group, management of bank conflicts, eliminate divergence by avoiding branch conditions and complete unrolling of loops, use of adequate group/thread dimensions to increase HW occupancy and application of highly data-parallel algorithms to accelerate the scan operations.

Simulating Rigid Body Fracture with Surface Meshes More Notes/Source Code etc

By combining an indirect boundary integral formulation, explicit surface tracking and a kernel-independent fast multipole method, presented method is effective for rigid body brittle fracture using the boundary surface mesh only.

Existing explicit mesh tracking methods are modified to support evolving cracks directly in the triangle mesh representation, giving highly detailed fractures with sharp features, independent of any volumetric sampling (unlike tetrahedral mesh or level set approaches) and avoids the need for calculations; the triangle mesh representation also allows simple integration into rigid body engines.

It is accurate, and at the same time computationally economical, and it successfully resolves crack evolution in various settings.

A study of parallelism-locality tradeoffs across memory hierarchy

They first study parallelism vs. locality tradeoffs in each layer of the memory hierarchy, as well as the cross-layer interactions.

Using the observations from the characterization study they propose a dynamic memory migration technique which optimizes both parallelism and locality metrics in the memory subsystem.

Breaking the application frames into smaller ones to exploit the memory locality and reduce the memory bandwidth requirements significantly. (Cooperative Parallelization)

PICKLOCK: A Deadlock Prediction Approach under Nested Locking

The solution proposed for predicting potential deadlocks and for confirming them involves taking a concurrent program and a test harness, executing the program under test to get an arbitrarily interleaved execution, and then predicting alternate executions leading to deadlocks.

Finally, in order to check if a real deadlock has been found, the program being tested is re-executed precisely under these predicted deadlocking schedules.

The algorithm is based on lock-sets and acquisition histories, which only ensure that the predicted run respects lock acquisitions and releases in the run.

The crucial observation is that acquisition histories give not only enough traction to detect alternate deadlocking interleavings, but also provide an effective mechanism to re-schedule the precise interleaving under which deadlock will occur; the latter helps our re-execution engine to run the predicted schedule and confirm the deadlock, which entirely eliminates false positives.

*Last edited by ThaOneDon (Yesterday 23:21:06)*

Offline