#101 2016-06-25 14:48:12


Re: tech thread


??? ... hmm
Efficient Ray Tracing Kernels for Modern CPU Architectures NanoRT Source Code
Ray stream techniques augment the fast single-ray traversal with increased utilization of CPU vector units and leverage memory bandwidth for batches of rays. Despite their success, the proposed implementations suffer from high bookkeeping cost and batch fragmentation, especially for small batch sizes. Contribution is two-fold:
For Coherent ray sets - a large packet traversal tailored to the BVH4 that is faster than the original BVH2 variant, and
for Incoherent ray batches - a novel implementation of ray streams which reduces the bookkeeping cost while strictly maintaining the preferred traversal order of individual rays.


#102 2016-07-08 14:17:00


Re: tech thread


State-of-the-Art in GPU-Based Large-Scale Volume Visualization /Source Code for out-of-core Renderer/
Combining the parallel processing power of GPUs with out-of-core methods and data streaming, a major enabler for interactivity is making both the computational and the visualization effort proportional to the amount and resolution of data that is actually visible on screen, i.e., "output-sensitive" algorithms and system designs. This leads to recent output-sensitive approaches that are "ray-guided," "visualization-driven," or "display-aware."

State-of-the-art in Compressed GPU-Based Direct Volume Rendering
Compression and level-of-detail pre-computation does not have to adhere to real-time constraints and can be performed off-line for high quality results. In contrast, adaptive real-time rendering from compressed representations requires fast, transient, and spatially independent decompression.
In this report, we review the existing compressed GPU volume rendering approaches, covering sampling grid layouts, compact representation models, compression techniques, GPU rendering architectures and fast decoding techniques.

Source Code/Paper for "Temporal Reprojection Anti-Aliasing" used in Playdead's INSIDE

Transparency and Anti-Aliasing Techniques for Real-Time Rendering
Filtering Approaches for Real-Time Anti-Aliasing
2 huge papers rounding up and comparing all the most useful Anti-Aliasing techniques used pre-2013.

Shadows/Shadow Mapping-Volumes
Deep Partitioned Shadow Volumes Using Stackless and Hybrid Traversals Shader Code ( PSV )
In spite of many works in the literature, Shadow Mapping and Shadow Volumes retain several drawbacks.

While Shadow Maps are not pixel-accurate, they are widely used in practice because they are fast.
Shadow Volumes are less efficient but they are still investigated because they are pixel-accurate.

A huge amount of work tried to improve the shadow maps accuracy and the shadow volumes efficiency.

Gerhards et al. have recently proposed a novel approach, the Partitioned Shadow Volumes (PSV) algorithm [GMAG15]. This method relies on an old idea [CF89] which was completely different from the original Shadow Volumes algorithm proposed by Crow [Cro77]. Thanks to a specific partitioning strategy, PSV have many advantages and allow real-time object based pixel-accurate shadows. This makes the algorithm an interesting option which has been little explored, contrary to shadow mapping or shadow volumes.

Last edited by ThaOneDon (2016-08-03 20:04:12)


#103 2016-08-10 03:08:50


Re: tech thread

Temporal Coherence Methods in Real-Time Rendering
Although graphics cards continue to evolve with an ever-increasing amount of computational power, the speed gain is easily counteracted by increasingly complex and sophisticated shading computations. For real-time applications, the direct consequence is that image resolution and temporal resolution are often the first candidates to bow to the performance constraints.
The underlying observation of methods in this paper are that a higher resolution and frame rate do not necessarily imply a much higher workload, but a larger amount of redundancy and a higher potential for amortizing rendering over several frames.
We describe a general approach, image-space reprojection, with several implementation algorithms that facilitate reusing shading information across adjacent frames. We also discuss data-reuse quality and performance related to reprojection techniques. Finally, in the second half of this survey, we demonstrate various applications that exploit TC in real-time rendering.

Ambient Obscurance/Occlusion/GI
A Survey of Volumetric Illumination Techniques for Interactive Volume Rendering DSSDO
In recent years several advanced volumetric illumination techniques to be used in interactive scenarios have been proposed. These techniques claim to have perceptual benefits as well as being capable of producing more realistic volume rendered images. Naturally, they cover a wide spectrum of illumination effects, including varying shading and scattering effects.  This survey, reviews and classifies the existing techniques - their technical realization, their performance behavior as well as their perceptual capabilities.

Ambient Occlusion on Mobile: an empirical comparison (Source Code - last pages)
Enormous paper on GI for mobile but applies everywhere really.

Real-time Rendering Techniques with Hardware Tessellation
State-of-the-Art on Tessellation.
As a result of recent advances in graphics hardware, in particular the GPU tessellation unit, complex geometry can now be generated on-the-fly within the GPU’s rendering pipeline. This has enabled the generation and displacement of smooth parametric surfaces in real-time applications. However, many well established approaches in offline rendering are not directly transferable due to the limited tessellation patterns or the parallel execution model of the tessellation stage. In this survey, we provide  an overview of recent work and challenges in this topic by summarizing, discussing, and comparing methods for the rendering of smooth and highly-detailed surfaces in real-time.

Efficient GPU Rendering of Subdivision Surfaces using Adaptive Quadtrees
In this method, a subdivision surface model is rendered in a single pass, without a separate subdivision step. Each quad face is submitted as a single tessellated primitive; a per-face adaptive quadtree is used to map tessellated vertices to the appropriate subdivided face. By traversing the quadtree for each post-tessellation vertex, we are able to accurately and efficiently evaluate the limit surface.
We evaluate our method on a variety of assets, and realize performance that can be three times faster than state-of-the-art approaches. In addition, our streaming formulation makes it easier to integrate subdivision surfaces into applications and shader code written for polygonal models.

Shadows/Shadow Mapping-Volumes
Fast Percentage Closer Soft Shadows using Temporal Coherence
This method improves the rendering performance of the Percentage Closer Soft Shadows method by exploiting the temporal coherence between individual frames: The costly soft shadow recalculation is saved whenever possible by storing the old shadow values in a screen-space History Buffer. By extending the shadow map algorithm by a so-called Movement Map, we can not only identify regions disoccluded by camera movement, but also robustly detect and update shadows cast by moving objects: Only the shadows in the areas marked red in the right image have to be re-evaluated. This saves rendering time and doubles the soft shadow rendering performance in real-time 3D scenes with both static and dynamic objects.

Revectorization-Based Shadow Mapping Source Code/etc In General: Survey
In this paper, we reduce aliasing with the revectorization based shadow mapping. To effectively reduce the perspective aliasing, we revectorize shadow boundaries based on their discontinuity directions. Then, we take advantage of the discontinuity space to filter the shadow silhouettes, further suppressing the remaining artifacts. To control the filter kernel size, we incorporate percentage-closer filtering into the algorithm. This enables us to reduce jagged shadow boundaries, to simulate penumbra and to provide high-quality screen-space anti-aliasing. Compared to previous techniques, we show that shadow revectorization produces less artifacts, consumes less memory and offers real-time performance.

Real-time BC6H Compression on GPU Source Code for the Compressor Intel`s ISPC Compressor
BC6H is a lossy block based compression format designed for compressing half floating point textures. It’s fully hardware supported starting from DX11 and current-gen consoles. It uses fixed size 4x4 texel blocks, which is very convenient for native hardware decompression. It doesn’t support alpha and sampling alpha is required to return one. It has 6:1 compression ratio or in other words it uses 8 bits per texel. All these properties make it a very convenient replacement of hacky encodings like RGBE, RGBM or RGBK, which we used in previous generation games for storing HDR textures.

Last edited by ThaOneDon (2016-08-30 10:02:18)


#104 2016-09-03 19:47:49


Re: tech thread

Ambient Obscurance/Occlusion/GI
Interactive Global Illumination Effects Using Deterministically Directed Layered Depth Maps More(Source Code)
A layered depth map(LDM) is an extension of the well-known depth map used in rasterization. Multiple layered depth maps can be used as a coarse scene representation.
LDMs were invented to solve the issue of rendering transparent objects in a rasterization pipeline [MCTB11]. Specifically, to implement so-called order-independent transparency(OIT).
Originally constructed using the so called depth peeling method which employed the shadow map depth test to "peel off" each layer. Atomic operations and shader storage buffer objects (SSBOs) allow for more advanced implementations.
We develop two global illumination methods which make use of such scene representations. The first is an interactive ambient occlusion method. The second is an interactive singlebounce indirect lighting method based on photon differentials.

Expressive Single Scattering for Light Shaft Stylization
This paper presents several strategies to stylize volumetric single scattering, overcoming the difficulty that light shafts depend on the layout of an entire environment. Our approach is compatible with animated scenes and relies on very efficient solutions, which makes it ready to be used for real-time applications, and enables a quick exploration of the various settings. The techniques are applied at a global scope – i.e., for the whole scene – but can also be used to make local changes to the scattering behavior.
Image-based occluder manipulations modify the complexity of the scattering appearance and are controlled by only a few parameters. Transfer functions allow us to interactively design a general mood and the result can even be transferred to other scenes. As an alternative, users can design a light map to modify the light emittance by relying on an optimization process which ensures that user-defined constraints are respected, which are defined using a painting metaphor.
Furthermore, we employ an efficient algorithm to approximate heterogeneity and enable the control of scattering intensity, noise frequency, and heterogeneity ratio. Finally, our solution supports key-framed animation to steer the stylization over time.

Guarded order independent transparency
Real-Time Deep Image Rendering and Order Independent Transparency
Faster Transparency from Low Level Shader Optimisation (Source Code)
Recent graphics hardware features, namely atomic operations and dynamic memory location writes, now make it possible to capture and store all per-pixel fragment data from the rasterizer in a single pass in what we call a deep image. A deep image provides a state where all fragments are available and gives a more complete image based geometry representation, providing new possibilities in image based rendering techniques.
A core and driving application is order-independent transparency(OIT). A number of deep image sorting improvements are presented, through which an order of magnitude performance increase is achieved, significantly advancing the ability to perform transparency rendering in real time. In the broader context of image based rendering we look at deep images as a discretized 3D geometry representation and discuss sampling techniques for raycasting and antialiasing with an implicit fragment connectivity approach.

Terrain Rendering/Level-of-Detail(LOD)/Occlusion Culling
Adaptive level of detail rendering estimation based on wavelet transform
Alternative approach to the adaptive triangulation problem. A technique of terrain rendering which uses wavelet transform to select appropriate LOD is described. This technique is a region-based multi-resolution approach that partitions the terrain into tiles that can be processed independently. The wavelet transform is used as a mathematical framework which localizes rough surface approximation where approximation error has to be controlled. It permits to choose the appropriate resolution according to local characteristics of the examined surface. To avoid visual artifacts like popping, either geomorphs are used or the maximum screen-space error is restricted to one pixel and the C-BDAM method is utilized.

Vertex Discard Occlusion Culling Hierarchical-Z with Compute Shaders (video)
with Voxels
Performing visibility determination in densely occluded environments is essential to avoid rendering unnecessary objects and achieve high frame rates. In this implementation, the image space Occlusion Culling algorithm is done completely in GPU, avoiding the latency introduced by returning the visibility results to the CPU. It utilizes the GPU rendering power to construct the Occlusion Map and then performs the image space visibility test by splitting the region of the screen space occludees into parallelizable blocks. This implementation is especially applicable for low end graphics hardware and the visibility results are accessible by GPU shaders. It can be applied with excellent results in scenes where pixel shaders alter the depth values of the pixels, without interfering with hardware Early-Z culling methods. We demonstrate the benefits and show the results of this method in real-time densely occluded scenes.

Aggressive Region-Based Visibility Computation Using Importance Sampling
An aggressive region-based visibility sampling algorithm for general 3D scenes. The visibility portal in the scene is a cue for guiding the visibility sampling. Our algorithm extends the image space sampling algorithm by measuring the size and orientation of portals in the scene, and results in a predictable sampling mechanism of visible set in the image space. An importance visibility sampling scheme in the image space is proposed based on the visibility portals, and used to guide the sampling process. Each newly added visibility sample is placed at the position potentially visible for most missing polygons. Experiments show that our sampling approach can effectively improve the performance of visibility sampling in both the convergence rate and the visual quality compared to the previous approaches.

Downsampling Scattering Parameters for Rendering Anisotropic Media
A new approach to compute scattering parameters at reduced resolutions. Many detailed appearance models involve high-resolution volumetric representations. Such level of detail leads to high storage but is usually unnecessary especially when the object is rendered at a distance. However, naïve downsampling often loses intrinsic shadowing structures and brightens resulting images.
Our method computes scaled phase functions, a combined representation of single-scattering albedo and phase function, and provides significantly better accuracy while reducing the data size by almost three orders of magnitude.
We also show that modularity can be exploited to greatly reduce the amortized optimization overhead by allowing multiple synthesized models to share one set of downsampled parameters. Our optimized parameters generalize well to novel lighting and viewing configurations.

Zstandard(Source Code) - BSD Licensed
Fast lossless compression algorithm. Targeting zlib-level and better.

Rendering view dependent reflections using the graphics card(Source Code)
(Screen space reflections(SSR), parallax-corrected cube mapping(PCCM) and billboard reflections(BBR))

Last edited by ThaOneDon (2016-11-18 10:25:30)


#105 2016-10-01 06:51:20


Re: tech thread

PBR (Physically Based Rendering)
Moving FROSTBITE to PBR Detailed Course Notes(Source Code/Algorithms etc)
Progressive Light Transport Simulation on the GPU: Survey and Improvements Implementing PBR In Detail Code
Position-Dependent Importance Sampling of Light Field Luminaires(Supplemental)

Depth-fighting Aware Methods for Multi-fragment Rendering
Multi-fragment rasterization is susceptible to flickering artifacts when two or more visible fragments of the scene have identical depth values. This phenomenon is called coplanarity or Z-fighting and incurs various unpleasant and unintuitive results when rendering complex multi-layer scenes.
In this work, we develop depth-fighting aware algorithms for reducing, eliminating and/or detecting related flaws in scenes suffering from duplicate geometry. We adapt previously presented single and multi-pass rendering methods, providing alternatives for both commodity and modern graphics hardware.

Optimizing the Graphics Pipeline with Compute
Low-level optimization for GCN
More specifically, how to render triangles fast, by not rendering so many triangles.

Ambient Obscurance/Occlusion/GI
Spherical Illuminance Composition for Real-Time Indirect Illumination(Source Code etc included) More
Another Implementation
(Sample Elimination for Generating Poisson Disk Sample Sets)
The concepts of light transport for the purpose of rendering are well understood, but expensive to calculate. For real-time solutions, simplification is necessary, often at the cost of visual quality.
The proposed method is fast enough to be suitable for real-time applications on contemporary consumer hardware. It is based on the radiosity technique with various adaptions to speed up the  computation. An infinite number of light bounces can be calculated iteratively. Indirect light is stored in the form of spherical harmonics (SH). This directional representation increases the quality of the results and enables the use of normal mapping. The idea of irradiance volumes has been incorporated to provide indirect lighting for dynamic objects.
The proposed technique supports full dynamic lighting and works with all commonly used light source models. In addition, area and environment lighting are facilitated.
Furthermore, we present details on how our technique can be implemented on contemporary hardware. Various approaches are explained and compared to give guidelines for practical implementation.

Practical Realtime Strategies for Accurate Indirect Occlusion More(newer paper from 2016)
GTAO, which is able to match a ground truth reference in half a millisecond on current console hardware. This is done by using an alternative formulation of the ambient occlusion equation, and an efficient implementation which distributes computation using spatio-temporal filtering. We then extend GTAO with a novel technique that takes into account near-field global illumination, which is lost when using ambient occlusion alone. Finally, we introduce a technique for specular occlusion, GTSO, symmetric to ambient occlusion which allows to compute realistic specular reflections from probe-based illumination. Our techniques are efficient, give results close to the ray-traced ground truth, and have been integrated in recent AAA console titles.

Screen Space Reflections(Source Code)
kode80's screen space reflections implementation for Unity3D 5. Features screen space ray casting, backface depth buffer for per-pixel geometry thickness, distance attenuated pixel stride, rough/smooth surface reflection blurring, fully customizable for quality/speed and more.

Irradiance regression for efficient final gathering in global illumination
Photon mapping is widely used for global illumination rendering because of its high computational efficiency. But its efficiency is still limited, mainly by the intensive sampling required in final gathering, a process that is critical for removing low frequency artifacts of density estimation. In this paper, we propose a method to predict the final gathering estimation with direct density estimation, thereby achieving high quality global illumination by photon mapping with high efficiency. We first sample the irradiance of a subset of shading points by both final gathering and direct radiance estimation. Then we use the samples as a training set to predict the final gathered irradiance of other shading points through regression.

Shadows/Shadow Mapping-Volumes
Fast robust and precise shadow algorithm
The algorithm is based on silhouette shadow volumes and it rivals the standard shadow mapping performance. Our performance is usually superior when compared with high resolution shadow maps. Moreover, it does not suffer from a number of artefacts of shadow mapping and always provides per pixel correct results.
We put all our algorithms evaluating silhouette edges to vertex shaders. Specially precomputed data are fed to the vertex shaders that extrude shadow volume sides just for silhouette edges. Some optimizations are deployed for performance and data size reasons that are important especially on low performance configurations, such as cost-effective tablets and mobile phones.
The paper evaluates our solution on number of models. Our solution performs on par with high resolution omnidirectional shadow mapping.

Real-Time Image-Based Volume Lighting
We propose a two-step, GPU-friendly technique for realtime rendering of heterogeneous participating media under distant environment lighting.
First, our algorithm estimates the spherical scattered radiance at a number of points in the medium and projects this function into the spherical harmonics basis. In the second step we render use the scattered radiance information to compute single scattering by ray-marching.
Our method is easy to implement using GPU shaders and does not require any precomputation, hence supporting dynamic lighting, animated media, dynamic optical properties of the volume, emission and self-shadowing.

Efficient High-Quality Shadow Maps
This thesis provides an efficient GPU implementation of various optimizations to basic shadow mapping. The optimizations, which echo the idea of making full use of the available resolution and precision, are simple to implement, provide a great deal of improvement and allow for some amount of dynamic refinement of shadows with change in the camera view.

Terrain Rendering/Level-of-Detail(LOD)/Occlusion Culling
On-the-fly Generation and Rendering of Infinite Cities on the GPU
Traditional approaches rely on evaluating a shape grammar and storing the geometry produced as a preprocessing step. During rendering, the pregenerated data is then streamed to the GPU. By interweaving generation and rendering, we overcome the problems and limitations of streaming pregenerated data. Using our methods of visibility pruning and adaptive level of detail, we are able to dynamically generate only the geometry needed to render the current view in real-time directly on the GPU. We also present a robust and efficient way to dynamically update a scene’s derivation tree and geometry, enabling us to exploit frame-to-frame coherence.

An Adaptive and Hybrid Approach to Revisiting the Visibility Pipeline
We introduce a new visibility paradigm based on the use of Ja1 triangulation structure that avoids the rendering of too much unnecessary primitives in a 3D scene. This structure is capable of executing culling operations in order to deal with the minimum amount of primitives in a scene during the rendering stage, as possible. To do that, we propose to execute the culling by combining the paradigms based on viewing frustum, back-face culling and occlusion culling all using the Ja1 triangulation as a spatial data structure.
To our knowledge, this approach is new in comparison to existing approaches and there is no way to do a comparison because the objective is different of those (real-time and interactive visualization through the web, of massive and large scenarios). To our belief, this approach to occlusion culling is also different from previous known works.

Frustrum Culling

Real-Time 3-D Wavelet Lifting
A single-loop approach designed to transform 3-D data.
This work presents a fast streaming unit for computing a 3-D discrete wavelet transform. The unit can continuously consume source data and instantly produce resulting coefficients. Considering this approach, every input as well as output sample is visited only once. The streaming unit can be further improved by exploiting suitable SIMD instruction set.

Rendering massive 3D scenes in real-time
Analyzing Deferred Rendering Techniques
Register Efficient Dynamic Memory Allocator for GPUs
A Survey on Implicit Surface Polygonization
Continuity and Interpolation Techniques More in Detail(Code/Samples/Algorithms)
Vector Field Processing on Triangle Meshes Tangent Vector Fields on Triangulated Surfaces
Recent Advances in Adaptive Sampling and Reconstruction for Monte Carlo Rendering
A survey of photon mapping state-of-the-art research and future challenges
Theory and Numerical Integration of Subsurface Light Transport
A Survey Of Techniques for Approximate Computing

Deferred Warping
The technique works in two steps: First, the surface deformation of the target object is determined and the resulting transformation field is stored as a matrix texture. Then the matrix texture is used as look-up table to transform a given geometry onto a deformed surface. Splitting the process in two steps yields a large flexibility since different attachment types can be realized by simply defining specific mapping functions.
Our technique can directly handle complex topology changes within the surface. We demonstrate a fast implementation in the vertex shading stage allowing the use of highly decorated surfaces with millions of triangles in real-time.

A Temporal Stable Distance To Edge Anti-aliasing
The implementation can, without any sub-pixel information and by storing extra geometrical data in a pre-render pass, prevent temporal instability and solve aliasing artifacts during a post-render pass. Thus being a real alternative to the state of the art post-processing Anti-Aliasing solutions, in sense of performance and quality in high end game engines and systems.
Right now it uses features only supported by GCN hardware for solving triangle edges. However this feature can easily be removed from the solution making it implementable on a large variety of hardware. If this is the case prototype 1 can be an excellent complement to Anti-Aliasing solutions such as Multi Sampling which can not solve alpha clipped edges.

Triangle-based Geometry Anti-Aliasing(TGAA)
In this paper, we present a new post-processing sub-pixel anti-aliasing algorithm via reliable triangle-based geometry extracted from the rendering pipeline.
The geometry information is employed to recover sub-pixel level structures by estimating coverage areas of relevant triangles.
The method shows good scalability in performance since geometry buffer is stored efficiently and is inexpensive to access. In addition, a linear rather than nearest neighbor color sampling process is incorporated into geometric filter to generate a more accurate anti-aliasing result.

Extending the Graphics Pipeline with Adaptive, Multi-Rate Shading
Due to complex shaders and high-resolution displays (particularly on mobile graphics platforms), fragment shading often dominates the cost of rendering in games. To improve the efficiency of shading on GPUs, we extend the graphics pipeline to natively support techniques that adaptively sample components of the shading function more sparsely than per-pixel rates.
We perform an extensive study of the challenges of integrating adaptive, multi-rate shading into the graphics pipeline, and evaluate two- and three-rate implementations that we believe are practical evolutions of modern GPU designs.

Towards Automatic Band-Limited Procedural Shaders
This paper explores the problem of analytically computing a band-limited version of a procedural shader as a continuous function of the sampling rate. There is currently no known way of analytically computing these integrals in general. We explore the conditions under which exact solutions are possible and develop several approximation strategies for when they are not.
Rather than addressing rendering time while tolerating a certain amount of infidelity in the resulting image, our approach explicitly addresses the visual property (lower sampling rate) that enables previous techniques to tolerate error in shaders at lower levels of detail. We apply local transformations to the shader program, but produce a single shader with a dependent frequency spectrum.
Compared to supersampling methods, our approach produces shaders that are less expensive to evaluate and closer to ground truth in many cases. Compared to mipmapping or precomputation, our approach produces shaders that support an arbitrary bandwidth parameter and require less storage.
We evaluate our method on a range of spatially-varying shader functions, automatically producing antialiased versions that have comparable error to 4x4 multisampling but can be over an order of magnitude faster.

Fast multi-resolution shading of acquired reflectance using bandwidth prediction
We accelerate the shading of acquired materials by selectively shading image pixels and adapting the number of samples used in the integration across pixels.

Integration with Stochastic Point Processes
Low-Discrepancy Blue Noise Sampling More (Source Code etc)
We derive exact formulae for bias and variance of integral estimates in terms of the spatial or spectral characteristics of integrands, and first and second order product density measures of general point patterns. The formulae allow us to study and design sampling schemes adapted to different classes of integrands by analyzing the effect of sampling density, weighting, and correlations among point locations separately.
We then focus on non-adaptive correlated stratified sampling patterns and specialize the formulae to derive closed-form and easy-to-analyze expressions of bias and variance for various stratified sampling strategies. Based on these expressions, we perform a theoretical error analysis for integrands involving the discontinuous visibility function.
We show that significant reductions in error can be obtained by considering alternative sampling strategies instead of the commonly used random jittering or low discrepancy patterns.
Various such sampling methods are used in rendering anti-aliasing, shadows/occlusion, geometry... etc.

Multi-Resolution Meshes for Feature-Aware Hardware Tessellation
A general framework for the construction and rendering of non-uniform LODs suitable for hardware tessellation.
Its key component is a novel hierarchical representation of multiresolution meshes that allows us to finely control the topological locations of vertex splits and merges. We thus managed to relax the regularity of fractional tessellation, while retaining the efficiency of the respective GPU’s units.
Within our framework, we presented a dedicated mesh decimation scheme that can be driven by any edge-based error metric. In particular, by applying it with a feature-preserving geometric error, we leveraged hardware tessellation for feature-aware LOD rendering of meshes.

Quantized Global Parametrization
Global surface parametrization often requires the use of cuts or charts due to non-trivial topology. In recent years a focus has been on so-called seamless parametrizations, where the transition functions across the cuts are rigid transformations with a rotation about some multiple of 90◦.
Of particular interest, e.g. for quadrilateral meshing, paneling, or texturing, are those instances where in addition the translational part of these transitions is integral (or more generally: quantized). We show that finding not even the optimal, but just an arbitrary valid quantization (one that does not imply parametric degeneracies), is a complex combinatorial problem.
We present a novel method that allows us to solve it, i.e. to find valid as well as good quality quantizations. It is based on an original approach to quickly construct solutions to linear Diophantine equation systems, exploiting the specific geometric nature of the parametrization problem. We thereby largely outperform the state-of-the-art, sometimes by several orders of magnitude.

GST(GPU-decodable Supercompressed Textures)(Source Code)
Modern GPUs supporting compressed textures allow interactive application developers to save scarce GPU resources such as VRAM and bandwidth. Compressed textures use fixed compression ratios whose lossy representations are significantly poorer quality than traditional image compression formats such as JPEG. We present a new method in the class of supercompressed textures that provides an additional layer of compression to already compressed textures. Our texture representation is designed for endpoint compressed formats such as DXT and PVRTC and decoding on commodity GPUs. We apply our algorithm to commonly used formats by separating their representation into two parts that are processed independently and then entropy encoded. Our method preserves the CPU-GPU bandwidth during the decoding phase and exploits the parallelism of GPUs to provide up to 3X faster decode compared to prior texture supercompression algorithms. Along with the gains in decoding speed, our method maintains both the compression size and quality of current state of the art supercompressed texture representations.

The overall result is an image the same size as DXT5, trading some alpha precision for a large reduction in color artifacts. The color map also lends itself well to lossless compression, reducing disk size significantly compared to unmodified DXT when used with wfLZ.

Lightmap Compression
Various tests and observations to find most effective methods.
The main challenge of compressing lightmaps is that often they have a wider range than regular diffuse textures. This range is not as large as in typical HDR textures, but it’s large enough that using regular LDR formats results in obvious quantization artifacts. Lightmaps don’t usually have high frequency details, they are often close to greyscale, and only have smooth variations in the chrominance.

Per-face parameterization for Texture Mapping of Geometry in Real-Time
Traditional UV-mapping often causes discontinuities which commonly results in visible seams in the end results. If any change is done to the vertex positions or the topology a remapping of the UV-map has to be done. Mesh colors aims to avoid these problems by skipping the transformation to 2D space as in UV-mapping, and associating color samples directly with the geometry of a mesh. 
The results show that mesh colors is a viable alternative in a real-time renderer. Though not as fast as regular UV-mapped textures due to lack of hardware accelerated filtering operations, mesh colors is a realistic alternative for special cases where regular texture-mapping would be cumbersome to work with or produce sub-par results.

Virtual Texturing(Source Code) Software Virtual Textures Virtual Texturing in WebGL
Selection of Just-In-Time texture tiles for the compression of gigapixel textures
ARB_sparse_texture2 Real Virtual Texturing Adaptive Virtual Texture Rendering
Incremental loading of terrain textures
Volume Encoded UV-Maps Bindless Texturing
Virtual texturing is a solution to the problem of real-time rendering of scenes with vast amounts of texture data which does not fit into graphics or main memory. Virtual texturing works by preprocessing the aggregate texture data into equally-sized tiles and determining the necessary tiles for rendering before each frame. These tiles are then streamed to the graphics card and rendering is performed with a special virtual texturing fragment shader that does texture coordinate adjustments to sample from the tile storage texture.

??? ... hmm
Dynamic Occlusion with Signed Distance Fields
Free Penumbra Shadows for Raymarching Distance Fields
Raymarching Distance Fields
Ray Marching Distance Fields in Real-time on WebGL
Raymarching Distance Fields: Concepts and Implementation
Enhanced Sphere Tracing hg_sdf library
Vector-to-Closest-Point Octree for Surface Ray-Casting
GPU Ray Tracer Using Ray Marching and Distance Fields
Raymarching is a 3d-rendering technique, praised by programming-enthusiasts for both its simplicity and speed. It has been used extensively in the demoscene, producing low-size executables and amazing visuals.

DIRT: Deferred Image-based Ray Tracing
Our method, designed entirely on the rasterization pipeline, alters the acceleration data structure construction from a per-fragment to a per-primitive basis in order to simultaneously support three important, generally conflicting in prior art, objectives: fast construction times, analytic intersection tests and reduced memory requirements.
In every frame, our algorithm operates in two stages: A compact representation of the scene geometry is built based on primitive linked-lists, followed by a traversal step that decouples the ray-primitive intersection tests from the illumination calculations; a process inspired by deferred rendering and the path integral formulation of light transport.
Efficient empty space skipping is achieved by exploiting several culling optimizations both in xy- and z space, such as pixel frustum clipping, depth subdivision and lossless buffer down-scaling.
An extensive experimental study is finally offered showing that our method advances the area of image-based ray tracing under the constraints posed by arbitrarily complex and animated scenarios.

Last edited by ThaOneDon (2016-11-18 10:26:40)


#106 2016-11-01 08:10:00


Re: tech thread

Geometry-shader-based real-time voxelization and applications

Real-Time Rendering of Volumetric Clouds
Amortized Noise

Ambient Obscurance/Occlusion/GI
Interactive diffuse global illumination discretization methods for dynamic environments
The solutions proposed in this dissertation are based on approximations that concentrate on discretization methods of the problem domain.
First we considered the creation of a discretized representation of the visibility function around an object, as the exact visibility computation is expensive to compute in real-time. Then we examined the creation of a discretized representation of the incoming light in order to estimate diffuse interactions from multiple light bounces. Finally, we investigated the creation of a discretized representation of the scene geometry and use it for accelerating the above process.

Stochastic Screen Space Reflections(Source Code)

Terrain Rendering/Level-of-Detail(LOD)/Occlusion Culling
Optimisations of the light culling algorithm in a Forward+ Rendering Pipeline
Tile based systems, such as Forward+, utilise general purpose compute to dynamically build linked lists of lights entirely on the GPU, however the effectiveness of the pipeline is heavily dependent on the accuracy and performance of the light/tile intersection tests. This project focuses on the improvement of these intersection tests, providing a more efficient Forward+ rendering pipeline.

A Caching System for a Dependency-aware Scene Graph(Poster)(Full Paper, Source Code included))
This thesis proposes a scene graph caching system that automatically creates an alternative representation of selected subgraphs. This alternative representation poses a render cache in the form of a so-called instruction stream which allows to render the cached subgraph at lower CPU cost and thus more quickly than with a regular render traversal.
In order to be able to update render caches incrementally in reaction to certain scene graph changes, a dependency system was developed. This system provides a model for describing and tracking changes in the scene graph and enables the scene graph caching system to update only those parts of the render cache that needs to be updated.
The actual performance characteristics of the scene graph caching system were investigated using a number of synthetic test scenes in different configurations. These tests showed that the caching system is most useful in scenes with a high structural complexity (high geometry count and/or deep scene graph hierarchies) and moderate primitive count per geometry.

An Efficient Energy Transfer Inverse Kinematics Solution(Source Code - page 102)
Our method builds upon a mass-spring model and relies on force interactions between masses. Joint rotations are computed using the closed-form method with predefined local axis coordinates. Combining these two approaches offers convincing visual quality results obtained with high time performance.

Particle Systems Using 3D Vector Fields with OpenGL Compute Shaders(Source Code - last pages))
Particle systems and particle effects are used to simulate a realistic and appealing atmosphere in many virtual environments. However, they do occupy a significant amount of computational resources. The demand for more advanced graphics increases by each generation, likewise does particle systems need to become increasingly more detailed.
This thesis proposes a texture-based 3D vector field particle system, computed on the GPU, and compares it to an equation-based particle system.
Several tests were conducted comparing different situations and parameters. All of the tests measured the computational time needed to execute the different methods.

Rigid Body Physics for Synthetic Data Generation
For synthetic data generation with concave collision objects, two physics simulations techniques are investigated; convex decomposition of mesh models for globally concave collision results, and a GPU implemented rigid body solver using spherical decomposition and impulse based physics with a spatial sorting-based collision detection.

Fast Collision Culling in Large-Scale Environments Using GPU Mapping Function
In order to take advantage of the high number of cores, a new mapping function is defined that enables GPU threads to determine the objects pair to compute without any global memory access.
These new optimized GPU kernel functions use the thread indexes and turn them into a unique pair of objects to test. A square root approximation technique is used based on Newton’s estimation, enabling the threads to only perform a few atomic operations.
A first characterization of the approximation errors is presented, enabling the fixing of incorrect computations. The I/O GPU streams are optimized using binary masks.

Enhanced Subpixel Morphological Antialiasing(SMAA)(Source Code)
A very efficient GPU-based MLAA implementation, capable of handling subpixel features seamlessly, and featuring an improved and advanced pattern detection & handling mechanism.

Shader Minifier(Source Code)(Latest Binary)

Fast Screen Space Curvature Estimation on GPU(Source-Shader Code/Demo etc)
Curvature is an important geometric property in computer graphics that provides information about the behavior of object surfaces. The exact curvature can only be calculated for a limited set of surfaces description. Most of the time, we deal with triangles, point sets or some other discrete representation of the surface. For those, curvature computation is problematic. Moreover, most of existing algorithms were developed for static geometry and can be slow for interactive modeling.
This paper proposes a screen space method which estimates the mean and Gaussian curvature at interactive rates. The algorithm uses positions and normals to estimate the curvature from the second fundamental form matrix. Using the screen space has advantages over the classical approach: low-poly geometry can be used and additional detail can be added with normal and bump maps.

Temporal Coherence Methods in Real-Time Rendering(Warping library)(Source Code for examples)
Spatial and Spectral Methods for Irregular Sampling in Computer Graphics
Feature Aware Sampling and Reconstruction
Combining displacement mapping methods on the GPU for real-time terrain visualization

??? ... hmm
Denoising Point Sets via L0 Minimization
Surface reconstruction is a widely-used geometry processing tool for digitizing real-world objects. In many cases, the input to a reconstruction algorithm is a point set acquired from the object in question. However, despite new methods and acquisition hardware, errors such as noise and outliers inevitably appear in these point sets. Moreover, the quality of the reconstructed surface strongly depends on the quality of the input point set.
We present an anisotropic point cloud denoising method using L0 minimization. The L0 norm directly measures the sparsity of a solution, and we observe that many common objects can be defined as piece-wise smooth surfaces with a small number of features. Hence, we demonstrate how to apply an L0 optimization directly to point clouds, which produces sparser solutions and sharper surfaces than either the L1 or L2 norms.
Our method can faithfully recover sharp features while at the same time smoothing the remaining regions even in the presence of large amounts of noise.

Last edited by ThaOneDon (2017-02-08 10:29:13)


#107 2016-12-02 23:02:16


Re: tech thread

GPGPU Scalable Compiler Optimizations
Unite 2016 - Tools, Tricks and Technologies for Reaching Stutter Free 60 FPS in INSIDE

Ambient Obscurance/Occlusion/GI
original Spherical Harmonics(Source Code, etc)

Neural Network Ambient Occlusion(NNAO)(Homepage(Source Code/Shaders/Filters))
We build a database of camera depths, normals, and ground truth ambient occlusion as calculated using an offline renderer, and use a neural network to learn a mapping from the depth and normals surrounding the pixel to the ambient occlusion of that pixel. Once trained we convert the neural network into an optimised shader which is more accurate than existing techniques, has better performance, no user parameters other than the occlusion radius, and can be computed in a single pass allowing it to be used as a drop-in replacement.

Shadows/Shadow Mapping-Volumes
An evaluation of moving shadow detection techniques
Shadows of moving objects may cause serious problems in many computer vision applications, including object tracking and object recognition. In common object detection systems, due to having similar characteristics, shadows can be easily misclassified as either part of moving objects or independent moving objects. To deal with the problem of misclassifying shadows as foreground, various methods have been introduced. This paper addresses the main problematic situations associated with shadows and provides a comprehensive performance comparison on up-to-date methods that have been proposed to tackle these problems.

Convex Hull Problems(Streaming Geometry)(Source Code)
The convex hull is a well-studied problem with a large body of results and algorithms in a variety of contexts.
We consider three contexts: when only an approximate convex hull is required, when the input points come from a (potentially unbounded) data stream, and when layers of concentric convex hulls are required.
Existing algorithms for these problems either do not achieve optimal runtime and linear space, or are overly complex and difficult to implement and use in practice. This thesis remedies this situation by proposing novel algorithms that are both simple and optimal. The simplicity is achieved by independently computing four sets of monotone convex layers in time and linear space. These are then merged together in O(n log n) time.

Project Chrono(Source Code)
An Multi-physics Simulation Engine/C++ Library based on a platform-independent open-source design.

Variance reduction using interframe coherence for animated scenes
In an animated scene, geometry and lighting often change in an unpredictable way. Rendering algorithms based on various methods are usually employed to precisely capture all features of an animated scene. However, often these methods typically take a long time to produce a noise-free image.
In this paper, we propose a variance reduction technique which exploits coherence between frames.
Firstly, we introduce a dual cone model to measure the incident coherence intersecting camera rays in object space. Secondly, we allocate multiple frame buffers to store image samples from consecutive frames. Finally, the color of a pixel in one frame is computed by borrowing samples from neighboring pixels in current, previous, and subsequent frames. Our experiments show that noise is greatly reduced by our method since the number of effective samples is increased by use of borrowed samples.

Last edited by ThaOneDon (2016-12-28 18:17:49)


#108 2017-01-08 07:59:57


Re: tech thread

PBR (Physically Based Rendering)
Renderers Laugh Engine(Vulkan based)

Ambient Obscurance/Occlusion/GI
Real-Time Global Illumination using Precomputed Light Field Probes (Homepage(Source Code etc)) NVIDIA's

Raytracing Reflection, Refraction, Fresnel, Total Internal Reflection, and Beer’s Law(Shader Code)

Cubemap based collision detection
The usual algorithm requires to compute an octree for the scenery meshes. Then collisions between the character and the scenery are computed using sphere-octree collision detection algorithm. The octree can be either precomputed and included into meshes data, or computed at the loading of the application.
Our algorithm computes physics by rendering a world axis aligned depth cubemap. It can work with low end graphic devices, and computations are done mainly on GPU.

Terrain Rendering/Level-of-Detail(LOD)/Occlusion Culling
Irregular Morphing for Real-Time Rendering of Large Terrain

Efficient Approximation of HRTF in Subbands for Accurate Sound Localization(Source Code)
Results indicate that the proposed algorithms preserve the salience of spatial cues, even for relatively high approximation tolerances, yielding computationally very efficient implementations.

High Dynamic Range Imaging Pipeline on the GPU
In this article we aim to fill a gap of providing a detailed description of how the HDRI pipeline, from HDR image assembly to tone mapping, can be implemented exclusively on the GPU. We also explain the trade-offs that need to be made for improving efficiency and show timing comparisons for CPU vs GPU implementations.
Another goal of this paper is to demonstrate how both the global and local versions of this operator can be efficiently implemented by using fragment shaders. Different from previous work, we will show that the implementation of this operator neither requires expensive convolution nor Fourier transform operations to compute local adaptation luminances.

Last edited by ThaOneDon (2017-01-15 07:54:55)


#109 2017-01-12 20:46:29


Re: tech thread

Hey ThaOneDon,
  It's great that you're compiling a list of interesting technologies, but the list has grown to the size that it's overwhelming for anyone who is wants to take a look.
  I would recommend instead keeping a very short and concise list of suggestions, providing your own reasoning on why it should be implemented, backed by your own experience in that system. Throwing out information like this (much of which is quite irrelevant to Tesseract) is not likely to attract any attention.
  Better yet, make your own proof-of-concept by integrating the changes yourself and show it off.
  Best of luck!

Last edited by Hypernova^ (2017-01-12 20:48:45)


#110 2017-01-12 23:34:32


Re: tech thread

Hypernova^ wrote:

Hey ThaOneDon,
  It's great that you're compiling a list of interesting technologies, but the list has grown to the size that it's overwhelming for anyone who is wants to take a look.
  I would recommend instead keeping a very short and concise list of suggestions, providing your own reasoning on why it should be implemented, backed by your own experience in that system. Throwing out information like this (much of which is quite irrelevant to Tesseract) is not likely to attract any attention.
  Better yet, make your own proof-of-concept by integrating the changes yourself and show it off.
  Best of luck!

I think at this point it's more of a tech blog thing and he just adds to it every one in a while when he stumbles onto something interesting :P


#111 2017-01-13 05:48:45


Re: tech thread

I`ve already started to scrub stuff that overlaps or is replaced by something better.
I`m also doing all i can think of to format it in a way that its easier to find.
How "compact" it is depends on tech/papers themselves, if stuff is completely different i can`t throw it out. (Source Code etc is also very hard to find actually, its lucky if any of this has any)
Everything i`ve added is already in the mindset that it has to be somehow beyond what Tesseract accomplishes to limit the scope realistically.
I`ve also started to keep alternatives. Which i guess i should limit to something like 2.
Reason i`m not keeping it short is because i want this to be definitive, most effective, up-to-date and useful tech "post" or blog etc that covers all the bases... i can make. lol
I think its getting plenty of views, 60k is not bad.

Last edited by ThaOneDon (2017-01-13 06:14:23)


#112 2017-02-01 00:58:59


Re: tech thread

An Incremental Rendering VM

Real-Time Editing of Procedural Terrains More(Source Code etc)

Ambient Obscurance/Occlusion/GI
Screen Space Reflections in Killing Floor 2(Source Code included)

Generalized Canonical Time Warping(Source Code)

Generic Convex Collision Detection using Support Mapping
libccd(Source Code) - 3-clause BSD Licensed. Open Source.

Defending Continuous Collision Detection against Errors
Numerical errors and rounding errors in continuous collision detection (CCD) can easily cause collision detection failures if they are not handled properly.
This paper demonstrates a set of simple modifications to make a basic CCD implementation failure-proof. Using error analysis, we prove the safety of these methods and we formulate suggested tolerance values to reduce false positives.

Improved Geometry Buffer Anti-Aliasing(GBAA+)(Source Code)
GBAA is an improved version GPAA.
The underlying idea is that instead of looking for sharp edges in the original image to assess the location of the geometric edges, you can use the information about the edges in a "pure form", having received it from the renderer.
Actual improvement lies in the fact that the direction and distance to the boundaries of the triangles are calculated in a geometry shader, which eliminates the need for pre-processing of geometry and rasterization of lines, reduces memory usage and, most importantly, eliminates the dependence of the performance on the geometric complexity of the scene.

Fundamental computational geometry on the GPU

Last edited by ThaOneDon (2017-02-08 10:06:29)


#113 2017-02-28 21:29:55


Re: tech thread

Reversed-Z Logarithmic
A Method for Automatically Creating and Using Billboards to Increase the Speed of Object Rendering
Real Time Depth Sorting of Transparent Fragments
Phenomenological Transparency
Infinite Sparse Volumes
Real-Time Volumetric Lighting using SVOs
Generating Compelling Procedural 3D Environments and Landscapes

A Non-linear GPU Thread Map for Triangular Domains
There is a stage in the GPU computing pipeline where a grid of thread-blocks, in parallel space, is mapped onto the problem domain, in data space. Threads that fall inside the domain perform computations while threads that fall outside are discarded at runtime.
In this work we study the case of mapping threads efficiently onto triangular domain problems and propose a block-space linear map λ(ω), based on the properties of the lower triangular matrix, that reduces the number of unnecessary threads from O(n2) to O(n).
This study is about the performance of algorithms, with similar purpose as Carmack and Lomont implementation of square root using three iterations of the Newton-Raphson method and the magic number “0x5f3759df”.

Ambient Obscurance/Occlusion/GI/Reflections
Global illumination effects with sampled geometry

Wrap Shading Extension to Energy-Conserving Wrapped Diffuse

Optimizing a Water Simulation based on Waterfront Parameter Interpolation
Automatic Optimization for Large-Scale Real-Time Coastal Water Simulation
Real-time Interactive Water Waves
Real-Time Screen Space Fluid Rendering with Scene Reflections
To solve the singular problem of water waves obtained with the traditional model, a hybrid deep-shallow-water model is estimated by using an automatic coupling algorithm. It can handle arbitrary water depth and different underwater terrain. As a certain feature of coastal terrain, coastline is detected with the collision detection technology. Then, unnecessary water grid cells are simplified by the automatic simplification algorithm according to the depth. Finally, the model is calculated on CPU and the simulation is implemented on GPU.

Terrain Rendering/Level-of-Detail(LOD)/Occlusion Culling
Real-Time Level-of-Detail (Algorithms, Notes, Code etc, ALL included)

6D Frictional Contact for Rigid Bodies
Enhanced FFD-AABB Collision Algorithm for Deformable Objects
Particle Simulation with GPUs

A Novel GPU-Based Deformation Pipeline
Deformation Pipeline that is independent of the integration solver used and allows fast rendering of deformable soft bodies on the GPU. The proposed method exploits the transform feedback mechanism of the modern GPU to bypass the read-back, thus, reusing the modified positions and/or velocities of the deformable object in a single pass in real time.

Steam Audio(License) (Copyrighted - Valve Corporation) - HRTF etc, Requires a License but Free
Geometric-based reverberator using acoustic rendering networks


Geometry Batching Using Texture-Arrays
Batching can be used to group and sort geometric primitives into batches to reduce the number of required state changes, whereas the size of the batches determines the number of required draw-calls, and therefore, is critical for rendering performance.
For example, in the case of texture atlases, which provide an approach for efficient texture management, the batch size is limited by the efficiency of the texture-packing algorithm and the texture resolution itself.
This paper presents a pre-processing approach and rendering technique that overcomes these limitations by further grouping textures or texture atlases and thus enables the creation of larger geometry batches. It is based on texture arrays in combination with an additional indexing schema that is evaluated at run-time using shader programs.
Basically, facilitates a flexible partitioning of geometry.

Exact, robust, and efficient regularized Booleans on general 3D meshes
Besides their utility and importance, Booleans are challenging to compute when dealing with meshes, because of topological changes, geometric degeneracies, etc.
We overcome these limitations and present an exact and robust approach performing on general meshes, required to be only closed and orientable.
Our method is based on a few geometric and topological predicates that allow to handle all input/output cases considered as degenerate in existing solutions, such as voids, non-manifold, disconnected, and unbounded meshes, and to robustly deal with special input configurations.

Robust Polyhedral Minkowski Sums with GPU Implementation
A convolution algorithm for Minkowski sums of polyhedra with robust CPU and GPU implementations.
The algorithm contains several innovations that support distributed computation. The computational bottleneck is finding the intersecting pairs of facets. We enabled a distributed algorithm by creating a novel type of kd-tree that eliminates duplicate entries without using global memory.
The memory bottleneck is the arrangements of the facets of the convolution. We removed this bottleneck by processing the facets in groups and by removing most of the blocked geometry.
We solved the robustness problem, which is the primary implementation challenge for computational geometry algorithms, using our ACP strategy.

The Implementation of a Scalable Texture Cache(Source Code)

Real Time Rendering of Parametric Surfaces on the GPU (Algorithms, Notes, Code etc, ALL included)

Dynamic and Robust Local Clearance Triangulations
A optimization of A* algorithm to make it close to human pathfinding behavior
Time-Bounded Best-First Search for Reversible and Non-reversible Search Graphs

Optimization Techniques for 3D Graphics Deployment
A Catalog of Stream Processing Optimizations
Light Shafts Rendering for Indoor Scenes
On Some Interactive Mesh Deformations
Adaptive Physically Based Models in Computer Graphics
Algorithms and Criteria for Volumetric Centroidal Voronoi Tessellations
Shadow Mapping Algorithms: Applications and Limitations
A Comprehensive Study on Pathfinding Techniques for Robotics and Video Games

??? ... hmm
Fast Data Parallel Radix Sort Implementation in DirectX 11 Compute Shader
The algorithms implement several optimization techniques to take advantage of the HW architecture such as:
taking advantage of kernel fusion strategy, the synchronous execution of threads in a warp/waveform to eliminate the need for barrier synchronization, using shared memory across threads within a group, management of bank conflicts, eliminate divergence by avoiding branch conditions and complete unrolling of loops, use of adequate group/thread dimensions to increase HW occupancy and application of highly data-parallel algorithms to accelerate the scan operations.

Simulating Rigid Body Fracture with Surface Meshes More Notes/Source Code etc
By combining an indirect boundary integral formulation, explicit surface tracking and a kernel-independent fast multipole method, presented method is effective for rigid body brittle fracture using the boundary surface mesh only.
Existing explicit mesh tracking methods are modified to support evolving cracks directly in the triangle mesh representation, giving highly detailed fractures with sharp features, independent of any volumetric sampling (unlike tetrahedral mesh or level set approaches) and avoids the need for calculations; the triangle mesh representation also allows simple integration into rigid body engines.
It is accurate, and at the same time computationally economical, and it successfully resolves crack evolution in various settings.

A study of parallelism-locality tradeoffs across memory hierarchy
They first study parallelism vs. locality tradeoffs in each layer of the memory hierarchy, as well as the cross-layer interactions.
Using the observations from the characterization study they propose a dynamic memory migration technique which optimizes both parallelism and locality metrics in the memory subsystem.
Breaking the application frames into smaller ones to exploit the memory locality and reduce the memory bandwidth requirements significantly. (Cooperative Parallelization)

PICKLOCK: A Deadlock Prediction Approach under Nested Locking
The solution proposed for predicting potential deadlocks and for confirming them involves taking a concurrent program and a test harness, executing the program under test to get an arbitrarily interleaved execution, and then predicting alternate executions leading to deadlocks.
Finally, in order to check if a real deadlock has been found, the program being tested is re-executed precisely under these predicted deadlocking schedules.
The algorithm is based on lock-sets and acquisition histories, which only ensure that the predicted run respects lock acquisitions and releases in the run.
The crucial observation is that acquisition histories give not only enough traction to detect alternate deadlocking interleavings, but also provide an effective mechanism to re-schedule the precise interleaving under which deadlock will occur; the latter helps our re-execution engine to run the predicted schedule and confirm the deadlock, which entirely eliminates false positives.

Last edited by ThaOneDon (2017-08-06 15:58:42)


#114 2017-04-01 04:43:48


Re: tech thread

Fragment Reduction on GPU with Content Adaptive Sampling
Small G-Buffers
Complex Transformative Portal Interaction
Procedural Terrain Generation using a Level of Detail System(More Source Code)

Ambient Obscurance/Occlusion/GI/Reflections
A Radiance Cache Method for Highly Glossy Surfaces
Subsurface Scattering-Based Object Rendering Techniques
Horizon Occlusion for Normal Mapped Reflections

Animated Foliage and Cloth(gif of this)
The GPU is very effective doing vector math and the vertex program is already looping through all of the vertices to convert them to screen space and send them to the fragment program. Before this we can simply add a value to these positions before sending them down the line.
By supplying properties for the material stiffness wind direction and wind speed a more realistic look can be achieved which correlates more to what actually happens in nature. A second property of how attached parts of an object is could be supplied through vertex color.
The second scenario relates to vegetation. By instead building the bending properties into the shader you wouldn’t need any collision detection for each individual plant, instead just look at the distance between vertex and player and scale the bending based on this.

Parallel explicit FEM algorithms using GPU's

(Alpha Mipmaps)

CoSMo: Intent-based Composition of Shader Modules
Shading Framework for Modern Rendering Engines
A Shader Framework for Rapid Prototyping
Automated Combination of Real-Time Shader Programs
The SuperShader
SuperShader aka Uber-Shader aka Meta Shader

Filtering Non-Linear Transfer Functions on Surfaces Supplemental(Algorithms, Notes, Code etc, ALL included)

Feature-Adaptive Catmull-Clark Subdivision on the GPU(Algorithms, Notes, Code etc, ALL included)
Feature-Adaptive Rendering of Loop Subdivision Surfaces on Modern GPUs

Algorithms for Efficient Computation of Convolution

??? ... hmm
Shading with Dynamic Lightmaps
They implement a system that bypasses disadvantage of lightmaps only working with static lights.
How its possible to store the light distribution among a scene from a moving light source with a cyclic path in a small finite amount of processed samples preserving the feeling of continuity between one sample and another.
Various techniques utilized to solve these problems are also useful outside the scope of this paper.

Job System and ParallelFor

Last edited by ThaOneDon (2017-04-19 16:09:54)


#115 2017-05-02 05:37:27


Re: tech thread

PBR (Physically Based Rendering)
A Physically-Based Reflectance Model Combining Reflection and Diffraction

Terrain Rendering/Level-of-Detail(LOD)/Occlusion Culling/Caching
Applying Tessellation to Clipmap Terrain Rendering More(solutions for clipmap/heightmap issues)
Implementation Details of Sample App Using Hybrid Terrain Representation(voxels+heightmap, and deformation)

A Bigger Mathematical Picture for Computer Graphics
Fast Fourier Transform Efficient FFT Algorithms

??? ... hmm
Revised fast convolution
Convolution is a mathematical tool used in filtering, correlation, compression and in many other applications. Although the concept of convolution is not new, the efficient computation of convolution is still an open topic. As the burden of data is constantly increasing, there appears request for fast manipulation with large data.
The fast convolution have been proposed to recursively determine if one new signal sample or new small portion of samples emerge in the given period N of a realization x(n) replacing the old one sample or old portion of samples, respectively. The number of operations for their speedy calculating is essentially reduced by the original recursive expression in comparison with the ordinary FFT procedure used only in the case of fixed values of samples x(n). The recursive algorithm could be effective in real-time applications for very large N.

Last edited by ThaOneDon (2017-05-09 06:54:11)


#116 2017-06-11 18:58:38


Re: tech thread

Flying Edges: A High-Performance Scalable Isocontouring Algorithm

Dijkstra-based Terrain Generation Using Advanced Weight Functions

Conformal Surface Morphing with Applications on Facial Expressions
Morphing is the process of changing one figure into another. Some numerical methods of 3D surface morphing by deformable modeling and conformal mapping are shown in this study. It is well known that there exists a unique Riemann conformal mapping from a simply connected surface into a unit disk by the Riemann mapping theorem. a 3D surface deformable model can be built via various approaches such as mutual parameterization from direct interpolation or surface matching using landmarks. In this paper, they take the advantage of the unique representation of 3D surfaces by the mean curvatures and the conformal factors associated with the Riemann mapping.
In 3D Surface Morphing similar to the traditional morphing approaches based on boundary representation, a wrap has to be created via feature correspondence and interpolation between shapes based on the wrap is employed to generate the morphing sequence. By taking advantage of the conformal parameterization and the unique surface representation of conformal factor and mean curvature, the wrap can be easily obtained by the composition of deformations from the Mobius transformation and the thin-plate matching function. To mimic the non-isomorphic risk that usually occurs in matching largely deformed surfaces, a single mesh based on geodesic frame is employed. As a result, the correspondence, including geometric information and texture information, of the whole surface can be defined and interpolation among original surface and target surface can be computed by the usual cubic spline homotopy in a disk parametric domain. This non-linear iterative surface reconstruction algorithm can be accelerated by using the multigrid method on a uniform mesh by which multi-resolution surfaces can also be obtained.
Several numerical experiments of the face morphing are presented to demonstrate the robustness of
this approach.

Efficient and Reliable Self-Collision Culling using Unprojected Normal Cones
Simple and linear time algorithm to perform the normal cone test using the unprojected 3D vertices, which reduces to a sequence point-plane classification tests.
Moreover, they present a hierarchical traversal scheme that can significantly reduce the number of normal cone tests and the memory overhead using front-based normal cone culling. The overall algorithm can reliably detect all (self) collisions in models composed of hundred of thousands of triangles.
There is a general perception that the overhead of normal cone tests is high and its applications
has mostly been limited to DCD. Presented unprojected contour test provides big improvement over prior continuous contour tests for CCD. It has no preprocessing overhead and can perform fast collision queries on complex benchmarks on a single CPU core.

Efficient HRTF-based Spatial Audio for Area and Volumetric Sources

Evaluating BVH splitting strategies More TSS BVH
A Comparative Study on a Novel Drawcall-Wise Visibility Culling and Space-Partitioning Data Structures

??? ... hmm
Parallel Computing and Optimization for Radiosity(Source Code etc)
Radiosity for Real-Time Simulations of Highly Tessellated Models
Real-Time Dynamic Radiosity for High Quality Global Illumination
Perspective-Driven Radiosity on Graphics Hardware
Image-space radiosity lighting method for dynamic and complex virtual environments
Techniques based around Radiosity that provides unique advantages.

Last edited by ThaOneDon (2017-06-25 11:00:30)


#117 2017-07-08 21:38:38


Re: tech thread

PBR (Physically Based Rendering)
Real-time 2D manipulation of plausible 3D appearance using shading and geometry buffers

Terrain Rendering/Level-of-Detail(LOD)/Occlusion Culling/Caching
Deferred Shading of Transparent Surfaces With Shadows and Refraction

Height map compression techniques
In practice, many applications handle the real-time rendering well with LOD schemes tailored to their needs.
In such cases, a compression method tied to a concrete LOD scheme is not feasible.
This method handles only the compression, so it can be used as a plug & play component in an existing real-time renderer. Its only job is to compress a block of terrain height samples sized 2nx2n and to provide fast progressive decompression of its mip-maps, while respecting the maximum error bound at every mip-map. The source code of the method is written modularly, so that any representation of the height samples can be compressed - doubles, floats or even arbitrary structures. It is inspired by C-BDAM - the compression method is extracted from the LOD scheme and simplified.
This approach introduces heavy redundancy of the data - a block corresponding to a certain quadtree node contains simplified blocks of its children and all these blocks are stored separately. The reason why this approach is used is that the user can navigate to any area almost immediately - only the data needed for the scene has to be fetched, without having to reconstruct it by traversing from the root. Moreover, this approach enables the user to flexibly extend the terrain data by high-resolution insets.
This algorithm should be able to compress a regular square block of height samples and progressively decompress it in the real-time, from the smallest mip-map to the largest one. Apart from this, the algorithm should not in any way interfere with the rendering pipeline of the application.

Controlling and Sampling Visibility Information on the Image Plane
Visibility-induced aliasing can be reduced substantially by, first, choosing a suitable function space that admits a sampling theorem for the given locations; second, determining the pre-filtering of the step function for this space; third, constructing a sampling theorem with the given locations; and fourth, deriving the quadrature weights from the sampling theorem.
They applied their methodology to the classical setting of bandlimited functions but also considered shift invariant spaces. Also demonstrated that the better spatial localization of the kernel functions in the latter setting compared to the sinc-function also yields lower error rates.

??? ... hmm
GPU Ray-Traced Collision Detection: Fine Pipeline Reorganization

Last edited by ThaOneDon (2017-07-30 11:21:32)


#118 2017-08-06 16:59:29


Re: tech thread

Forward And Backward Reaching Inverse Kinematics(FABRIK)(Source Code)

Compromise-free Pathfinding on a Navigation Mesh(Source Code)

Last edited by ThaOneDon (2017-08-19 13:20:07)


#119 2017-09-13 05:04:48


Re: tech thread

- Clustered, Forward+, Deferred -
Volume Tiled Forward Shading(Source Code/etc included)

Ambient Obscurance/Occlusion/GI/Reflections
Frequency Based Radiance Cache for Rendering Animations

Analyzing and Predicting Anisotropic Effects of BRDFs
The majority of the materials we encounter in the real-world have variable reflectance when rotated along a surface normal, azimuthally-variable behavior known as visual anisotropy. Such behavior can be represented by a fourdimensional anisotropic BRDF that characterizes the anisotropic appearance of homogeneous materials. Unfortunately, most past research has been devoted to simplistic three dimensional isotropic BRDFs.
In this paper, they analyze and categorize basic types of BRDF anisotropy, use a psychophysical study to assess at which conditions can isotropic appearance be used without loss of details in material appearance. To this end, they tested the human impression of material anisotropy on various shapes and under two illuminations. Concluding that subjects sensitivity to anisotropy declines with increasing complexity of 3D geometry and increasing uniformity of illumination environment.
Next, they proposed two anisotropy measures; while the first one is based on entire BRDF information, the second one requires only a sparse subset of reflectance values. Both measures have a similar performance on the tested dataset, and we have shown a positive correlation with results of the psychophysical study. The achieved results demonstrate that the proposed anisotropy measures can be considered as a promising approximation of human perception of real-world visual anisotropy.

Real-Time Light Transport in Analytically Integrable Participating Media(More, Source Code etc)
Real-time rendering of participating media, such as fog is an important problem, because such media significantly influence the appearance of the rendered scene. Physically correct solution involves a costly simulation of a very large number of light-particle interactions, especially when considering multiple scattering.
The existing real-time approaches are mostly based on empirical or single-scattering approximations, or only consider homogeneous media. This work briefly examines the existing solutions and then presents an improved method for real-time multiple scattering in quasi-heterogeneous media. Inherent visual artifacts are minimized with several techniques using analytically integrable density functions and efficient MIP map filtering.
The resulting highly-parallel method achieves good visual fidelity and has a stable computation time of only a few milliseconds per frame.

A Closed-Form Model for Image-Based Distant Lighting
Optimized Rendering Techniques Based on Local Cubemaps
The key idea proposed in this paper is the fact that any light source can be modeled as an area light source. For instance, a spot light or a directional linear light can both be modeled as special cases of an area light source, where the dimensionality has reduced. Similarly, environment lighting using cubemaps may be viewed as the limiting case of pointwise varying multiple area light sources.
Therefore, in this paper, its first shown how the light integral can be solved in closed-form for a constant area light source of rectangular shape. They apply their solution to rendering lambertian and Phong-like materials.
Because of the closed-form nature of this solution, no sampling is required and noise is completely eliminated. On the other hand, the lack of requirement for sampling reduces the rendering time significantly, making it dependent only on the complexity of the object (i.e. the number of triangles used to represent it) for a constant area light source, and dependent on the required highest light frequency in the case of pointwise varying environment lighting.

Area Lights(Source Code included)
Explanations for efficient punctual light source rendering, like point, spot and directional lights. Most games get away with using these simplistic light sources.
Inspired by Frostbite`s representative point method. What this technique essentially does is that you keep the specular calculation, but change the light vector. The light vector was the vector pointing from the light position to the surface position. But for lights, we are not interested in the reflection between the light’s center and the surface, but between the light “mesh” and the surface.

Shadows/Shadow Mapping-Volumes
Non-Linearly Quantized Moment Shadow Maps(Source Code included)
Moment Shadow Maps enable direct filtering to accomplish proper antialiasing of dynamic hard shadows. For each texel, the moment shadow map stores four powers of the depth in either 64 or 128 bits. After filtering, this information enables a heuristic reconstruction. However, the rounding errors introduced at 64 bits per texel necessitate a bias that strengthens light leaking artifacts noticeably. In this paper, they propose a non-linear transform which maps the four moments to four quantities describing the depth distribution more directly.
As a prerequisite for the use of its quantization schemes, they propose a compute shader that applies a resolve for a multisampled shadow map and a 9² two-pass Gaussian filter in shared memory. The quantized moments are written back to device memory only once at the very end. This approach makes the technique roughly as fast as
Variance Shadow Mapping without any of its drawbacks.
Since hardware-accelerated bilinear filtering is incompatible with non-linear quantization, they employ
blue noise dithering as inexpensive alternative to manual bilinear filtering.

Output Sensitive Collision Detection for Unisize Boxes

Simplified and Tessellated Mesh for Realtime High Quality Rendering

Last edited by ThaOneDon (2017-09-23 20:14:51)


#120 2017-10-01 04:23:15


Re: tech thread

Ambient Obscurance/Occlusion/GI/Reflections
Real-time Rendering of Translucent Material by Contrast-Reversing Procedure
The conventional method of rendering the translucence of an object is difficult to implement in real time, since the translucency is accompanied by complicated light behavior such as scattering and absorption. To simplify this rendering process, they focus on the contrast-reversing stimulant property in vision science. This property is based on the perception that we can recognize a luminance histogram compatible between scattering and absorption. According to this property, they propose a simple rendering method to reverse the light path between reflection and transmission.
Their method adopts an additional function for selecting a front or back scattering process in the calculation of each pixel value. Because this improvement makes only slight alterations in the conventional reflection model, it can reproduce a translucent appearance in real time while inheriting the advantages of various reflection models.

Little Lightmap Tricks

Meshfree C2-Weighting for Shape Deformation

A General Framework for Constrained Mesh Parameterization
Parameterizing or flattening a triangle mesh is necessary for many applications in computer graphics and geometry. Certain downstream applications require adherence to more general, geometric constraints – possibly at the cost of higher distortion. By means of this method various geometric features, such as, lines, circular arcs and various subregions can be constrained, while the energy is also minimized, providing a more general solution than previous approaches. Presented framework is motivated by the As-Rigid-As-Possible parameterization method, and demonstrates its effectiveness through several examples. The method can easily be adapted to parameterization methods that minimize alternative distortion measures.

Semi-Calibrated Near-Light Photometric Stereo

??? ... hmm
Ray Tracing Surface Patches

Exploring Clustering Algorithms in the Appearance Modeling Problem
Modeling the appearance of a given material is a complex task with many approaches in the literature.
Solutions such as measuring the BRDF of the material or using a linear combination of an existent BRDF-basis to approximate the material’s appearance are time and resource consuming.
In this paper they used two classical and one evolutionary clustering algorithms to reduce the number of terms in the linear combination and a NNLS procedure to estimate the contributions of each BRDF-base in the reproduction of a desired material.

Guided Robust Matte-Model Fitting for Accelerating Multi-light Reflectance Processing Techniques
To recover information on objects shape and appearance, the matte model is used directly or combined with specialized methods for modeling high-frequency behaviors. Multivariate robust regression offers a general solution to reliably extract the matte component when source data is heavily contaminated by shadows, inter-reflections, specularity, or noise. However its usually very slow.
In this paper, they accelerate robust fitting by drastically reducing the number of tested candidate solutions using a guided approach. Method propagates already known solutions to nearby pixels using a similarity-driven flood-fill strategy, and exploits this knowledge to order possible candidate solutions and to determine convergence conditions.

Last edited by ThaOneDon (2017-11-01 08:10:27)


#121 2017-11-15 18:38:17


Re: tech thread


#122 2017-12-11 22:08:12


Re: tech thread

- Clustered, Forward+, Deferred -
(Frustum vs Pyramid intersection(More Culling)
Pyramid vs Frustum tests are useful because Pyramids are a good way to enclose a cone that fits relatively tight
(i.e. it’s much tighter than a sphere, an AABB, or an OOB).

Tiny Clouds
Every pixel does a ray march from far to near. It does it backwards to make for simpler alpha blending math.
At every ray step, it samples FBM data to figure out if the current position is below surface of the cloud or above it.
If below, it alpha blends the pixel color with the cloud color at that point, using the vertical distance into the cloud as the cloud density.

Terrain Rendering/Level-of-Detail(LOD)/Occlusion Culling/Caching
Sphere Projection
In computer graphics very often you want to know how big an object looks in screen, probably measured in pixels. Or at least you want to have an upper bound of the pixel coverage, because that allows you to perform intelligent
Level of Detail (LOD) for that object. For example, if a character or a tree are not but a couple of pixels in screen, you probably want to render them with less detail. One easy way to get an upper bound of the pixel coverage is to embed your object in a bounding box or sphere, then rasterize the sphere or box and count the amount of pixels.
This requires complexity in your engine, and probably some delayed processing as the result of that rasterization won't be immediately ready.
It would be cool if a tessellation shader or a geometry shader would be able to tessellate or kill geometry on the fly based on the pixel coverage of the object, just immediately.
The pixel coverage of a (bounding) sphere happens to have analytic expression can be solved with no more than
one square root, its very compact.

GPU Assisted Self-Collisions of Cloths
Fixed Spherical n Points Density-Based Clustering Technique for Cloth Simulation
During the first step of these algorithms, the compute shader is dispatched several times since time step of the simulation is very small. It of course, slows down the simulation a bit cause the cloth simulation has to be performed several times per frame, but it is a price that needs to be paid if we want to observe the simulation in a reasonable time.
During multiple times when being invoked they read the initial positions and velocities from the first position and velocity buffers, modify these data and save them to the other two position and velocity buffers (so the newest position and velocity data are in second buffers).
After the dispatch compute calls they issue memory calls to make sure that all shader writes have completed. Then, they swap buffers so in the next compute shader dispatch they will read data from second pair of the buffers and write to the first pair and so on.
Performs really well despite the use of naive algorithm for checking constraints and integration method that forces us to evaluate same shaders too many times.

Collision Detection between Dynamic Rigid Objects and Static Displacement Mapped Surfaces
More accurate backward projection methods: Aligned projection direction method and the alpha plane method are derived and compared with the old method. Both these new methods perform well in the bad cases where the old backward projection can result in artifacts. However, the aligned projection direction method has errors. Alpha plane method is safer to use with no such problems. It also has fewer iterations and a higher accuracy.
A better bounding plane is integrated in the early out part, which helps filtrate potential colliding primitives and objects and helps to reduce the objects to be tested in the narrow phase. Multiresolution bounding surfaces are generated by solving inequality-constrained nonlinear programming problems with consideration of extreme cases. With these upper and lower bounding surfaces, multiresolution convex
bounding volumes are constructed. Convex versus convex collision detection is performed between the bounding volume and the query object. Collision detection methods differ in resolutions.
The multiresolution bounding volume collision detection highly reduces the number of objects to be tested with the terrain surface. Compared with the old collision detection method which tests every vertex in the query object, the multiresolution detection reduces the number of vertices to be tested, i.e. the number of backward projections.

Dithering Improved box/triangle filtering
To give your eyes the best chance at recombining everything, dithering works best when the dither pattern dots have a 1:1 correlation with the output pixels. But, correlating only with the output means that as a scene post effect there's no connection between the geometry being rendered and the pattern that thresholds it. Each frame, moving scene elements threshold against different values. What we want instead is for the dither pattern to be "pinned" to the geometry and to appear stable as it moves with the rest of the scene.
The core of this is a mapping problem. As told by the length of this post, there's a conflict between the ideal dither pattern mapping (1:1 with the screen) and the ideal scene mapping (x:1 with the geometry) so get ready for some compromises. Most of this work was focused on mapping the input dither pattern into different spaces that better correlate the pattern with the scene geometry. Everything here is done at the pre-thresholding stage.

GPU-based particle simulation(Source Code included)
The old system could spawn particles on the surface on a mesh with a starting velocity of each particle modulated by the surface normal. It kept a copy of each particle on CPU, updated them sequentially, then uploaded them to GPU for rendering each frame. The new system needed to keep the same set of features at a minimum, but GPU simulation also opens up more possibilities because we have direct access to resources like textures created by the rendering pipeline. It is also highly parallellized compared to the CPU solution, both the emitting and the simulation phase which means we can do a much higher amount of particles in the same amount of time. There is less data moving between the system and GPU, we can get away with only a single constant buffer update and command buffer generation, the rest of the data lives completely in VRAM. This makes simulation on a massive scale a reality.

3D Mesh Simplification
Decomposition with plane fit approach models every block of a decomposition as a planar area. Unlike block based Quadtree, this decomposes the depth image into homogeneous triangular areas. The algorithm is developed using a recursive technique. By taking several samples of depth within the current block one can estimate underlying depth plane parameters. In this binary tree approach is followed. First, the block is divided into half horizontally. For each block, triangles are formed by cutting the block in different diagonals. A plane is fitted for these triangles. If the error produced by the triangles is less than a threshold then the block is not divided otherwise the block is further divided vertically. This process is followed until there are no more blocks to divide or the condition is met.

Last edited by ThaOneDon (2017-12-24 14:59:09)


#123 2019-09-04 23:14:55


Re: tech thread

Improved Geometric Specular Antialiasing(IGSA) Supplemental
Specific ways to adjust Lighting model and NDF Filtering and few more things in pixel shader.
Limited to specular aliasing mostly.


#124 2019-10-08 19:44:55


Re: tech thread

Surface Gradient Based Bump Mapping
Some of the better techniques related to normal/bump mapping.


#125 2020-02-27 01:47:04


Re: tech thread

Texture-space Decals Another
There are few ways to do decals which are used in games to draw images onto others surfaces but most of them have different tradeoffs. Rendering into texture space is one of them.

Way to blend between textures, most common example of this is terrain. Explains an effect where we can use additional lerp interpolation and height data to control exactly where the blending should occur.

Last edited by ThaOneDon (2020-02-27 02:13:49)


Board footer