I've been interested in programming (and computer graphics in particular) ever since I got my first 286 and the explosive
banana throwing gorillas that came with it, but it didn't really take off until many years later,
when I found myself messing around with Dark Basic, trying to force the game maker's scripting engine
into doing all sorts of sadistic things it wasn't meant for, like rendering 1996-era game levels triangle
by triangle and manual collision detection...realizing that this wasn't cutting the mustard in terms of performance,
I made the jump to C++ and OpenGL and haven't looked back.
Along the way, I've come up with more than a few demos that are just collecting dust on my hard drive, so I figured I'd put some of
the more interesting ones up for people to see. A large number of these require pixel shader 2 capable graphics cards (my older demos have long since
died horrible deaths in hard drive crashes and I don't need to be inflicting register combiners on anyone these days), and some of them need SSE capable CPUs.
Contact: anankervis at gmail
Most of the demos include source code and a readme with instructions. The source code was originally never meant to see the light of day, but it shouldn't be
too hard to follow - I figure it beats the heck out of just a binary, at least. If you find something scary in the code for the older demos, I warned
you!
January 21, 2013 - updated Voxel Cone Tracing Global Illumination demo December 17, 2012 - updated recent demos to include full source, Windows 8 fix November 24, 2012 - added Voxel Cone Tracing Global Illumination demo September 30, 2012 - added Pixel Dead Reckoning demo September 25, 2012 - added Linked List Antialiasing and Edge Resampling demo March 31, 2011 - added Super Frontier Wars December 9, 2010 - added Straight Aces, Gammon Trigger demo, Astronomo Rex video September 20, 2009 - added CUDA 3D Perlin Noise demo July 15, 2009 - embedded Felwyrld tech slides July 14, 2009 - added Air Master 3D, added Gammon Trigger video June 21, 2008 - fixed GS Painterly demo under NVIDIA 177 drivers April 10, 2008 - added code samples March 23, 2008 - added new Felwyrld screenshots, Gammon Trigger July 1, 2007 - added Geometry Shader Painterly Rendering demo June 30, 2007 - added DXT GPU Compression demo, Geometry Shader Tessellation demo October 23, 2006 - added Light Beams and GPU Clouds demo, Soft Particles demo August 30, 2006 - added GPU Ray-Triangle Intersection demo
An implementation of global illumination using voxel cone tracing, as described by Crassin et al. in Interactive Indirect Illumination Using Voxel Cone Tracing, with the Crytek Sponza model used for content.
This demo served both as a means to familiarize myself with voxel cone tracing and as a testbed for performance experiments with the voxel storage: plain 3D textures, real-time compressed 3D textures, and 3D textures aligned with the diffuse sample rays were tested. Sparse voxel octrees were not implemented due to time constraints, but would have been nice to have as a baseline reference. Compared to SVO in the context of voxel cone tracing (as opposed to ray casting, where SVO is a clear winner), 3D textures allow for easier filtering, direct lookups without evaluating the octree structure, and potentially better cache and memory bandwidth utilization (depending on cone size and scene density). The clear downside is the space requirement: 3D textures can't scale to larger scenes or smaller, more detailed voxels. There may be ways to work around this deficiency: sparse textures (GL_AMD_sparse_texture), compression, or hybrid schemes that mix tree structures with 3D textures.
Real-time DXT compression is fast enough to convert the 3D voxel textures on the fly, however API and driver limitations prevent this from being an effective choice due to the inability to write directly to tiled texture memory and CPU fallbacks that get triggered when trying to populate a compressed 3D texture from GPU memory. The potential memory bandwidth savings did not result in a performance advantage - it seems that the cone tracing is limited by texture filtering and ALU on the hardware tested. This approach may still be worth considering, simply for the compression alone.
Aligning the 3D textures with the diffuse sample cone directions simplifies cone tracing significantly (removing the need to manually filter the directionally-dependent voxels), allowing the diffuse cones to be traced much faster. Unfortunately, this also requires that the cone directions be uniform for all fragments, which in turn requires more cones to maintain quality, giving a net loss.
Requires OpenGL 4.3. Tested on an NVIDIA GeForce GTX 680 with the 310.54 beta drivers.
January 21, 2013 update: improved performance (voxel clear and mipmap steps), corrected voxel mipmap blending equation, changed camera start location
Extrapolates new frames using color, velocity, and transform data from a previous frame. Can be used to upsample framerate (for example, from something unsteady and below 120hz to a steady 120hz), reduce perceived lag/delay in the input-process-render-display loop, align images with different delays or transforms (such as video and a rendered overlay), and generate stereo pairs. Just like with networking dead reckoning, there will be artifacts when the new frame cannot be extrapolated entirely from previous values. This demo demonstrates frame rate upsampling, delay compensation, and anaglyph stereo pair generation.
A keyframe is generated every N frames with color, depth, and velocity, and is tagged with the transform matrix and current time used to render the frame. Subsequent frames skip rendering the scene, and instead jump directly to the extrapolation pass, providing a new time and transform for the keyframe to be projected to. The reprojection algorithm first scans over the keyframe, determining which points are static between the keyframe and extrapolated frame (outputting these directly). Points which are not static are appended to a buffer with color, position, and size information (for points undergoing rotation, size must be adjusted to maintain area and prevent gaps). A second pass scatters these points to their final location in the extrapolated image.
Delay compensation simply projects the keyframe ahead in time by some number of milliseconds, and makes use of the velocity buffer. Framerate upsampling does the same, but also applies a new transform matrix to take into account user input between keyframes. Stereo pair generation reprojects a left and right pair from the keyframe (a possible optimization would be to generate the left or right side as the keyframe, and only do one reprojection to create the other half of the pair). The black regions around objects are areas of disocclusion where there is not enough information to reconstruct the frame. These could be filled with some sort of hole-filling algorithm or with data from older keyframes.
Requires OpenGL 4.3. Tested on an NVIDIA GeForce GTX 680 with the 306.63 GL 4.3 beta drivers.
Renders the scene at a resolution lower than the screen resolution (1/10th x 1/10th resolution, in this case) and stores a linked list of fragments that touch each pixel, along with enough information about each fragment to reconstruct the edges of the triangle that produced it. A custom resolve shader then traverses the linked list of fragments for each pixel, and calculates coverage at a higher resolution. The end result combines the lower-resolution shading with full-resolution, antialiased edges (3x3 antialiasing in this example). Conservative rasterization of the scene geometry is used to ensure adequate coverage of fragments to produce the final, upscaled image.
Compared to image post-processing techniques such as FXAA and MLAA, the performance leaves a bit to be desired and the implementation is fairly intrusive (requiring modifications to fragment and geometry shaders to implement conservative rasterization and the linked list of fragment data). However, this technique does have the advantages of separating the shading resolution from the output geometry resolution (for shading-heavy applications, you could reduce the shading workload while still keeping sharp geometry edges), allows for higher quality antialiasing than MSAA or image post-processing AA, and provides a straightforward avenue to implement order-independent transparency using the existing linked list and resolve shader infrastructure.
Requires OpenGL 4.3. Tested on an NVIDIA GeForce GTX 680 with the 305.67 GL 4.3 beta drivers.
Air Master 3D is an arcade flying sim for the iPhone and iPod Touch. Click on the link above
to see the official website with screenshots and more details. Air Master 3D features
procedural terrain, a high speed OpenGL ES renderer, and a cross-platform code base that
allows simultaneous development on both the Win32 and iPhone platforms. The latest version
adds a completely new shader-based render code path for OpenGL ES 2.0 devices, enabling
effects such as fully-reflective, rippling water and dynamically lit volume clouds.
Super Frontier Wars
Super Frontier Wars merges the team-based multiplayer and jetpacks from Tribes
with the ability to quickly modify (construct, harvest, and destroy) the surrounding
environment to your advantage. The premise is a future where mining corporations battle
over the resources of alien planets, using re-purposed mining technology as weapons to
alter the environment. Rendering features include dynamically lit volume clouds, real-time
ambient occlusion, procedural trees, detail geometry (grass, etc.) generated via geometry
shader, and displacement mapping using parallax mapping, relief mapping, or tessellation
depending on hardware capabilities. The game is playable over the internet or against AI.
Gammon Trigger is a top-down space adventure game. On the rendering
side, it features normal-mapped, specular-mapped, environment-mapped, emissive-mapped, and
pretty much every-other-mapped sprites, which allows an otherwise 2D game to take
advantage of dynamic per-pixel lighting, image based lighting, and all sorts of other
fancy effects. Amazingly, it manages to run quite well even on most integrated graphics
chips that support fragment programs. There's plenty more besides graphics, though -
a randomly generated galaxy for each game; a dialogue system and snazzy UI; physics, AI,
and audio; a mission system - and anything else that comes up along the way.
Felwyrld is a prototype of a graphical MUD, and includes a client/server setup
designed for minimal bandwidth usage through means such as server prediction of clients'
extrapolated data state (updates are sent only when client-side extrapolated data will
become out of sync), client views, compression, and quantization. Server CPU load is kept
low through dynamic spawning around clients, and logins are handled through an encrypted
system.
The Felwyrld client features generated terrain (about 50km x 50km, requiring relative
coordinates for accurate rendering) with disk paging, procedural trees with wind animation,
and sky rendering with day/night cycles.
The terrain and all models are fogged using the sky color (including sunset variations) and lit with the sun color and ambient directional
sky color. Trees are procedurally generated using a method similar to the one described in 'Creation and Rendering of Realistic Trees' by Weber and
Penn. Vertex programs are used to make tree branches bend and leaves twist based on wind direction. Sky rendering is performed similar to bump
mapping and allows for overly dramatic sunsets, cloud linings, and sun bleed-through based on cloud depth. Occlusion queries are used extensively
in conjunction with a quad tree to cull as much as possible against the terrain and other occluders. (Update: the newest Felwyrld screenshots display
improved terrain texturing and lighting, high quality dynamically lit volume clouds, dense grass and undergrowth, and the engine is now using deferred rendering.)
Renders a scene into color, position, and normal textures, and then outputs the scene as a large number of brush strokes covering the screen using the geometry shader. Brush strokes can be applied from arbitrary triangle meshes or screen-space meshes, and adjust size and quantity to fill the source triangles. Strokes rotate to follow the curvature of objects in the scene, and are placed using random numbers generated in the geometry shader. Requires a shader model 4 card.
Implements DXT1 texture compression entirely on the GPU for shader model 3 cards. Lookup tables are used to get FP16 values containing the correct bit values needed for compression. Pixel buffer objects are then used to perform a GPU copy from the compression results into a compressed texture. Useful for when the results of a render to texture are reused multiple times before being rendered again, or when compressing a large number of textures.
Implements 3D Perlin Noise on the GPU in a CUDA kernel. Two samples of 3D noise are taken per vertex and used to animate a 1024x1024 heightfield. The resulting vertex buffer is filled directly by the CUDA kernel, avoiding the need for readbacks or copies.
Astronomo Rex is a side-scrolling shooter (or close enough) that was created for a game competition. It features per-pixel dynamically lit
particles, which makes for a pretty spectacular addition to otherwise boring particle-based smoke. Other effects are put to good use, such as screen
displacement and 3d textures that make models appear to burn through and disintegrate over time (with a little fragment program assistance).
Particles are already fill-intensive, and adding per-pixel lighting certainly doesn't help, but the game manages a good framerate on most graphics
cards and performance is easily scaled by changing screen resolution, as well as limiting the number of contributing lights on less capable cards.
Tessellates a heightfield based on distance to the viewer using the geometry shader, with smooth transitions between each tessellation level. The geometry shader isn't really meant for tessellation, but it works well when combined with transform feedback to store the results for later reuse. Requires a shader model 4 card.
Finds the closest triangle that intersects a ray, using the GPU. A GLSL shader is used to calculate the ray-triangle intersection for each triangle
and then another shader is used to progressively downsample the set of results into a 1x1 texture containing the nearest intersection. This demo
probably requires a shader model 3 card.
Shows off bump mapping with shadow maps, making use of the stencil buffer and alpha testing to give a huge performance boost when looking
at shadowed pixels or pixels out of a light's range. This goes a long way towards having game levels with a large number of dynamic or static per-pixel
lights in the same room. Parallax mapping gives a nice sense of depth to the bump maps. The shadow maps use packed RGBA8 cube map textures, so
there's none of the headaches associated with float or depth textures. Stencil shadow volumes can also provide the same shadow masking acceleration,
though the light range optimization is much more difficult.
Uses the GPU to accelerate calculation of an ambient occlusion and bent normal texture map. Variance shadow mapping is used to determine visibility
from a particular direction, and a large number of passes from random angles are summed into a floating point texture. After all passes are
completed, the texture is downloaded to system memory and edges inside the texture map are bled outwards to avoid artifacts when mipmaps of the
texture are sampled. A source model with unique texture mapping works best. Variance shadow mapping seems to produce better results than standard
shadow mapping, despite precision issues, though this does have an impact on the program's ability to catch small details in the model. A small
change to the fragment program can also enable cosine weighting.
Allows editing of high resolution models by using 'push' and 'pull' tools. Geometry is manipulated and rendered quickly through spatial
partitioning and the construction of a topology map. Multiple tool shapes are supported by placing grayscale targa files in the brushes folder,
and tools can operate in either the normal push/pull mode or in a smooth/sharpen mode. Models with several million triangles can be edited easily,
provided your graphics card is up to the task and you have enough system memory. Rendering of several million triangle models in real time works
well enough, though it requires splitting the model into a large number of chunks to avoid issues with driver/hardware vertex buffer size
limitations. A nice extension to this demo would be to only re-render altered sections of the model when using the tools.
Implements motion blur by combining previous and current frame transform matrices. Vertices are stretched along the direction of motion, and the
per-pixel velocity is stored into a 2nd render target, which is then used to blur the color output. The blur takes into account both object and
camera movement because calculations are done using post-transform positions and screen space velocity.
Red is a completed top-down shooter. It will work on pretty much anything that supports OpenGL 1.1 and OpenAL, and features a variety of generated
content, namely space backdrops and planet sprites.
Creates several procedural planets using SSE-optimized 3d Perlin noise. Extra detail is generated as you get closer to the planets, and a
cube-based warped mesh is used to facilitate GPU-friendly level of detail and texture mapping. The planets even sport cheesy atmospheres that are
calculated inside a fragment program by intersecting the view vector against the near and far sides of the atmosphere shell and calculating the
distance between the two intersections, then passing that and the distance to the planet's bounding sphere (altitude) into a texture lookup. Two
versions exist, one using multithreading for detail generation and the other not, in case of performance issues on some processors.
Implements radial light beams from a point light source (in screen space). Light emitters are rendered into a texture, with the depth buffer primed with occluders that block light sources. An image processing step similar to a radial blur is performed and the image is then added on top of the scene. This approach could be modified to create shadows as well.
Clouds are generated using a GPU perlin noise implementation, which prevents stutters when creating the next cloud animation target.
Large particles typically clip noticably against other geometry. 'Soft particles' get around this problem by scaling alpha when the distance between the particle and the nearest solid object behind it is less than the particle's size. A texture with RGBA8 encoded depth is used for the scene depth lookup in the particle's fragment shader. Proper truncation of values is important when encoding floats into an RGBA8 texture, as the results of the encoding process could otherwise be rounded producing an incorrect depth when unpacked. This effect is from a Crytek paper about screen depth effects.
Straight Aces for the iPhone is a poker-themed puzzle game, created in collaboration with
designer Aaron Calta and artist Zack Wallig. The objective of the game is to recognize and
select high-value poker hands by choosing five adjacent cards from the randomly dealt game
grid. Bonus time is awarded for valuable hands, allowing for games that range from a minute
in length all the way up to 10-minute-long epic play sessions by chaining multiple high-value
hands in a row. High score and in-depth statistic tracking shows your improvement over time.
A simple neural network that performs image compression and visually displays the results of learning. Contains C++, C++ with x86 assembly, and C++
with x86 + SSE assembly implementations.
A 3d viewer for the Lorenz attractor, it lets you pause, adjust evaluation speed, and create multiple simultaneous instances. The differential
equations are adaptively evaluated to avoid creating excessively dense line segments while retaining detail where needed and maintaining a good
rendering speed on older graphics cards.
An ocean water simulation with FFT-animated heightfields, reflections, and fresnel effect. Began as an attempt to implement a projected grid, where
all vertices are equally spaced in screen space, but this resulted in a disturbing swimming effect on the height of vertices, and so it was dropped
in favor of a traditional heightfield grid.
Contains code samples from my current project. Many of the demos on this site were originally
written for my personal learning and haven't been touched in quite a while, so I've
uploaded some newer code demonstrating a few engine utilities and components from my
current project. The code is mostly portable thanks to cross-platform libraries, but does
have some non-essential dependencies on the Win32 API or MSVC (directory monitoring and debug
support). Included are:
A fixed-size string implementation that emulates STL string while avoiding allocations
A texture loader supporting PSD and TGA files, and runtime reloading of modified texture files
A shader loader for ARB_fragment_program and ARB_vertex_program which supports runtime reloading of modified shaders
An OpenAL wrapper which can load and play .ogg files, including music streamed from file
Win32/MSVC Debug support - console creation and output, memory allocation tracking
A simple allocator for reusing a block of scratch memory, good for avoiding allocations and fragmentation
A compiler for a stripped down version of C, written for a university compilers class. Outputs MIPS assembly.
hair tool
This was created as a styling tool to explore the possibility of character models with individual animated hairs, but mostly resulted in comical
Sean Connery look-alikes and power mullets.
An implementation of Mark Harris's 3d clouds. By using dynamic billboards, a large number of these can be rendered. They require a preprocessing
step any time the light source changes, which does not take excessively long, but a different approach borrowing from volume fog might be more
suitable for use with dynamic light sources. (Update: though not shown in this demo, using a simple software rasterizer instead of GPU
readback gives interactive performance with dynamic lights. See Felwyrld above for a large scale implementation of this.)
Jedi Knight modifications and reverse engineering:
Jedi Knight is an old LucasArts game released in 1997. It presents a number of unique challenges for the editors that still work with it, mostly due
to the software rendering-era portal engine that it is built around. For more information see jkhub.net.
The Sith2 Engine: this was a short-lived open source project that made quite a bit of progress towards cloning the original game engine. My
contributions included an optimized renderer, portal visibility determination, audio, input, and some collision detection/response coding.
Soft Shadows: a plugin for the game's level editor that creates static soft shadows by exploiting vertex lighting through cutting up of surfaces to
create areas of blending between lit and unlit vertices. Efficient culling of a light's view frustum against successive level portals makes this
a relatively quick operation on levels of all complexity.
Scripting Extensions and Rendering Improvements: by reverse engineering and debugging the game, I was able to produce a series of patches that
removed limitations in the game's rendering and resource management, as well as inserting an extendable DLL addon into the game's built-in scripting
engine that provides new scripting functions.
DirectX Wrappers: a pair of wrappers, one for DirectInput which adds mouse wheel support to the game, and another for DirectDraw which converts all
rendering to OpenGL. The DirectDraw -> OpenGL wrapper has to properly handle pre-transformed vertices that lack perspective texture correction (the game
engine supplies transformed vertices, but expects DirectX to handle perspective correction).