University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 5
- Hanming Zhang
- Tested on: Google Chrome 62.0 on Windows 10 Education, i7-3630QM @ 2.40GHz, 16GB, GTX 670MX 3072MB (Personal laptop)
This project requires a WebGL-capable browser with support for several extensions(listed below). You can check for support on WebGL Report:
- OES_texture_float
- OES_texture_float_linear
- OES_element_index_uint
- EXT_frag_depth
- WEBGL_depth_texture
- WEBGL_draw_buffer
![]() |
![]() |
---|---|
no post processing | bloom |
![]() |
![]() |
ramp shading | bloom + ramping shading |
- cluster data structure to keep track of how many lights are in each cluster and what their indices are
- Render the scene using only the lights that overlap a given cluster
- cluster part are the same as Clustered Forward+
- Store vertex attributes in g-buffer
- Read g-buffer in a shader to produce final output
- Two g-buffers are used(reconstruct norm using two-component normal and reconstruct position using screen space X, Y, and depth information)
- Bloom (additional pipeline stages include brightness filter, horizontal & vertical Gaussian blur and finally combine)
- Ramp shading
- Left mouse button to rotate Camera
- Right mouse button to move Camera
- Middle mouse button to zoom in/out
Before optimization, 4 g-buffers are used, which are albedo, normal, depth and position respectively and they are shown like below.
![]() |
![]() |
---|---|
color / albedo | normal |
![]() |
![]() |
depth | position |
After optimization, 2 g-buffers are used, which are albedo, view space depth(to determine fragment cluster depth index, because I slice clusters in view space), two component normal(X, Y) and NDC depth(to reconstruct position with screen space position). The structure is shown as below.
As we can see, when the size of g-buffers reduces from 4 to 2, we have around 16% performance increase. During the process of rendering, we only need to write information to 2 frames buffers, and also only need to extract information from these 2 buffers. The bandwidth burden is reduced.
-
Since forward will check every light in the scene in the fragment shader, so naturally, it takes the most time. Clusters works well, and we gain huge performance increase by using it, although on the CPU side, we need to loop through every light and assign their index to the influenced clusters.
About cluster slice method I use (all slices happens on the view space):
- slice view space depth(between near and far clips) in a natural logarithm way, which means cluster size will smaller(in depth) when it's near and larger when it's far.
- evenly slice view frustum in X and Y directly (which means evenly slice our screen)
finally, our view frustum should looks like this,
As a result, we only shade limited amount of lights, which influences the cluster each fragment belongs to in fragment shader. This saves a lot of rendering time. In terms of clustered deferred shading, since we only save visible(nearer to camera and not occluded) fragments' information(color, normal, depth) to g-buffers, we save the time to render those not visible fragments.
Basically, Bloom post processing effect take additional stages to do brightness filter, horizontal Gaussian blur, vertical Gaussian blur and finally combine it with origin no effect frame. But in our case, it does not take so long time to do all these things, since all these actions are very basic, and just reading and writing frame buffers again and again. Several results during these stages are shown below.
![]() |
![]() |
![]() |
---|---|---|
after brightness filter | after horizontal Gaussian Blur | after vertical Gaussian Blur |
Ramp shading is almost the same as no effect rendering in terms of performance, and we only do several extra steps to modulate the diffuse coefficient(lambert term) in fragment shader.
- OpenGL Bloom Tutorial by Philip Rideout
- OpenGL Bloom Effects by ThinMatrix
- Clustered Deferred and Forward Shading, Olsson et.al. 2012
- Three.js by @mrdoob and contributors
- stats.js by @mrdoob and contributors
- webgl-debug by Khronos Group Inc.
- glMatrix by @toji and contributors
- minimal-gltf-loader by @shrekshao