Purloined Primitives

Our method of rendering impostor spheres is very similar to our method of rendering mesh spheres. In both cases, we set uniforms that define the sphere's position and radius. We bind a material uniform buffer, then bind a VAO and execute a draw command. We do this for each sphere.

However, this seems rather wasteful for impostors. Our per-vertex data for the impostor is really the position and the radius. If we could somehow send this data 4 times, once for each square, then we could simply put all of our position and radius values in a buffer object and render every sphere in one draw call. Of course, we would also need to find a way to tell it which material to use.

We accomplish this task in the Geometry Impostor tutorial project. It looks exactly the same as before; it always draws impostors, using the depth-accurate shader.

Impostor Interleaving

To see how this works, we will start from the front of the rendering pipeline and follow the data. This begins with the buffer object and vertex array object we use to render.

Example 13.5. Impostor Geometry Creation

glBindBuffer(GL_ARRAY_BUFFER, g_imposterVBO);
glBufferData(GL_ARRAY_BUFFER, NUMBER_OF_SPHERES * 4 * sizeof(float), NULL, GL_STREAM_DRAW);

glGenVertexArrays(1, &g_imposterVAO);
glBindVertexArray(g_imposterVAO);
glEnableVertexAttribArray(0);
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 4 * sizeof(float), (void*)(0));
glEnableVertexAttribArray(1);
glVertexAttribPointer(1, 1, GL_FLOAT, GL_FALSE, 4 * sizeof(float), (void*)(12));

glBindVertexArray(0);
glBindBuffer(GL_ARRAY_BUFFER, 0);

This code introduces us to a new feature of glVertexAttribPointer. In all prior cases the fifth parameter was 0. Now it is 4 * sizeof(float). What does this parameter mean?

This parameter is the array's stride. It is the number of bytes from one value for this attribute to the next in the buffer. When this parameter is 0, that means that the actual stride is the size of the base type (GL_FLOAT in our case) times the number of components. When the stride is non-zero, it must be larger than that value.

What this means for our vertex data is that the first 3 floats represent attribute 0, and the next float represents attribute 1. The next 3 floats is attribute 0 of the next vertex, and the float after that is attribute 1 of that vertex. And so on.

Arranging attributes of the same vertex alongside one another is called interleaving. It is a very useful technique; indeed, for performance reasons, data should generally be interleaved where possible. One thing that it allows us to do is build our vertex data based on a struct:

struct VertexData
{
    glm::vec3 cameraPosition;
    float sphereRadius;
};

Our vertex array object perfectly describes the arrangement of data in an array of VertexData objects. So when we upload our positions and radii to the buffer object, we simply create an array of these structs, fill in the values, and upload them with glBufferData.

Misnamed and Maligned

So, our vertex data now consists of a position and a radius. But we need to draw four vertices, not one. How do we do that?

We could replicate each vertex data 4 times and use some simple gl_VertexID math in the vertex shader to figure out which corner we're using. Or we could get complicated and learn something new. That new thing is an entirely new programmatic shader stage: geometry shaders.

Our initial pipeline discussion ignored this shader stage, because it is an entirely optional part of the pipeline. If a program object does not contain a geometry shader, then OpenGL just does its normal stuff.

The most confusing thing about geometry shaders is that they do not shade geometry. Vertex shaders take a vertex as input and write a vertex as output. Fragment shader take a fragment as input and potentially writes a fragment as output. Geometry shaders take a primitive as input and write zero or more primitives as output. By all rights, they should be called primitive shaders.

In any case, geometry shaders are invoked just after the hardware that collects vertex shader outputs into a primitive, but before any clipping, transforming or rasterization happens. Geometry shaders get the values output from multiple vertex shaders, performs arbitrary computations on them, and outputs one or more sets of values to new primitives.

In our case, the logic begins with our drawing call:

glBindVertexArray(g_imposterVAO);
glDrawArrays(GL_POINTS, 0, NUMBER_OF_SPHERES);
glBindVertexArray(0);

This introduces a completely new primitive and primitive type: GL_POINTS. Recall that multiple primitives can have the same base type. GL_TRIANGLE_STRIP and GL_TRIANGLES are both separate primitives, but both generate triangles. GL_POINTS does not generate triangle primitives; it generates point primitives.

GL_POINTS interprets each individual vertex as a separate point primitive. There are no other forms of point primitives, because points only contain a single vertex worth of information.

The vertex shader is quite simple, but it does have some new things to show us:

Example 13.6. Vertex Shader for Points

#version 330

layout(location = 0) in vec3 cameraSpherePos;
layout(location = 1) in float sphereRadius;

out VertexData
{
    vec3 cameraSpherePos;
    float sphereRadius
} outData;

void main()
{
	outData.cameraSpherePos = cameraSpherePos;
    outData.sphereRadius = sphereRadius;
}

VertexData is not a struct definition, though it does look like one. It is an interface block definition. Uniform blocks are a kind of interface block, but inputs and outputs can also have interface blocks.

An interface block used for inputs and outputs is a way of collecting them into groups. One of the main uses for these is to separate namespaces of inputs and outputs using the interface name (outData, in this case). This allows us to use the same names for inputs as we do for their corresponding outputs. They do have other virtues, as we will soon see.

Do note that this vertex shader does not write to gl_Position. That is not necessary when a vertex shader is paired with a geometry shader.

Speaking of which, let's look at the global definitions of our geometry shader.

Example 13.7. Geometry Shader Definitions

#version 330
#extension GL_EXT_gpu_shader4 : enable

layout(std140) uniform;
layout(points) in;
layout(triangle_strip, max_vertices=4) out;

uniform Projection
{
    mat4 cameraToClipMatrix;
};

in VertexData
{
    vec3 cameraSpherePos;
    float sphereRadius;
} vert[];

out FragData
{
    flat vec3 cameraSpherePos;
    flat float sphereRadius;
    smooth vec2 mapping;
};

Note

The #extension line exists to fix a compiler bug for NVIDIA's OpenGL. It should not be necessary.

We see some new uses of the layout directive. The layout(points) in command is geometry shader-specific. It tells OpenGL that this geometry shader is intended to take point primitives. This is required; also, OpenGL will fail to render if you try to draw something other than GL_POINTS through this geometry shader.

Similarly, the output layout definition states that this geometry shader outputs triangle strips. The max_vertices directive states that we will write at most 4 vertices. There are implementation defined limits on how large max_vertices can be. Both of these declarations are required for geometry shaders.

Below the Projection uniform block, we have two interface blocks. The first one matches the definition from the vertex shader, with two exceptions. It has a different interface name. But that interface name also has an array qualifier on it.

Geometry shaders take a primitive. And a primitive is defined as some number of vertices in a particular order. The input interface blocks define what the input vertex data is, but there is more than one set of vertex data. Therefore, the interface blocks must be defined as arrays. Granted, in our case, it is an array of length 1, since point primitives have only one vertex. But this is still necessary even in that case.

We also have another output fragment block. This one matches the definition from the fragment shader, as we will see a bit later. It does not have an instance name. Also, note that several of the values use the flat qualifier. We could have just used smooth, since we're passing the same values for all of the triangles. However, it's more descriptive to use the flat qualifier for values that are not supposed to be interpolated. It might even save performance.

Here is the geometry shader code for computing one of the vertices of the output triangle strip:

Example 13.8. Geometry Shader Vertex Computation

//Bottom-left
mapping = vec2(-1.0, -1.0) * g_boxCorrection;
cameraSpherePos = vec3(vert[0].cameraSpherePos);
sphereRadius = vert[0].sphereRadius;
cameraCornerPos = vec4(vert[0].cameraSpherePos, 1.0);
cameraCornerPos.xy += vec2(-vert[0].sphereRadius, -vert[0].sphereRadius) * g_boxCorrection;
gl_Position = cameraToClipMatrix * cameraCornerPos;
gl_PrimitiveID = gl_PrimitiveIDIn;
EmitVertex();

This code is followed by three more of these, using different mapping and offset values for the different corners of the square. The cameraCornerPos is a local variable that is re-used as temporary storage.

To output a vertex, write to each of the output variables. In this case, we have the three from the output interface block, as well as the built-in variables gl_Position and gl_PrimitiveID (which we will discuss more in a bit). Then, call EmitVertex(); this causes all of the values in the output variables to be transformed into a vertex that is sent to the output primitive type. After calling this function, the contents of those outputs are undefined. So if you want to use the same value for multiple vertices, you have to store the value in a different variable or recompute it.

Note that clipping, face-culling, and all of that stuff happens after the geometry shader. This means that we must ensure that the order of our output positions will be correct given the current winding order.

gl_PrimitiveIDIn is a special input value. Much like gl_VertexID from the vertex shader, gl_PrimitiveIDIn represents the current primitive being processed by the geometry shader (once more reason for calling it a primitive shader). We write this to the built-in output gl_PrimitiveID, so that the fragment shader can use it to select which material to use.

And speaking of the fragment shader, it's time to have a look at that.

Example 13.9. Fragment Shader Changes

in FragData
{
    flat vec3 cameraSpherePos;
    flat float sphereRadius;
    smooth vec2 mapping;
};

out vec4 outputColor;

layout(std140) uniform;

struct MaterialEntry
{
    vec4 diffuseColor;
    vec4 specularColor;
    vec4 specularShininess;        //ATI Array Bug fix. Not really a vec4.
};

const int NUMBER_OF_SPHERES = 4;

uniform Material
{
    MaterialEntry material[NUMBER_OF_SPHERES];
} Mtl;

The input interface is just the mirror of the output from the geometry shader. What's more interesting is what happened to our material blocks.

In our original code, we had an array of uniform blocks stored in a single uniform buffer in C++. We bound specific portions of this material block when we wanted to render with a particular material. That will not work now that we are trying to render multiple spheres in a single draw call.

So, instead of having an array of uniform blocks, we have a uniform block that contains an array. We bind all of the materials to the shader, and let the shader pick which one it wants as needed. The source code to do this is pretty straightforward.

Note

Notice that the material specularShininess became a vec4 instead of a simple float. This is due to an unfortunate bug in ATI's OpenGL implementation.

As for how the material selection happens, that's simple. In our case, we use the primitive identifier. The gl_PrimitiveID value written from the vertex shader is used to index into the Mtl.material[] array.

Do note that uniform blocks have a maximum size that is hardware-dependent. If we wanted to have a large palette of materials, on the order of several thousand, then we may exceed this limit. At that point, we would need an entirely new way to handle this data. Once that we have not learned about yet.

Or we could just split it up into multiple draw calls instead of one.