Unity3D Compute Shaders
I love the Unity3D Engine. I’ve been developing in the system since Unity 3.5 back in 2012, and something that I’ve learned recently about the engine is the capacity for GPU computation using Compute Shaders. Above, you are seeing a program that I wrote to shatter objects in real-time using the GPU with Compute Shaders.
Compute Shaders are simple to setup, and this tutorial should be accessible to anyone whom has used Unity for at least a year, or has worked with the Unity Shader System. However, Compute Shaders are not simple to use correctly. Identifying the right time, place, and format to ship data to the GPU for processing is an art and is far beyond the scope of this article.
What is a Compute Shader?
Well unlike a vert/frag shader for materials, image effects, etc., a Compute Shader is a body of code executed in parallel on the GPU that accomplishes a direct task quickly. It does this by utilizing the large volume of threads available on the GPU, for work on a single repeated task. Inside the Unity3D Engine, a Compute Shader will be run using HLSL (as it is easiest to port to GLSL afterwards). At the time of this article (Unity 5.3.4f1) Compute Shaders will run on a Windows machine with DirectX 11 and a Shader Model 5.0 GPU. Additionally, Compute Shaders work on any machine running OpenGL 4.3+ or OpenGL ES 3.1+ (Android), which currently means no OSX or iOS support. Modern consoles (PS4 & Xbox One) can run Compute Shaders on their platforms.
Setting Up Your Environment
Before we can properly begin learning about Compute Shaders, we need to first make sure that your Player settings for Rendering are established appropriately.
On a Windows/Mac/Linux system it is best to leave your rendering settings at the Unity defaults for Auto Graphics API selection:
On an Android system you will have to perform a more elaborate setup. You’ll have to unselect “Auto Graphics API” and have only “OpenGLES3” (or higher) in the list:
The First Compute Shader
Now that your environment is fully established, let’s begin working with a Compute Shader. Each Compute Shader has two parts, the. Compute shader code and the .CS marshalling class that goes along with it. The marshalling class is optional for every shader, but at least one script must exist to actually send the data to the graphics card and run the shader.
Let’s go ahead and create a Compute Shader by right clicking inside Assets and going Create -> Shader -> Compute Shader and naming it “MyFirstCompute”.
Currently Mono doesn’t support HLSL syntax highlighting so I use an HLSL syntax definition inside Notepad++ here, or you can define your own.
Open up MyFirstCompute and you will see the following code (I removed comments):
What Is All This?
Now that we have a .Compute shader what does all of it do? What are the parts for?
This is a kernel that defines a body of information and the function all as one group. You can have many kernels inside a single .Compute file.
This is a data type inside HLSL which means a Read-Write Texture2D templated for the type float4. This means you have a Texture2D and at each position there are 4 floats (16 bytes) accessible that you can read and write. For every RW type there is also a type without RW that is read-only. So in this case it would be “Texture2D<float4>”.
NumThreads is where the threads are defined for the group that the GPU will spawn for the shader. The threads that are spawned are accessed like an array of 1 to 3 dimensions. So in this case since we say numthreads(8, 8, 1) it defines a thread group of 64 threads accessed like an 2D array since the x and y dimensions are specified. Identifying how many threads to work with is complicated and relies heavily on your situation and platform.
This is the actual function associated with the kernel definition. The name of the kernel and function are semantic and can be whatever you choose.
Inside the signature of the function we have “id” which is a set of 3 unsigned integers (accessible with x, y, and z) defined by the GPU through SV_DispatchThreadID. SV_DispatchThreadID is the set of indices associated with the current thread inside the thread group as an array. So if I am the first thread inside a 2D thread group, my id data would be < 0, 0 > (x =0, y = 0) according to SV_DispatchThreadID. If you are feeling adventurous you can read more about what access you have to your indices inside the thread group here.
Sending Data & Running Shaders
So now that we have a rough understanding of how a Compute Shader operates, we can begin to send data to the GPU for processing. To do this we will need to first create a .CS script that will send and receive data to/from the GPU.
Go ahead and create a CS script named “MyComputeScript” and open it up. We are going to build the base skeleton of the shader handler script. We are going to define the initial Start() method inside the script, and our own method RunShader() that will be called by Start().
Once a GameObject has the MyComputeShader on it, and the compute shader has been properly assigned to the public “shader” field you can run the script. The code above is horrendous, but it is written as to avoid obfuscating the important details. Let’s break down what RunShader() is doing and how data really gets to the card.
The first task inside the method is to capture the index of the kernel inside the shader. It is important to perform a lookup for the kernel index since a single Compute file can have multiple kernels defined.
Unity Advice: Please, oh please don’t perform string operations like this. You are literally wasting time and resources by doing it like this. Instead create a public static class with public const string values you can reference. The only time you should perform string operations like this is inside public properties set within the Unity Editor, but even then it should be avoided.
Once the kernel is identified you can begin to send data to the GPU. We initialize an arbitrary RenderTexture with a width and height of 512 with a bit-depth of 24. This means each pixel is a float4 consisting of RGBA values. The most important part of the code is when we set “enableRandomWrite” to true which allows the GPU to modify the data we send directly.
Once the texture is created we can send it to the GPU using SetTexture which takes the kernel index, the name of the property to be set, and the data that will be sent there.
Dispatch is a bit more complex than the other methods. Inside dispatch you specify the kernel you are launching, then provide how many elements will be operated per thread inside the thread group. So in this case we say 512/8 which is 64, meaning that for every 1 thread on the GPU, we will perform an operation on it 64 times in both the x and y dimensions.
Additionally, here is how I would properly structure the code:
StructuredBuffers Are Your Friend
Now let’s get into why Compute Shaders are super awesome! The ability to marshal custom data from the CPU to the GPU! This is especially useful when trying to perform complex operations on the GPU that require data types of some form.
So, let’s talk about the code above, we have a new shader that has a kernel defined. Inside this kernel we define a struct that we can use as a custom type inside the kernel. We also have two different StructuredBuffer’s, a RW and non-RW version. Like with the Texture2D, the RW has the ability to read and write, whereas the other is read-only. In this case we have a StructuredBuffer that accepts a plane struct as the type as well as an RWStructuredBuffer that accepts a half3 as the type.
Briefly, a half is literally half of a float and is being used because the code inside the function of this kernel doesn’t require the precision of a full float. By using a half it is faster when sending data between the CPU and GPU since 16 bits are being sent instead of 32 bits.
Inside the kernel function you interact with the StructuredBuffer as you would any buffer or array and typically you map the number of threads you are using to the buffer’s size as well.
But How Do I Use These Magical Beasts?
Inside the MyComputeShader script we can modify the code inside RunShader() to say the following instead:
So now since we are working with half3 we can define a Vector3 (same as float3 on the GPU) for the data inside our output buffer. We can then send this data to the GPU with a buffer and get the result.
We do this by creating a ComputeBuffer object and assigning it a number of elements and the stride between each element. The stride is very important as it is the total number of bytes inside each element of the array. After that you can set the data of the buffer, and then set the buffer to the shader. In this process the data inside the CPU RAM will then be copied to the GPU RAM for processing. Dispatch is called as before.
The new twist is that in order to get your data you have to have the buffer copy the resulting data from itself to a new location in CPU RAM. This is done using the GetData method after dispatch is called.
Finally, once you are done receiving your data and you no longer need the buffer, call dispose in order for it to be properly cleaned up. Leaving it to die when the Garbage Collector comes by at the end of the method will result in a runtime exception from Unity.
This has been a basic tutorial to get you started with the Compute Shader system inside Unity3D. If you have any further questions feel free to contact me on my website here or you can reference the Unity Manual for more information. Remember that this is in HLSL, so the MSDN documentation will be useful to understand what you have, and the more advanced concepts involving GPU processing.