Best approaches to performance analysis

Good coding practices and project asset management often make finding the root cause of a performance issue relatively simple, at which point the only real problem is figuring out how to improve the code. For instance, if the method only processes a single gigantic for loop, then it will be a pretty safe assumption that the problem is either with how many iterations the loop is performing, whether or not the loop is causing cache misses by reading memory in a non-sequential fashion, how much work is done in each iteration, or how much work it takes to prepare for the next iteration.

Of course, whether we're working individually or in a group setting, a lot of our code is not always written in the cleanest way possible, and we should expect to have to profile some poor coding work from time to time. Sometimes we are forced to implement a hacky solution for the sake of speed, and we don't always have the time to go back and refactor everything to keep up with our best coding practices. In fact, many code changes made in the name of performance optimization tend to appear very strange or arcane, often making our codebase more difficult to read. The common goal of software development is to make code that is clean, feature-rich, and fast. Achieving one of these is relatively easy, but the reality is that achieving two will cost significantly more time and effort, while achieving all three is a near-impossibility.

At its most basic level, performance optimization is just another form of problem-solving, and overlooking the obvious while problem-solving can be an expensive mistake. Our goal is to use benchmarking to observe our application looking for instances of problematic behavior, then use instrumentation to hunt through the code for clues about where the problem originates. Unfortunately, it's often very easy to get distracted by invalid data or jump to conclusions because we're being too impatient or missed a subtle detail. Many of us have run into occasions during software debugging where we could have found the root cause of the problem much faster if we had simply challenged and verified our earlier assumptions. Hunting down performance issues is no different.

A checklist of tasks would be helpful to keep us focused on the issue, and not waste time chasing so-called ghosts. Of course, every project is different, with its own unique challenges to overcome, but the following checklist is general enough that it should be able to apply to any Unity project:

Verifying that the target script is present in the Scene
Verifying that the script appears in the Scene the correct number of times
Verifying the correct order of events
Minimizing ongoing code changes
Minimizing internal distractions
Minimizing external distractions

Verifying script presence

Sometimes, there are things we expect to see, but don't. These are usually easy to spot because the human brain is very good at pattern recognition and spotting differences we didn't expect. Meanwhile, there are times where we assume that something has been happening, but it didn't. These are generally more difficult to notice, because we're often scanning for the first kind of problem, and we’re assuming that the things we don’t see are working as intended. In the context of Unity, one problem that manifests itself this way is verifying that the scripts we expect to be operating are actually present in the Scene.

Script presence can be quickly verified by typing the following into the Hierarchy window textbox:

t:&lt;monobehaviour name&gt;

For example, typing t:mytestmonobehaviour (note that it is not case-sensitive) into the Hierarchy textbox will show a shortlist of all GameObjects that currently have at least one MyTestMonoBehaviour script attached as a Component.

Note that this shortlist feature also includes any GameObjects with Components that derive from the given script name.

We should also double-check that the GameObjects they are attached to are
still enabled, since we may have disabled them during earlier testing since someone or something may have accidentally deactivated the object.

Verifying script count

If we’re looking at our Profiler data and note that a certain MonoBehaviour method is being executed more times than expected, or is taking longer than expected, we might want to double-check that it only occurs as many times in the Scene as we expect it to. It’s entirely possible that someone created the object more times than expected in the Scene file, or that we accidentally instantiated the object more than the expected number of times from code. If so, the problem could be due to conflicting or duplicated method invocations generating a performance bottleneck. We can verify the count using the same shortlist method used in the Best approaches to performance analysis section.

If we expected a specific number of Components to appear in the Scene, but the shortlist revealed more (or less!) than this, then it might be wise to write some initialization code that prevents this from ever happening again. We could also write some custom Editor helpers to display warnings to any level designers who might be making this mistake.

Preventing casual mistakes such as this is essential for good productivity, since experience tells us that if we don't explicitly disallow something, then someone, somewhere, at some point, for whatever reason, will do it anyway. This is likely to cost us a frustrating afternoon hunting down a problem that eventually turned out to be caused by human-error.

Verifying the order of events

Unity applications mostly operate as a series of callbacks from Native code to Managed code. This concept will be explained in more detail in Chapter 8, Masterful Memory Management, but for the sake of a brief summary, Unity's main thread doesn't operate like a simple console application would. In such applications, code would be executed with some obvious starting point (usually a main() function), and we would then have direct control of the game engine, where we will initialize major subsystems, then the game runs in a big while-loop (often called the Game Loop) that checks for user input, updates the game, renders the current Scene, and repeats. This loop only exits once the player chooses to quit the game.

Instead, Unity handles the Game Loop for us, and we expect callbacks such as Awake(), Start(), Update(), and FixedUpdate() to be called at specific moments. The big difference is that we don't have fine-grained control over the order in which events of the same type are called. When a new Scene is loaded (whether it's the first Scene of the game, or a later Scene), every MonoBehaviour Component's Awake() callback gets called, but there's no way of telling which order this will happen in.

So, if we one set of objects that configure some data in their Awake() callback, and then another set of objects do something with that configured data in their own Awake() callback, some reorganization or recreation of Scene objects or a random change in the codebase or compilation process (it's unclear what exactly causes it) may cause the order of these Awake() calls to change, and then the dependent objects will probably try to do things with data that wasn't initialized how we expected. The same goes for all other callbacks provided by MonoBehaviour Components, such as Start() and Update().

There’s no way or telling the order in which the same type of callback gets called among a group of MonoBehaviour Components, so we should be very careful not to assume that object callbacks are happening in a specific order. In fact, its essential practice to never write code in a way that assumes these callbacks will need to be called in a certain order because it could break at any time.

A better place to handle late-stage initialization is in a MonoBehaviour Component's Start() callback, which is always called after every object’s Awake() is called and just before its first Update(). Late-stage updates can also be done in the LateUpdate() callback.

If we’re having trouble determining the actual order of events, then this is best handled by either step-through debugging with an IDE (MonoDevelop, Visual Studio, and so on) or by printing simple logging statements with Debug.Log().

Be warned that Unity's logger is notoriously expensive. Logging is unlikely to change the order of the callbacks, but it can cause some unwanted spikes in performance if used too aggressively. Be smart and do targeted logging only on the most relevant parts of the codebase.

Coroutines are typically used to script some sequence of events, and when they’re triggered will depend on what yield types are being used. The most difficult and unpredictable type to debug is perhaps the WaitForSeconds yield type. The Unity Engine is non-deterministic, meaning that you'll get a slightly different behavior from one session and the next, even on the same hardware. For example, you might get 60 updates called during the first second of application runtime during one session, 59 in the next, and 62 in the one after that. In another session, you might get 61 updates in the first second, then 60, followed by 59.

A variable number of Update() callbacks will be called between when the Coroutine starts and when it ends, and so if the Coroutine depends on something’s Update() being called a specific number of times, we will run into problems. It’s best to keep Coroutine behavior dead-simple and dependency-free of other behavior once it begins. Breaking this rule may be tempting, but it's essentially guaranteed that some future change is going to interact with the Coroutine in an unexpected way leading to a long, painful debugging session of a game-breaking bug that’s very hard to reproduce.

Minimizing ongoing code changes

Making code changes to the application in order to hunt down performance issues is best done carefully, as the changes are easy to forget as time wears on. Adding debug logging statements to our code can be tempting, but remember that it costs us time to introduce these calls, recompile our code, and remove these calls once our analysis is complete. In addition, if we forget to remove them, then they can cost unnecessary runtime overhead in the final build since Unity's debug Console window logging can be prohibitively expensive in both CPU and memory.

A good way to combat this problem is to add a flag or comment everywhere we made a change with our name so that it's easy to find and remove it later. Hopefully we're also wise enough to use a source-control tool for our codebase making it easy to differentiate the contents of any modified files and revert them to their original state. This is an excellent way to ensure that unnecessary changes don't make it into the final version. Of course, this is by no means a guaranteed solution if we also applied a fix at the same time and didn’t double-check all of our modified files before committing the change.

Making use of breakpoints during runtime debugging is the preferred approach, as we can trace the full callstack, variable data, and conditional code paths (for example, if-else blocks), without risking any code changes or wasting time on recompilation. Of course, this is not always an option if, for example, we're trying to figure out what causes something strange to happen in one out of a thousand frames. In this case, it's better to determine a threshold value to look for and add an if statement with a breakpoint inside, which will be triggered when the value has exceeded the threshold.

Minimizing internal distractions

The Unity Editor has its own little quirks and nuances, which can sometimes make it confusing to debug some kinds of problems.

Firstly, if a single frame takes a long time to process, such that our game noticeably freezes, then the Profiler may not be capable of picking up the results and recording them in the Profiler window. This can be especially annoying if we wish to catch data during application/Scene initialization. The upcoming section, Custom CPU profiling, will offer some alternatives to explore to solve this problem.

One common mistake (that I have admittedly fallen victim to multiple times during the writing of this book) is that if we are trying to initiate a test with a keystroke and have the Profiler open, we should not forget to click back into the Editor's Game window before triggering the keystroke. If the Profiler is the most recently clicked window, then the Editor will send keystroke events to that, instead of the runtime application, and hence no GameObject will catch the event for that keystroke. This can also apply to the GameView for rendering tasks and even Coroutines using the WaitForEndOfFrame yield type. If the Game window is not visible and active in the Editor, then nothing is being rendered to that view, and therefore, no events that rely on the Game window rendering will be triggered. Be warned!

Vertical Sync (otherwise known as VSync) is used to match the application's frame rate to the frame rate of the device it is being displayed to, for example, a monitor may run at 60 Hertz (60 cycles per-second), and if a rendering loop in our game is running faster than this then it will sit and wait until that time has elapsed before outputting the rendered frame. This feature reduces screen-tearing which occurs when a new image is pushed to the monitor before the previous image was finished, and for a brief moment part of the new image overlaps the old image.

Executing the Profiler with VSync enabled will probably generate a lot of noisy spikes in the CPU Usage Area under the WaitForTargetFPS heading, as the application intentionally slows itself down to match the frame rate of the display. These spikes often appear very large in Editor Mode since the Editor is typically rendering to a very small window, which doesn’t take a lot of CPU or GPU work to render.

This will generate unnecessary clutter, making it harder to spot the real issue(s). We should ensure that we disable the VSync checkbox under the CPU Usage Area when we're on the lookout for CPU spikes during performance tests. We can disable the VSync feature entirely by navigating to Edit | Project Settings | Quality and then to the sub-page for the currently selected platform.

We should also ensure that a drop in performance isn't a direct result of a massive number of exceptions and error messages appearing in the Editor Console window. Unity's Debug.Log() and similar methods, such as Debug.LogError() and Debug.LogWarning() are notoriously expensive in terms of CPU usage and heap memory consumption, which can then cause garbage collection to occur and even more lost CPU cycles (refer to Chapter 8, Masterful Memory Management, for more information on these topics).

This overhead is usually unnoticeable to a human being looking at the project in Editor Mode, where most errors come from the compiler or misconfigured objects. However, they can be problematic when used during any kind of runtime process, especially during profiling, where we wish to observe how the game runs in the absence of external disruptions. For example, if we are missing an object reference that we were supposed to assign through the Editor and it is being used in an Update() callback, then a single MonoBehaviour could throw new exceptions every single update. This adds lots of unnecessary noise to our profiling data.

Note that we can hide different log level types with the buttons shown in the next screenshot. The extra logging still costs CPU and memory to execute, even though they are not being rendered, but it does allow us to filter out the junk we don't want. Although, it is often a good practice to keep all of these options enabled to verify that we're not missing anything important.

Minimizing external distractions

This one is simple but absolutely necessary. We should double-check that there are no background processes eating away CPU cycles or consuming vast swathes of memory. Being low on available memory will generally interfere with our testing, as it can cause more cache misses, hard-drive access for virtual memory page-file swapping, and generally slow responsiveness of the application. If our application is suddenly behaving significantly worse than what we expected, double-check the system's task manager (or equivalent) for any CPU/memory/hard disk activity, which might be causing problems.

Targeted profiling of code segments

If our performance problem isn't resolved by the checklist mentioned previously, then we probably have a real issue on our hands that demands further analysis. The Profiler window is effective at showing us a broad overview of performance; it can help us find specific frames to investigate and can quickly inform us which MonoBehaviour and/or method may be causing issues. We would then need to figure out whether the problem is reproducible, under what circumstances a performance bottleneck arises, and where exactly within the problematic code block the issue is originating from.

To accomplish these, we will need to perform some profiling of targeted sections of our code, and there are a handful of useful techniques we can employ for this task. For Unity projects, they essentially fit into two categories:

Controlling the Profiler from script code
Custom timing and logging methods

Note that the next section focuses on how to investigate Scripting bottlenecks through C# code. Detecting the source of bottlenecks in other engine subsystems will be discussed in their related chapters.

Profiler script control

The Profiler can be controlled in script code through the Profiler class. There are several useful methods in this class that we can explore within the Unity documentation, but the most important methods are the delimiter methods that activate and deactivate profiling at runtime. These can be accessed through the UnityEngine.Profiling.Profiler class through its BeginSample() and EndSample() methods.

Note that the delimiter methods, BeginSample() and EndSample(), are only compiled in development builds, and as such, they will not be compiled or executed in release builds where Development Mode is unchecked. This is commonly known as non-operation or no-op code.

The BeginSample() method has an overload that allows a custom name for the sample to appear in the CPU Usage Area's Hierarchy mode. For example, the following code will profile invocations of this method and make the data appear in the Breakdown View under a custom heading, as follows:

void DoSomethingCompletelyStupid() { 
  Profiler.BeginSample("My Profiler Sample");  
  List&lt;int&gt; listOfInts = new List&lt;int&gt;();  
  for(int i = 0; i &lt; 1000000; ++i) {    
    listOfInts.Add(i);  
  }
  Profiler.EndSample();
}

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files emailed directly to you.

We should expect that invoking this poorly designed method (which generates a List containing a million integers, and then does absolutely nothing with it) will cause a huge spike in CPU usage, chew up several megabytes of memory, and appear in the Profiler Breakdown View under the heading My Profiler Sample, as the following screenshot shows:

Custom CPU Profiling

The Profiler is just one tool at our disposal. Sometimes, we may want to perform customized profiling and logging of our code. Maybe we're not confident that the Unity Profiler is giving us the right answer, maybe we consider its overhead cost too great, or maybe we just like having complete control of every single aspect of our application. Whatever our motivations, knowing some techniques to perform an independent analysis of our code is a useful skill to have. It's unlikely we'll only be working with Unity for the entirety of our game development careers, after all.

Profiling tools are generally very complex, so it's unlikely we would be able to generate a comparable solution on our own within a reasonable time frame. When it comes to testing CPU usage, all we should really need is an accurate timing system, a fast, low-cost way of logging that information, and some piece of code to test against. It just so happens that the .NET library (or, technically, the Mono framework) comes with a Stopwatch class under the System.Diagnostics namespace. We can stop and start a Stopwatch object at any time, and we can easily acquire a measure of how much time has passed since the Stopwatch was started.

Unfortunately, this class is not perfectly accurate; it is accurate only to milliseconds, or tenths of a millisecond, at best. Counting high-precision real time with a CPU clock can be a surprisingly difficult task when we start to get into it; so, in order to avoid a detailed discussion of the topic, we should try to find a way for the Stopwatch class to satisfy our needs.

If precision is important, then one effective way to increase it is by running the same test multiple times. Assuming that the test code block is both easily repeatable and not exceptionally long, we should be able to run thousands, or even millions of tests within a reasonable time frame and then divide the total elapsed time by the number of tests we just performed to get a more accurate time for a single test.

Before we get obsessed with the topic of high precision, we should first ask ourselves if we even need it. Most games expect to run at 30 FPS or 60 FPS, which means that they only have around 33 milliseconds or 16 milliseconds, respectively, to compute everything for the entire frame. So, hypothetically, if we need to bring only the performance of a particular code block under ten milliseconds, then repeating the test thousands of times to get microsecond precision is too many orders of magnitude away from the target to be worthwhile.

The following is a class definition for a custom timer that uses a Stopwatch to count time for a given number of tests:

using System;
using System.Diagnostics;

public class CustomTimer : IDisposable {
  private string _timerName;
  private int _numTests;
  private Stopwatch _watch;

  // give the timer a name, and a count of the 
  // number of tests we're running
  public CustomTimer(string timerName, int numTests) {
    _timerName = timerName;
    _numTests = numTests;
    if (_numTests &lt;= 0) {
      _numTests = 1;
    }
    _watch = Stopwatch.StartNew();
  }

    // automatically called when the 'using()' block ends
    public void Dispose() {
    _watch.Stop();
    float ms = _watch.ElapsedMilliseconds;
    UnityEngine.Debug.Log(string.Format("{0} finished: {1:0.00} " + 
        "milliseconds total, {2:0.000000} milliseconds per-test " + 
        "for {3} tests", _timerName, ms, ms / _numTests, _numTests));
    }
}

Adding an underscore before member variable names is a common and useful way of distinguishing a class' member variables (also known as fields) from a method's arguments and local variables.

The following is an example of the CustomTimer class usage:

const int numTests = 1000;
using (new CustomTimer("My Test", numTests)) {
  for(int i = 0; i &lt; numTests; ++i) {
    TestFunction();
  }
} // the timer's Dispose() method is automatically called here

There are three things to note when using this approach. Firstly, we are only making an average of multiple method invocations. If processing time varies enormously between invocations, then that will not be well represented in the final average.

Secondly, if memory access is common, then repeatedly requesting the same blocks of memory will result in an artificially higher cache hit rate (where the CPU can find data in memory very quickly because it's accessed the same region recently), which will bring the average time down when compared to a typical invocation.

Thirdly, the effects of Just-In-Time (JIT) compilation will be effectively hidden for similarly artificial reasons, as it only affects the first invocation of the method. JIT compilation is a .NET feature that will be covered in more detail in Chapter 8, Masterful Memory Management.

The using block is typically used to safely ensure that unmanaged resources are properly destroyed when they go out of scope. When the using block ends, it will automatically invoke the object's Dispose() method to handle any cleanup operations. In order to achieve this, the object must implement the IDisposable interface, which forces it to define the Dispose() method.

However, the same language feature can be used to create a distinct code block, which creates a short-term object, which then automatically processes something useful when the code block ends, which is how it is being used in the preceding code block.

Note that the using block should not be confused with the using statement, which is used at the start of a script file to pull in additional namespaces. It's extremely ironic that the keyword for managing namespaces in C# has a naming conflict with another keyword.

As a result, the using block and the CustomTimer class give us a clean way of wrapping our test code in a way that makes it obvious when and where it is being used.

Another concern to worry about is application warm-up time. Unity has a significant startup cost when a Scene begins, given the amount of data that needs to be loaded from disk, the initialization of complex subsystems, such as the Physics and Rendering Systems, and the number of calls to various Awake() and Start() callbacks that need to be resolved before anything else can happen. This early overhead might only last a second, but that can have a significant effect on the results of our testing if the code is also executed during this early initialization period. This makes it crucial that if we want an accurate test, then any runtime testing should begin only after the application has reached a steady state.

Ideally, we would be able to execute the target code block in its own Scene after its initialization has completed. This is not always possible, so as a backup plan, we could wrap the target code block in an Input.GetKeyDown() check in order to assume control over when it is invoked. For example, the following code will execute our test method only when the spacebar is pressed:

if (Input.GetKeyDown(KeyCode.Space)) {
  const int numTests = 1000;
  using (new CustomTimer("Controlled Test", numTests)) {
    for(int i = 0; i &lt; numTests; ++i) {
      TestFunction();
    }
  }
}

As mentioned previously, Unity's Console window logging mechanism is prohibitively expensive. As a result, we should try not to use these logging methods in the middle of a profiling test (or during gameplay, for that matter). If we find ourselves absolutely needing detailed profiling data that prints out lots of individual messages (such as performing a timing test on a loop to figure out which iteration is costing more time than the rest), then it would be wiser to cache the logging data and print it all out at the end, as the CustomTimer class does. This will reduce runtime overhead, at the cost of some memory consumption. The alternative is that many milliseconds are lost to printing each Debug.Log() message in the middle of the test, which pollutes the results.

The CustomTimer class also makes use of string.Format(). This will be covered in more detail in Chapter 8, Masterful Memory Management, but the short explanation is that this method is used because generating custom string object using the + operator (for example, code such as Debug.Log("Test: " + output);) can result in a surprisingly large amount of memory allocations, which attracts the attention of the Garbage Collector. Doing otherwise would conflict with our goal of achieving accurate timing and analysis and should be avoided.