The Unity Profiler

The Unity Profiler is built into the Unity Editor itself and provides an expedient way of narrowing down our search for performance bottlenecks by generating usage and statistics reports on a multitude of Unity3D subsystems during runtime. The different subsystems it can gather data for are listed as follows:

CPU consumption (per-major subsystem)
Basic and detailed rendering and GPU information
Runtime memory allocations and overall consumption
Audio source/data usage
Physics Engine (2D and 3D) usage
Network messaging and operation usage
Video playback usage
Basic and detailed user interface performance (new in Unity 2017)
Global Illumination statistics (new in Unity 2017)

There are generally two approaches to make use of a profiling tool: instrumentation and benchmarking (although, admittedly, the two terms are often used interchangeably).

Instrumentation typically means taking a close look into the inner workings of the application by observing the behavior of targeted function calls, where/how much memory is being allocated, and, generally getting an accurate picture of what is happening with the hope of finding the root cause of a problem. However, this is normally not an efficient way of starting to find performance problems because profiling of any application comes with a performance cost of its own.

When a Unity application is compiled in Development Mode (determined by the Development Build flag in the Build Settings menu), additional compiler flags are enabled causing the application to generate special events at runtime, which get logged and stored by the Profiler. Naturally, this will cause additional CPU and memory overhead at runtime due to all of the extra workload the application takes on. Even worse, if the application is being profiled through the Unity Editor, then even more CPU and memory will be spent, ensuring that the Editor updates its interface, renders additional windows (such as the Scene window), and handles background tasks. This profiling cost is not always negligible. In excessively large projects, it can sometimes cause wildly inconsistent behavior when the Profiler is enabled. In some cases, the inconsistency is significant enough to cause completely unexpected behavior due to changes in event timings and potential race conditions in asynchronous behavior. This is a necessary price we pay for a deep analysis of our code's behavior at runtime, and we should always be aware of its presence.

Before we get ahead of ourselves and start analyzing every line of code in our application, it would be wiser to perform a surface-level measurement of the application. We should gather some rudimentary data and perform test scenarios during a runtime session of our game while it runs on the target hardware; the test case could simply be a few seconds of Gameplay, playback of a cut scene, a partial play through of a level, and so on. The idea of this activity is to get a general feel for what the user might experience and keep watching for moments when performance becomes noticeably worse. Such problems may be severe enough to warrant further analysis.

This activity is commonly known as benchmarking, and the important metrics we're interested in are often the number of frames per-second (FPS) being rendered, overall memory consumption, how CPU activity behaves (looking for large spikes in activity), and sometimes CPU/GPU temperature. These are all relatively simple metrics to collect and can be used as a best first approach to performance analysis for one important reason; it will save us an enormous amount of time in the long run, since it ensures that we only spend our time investigating problems that users would notice.

We should dig deeper into instrumentation only after a benchmarking test indicates that further analysis is required. It is also very important to benchmark by simulating actual platform behavior as much as possible if we want a realistic data sample. As such, we should never accept benchmarking data that was generated through Editor Mode as representative of real gameplay, since Editor Mode comes with some additional overhead costs that might mislead us, or hide potential race conditions in a real application. Instead, we should hook the profiling tool into the application while it is running in a standalone format on the target hardware.

Many Unity developers are surprised to find that the Editor sometimes calculates the results of operations much faster than a standalone application does. This is particularly common when dealing with serialized data like audio files, Prefabs and Scriptable Objects. This is because the Editor will cache previously imported data and is able to access it much faster than a real application would.

Let's cover how to access the Unity Profiler and connect it to the target device so that we can start to make accurate benchmarking tests.

Users who are already familiar with connecting the Unity Profiler to their applications can skip to the section titled The Profiler window.

Launching the Profiler

We will begin with a brief tutorial on how to connect our game to the Unity Profiler within a variety of contexts:

Local instances of the application, either through the Editor or a standalone instance
Local instances of a WebGL application running in a browser
Remote instances of the application on an iOS device (for example, iPhone or iPad)
Remote instances of the application on an Android device (for example, an Android tablet or phone)
Profiling the Editor itself

We will briefly cover the requirements for setting up the Profiler in each of these contexts.

Editor or standalone instances

The only way to access the Profiler is to launch it through the Unity Editor and connect it to a running instance of our application. This is the case whether we're executing our game in Play Mode within the Editor, running a standalone application on the local or remote device, or we wish to profile the Editor itself.

To open the Profiler, navigate to Window | Profiler within the Editor:

If the Editor is already running in Play Mode, then we should see reporting data actively gathering in the Profiler window.

To profile standalone projects, ensure that the Development Build and Autoconnect Profiler flags are enabled when the application is built.

Choosing whether to profile an Editor-based instance (through the Editor's Play Mode) or a standalone instance (built and running separately from the Editor) can be achieved through the Connected Player option in the Profiler window:

Note that switching back to the Unity Editor while profiling a separate standalone project will halt all data collection since the application will not be updated while it is in the background.

Note that the Development Build option is named Use Development Mode and the Connected Player option is named Active Profiler in Unity 5.

Connecting to a WebGL instance

The Profiler can also be connected to an instance of the Unity WebGL Player. This can be achieved by ensuring that the Development Build and Autoconnect Profiler flags are enabled when the WebGL application is built and run from the Editor. The application will then be launched through the Operating System's default browser. This enables us to profile our web-based application in a more real-world scenario through the target browser and test multiple browser types for inconsistencies in behavior (although this requires us to keep changing the default browser).

Unfortunately, the Profiler connection can only be established when the application is first launched from the Editor. It currently (at least in early builds of Unity 2017) cannot be connected to a standalone WebGL instance already running in a browser. This limits the accuracy of benchmarking WebGL applications since there will be some Editor-based overhead, but it’s the only option we have available for the moment.

Remote connection to an iOS device

The Profiler can also be connected to an active instance of the application running remotely on an iOS device, such as an iPad or iPhone. This can be achieved through a shared Wi-Fi connection.

Note that remote connection to an iOS device is only possible when Unity (and hence the Profiler) is running on an Apple Mac device.

Follow the given steps to connect the Profiler to an iOS device:

Ensure that the Development Build and Autoconnect Profiler flags are enabled when the application is built.
Connect both the iOS device and Mac device to a local Wi-Fi network, or to an ad hoc Wi-Fi network.
Attach the iOS device to the Mac via the USB or Lightning cable.
Begin building the application with the Build & Run option as usual.
Open the Profiler window in the Unity Editor and select the device under Connected Player.

You should now see the iOS device's profiling data gathering in the Profiler window.

The Profiler uses ports from 54998 to 55511 to broadcast profiling data. Ensure that these ports are available for outbound traffic if there is a firewall on the network.

For troubleshooting problems with building iOS applications and connecting the Profiler to them, consult the following documentation page: https://docs.unity3d.com/Manual/TroubleShootingIPhone.html.

Remote connection to an Android device

There are two different methods for connecting an Android device to the Unity Profiler: either through a Wi-Fi connection or using the Android Debug Bridge (ADB) tool. Either of these approaches will work from an Apple Mac, or a Windows PC.

Perform the following steps to connect an Android device over a Wi-Fi connection:

Ensure that the Development Build and Autoconnect Profiler flags are enabled when the application is built.
Connect both the Android and desktop devices to a local Wi-Fi network.
Attach the Android device to the desktop device via the USB cable.
Begin building the application with the Build & Run option as usual.
Open the Profiler window in the Unity Editor and select the device under Connected Player.

The application should then be built and pushed to the Android device through the USB connection, and the Profiler should connect through the Wi-Fi connection. You should then see the Android device's profiling data gathering in the Profiler window.

The second option is to use ADB. This is a suite of debugging tools that comes bundled with the Android Software Development Kit (SDK). For ADB profiling, follow these steps:

Ensure that the Android SDK is installed by following Unity's guide for Android SDK/NDK setup: https://docs.unity3d.com/Manual/android-sdksetup.html.
Connect the Android device to your desktop machine via the USB cable.
Ensure that the Development Build and Autoconnect Profiler flags are enabled when the application is built.
Begin building the application with the Build & Run option as usual.
Open the Profiler window in the Unity Editor and select the device under Connected Player.

You should now see the Android device's profiling data gathering in the Profiler window.

For troubleshooting problems with building Android applications and connecting the Profiler to them, consult the following documentation page: https://docs.unity3d.com/Manual/TroubleShootingAndroid.html.

Editor profiling

We can profile the Editor itself. This is normally used when trying to profile the performance of custom Editor Scripts. This can be achieved by enabling the Profile Editor option in the Profiler window and configuring the Connected Player option to Editor, as shown in the following screenshot:

Note that both options must be configured this way if we want to profile the Editor. Setting Connected Player to Editor without enabling the Profile Editor button is the default case, where the Profiler is collecting data for our application while it is running in Play Mode.

The Profiler window

We will now cover the essential features of the Profiler as they can be found within the interface.

The Profiler window is split into four main sections:

Profiler Controls
Timeline View
Breakdown View Controls
Breakdown View

These sections are shown in the following screenshot:

We'll cover each of these sections in detail.

Profiler controls

The top bar in the previous screenshot contains multiple drop-down and toggle buttons we can use to affect what is being profiled and how deeply in the subsystem that data is gathered from. They are covered in the next subsections.

Add Profiler

By default, the Profiler will collect data for several different subsystems that cover the majority of the Unity's Engine subsystems in the Timeline View. These subsystems are organized into various Areas containing relevant data. The Add Profiler option can be used to add additional Areas or restore them if they were removed. Refer to the Timeline View section for a complete list of subsystems we can profile.

Record

Enabling the Record option makes the Profiler record profiling data. This will happen continuously while this option is enabled. Note that runtime data can only be recorded if the application is actively running. For an app running in the Editor, this means that Play Mode must be enabled and it should not be paused; alternatively, for a standalone app, it must be the active window. If Profile Editor is enabled, then the data that appears will be gathered for the Editor itself.

Deep Profile

Ordinary profiling will only record the time and memory allocations made by the common Unity callback methods, such as Awake(), Start(), Update(), and FixedUpdate(). Enabling the Deep Profile option re-compiles our scripts with much deeper level of instrumentation, allowing it to measure each and every invoked method. This causes a significantly greater instrumentation cost during runtime than normal, and uses substantially more memory since data is being collected for the entire callstack at runtime. As a consequence, Deep Profiling may not even be possible in large projects, as Unity may run out of memory before testing even begins or the application may run so slowly as to make the test pointless.

Note that toggling Deep Profile requires the entire project to be completely re-compiled before profiling can begin again, so it is best to avoid toggling the option back and forth between tests.

Since this option blindly measures the entire callstack, it would be unwise to keep it enabled during most of our profiling tests. This option is best reserved for when default profiling is not providing enough detail to figure out the root cause, or if we’re testing performance of a small test Scene, which we're using to isolate certain activities.

If Deep Profiling is required for larger projects and scenes, but the Deep Profile option is too much of a hindrance during runtime, then there are alternative approaches that can be used to perform more detailed profiling in the upcoming section titled Targeted profiling of code segments.

Profile Editor

The Profile Editor option enables Editor profiling, that is, gathering profiling data for the Unity Editor itself. This is useful in order to profile any custom Editor scripts we have developed.

Remember that Connected Player must also be set to the Editor option for Editor profiling to occur.

Connected Player

The Connected Player drop-down offers choices to select the target instance of Unity we want to profile. This can be the current Editor application, a local standalone instance of our application, or an instance of our application running on a remote device.

Clear

The Clear button clears all profiling data from the Timeline View.

Load

The Load button will open up a dialog window to load in any previously-saved Profiler data (from using the Save option).

Save

The Save button saves any Profiler data currently presented in the Timeline View to a file. Only 300 frames of data can be saved in this fashion at a time, and a new file must be manually created for any more data. This is typically sufficient for most situations, since when a performance spike occurs we then have about five to ten seconds to pause the application and save the data for future analysis (such as attaching it to a bug report) before it gets pushed off the left side of the Timeline View. Any saved Profiler data can be loaded into the Profiler for future examination using the Load option.

Frame Selection

The Frame Counter shows how many frames have been profiled and which frame is currently selected in the Timeline View. There are two buttons to move the currently selected frame forward or backward by one frame and a third button (the Current button) that resets the selected frame to the most recent frame and keeps that position. This will cause the Breakdown View to always show the profiling data for the current frame during runtime profiling and will display the word Current.

Timeline View

The Timeline View reveals profiling data that has been collected during runtime, organized into a series of Areas. Each Area focuses on profiling data for a different subsystem of the Unity Engine and each is split into two sections: a graphical representation of profiling data on the right, and a series of checkboxes to enable/disable different activities/data types on the left. These colored boxes can be toggled, which changes the visibility of the corresponding data types within the graphical section of the Timeline View.

When an Area is selected in the Timeline View, more detailed information for that subsystem will be revealed in the Breakdown View (beneath the Timeline View) for the currently selected frame. The kinds of information displayed in the Breakdown View varies depending on which Area is currently selected in the Timeline View.

Areas can be removed from the Timeline View by clicking on the X at the top-right corner of an Area. Recall that Areas can be restored to the Timeline View through the Add Profiler option in the Controls bar.

At any time, we can click at a location in the graphical part of the Timeline View to reveal information about a given frame. A large vertical white bar will appear (usually with some additional information on either side coinciding with the line graphs), showing us which frame is selected.

Depending on which Area is currently selected (determined by which Area is currently highlighted in blue), different information will be available in the Breakdown View, and different options will be available in the Breakdown View Controls. Changing the Area that is selected is as simple as clicking on the relevant box on the left-hand side of the Timeline View or on the graphical side, although clicking inside the graphical Area might also change which frame has been selected, so be careful clicking in the graphical Area if you wish to see Breakdown View information for the same frame.

Breakdown View Controls

Different drop-downs and toggle button options will appear within the Breakdown View Controls, depending on which Area is currently selected in the Timeline View. Different Areas offer different controls, and these options dictate what information is available, and how that information is presented in the Breakdown View.

Breakdown View

The information revealed in the Breakdown View will vary enormously based on which Area is currently selected and which Breakdown View Controls options are selected. For instance, some Areas offer different modes in a drop-down within the Breakdown View Controls, which can provide a simpler or detailed view of the information or even a graphical layout of the same information so that it can be parsed more easily.

Let's cover each Area and the different kinds of information and options available in the Breakdown View.

The CPU Usage Area

This Area shows data for all CPU usage and statistics. This Area is perhaps the most complex and useful since it covers a large number of Unity subsystems, such as MonoBehaviour Components, cameras, some rendering and physics processes, user interface (including the Editor's interface, if we're running through the Editor), audio processing, the Profiler itself, and more.

There are three different modes of displaying CPU usage data in the Breakdown View:

Hierarchy mode
Raw Hierarchy mode
Timeline mode

Hierarchy mode reveals most callstack invocations, while grouping similar data elements and global Unity function calls together for convenience. For instance, rendering delimiters, such as BeginGUI() and EndGUI() calls, are combined together in this mode. Hierarchy mode is helpful as an initial first step to determine which function calls cost the most CPU time to execute.

Raw Hierarchy mode is similar to Hierarchy mode, except it will separate global Unity function calls into separate entries rather than being combined into one bulk entry. This will tend to make the Breakdown View more difficult to read, but may be helpful if we're trying to count how many times a particular global method is invoked or determining whether one of these calls is costing more CPU/memory than expected. For example, each BeginGUI() and EndGUI() calls will be separated into different entries, making it more clear how many times each is being called compared to the Hierarchy mode.

Perhaps, the most useful mode for the CPU Usage Area is the Timeline mode option (not to be confused with the main Timeline View). This mode organizes CPU usage during the current frame by how the call stack expanded and contracted during processing.

Timeline mode organizes the Breakdown View vertically into different sections that represent different threads at runtime, such as Main Thread, Render Thread, and various background job threads called Unity Job System, used for loading activity such as scenes and other assets. The horizontal axis represents time, so wider blocks are consuming more CPU time than narrower blocks. The horizontal size also represents relative time, making it easy to compare how much time one function call took compared to another. The vertical axis represents the callstack, so deeper chains represent more calls in the callstack at that time.

Under Timeline mode, blocks at the top of the Breakdown View are functions (or technically, callbacks) called by the Unity Engine at runtime (such as Start(), Awake(), or Update() ), whereas blocks underneath them are functions that those functions had called into, which can include functions on other Components or regular C# objects.

The Timeline mode offers a very clean and organized way to determine which particular method in the callstack consumes the most time and how that processing time measures up against other methods being called during the same frame. This allows us to gauge the method that is the biggest cause of performance problems with minimal effort.

For example, let's assume that we are looking at a performance problem in the following screenshot. We can tell, with a quick glance, that there are three methods that are causing a problem, and they each consume similar amounts of processing time, due to their similar widths:

In the previous screenshot, we have exceeded our 16.667 millisecond budget with calls to three different MonoBehaviour Components. The good news is that we have three possible methods through which we can find performance improvements, which means lots of opportunities to find code that can be improved. The bad news is that increasing the performance of one method will only improve about one-third of the total processing for that frame. Hence, all three methods may need to be examined and optimized in order get back under budget.

It's a good idea to collapse the Unity Job System list when using Timeline mode, as it tends to obstruct the visibility of items shown in the Main Thread block, which is probably what we’re most interested in.

In general, the CPU Usage Area will be most useful for detecting issues that can be solved by solutions that will be explored in Chapter 2, Scripting Strategies.

The GPU Usage Area

The GPU Usage Area is similar to the CPU Usage Area, except that it shows method calls and processing time as it occurs on the GPU. Relevant Unity method calls in this Area will relate to cameras, drawing, opaque and transparent geometry, lighting and shadows, and so on.

The GPU Usage Area offers hierarchical information similar to the CPU Usage Area and estimates time spent calling into various rendering functions such as Camera.Render() (provided rendering actually occurs during the frame currently selected in the Timeline View).

The GPU Usage Area will be a useful tool to refer to when you go through Chapter 6, Dynamic Graphics.

The Rendering Area

The Rendering Area provides some generic rendering statistics that tend to focus on activities related to preparing the GPU for rendering, which is a set of activities that occur on the CPU (as opposed to the act of rendering, which is activity handled within the GPU and is detailed in the GPU Usage Area). The Breakdown View offers useful information, such as the number of SetPass calls (otherwise known as Draw Calls), the total number of batches used to render the Scene, the number of batches saved from Dynamic Batching and Static Batching and how they are being generated, as well as memory consumed for textures.

The Rendering Area also offers a button to open the Frame Debugger, which will be explored more in Chapter 3, The Benefits of Batching. The rest of this Area's information will prove useful when you go through Chapter 3, The Benefits of Batching, and Chapter 6, Dynamic Graphics.

The Memory Area

The Memory Area allows us to inspect memory usage of the application in the Breakdown View in the following two modes:

Simple mode
Detailed mode

Simple mode provides only a high-level overview of memory consumption of subsystems. This include Unity's low-level Engine, the Mono framework (total heap size that is being watched by the Garbage Collector), graphical assets, audio assets and buffers, and even memory used to store data collected by the Profiler.

Detailed mode shows memory consumption of individual GameObjects and MonoBehaviours for both their Native and Managed representations. It also has a column explaining the reason why an object may be consuming memory and when it might be deallocated.

The Garbage Collector is a common feature provided by the various languages Unity supports, which automatically releases any memory we have allocated to store data, but if it is handled poorly it has the potential to stall our application for brief moments. This topic, and many more related topics such as Native and Managed memory spaces, will be explored in Chapter 8, Masterful Memory Management.

Note that information only appears in Detailed mode through manual sampling by clicking on the Take Sample: <TargetName> button. This is the only way to gather information when using Detailed mode, since performing this kind of analysis automatically for each update would be prohibitively expensive.

The Breakdown View also provides a button labelled Gather Object References, which can gather deeper memory information about some objects.

The Memory Area will be a useful tool to use when we dive into the complexities of memory management, Native versus Managed memory, and the Garbage Collector in Chapter 8, Masterful Memory Management.

The Audio Area

The Audio Area grants an overview of audio statistics and can be used both to measure CPU usage from the audio system and total memory consumed by Audio Sources (both for those that are playing or paused) and Audio Clips.

The Breakdown View provides lots of useful insight into how the Audio System is operating and how various audio channels and groups are being used.

The Audio Area may come in handy as we explore art assets in Chapter 4, Kickstart Your Art.

Audio is often overlooked when it comes to performance optimization, but audio can become a surprisingly large source of bottlenecks if it is not managed properly due to the potential amount of hard disk access and CPU processing required. Don’t neglect it!

The Physics 3D and Physics 2D Areas

There are two different Physics Areas, one for 3D physics (Nvidia's PhysX) and another for the 2D physics system (Box2D). This Area provides various physics statistics, such as Rigidbody, Collider, and Contact counts.

The Breakdown View for each Physics Area provides some rudimentary insight into the subsystem’s inner workings, but we can get further insight by exploring the Physics Debugger, which we will introduce in Chapter 5, Faster Physics.

The Network Messages and Network Operations Areas

These two Areas provide information about Unity's Networking System, which was introduced during the Unity 5 release cycle. The information present will depend on whether the application is using the High-Level API (HLAPI) or Transport Layer API (TLAPI) provided by Unity. The HLAPI is a more easy-to-use system for managing Player and GameObject network synchronization automatically, whereas the TLAPI is a thin layer that operates just above the socket level, allowing Unity developers to conjure up their own networking system.

Optimizing network traffic is a subject that fills an entire book all by itself, where the right solution is typically very dependent on the particular needs of the application. This will not be a Unity-specific problem, and as such, the topic of network traffic optimization will not be explored in this book.

The Video Area

If our application happens to make use of Unity's VideoPlayer API, then we might find this Area useful for profiling video playback behavior.

Optimization of media playback is also a complex, non-Unity-specific topic and will not be explored in this book.

The UI and UI Details Areas

These Areas are new in Unity 2017 and provide insight into applications making use of Unity's built-in User Interface System. If we’re using a custom-built or 3rd-party User Interface System (such as NGUI), then these Areas will probably provide little benefit.

Poorly optimized user interface can often affect one or both of the CPU and GPU, so we will investigate some code optimization strategies for UI in Chapter 2, Scripting Strategies, and graphics-related approaches in Chapter 6, Dynamic Graphics.

The Global Illumination Area

The Global Illumination Area is another new Area in Unity 2017, and gives us a fantastic amount of detail into Unity's Global Illumination (GI) system. If our application makes use of GI, then we should refer to this Area to verify that it is performing properly.

This Area may become useful as we explore lighting and shadowing in Chapter 6, Dynamic Graphics.

You're reading from Unity 2017 Game Optimization Optimize all aspects of Unity performance

Table of Contents (10) Chapters