What is profiling?
A program that hasn't been optimized will normally spend most of its CPU cycles in some particular subroutines. Profiling is the analysis of how the code behaves in relation to the resources it's using. For instance, profiling will tell you how much CPU time an instruction is using or how much memory the full program is consuming. It is achieved by modifying either the source code of the program or the binary executable form (when possible) to use something called as a profiler.
Normally, developers profile their programs when they need to either optimize their performance or when those programs are suffering from some kind of weird bug, which can normally be associated with memory leaks. In such cases, profiling can help them get an in-depth understanding of how their code is using the computer's resources (that is, how many times a certain function is being called).
A developer can use this information, along with a working knowledge of the source code, to find the program's bottlenecks and memory leaks. The developer can then fix whatever is wrong with the code.
There are two main methodologies for profiling software: event-based profiling and statistical profiling. When using these types of software, you should keep in mind that they both have pros and cons.
Event-based profiling
Not every programming language supports this type of profiling. Here are some programming languages that support event-based profiling:
- Java: The JVMTI (JVM Tools Interface) provides hooks for profilers to trap events such as calls, thread-related events, class loads and so on
- .NET: Just like with Java, the runtime provides events (http://en.wikibooks.org/wiki/Introduction_to_Software_Engineering/Testing/Profiling#Methods_of_data_gathering)
- Python: Using the
sys.setprofile
function, a developer can trap events such aspython_[call|return|exception]
orc_[call|return|exception]
Event-based profilers (also known as tracing profilers) work by gathering data on specific events during the execution of our program. These profilers generate a large amount of data. Basically, the more events they listen to, the more data they will gather. This makes them somewhat impractical to use, and they are not the first choice when starting to profile a program. However, they are a good last resort when other profiling methods aren't enough or just aren't specific enough. Consider the case where you'd want to profile all the return statements. This type of profiler would give you the granularity you'd need for this task, while others would simply not allow you to execute this task.
A simple example of an event-based profiler on Python could be the following code (we'll understand this topic better once we reach the upcoming chapters):
import sys def profiler(frame, event, arg): print 'PROFILER: %r %r' % (event, arg) sys.setprofile(profiler) #simple (and very ineficient) example of how to calculate the Fibonacci sequence for a number. def fib(n): if n == 0: return 0 elif n == 1: return 1 else: return fib(n-1) + fib(n-2) def fib_seq(n): seq = [ ] if n > 0: seq.extend(fib_seq(n-1)) seq.append(fib(n)) return seq print fib_seq(2)
The preceding code contributes to the following output:
PROFILER: 'call' None PROFILER: 'call' None PROFILER: 'call' None PROFILER: 'call' None PROFILER: 'return' 0 PROFILER: 'c_call' <built-in method append of list object at 0x7f570ca215f0> PROFILER: 'c_return' <built-in method append of list object at 0x7f570ca215f0> PROFILER: 'return' [0] PROFILER: 'c_call' <built-in method extend of list object at 0x7f570ca21bd8> PROFILER: 'c_return' <built-in method extend of list object at 0x7f570ca21bd8> PROFILER: 'call' None PROFILER: 'return' 1 PROFILER: 'c_call' <built-in method append of list object at 0x7f570ca21bd8> PROFILER: 'c_return' <built-in method append of list object at 0x7f570ca21bd8> PROFILER: 'return' [0, 1] PROFILER: 'c_call' <built-in method extend of list object at 0x7f570ca55bd8> PROFILER: 'c_return' <built-in method extend of list object at 0x7f570ca55bd8> PROFILER: 'call' None PROFILER: 'call' None PROFILER: 'return' 1 PROFILER: 'call' None PROFILER: 'return' 0 PROFILER: 'return' 1 PROFILER: 'c_call' <built-in method append of list object at 0x7f570ca55bd8> PROFILER: 'c_return' <built-in method append of list object at 0x7f570ca55bd8> PROFILER: 'return' [0, 1, 1] [0, 1, 1] PROFILER: 'return' None PROFILER: 'call' None PROFILER: 'c_call' <built-in method discard of set object at 0x7f570ca8a960> PROFILER: 'c_return' <built-in method discard of set object at 0x7f570ca8a960> PROFILER: 'return' None PROFILER: 'call' None PROFILER: 'c_call' <built-in method discard of set object at 0x7f570ca8f3f0> PROFILER: 'c_return' <built-in method discard of set object at 0x7f570ca8f3f0> PROFILER: 'return' None
As you can see, PROFILER
is called on every event. We can print/gather the information we deem relevant inside the PROFILER
function. The last line on the sample code shows that the simple execution of fib_seq(2)
generates a lot of output data. If we were dealing with a real-world program, this output would be several orders of magnitude bigger. This is why event-based profiling is normally the last option when it comes to profiling. There are other alternatives out there (as we'll see) that generate much less output, but, of course, have a lower accuracy rate.
Statistical profiling
Statistical profilers work by sampling the program counter at regular intervals. This in turn allows the developer to get an idea of how much time the target program is spending on each function. Since it works by sampling the PC, the resulting numbers will be a statistical approximation of reality instead of exact numbers. Still, it should be enough to get a glimpse of what the profiled program is doing and where the bottlenecks are.
Some advantages of this type of profiling are as follows:
- Less data to analyze: Since we're only sampling the program's execution instead of saving every little piece of data, the amount of information to analyze will be significantly smaller.
- Smaller profiling footprint: Due to the way the sampling is made (using OS interrupts), the target program suffers a smaller hit on its performance. Although the presence of the profiler is not 100 percent unnoticed, statistical profiling does less damage than the event-based one.
Here is an example of the output of OProfile (http://oprofile.sourceforge.net/news/), a Linux statistical profiler:
Function name,File name,Times Encountered,Percentage "func80000","statistical_profiling.c",30760,48.96% "func40000","statistical_profiling.c",17515,27.88% "func20000","static_functions.c",7141,11.37% "func10000","static_functions.c",3572,5.69% "func5000","static_functions.c",1787,2.84% "func2000","static_functions.c",768,1.22% "func1500","statistical_profiling.c",701,1.12% "func1000","static_functions.c",385,0.61% "func500","statistical_profiling.c",194,0.31%
Here is the output of profiling the same Fibonacci code from the preceding code using a statistical profiler for Python called statprof:
% cumulative self time seconds seconds name 100.00 0.01 0.01 B02088_01_03.py:11:fib 0.00 0.01 0.00 B02088_01_03.py:17:fib_seq 0.00 0.01 0.00 B02088_01_03.py:21:<module> --- Sample count: 1 Total time: 0.010000 seconds
As you can see, there is quite a difference between the output of both profilers for the same code.