What can we profile?
Going deeper into profiling, it is very important to understand what we can actually profile. Measuring is the core of profiling, so let's take a detailed look at the things we can measure during a program's execution.
Execution time
The most basic of the numbers we can gather when profiling is the execution time. The execution time of the entire process or just of a particular portion of the code will shed some light on its own. If you have experience in the area your program is running (that is, you're a web developer and you're working on a web framework), you probably already know what it means for your system to take too much time. For instance, a simple web server might take up to 100 milliseconds when querying the database, rendering the response, and sending it back to the client. However, if the same piece of code starts to slow down and now it takes 60 seconds to do the same task, then you should start thinking about profiling. You also have to consider that numbers here are relative. Let's assume another process: a MapReduce job that is meant to process 2 TB of information stored on a set of text files takes 20 minutes. In this case, you might not consider it as a slow process, even when it takes considerably more time than the slow web server mentioned earlier.
To get this type of information, you don't really need a lot of profiling experience or even complex tools to get the numbers. Just add the required lines into your code and run the program.
For instance, the following code will calculate the Fibonnacci sequence for the number 30:
import datetime tstart = None tend = None def start_time(): global tstart tstart = datetime.datetime.now() def get_delta(): global tstart tend = datetime.datetime.now() return tend - tstart def fib(n): return n if n == 0 or n == 1 else fib(n-1) + fib(n-2) def fib_seq(n): seq = [ ] if n > 0: seq.extend(fib_seq(n-1)) seq.append(fib(n)) return seq start_time() print "About to calculate the fibonacci sequence for the number 30" delta1 = get_delta() start_time() seq = fib_seq(30) delta2 = get_delta() print "Now we print the numbers: " start_time() for n in seq: print n delta3 = get_delta() print "====== Profiling results =======" print "Time required to print a simple message: %(delta1)s" % locals() print "Time required to calculate fibonacci: %(delta2)s" % locals() print "Time required to iterate and print the numbers: %(delta3)s" % locals() print "====== ======="
Now, the code will produce the following output:
About to calculate the Fibonacci sequence for the number 30 Now we print the numbers: 0 1 1 2 3 5 8 13 21 #...more numbers 4181 6765 10946 17711 28657 46368 75025 121393 196418 317811 514229 832040 ====== Profiling results ======= Time required to print a simple message: 0:00:00.000030 Time required to calculate fibonacci: 0:00:00.642092 Time required to iterate and print the numbers: 0:00:00.000102
Based on the last three lines, we see the obvious results: the most expensive part of the code is the actual calculation of the Fibonacci sequence.
Tip
Downloading the example code
You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Where are the bottlenecks?
Once you've measured how much time your code needs to execute, you can profile it by paying special attention to the slow sections. These are the bottlenecks, and normally, they are related to one or a combination of the following reasons:
- Heavy I/O operations, such as reading and parsing big files, executing long-running database queries, calling external services (such as HTTP requests), and so on
- Unexpected memory leaks that start building up until there is no memory left for the rest of the program to execute properly
- Unoptimized code that gets executed frequently
- Intensive operations that are not cached when they could be
I/O-bound code (file reads/write, database queries, and so on) is usually harder to optimize, because that would imply changing the way the program is dealing with that I/O (normally using core functions from the language). Instead, when optimizing compute-bound code (like a function that is using a badly implemented algorithm), getting a performance improvement is easier (although not necessarily easy). This is because it just implies rewriting it.
A general indicator that you're near the end of a performance optimization process is when most of the bottlenecks left are due to I/O-bound code.