Operating system-level commands for tuning and diagnosis
While the previous Oracle database commands prove useful for managing and troubleshooting Oracle RAC and Oracle Clusterware issues, there is a critical need to use operating system commands with the Linux and Unix environment to diagnose and tune difficult issues that may arise in the Oracle RAC ecosystem. We will now provide you with some useful details on tools that can benefit you in these situations.
Strace
Linux provides a useful tracing tool for system calls that can be used for diagnosis and debugging issues with Oracle 11g RAC environments. As DBAs, we often have problems with our database environments when a runaway Oracle database process causes a performance problem in consuming excessive memory from the operating system. By understanding how to use a trace tool, such as strace for your Linux environment, you can pinpoint the root cause of such a problem and find a solution to the failure more effectively.
To display the options for strace, enter the strace at a Linux or Unix shell prompt session as follows:
[oracle@raclinux1 ~]$ strace usage: strace [-dffhiqrtttTvVxx] [-a column] [-e expr] ... [-o file] [-p pid] ... [-s strsize] [-u username] [-E var=val] ... [command [arg ...]] or: strace -c [-e expr] ... [-O overhead] [-S sortby] [-E var=val] ... [command [arg ...]] -c -- count time, calls, and errors for each syscall and report summary -f -- follow forks, -ff -- with output into separate files -F -- attempt to follow vforks, -h -- print help message -i -- print instruction pointer at time of syscall -q -- suppress messages about attaching, detaching, etc. -r -- print relative timestamp, -t -- absolute timestamp, -tt -- with usecs -T -- print time spent in each syscall, -V -- print version -v -- verbose mode: print unabbreviated argv, stat, termio[s], etc. args -x -- print non-ascii strings in hex, -xx -- print all strings in hex -a column -- alignment COLUMN for printing syscall results (default 40) -e expr -- a qualifying expression: option=[!]all or option=[!]val1[,val2]... options: trace, abbrev, verbose, raw, signal, read, or write -o file -- send trace output to FILE instead of stderr -O overhead -- set overhead for tracing syscalls to OVERHEAD usecs -p pid -- trace process with process id PID, may be repeated -s strsize -- limit length of print strings to STRSIZE chars (default 32) -S sortby -- sort syscall counts by: time, calls, name, nothing (default time) -u username -- run command as username handling setuid and/or setgid -E var=val -- put var=val in the environment for command -E var -- remove var from the environment for command [oracle@raclinux1 ~]$
Now, let's take a look at how to use STRACE to trace an Oracle 11g RAC process:
[oracle@raclinux1 ~]$ strace oracle execve("/u01/app/oracle/product/11.1.0/11g/bin/oracle", ["oracle"], [/* 41 vars */]) = 0 uname({sys="Linux", node="raclinux1.us.oracle.com", ...}) = 0 brk(0) = 0xf56b000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/u01/app/oracle/product/11.1.0/11g/lib/tls/i686/sse2/libskg
We can also use STRACE to examine system calls for an Oracle 11g database instance, in an Oracle 11g RAC environment, using the following example:
[oracle@raclinux1 bin]$ strace -c oracle % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 4.19 0.865232 41202 21 write 4.02 0.831352 207838 4 lstat64 4.02 0.831329 207832 4 _llseek 4.02 0.831327 207832 4 times 3.42 0.707903 707903 1 io_setup 3.42 0.707867 707867 1 _sysctl 3.42 0.707864 707864 1 socket 3.42 0.707843 707843 1 bind 3.42 0.707839 707839 1 set_thread_area 3.42 0.707837 707837 1 futex 3.42 0.707836 707836 1 dup 3.42 0.707836 707836 1 set_tid_address 3.42 0.707835 707835 1 gettid 3.42 0.707833 707833 1 fcntl64 3.42 0.707831 707831 1 getuid32 3.42 0.707831 707831 1 geteuid32 3.21 0.662962 82870 8 mprotect 3.21 0.662671 82834 8 gettimeofday 2.77 0.572299 26014 22 fstat64 2.71 0.560835 12192 46 old_mmap 2.66 0.549201 109840 5 shmdt 2.61 0.539206 107841 5 shmat 2.61 0.539194 107839 5 5 access 2.61 0.539158 107832 5 rt_sigprocmask 2.46 0.509131 2204 231 close 2.39 0.494136 41178 12 9 shmget 2.01 0.415677 207839 2 uname 1.57 0.325381 20336 16 16 mkdir 1.30 0.268287 5708 47 17 stat64 1.27 0.263489 909 290 61 open 1.19 0.246993 41166 6 getrlimit 1.19 0.246990 41165 6 rt_sigaction 0.87 0.179247 2938 61 read 0.60 0.123500 41167 3 brk 0.60 0.123500 41167 3 setrlimit 0.60 0.123494 41165 3 lseek 2.70 -1.1557161 2786 200 mmap2 1.52 -1.1313491 2986 105 munmap ------ ----------- ----------- --------- --------- ---------------- 100.00 20.669398 1134 108 total
Truss
Truss is a useful utility available for the Sun Solaris Unix operating system that can be used for tracing Oracle 11g background processes.
Truss provides the ability to walk through trace, step by step, for Oracle 11g internal processes, by using the –p
parameter to the truss
command, followed by the Solaris process id (PID). You can also trace child processes for Oracle background processes by using the truss
command with the –f
parameter. The parameters available for the truss command are listed next:
truss [-fcaeil] [-[tvx] [!]syscall...] [-s [!]signal...] [-m [!]fault...] [-[rw] [!]fd...] [-o outfile] command | -p pid
Some typical options for truss are as follows:
-o
: output to file-f
: trace child processes-c
: count system calls-p
: trace calls based on Unix PID
Now, let's take a look at how to trace the Oracle 11g database instance process by using the truss utility with Solaris:
solaris06$ truss -cp 7612 ^C Syscall seconds calls errors read .900 55 Write .500 1 times .100 93 yield .500 426
GDB
GDB is a system debugger available for the Linux platform that can be used to discover problems with Oracle system processes. The man page available in Unix and Linux platforms for GDB will provide you with all of the command-line syntax options. To display the syntax for GDB, enter the man GDB
command at a Unix or Linux shell prompt. Further details on GDB are also available online at http://linux.die.net/man/1/gdb.
GDB provides you with the ability to trace system memory usage for background processes, as well as the ability to dump memory values for these background processes. This is useful in troubleshooting difficult problems, such as when a memory leak occurs with an Oracle 11g database process. Before you can use the GDB utility to trace the Oracle 11g database instance process, you will first need to determine the specific Process ID (PID) in Linux by using the ps
and grep
Linux/Unix operating system commands. In the following example, we will use GDB to trace an open SQL*PLUS session within the Linux operating system.
After finding the process id (4107) for the SQL*PLUS session, we can begin the debug session with GDB:
[oracle@raclinux1 ~]$ gdb $ORACLE_HOME/bin/oracle 4107 start the new session for GDB GNU gdb Red Hat Linux (6.1post-1.20040607.62rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...(no debugging symbols found)...Using host libthread_db library "/lib/tls/libthread_db.so.1". Attaching to program: /u01/app/oracle/product/11.1.0/11g/bin/oracle, process 4107 The GDB dump starts by attaching to the PID we gave it earlier Reading symbols from /u01/app/oracle/product/11.1.0/11g/lib/libskgxp11.so...(no debugging symbols found)...done. Loaded symbols for /u01/app/oracle/product/11.1.0/11g/lib/libskgxp11.so Reading symbols from /lib/tls/librt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/tls/librt.so.1 GDB loads the shared libraries for Oracle to perform the memory dump trace Reading symbols from /u01/app/oracle/product/11.1.0/11g/lib/libnnz11.so...(no debugging symbols found)...done. Loaded symbols for /u01/app/oracle/product/11.1.0/11g/lib/libnnz11.so Reading symbols from /u01/app/oracle/product/11.1.0/11g/lib/libclsra11.so...done. Loaded symbols for /u01/app/oracle/product/11.1.0/11g/lib/libclsra11.so Reading symbols from /u01/app/oracle/product/11.1.0/11g/lib/libdbcfg11.so...done. Loaded symbols for /u01/app/oracle/product/11.1.0/11g/lib/libdbcfg11.so Reading symbols from /u01/app/oracle/product/11.1.0/11g/lib/libhasgen11.so...done. Loaded symbols for /u01/app/oracle/product/11.1.0/11g/lib/libhasgen11.so Reading symbols from /u01/app/oracle/product/11.1.0/11g/lib/libskgxn2.so...done.Loaded symbols for /u01/app/oracle/product/11.1.0/11g/lib/libskgxn2.so Reading symbols from /u01/app/oracle/product/11.1.0/11g/lib/libocr11.so...done. Loaded symbols for /u01/app/oracle/product/11.1.0/11g/lib/libocr11.so Reading symbols from /u01/app/oracle/product/11.1.0/11g/lib/libocrb11.so...done.Loaded symbols for /u01/app/oracle/product/11.1.0/11g/lib/libocrb11.so Reading symbols from /u01/app/oracle/product/11.1.0/11g/lib/libocrutl11.so...done. Loaded symbols for /u01/app/oracle/product/11.1.0/11g/lib/libocrutl11.so Reading symbols from /usr/lib/libaio.so.1...done. Loaded symbols for /usr/lib/libaio.so.1 Reading symbols from /lib/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/tls/libm.so.6...done. Loaded symbols for /lib/tls/libm.so.6 Reading symbols from /lib/tls/libpthread.so.0...done. (gdb) stepi we use the STEPI command for GDB to walk the dump through the memory [Switching to Thread -1208035648 (LWP 4107)] 0x0090a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 (gdb) call ksudss(10) We issue the CALL command to move through the memory stack $1 = 0 (gdb) detach we use the detach command for GDB to end our memory tracing session Detaching from program: /u01/app/oracle/product/11.1.0/11g/bin/oracle, process 4107
The next step is to attach the Oracle process id (PID) 4107 found earlier to the GDB debugger. Before we can make use of the GDB functions, we must first use the GDB attach command to link the debugger to the Oracle process that will be traced. This will enable us to drill down further into the tracing for Oracle 11g by using the attach command with GDB, as well as show the memory dump information:
(gdb) attach 4107 we start a new memory dump trace by using the attach command for GDB Attaching to program: /u01/app/oracle/product/11.1.0/11g/bin/oracle, process 4107 [New Thread -1208035648 (LWP 4107)] Symbols already loaded for /u01/app/oracle/product/11.1.0/11g/lib/libskgxp11.so Symbols already loaded for /lib/tls/librt.so.1 Symbols already loaded for /u01/app/oracle/product/11.1.0/11g/lib/libnnz11.so Symbols already loaded for /u01/app/oracle/product/11.1.0/11g/lib/libclsra11.so Symbols already loaded for /u01/app/oracle/product/11.1.0/11g/lib/libdbcfg11.so Symbols already loaded for /u01/app/oracle/product/11.1.0/11g/lib/libhasgen11.soSymbols already loaded for /u01/app/oracle/product/11.1.0/11g/lib/libskgxn2.so Symbols already loaded for /u01/app/oracle/product/11.1.0/11g/lib/libocr11.so Symbols already loaded for /u01/app/oracle/product/11.1.0/11g/lib/libocrb11.so Symbols already loaded for /u01/app/oracle/product/11.1.0/11g/lib/libocrutl11.soSymbols already loaded for /usr/lib/libaio.so.1 Symbols already loaded for /lib/libdl.so.2 Symbols already loaded for /lib/tls/libm.so.6 Symbols already loaded for /lib/tls/libpthread.so.0 Symbols already loaded for /lib/libnsl.so.1 Symbols already loaded for /lib/tls/libc.so.6 Symbols already loaded for /lib/ld-linux.so.2 Symbols already loaded for /usr/lib/libnuma.so Symbols already loaded for /lib/libnss_files.so.2 Symbols already loaded for /u01/app/oracle/product/11.1.0/11g/lib/libnque11.so [Switching to Thread -1208035648 (LWP 3107)] 0x0090a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
Now, to get the stack dump, we will need to issue the bt
command for our GDB session:
(gdb) bt We issue the BT command for GDB to obtain a listing of the memory stack dump #0 0x0090a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x003d14c3 in __read_nocancel () from /lib/tls/libpthread.so.0 #2 0x0e5b652e in sntpread () #3 0x0e5b64cf in ntpfprd () #4 0x0e59b3c7 in nsbasic_brc () #5 0x0e59e20e in nsbrecv () #6 0x0e5a2d20 in nioqrc () #7 0x0e39da65 in __PGOSF20_opikndf2 ()