Personal tools
You are here: Home Blog Use gdb with Zope (or Python)

Use gdb with Zope (or Python)

Filed under: , ,
When Python is hanging in a C extension, you might want to use gdb.

Today, Jean-Fran├žois Roche and I were debugging a spinning Zope instance.

We first tried to use Zope 2.13 signal handling feature.

When Zope catches a SIGUSR1 signal that is sent by issuing kill -10 myzope_pid, Zope dumps the stack trace of each thread to stdout.
(For the record, if you work with Zope2 < 2.13, you can install Products.signalstack to get the same feature).

If you do not have access to stdout, for instance when running Zope in background, you can use five.z2monitor
Jean-Fran├žois released it a few weeks ago. Its README will tell you more. 

Back to our problem.

Unfortunately, neither of the two solutions hereabove did help us : we did not get any strack trace.

This made us guess that we were stuck in a C extension : 
when looping in C extension code, the Python interpreter has no chance to run registered signal handlers or to switch to other Python threads.

To find where we were looping, we would need to use gdb (that we had never used before).
We searched for start zope with gdb and found an old article on : Debug a spinning Zope.

This article explain very well of to use gdb to debug this type of problem. It was very easy to follow step by step. Content is copied hereunder.
It allowed us to confirm that Zope was actually spinning in, iow in a C extension.

"Spinning" is when a request causes a running Zope to consume all available CPU indefinitely. This is usually caused by some kind of infinite loop or deadlock, and is painful to debug. Under Linux, at least, I've been able to use gdb to solve one spinning problem.

I've only tried this on a Mandrake 8.1 Linux installation, with a multi-threaded, zdaemoned Zope 2.5.1 running under Python 2.1.3. I have no experience debugging any other configuration this way.

  1. Attach to Zope with the Gnu Debugger

    Don't know how to use gdb? Neither do I, but I was able to muddle through.

    • Look in your var/ file and get the second pid listed there.
    • Run gdb with the name of your python executable. For example, with Python 2.1.3, I ran gdb python2.1.
    • At the (gdb) prompt, type attach pid, using the pid you found earlier.
    • If all goes well, you should have to page through several screens worth of "Reading symbols" spew. Hit return until it's done.
  2. Find the spinning thread
    • Type info threads at the (gdb) prompt.
    • Unless your Zope is very busy, most of the threads should be in sigsuspend(), poll(), or select().
      You should be able to spot the troublemaker here.
      Failing that, check top for the pid of the thread that's using all the CPU time, and look for (LWP) in the thread list.
    • Supposing our culprit is listed as 4 Thread 2051 (LWP 8236) ...,
      we now switch to thread 4 with the command thread 4.
  3. Get a traceback

    Now for the fun part, thanks to a post by Barry Warsaw.

    • Type the following at the prompt:
          call PyRun_SimpleString("import sys, traceback; sys.stderr=open('/tmp/tb','w',0); traceback.print_stack()")
    • Look in /tmp/tb for a complete Python traceback of the current call stack of the thread.
  4. Figure out where the loop/deadlock is.

    I can't give step-by-step instructions on this one. Try repeating step 3 several times; you should see a pattern.
    In my case, the thread was always in the __read() method of an NMB connection, and I discovered that it was being called with no timeout value.

This was a chance to go back in the past and being recalled of famous Evan Simpson, one of the authors of the Zope book.

I also want to thank the people that keep running, giving us a chance to keep access to precious information like copied hereabove.

Document Actions