Contact Search RSS Feed

Filter: Yearly view
Filter (Chronologic, Yearly view)
2010>>
January(1)
February
March(1)
April(1)
May(1)
June
July
August
September
October
November
December

Smalltak meets DTrace

EDIT: the VM patches and the examples are available from Github now. If you make changes, please send me pull requests!

This post is an introduction to DTrace and a provider I implemented for the Smalltalk Squeak/Pharo VM.

What is DTrace and why is it interesting for Smalltalk?

DTrace is a powerful, kernel-based dynamic tracing framework. It was originally built for Solaris but is being added to Linux and is shipped with OSX since version 10.5. DTrace can provide very detailed information that is useful when you need to track down bugs or performance issues (e.g., syscalls, I/O events). Programs are dynamically instrumented -- meaning you can deliver fully optimized binaries and start a debugging session on a live production server.

DTrace allows one to add high-level probes, which give the capability of creating static instrumentation points anywhere in the VM. That means, for a high-level language like Smalltalk we can add probes for events such as method enter and method return. So, what does that buy us for Squeak/Pharo? This allows us to use the DTrace technology to trace Smalltalk code execution. DTrace allows unprecedented view across all layers of Smalltalk software. With DTrace we can simultaneously observe and correlate events from Smalltalk, the VM, and the OS.

The provider I’ve implemented currently provides the following events: method enter and return, object allocation, full and incremental GC begin and end, and process activation (that is, when the scheduler transfers control to a Smalltalk process). Similar providers exist for languages such as Java and Ruby.

A simple example

To wet your appetite, here is an example that illustrates the basic usage of DTrace for Squeak/Pharo. For this example, lets assume our application has a high GC rate so to start tracking down the problem we want to know if there are any classes that are extensively instantiated during a given program run. In the DTrace scripting language (a subset of C named D), we define the probes we are interested in and how to print and aggregate the traced data:

Squeak$target:::proc_transferto
{
  self->priority = arg3;
}

Squeak$target:::objectalloc
/ self->priority == 30 /
{
  @number[copyinstr(arg0)] = count();
  @size[copyinstr(arg0)] = sum(arg1);
}

dtrace:::END
{
   ... print number and size vars ...
}

The first line captures the firing of the proc_transferto probe, which is triggered whenever a Smalltalk process is activated. Through the variable arg3 we can grab the process’ priority, which we store for later use. With the next statement we tell DTrace that we are also interested in the probe objectalloc, which is fired each time the VM creates a new object. However, we only want the probe to fire when the current process priority is 30. Like this we restrict the triggered events to the processes in question (e.g., only events from Seaside processes, which run at priority 30, but excluding GUI activity).

The variables ‘number’ and ‘size’ are maps that associate each class name with the number of instantiations, respectively with the total sum of bytes allocated. The class name is fetched from arg0, the size of the newly instantiated object from arg1.

The pseudo probe END is run when the tracing session is terminated (that is, when you hit Ctrl-c). In this declaration we specify how to print the results of the aggregated data. The printout  shows the top 4 classes:

Number of instances created per class

CLASS           NUMBER
Array             8248
Rectangle      19670
Float              41521
Point              79272

Total memory allocated for instances of class (in bytes)
CLASS           SIZE
Array              162516
Rectangle      236040
Float              498252
Point              951264

These numbers are taken from tracing the execution of the statement ‘Browser open’ in Pharo. The most instantiated class is Point with 79.000 instances, which together consume 0.9MB (!) of memory. For the complete D source see the file objectalloc.d in the attached zip file (see below).

As a follow up question, we may want to know which methods generated the Point instances. With another small script we can quickly answer also that question.

Rectangle>>intersect:                                    2534
ImageFillStyle>>initialize                            2566
Form>>extent                                                 2792
Point>>asIntegerPoint                                  5102
SmallInteger>>adaptToPoint:andSend:     7480
Point>>-                                                          10271
Point>>+                                                         13369

The script remembers for each methodenter and methodreturn even the current class and method in the variable self->currentcaller. On each objectalloc event we then do a @methods[self->currentcaller] = count().

Quick start (for now only OS X)

1. Download the ZIP file:

2. Unzip and execute start.command (just double click). This should start up a Pharo image using the instrumented VM.

3. Open a terminal window and execute

sudo dtrace -s objectalloc.d -p `ps -ef | grep squeak | grep -v grep | awk '{ print $2 }'`

4. In the Pharo image evaluate

[ Browser open ] forkAt: 30

5. When the browser has opened (takes a while), quit the D script (Ctr-c). Its output is printed to the terminal.

How to build your own DTrace-instrumented VM

Follow my previous blog post that explains how to build a Squeak/Pharo VM from zero. Before generating the sources, file in the following changeset:

Also, you need to add the following header file into your ../platforms/unix/src/vm/

EDIT: the VM patches and the examples are available from Github now. If you make changes, please send me pull requests!

 

Currently available probes

methodenter
       arg0: name of class
       arg1: name of method
       arg2: primitive index (0 if not a primitive)
       arg3: successFlag of the previously run primitive (false if method is run due to a primitive failure; for non-primitive methods arg3 is always true)

methodreturn
       arg0: name of class
       arg1: name of method
       arg2: primitive index (0 if not a primitive)

objectalloc
      arg0: name of class
      arg1: size in bytes

gcbegin

gcend

igcbegin

igcend

proc_transferto
      arg0: the previous process' pid (hash of Process instance)
      arg1: the previous process' priority
      arg2: the new process' pid
      arg3: the new process' priority

Performance overhead

While the low level probes of DTrace do not add overhead to the programs (because of dynamic instrumentation), the ones I added to the VM (aka. User-land Statically Defined Tracing (USDT)) do slow down the execution. The overhead compared to a vanilla VM are:

Bytecodes/sec
Sends/sec

1.5%
5.3%

This is not a huge slowdown, but it is still measurable. There may also be room for tuning the C code I added to the VM.

Where to go from here?

There are some more probes that would be useful to add. One of these would be a custom probe that can be triggered via a primitive from Smalltalk code. Like this one could fire high-level events, for instance, Seaside-start-request and Seaside-end-request, which would allow one to associate or scope traces to specific Seaside requests. Another useful feature would be to be able to access the current Smalltalk stack. This would allow for sampling; similar to message tally.

DTrace by itself does not provide visualizations but since its output can be formatted as required (using printf) and piped into a file, one can feed the data to a visualization tool (e.g., print output in MSE format, import it into Moose, and create interactive visualizations with Mondrian).

The interesting use cases will emerge from debugging real-world problems. What makes DTrace a compelling technique is that it allows one to seamlessly combine data from different providers. For instance, you can now figure out which of your Smalltalk code triggers expensive I/O events because it hits the disk. Or you could add Firefox probes and see how control flows between Ajax and Smalltalk.

If you find this useful or if you have interesting scripts to share, please drop me a message.

Posted by Adrian Lienhard on May 10, 2010


Archive

Filter (Chronologic, Yearly view)
2010>>
January(1)
February
March(1)
April(1)
May(1)
June
July
August
September
October
November
December

Tweets

CMSBOX Powered by cmsbox