jvisualvm: A free Java memory and CPU profiler

I needed to profile a Java application, and since we had a JProfiler floating license, I used it. JProfiler works well, although it’s pricey. I was googling for other Java profiling tools, and [stackoverflow.com](http://stackoverflow.com/search?q=visualvm) made mention of [jvisualvm](https://visualvm.dev.java.net/), which comes bundled with JDK 6 release 7. I noticed that on my Fedora 10 box, the java-1.6.0-openjdk package includes jvisualvm. None of my coworkers had heard of it.

JProfiler introduces a significant performance penalty into the code it profiles, whereas other tools including jvisualvm and YourKit have a much lower impact. I’m going to give jvisualvm a try, once I get the target environment set up properly with the new JDK.

UPDATE: jvisualvm won’t profile remote applications like JProfiler can. jvisualvm is not quite as easy to use, and I haven’t figured out how to get stack traces on the CPU and memory hot spots. Overall, I like the tool.

UPDATE 2: jvisualvm can be configured to give a stack trace of memory hot spots. I’ve learned that performance between the Java 1.5 and 1.6 jvms can be very different. I’ve learned that I can run ‘kill -3 ‘ to print a stack trace of my running java processes. It’s helped me to narrow down bottlenecks in an application when the profiler wasn’t granular enough.

The future of Gnome Apps: JavaScript?

There’s an interesting article called “[Building desktop Linux
applications with JavaScript](http://arstechnica.com/articles/paedia/javascript-gtk-bindings.ars?bub)” By Ryan Paul, January 19, 2009.

I didn’t immediately understand the vision. Don’t we already have
Python, Ruby, Java, C++ and Perl bindings for Gnome? Yes, we do. So why
would we add JavaScript to the mix? Or any other scripting language?

The best way to think about it is Firefox plugins, like Greasemonkey,
that actually modify the web browser to give you a new experience.
Firefox extensions are written in JavaScript. JavaScript has hooks into
the application (Firefox) to manipulate it.

Gnome hackers want to do the same thing for Gnome. Not only could you
write Gnome application in JavaScript, you could extend a Gnome
application using JavaScript, no matter what language it was written in.

Another way to think about it is this: When most people think of Java,
they don’t think of the language. They think of the platform — the
libraries that are shipped with the language (networking, database
connectivity, etc.). The same is true for Python, Perl, and Ruby.

The goal is to us an embeddable language to tweak the Gnome platform,
not to use a platform (like Java, Python or Perl) to tweak Gnome. When
they embed a language into Gnome, application developers will use the
Gnome platform way of doing networking, instead of doing it the Java
library way. They will use the Gnome way of opening file picker, not the
Java library way. They will use the Gnome way of doing HTTP, not the
Python or the Java or the Perl way.

Using the 2.6.26 Linux Kernel Debugger (KGDB) with VMware

Reading the linux kernel documentation on KGDB wasn’t enough for me to be able
to use the newly built-in KGDB kernel debugger with version 2.6.26 or 2.6.27.
The breakthrough for me was reading [part of Jason Wessel’s
guide](http://www.kernel.org/pub/linux/kernel/people/jwessel/kgdb/ch03s03.html).

I have two machines:

* developer – where I run gdb
* target – where the kernel is being debugged, running in VMware

Configure VMware on the developer machine

* Power down the guest (target)
* Edit the VM guest settings
* Add a serial port
* Use named pipe `/tmp/com_1` (it’s really a UNIX domain socket)
* Configure it to “Yield CPU on poll” (under Advanced)
* Install ‘socat’, if not already installed

Configure and Compile the kernel on the developer or the target machine

– Get kernel 2.6.26 or newer
– `make menuconfig` # or make gconfig
– Under Kernel Hacking:
– enable KGDB
– enable the Magic SysRq key
– enable “Compile the kernel with debug info”
– Build kernel: `make`

Configure target

– Enable Magic SysRq key on target:
– Edit /etc/sysctl.conf and set `kernel.sysrq = 1`
– or run `sysctl -w kernel.sysrq=1` # this doesn’t survive a reboot
– Install developer kernel
– On the developer machine:
`rsync -av –exclude .git ./ root@target.host.name:/mnt/work/linux-2.6.26`
– On the target, a RedHat based system:
`make install`
`make modules_install`
– Edit /boot/grub/grub.conf and set `timeout=15`
– Boot into the newly installed kernel

Start debugging

– On target:
`echo ttyS0 > /sys/module/kgdboc/parameters/kgdboc`
– On developer:
`socat -d -d /tmp/com_1 PTY:` # notice what pty is allocated — /dev/pts/1 in my case
`gdb vmlinux`
`set remotebaud 115200`
`target remote /dev/pts/1`
– On target, do one of the following:
– `echo “g” > /proc/sysrq-trigger`
– Type ALT-SysRq-G
– Ready, get set, go! Go back to developer machine and use gdb to set
breakpoints, continue, etc.

I set up debugging because I wanted to understand the behavior of the kernel
when loading a module. It turns out that loading of the module failed because
sitting in a debugger delayed the execution, causing a timeout in module load
by the time I stepped through the code. Use of printk turned out to work
better.

DjangoCon videos on YouTube

I’m not a Django programmer, but for those who are, this may be useful. YouTube has videos of the inagural DjangoCon conference:
[http://www.youtube.com/results?search_query=djangocon&search=tag](http://www.youtube.com/results?search_query=djangocon&search=tag)

Web App Security Statistics

Perhaps this is a bit old, but it’s the first time I’ve seen it, and I thought it was interesting enough to share.

[http://www.webappsec.org/projects/statistics/](http://www.webappsec.org/projects/statistics/)

* more than 7% of analyzed sites can be compromised automatically
* Detailed manual and automated assessment using white and black box methods shows that probability to detect high severity vulnerability reaches 96.85%.
* The most prevalent vulnerabilities are Cross-Site Scripting, Information Leakage, SQL Injection and Predictable Resource Location

Git Book, yap

The Pragmatic Bookshelf is releasing a [book on using Git](http://www.pragprog.com/titles/tsgit/pragmatic-version-control-using-git) for version control.

Steven Walter released a new command-line front-end for git called [yap](http://lwn.net/Articles/297285/). It’s not only supposed to make it easier to work with git, but also with subversion repositories. It’s available from [http://repo.or.cz/w/yap.git](http://repo.or.cz/w/yap.git)

MySQL or PostgreSQL?

I’ve often wondered why people seem to prefer either MySQL or PostgreSQL. For the most part, I think it comes down to the following:

* Familiarity.
* Friends (a.k.a. support system) being more familiar with one over the other.
* Ease of getting started. Most web hosting providers support MySQL out-of-the box.
* Name recognition.
* Ease of support.

Here are some resources that could be useful for learning the pros and cons of each database:

* [What MySQL can learn from PostgreSQL](http://www.scribd.com/doc/2575733/The-future-of-MySQL-The-Project)
* [What can PostgreSQL learn from MySQL](http://www.postgresonline.com/journal/index.php?/archives/48-What-can-PostgreSQL-learn-from-MySQL.html) and the [accompanying presentation](http://www.commandprompt.com/files/mysql_learn.pdf)
* [MySQL quirks and limitations](http://use.perl.org/~Smylers/journal/34246)
* [Why PostgreSQL?](http://wiki.postgresql.org/wiki/Why_PostgreSQL_Instead_of_MySQL:_Comparing_Reliability_and_Speed_in_2007)

Effective forms of communication

Have you ever wondered what forms of communication are the most and the least
effective for software engineers? See Scott Ambler’s [“Models of Communication” diagram in his essay](http://www.agilemodeling.com/essays/communication.htm). Face-to-face is most effective, and paper is the least effective, with email, telephone and video conferencing falling in-between the two ends of the spectrum.

REST versus RPC

Have you considered the merits and applicability of RESTful web apps? Here are a few notes I’ve made.

There was quite a [discussion about RPC, REST, and message queuing](http://steve.vinoski.net/blog/2008/07/13/protocol-buffers-leaky-rpc) — they are not the same thing. Each one is needed in a different scenario. All are used in building distributed systems.

Wikipedia’s [explanation of REST](http://en.wikipedia.org/wiki/Representational_State_Transfer) is quite informative, especially their [examples](http://en.wikipedia.org/wiki/Representational_State_Transfer#Example) of RPC versus REST.

The poster “soabloke” says RPC “Promotes tightly coupled systems which are difficult to
scale and maintain. Other abstractions have been more successful in building
distributed systems. One such abstraction is message queueing where systems
communicate with each other by passing messages through a distributed queue.
REST is another completely different abstraction based around the concept of a
‘Resource’. Message queuing can be used to simulate RPC-type calls
(request/reply) and REST might commonly use a request/reply protocol (HTTP) but
they are fundamentally different from RPC as most people conceive it. ”

The [REST FAQ](http://rest.blueoxen.net/cgi-bin/wiki.pl?RestFaq) says, “Most applications that self-identify as using “RPC” do not conform to the REST. In particular,
most use a single URL to represent the end-point (dispatch point) instead of using a multitude of
URLs representing every interesting data object. Then they hide their data objects behind method
calls and parameters, making them unavailable to applications built of the Web. REST-based
services give addresses to every useful data object and use the resources themselves as the
targets for method calls (typically using HTTP methods)… REST is incompatible with
‘end-point’ RPC. Either you address data objects (REST) or you don’t.”

RPC: Remote Procedure Call assumes that people agree on what kinds of procedures they would like
to do. RPC is about algorithms, code, etc. that operate on data, rather than about the data
itself. Usually fast. Usually binary encoded. Okay for software designed and consumed by a
single vendor.

REST: All data is addressed using URLs, and is encoded using a standard MIME type. Data that is
made up of other data would simply have URLs pointing to the other data. Assumes that people
won’t agree on what they want to do with data, so they let people get the data, and act on it
independently, without agreeing on procedures.

Perl one liners for email analysis

I thought it’d be interesting to know what times of day people were most likely to send me email. My email is stored in mbox format (I used Thunderbird and mutt for email), so I wrote a perl one-liner to analyze it for me.

The first one-liner prints a histogram, in 80 columns, of activity per-hour of the day. The second prints it in a form suitable for import into a spreadsheet

Histogram:

perl -nle ‘$sum[$1]++ if m/^Date: .* (\d\d):\d\d:\d\d/; END {foreach (@sum) { $max = $_ if $_ > $max }; $div = $max/80; foreach (@sum) { print $i++ . ” ” . (“#” x ($_ / $div)) . ” ($_)”;}}’ /path/to/Inbox

0 #################################### (115)
1 ########################## (84)
2 ################### (62)
3 ################ (54)
4 ############ (40)
5 ######### (31)
6 ####### (23)
7 ######################## (79)
8 ####################################### (126)
9 ############################################### (152)
10 ######################################### (133)
11 ###################################### (124)
12 ############################################################### (202)
13 ############################################################## (200)
14 ############################################################ (192)
15 #################################################################### (218)
16 ######################################################################## (229)
17 ################################################################ (206)
18 ################################################## (160)
19 ############################### (101)
20 ##################################### (118)
21 ######################################## (129)
22 ######################################################### (183)
23 ######################################## (129)

Tabular data:

perl -nle ‘$sum[$1] += 1 if m/^Date: \w{3}, \d+ \w{3} \d{4} (\d\d):\d\d:\d\d/; END {foreach (@sum) { print $i++ . “\t” . $_;} }’ /path/to/Inbox

While I was at it, I wanted to know what the most common timezone offsets were. Again, I wrote two separate one-liners. One prints a histogram, and the other doesn’t.

Histogram:

perl -nle ‘$tz{$1} += 1 if m/^Date: .*([+-]\d{4})/; END {foreach (values %tz) {$max = $_ if $_ > $max }; $div = $max/80; foreach (sort(keys %tz)) { print “$_ ” . (“#” x ($tz{$_}/$div)) . ” ($tz{$_})”; }}’ /path/to/Inbox

Non-histogram:

perl -nle ‘$tz{$1} += 1 if m/^Date: .*([+-]\d{4})/; END {foreach (sort(keys %tz)) { print “$_ $tz{$_}”; }}’ /path/to/Inbox

I subscribe to various email lists, and each has different characteristics. I was surprised to find that my family email box usage pattern was fairly spread out around the clock, except that it drops off significantly during dinner and during the wee hours of the morning. Evening hours are the most active.

I’ve taken the timezone one-liner and modified it to tell me the most common months of the year, or the most common days of the week for email to be sent. For all my email boxes, analyzed over the last few years, email is most active on weekdays, and drops off on weekends.

Mon ############################################################### (5630)
Tue ##################################################################### (6129)
Wed ######################################################################## (6372)
Thu ##################################################################### (6155)
Fri ############################################################ (5329)
Sat ############################## (2675)
Sun ########################## (2368)

I tried translating those one-liners into Ruby, but it wasn’t as compact, and doing it as a one-liner in Java just isn’t going to happen.