Perl one liners for email analysis

I thought it’d be interesting to know what times of day people were most likely to send me email. My email is stored in mbox format (I used Thunderbird and mutt for email), so I wrote a perl one-liner to analyze it for me.

The first one-liner prints a histogram, in 80 columns, of activity per-hour of the day. The second prints it in a form suitable for import into a spreadsheet

Histogram:

perl -nle ‘$sum[$1]++ if m/^Date: .* (\d\d):\d\d:\d\d/; END {foreach (@sum) { $max = $_ if $_ > $max }; $div = $max/80; foreach (@sum) { print $i++ . ” ” . (“#” x ($_ / $div)) . ” ($_)”;}}’ /path/to/Inbox

0 #################################### (115)
1 ########################## (84)
2 ################### (62)
3 ################ (54)
4 ############ (40)
5 ######### (31)
6 ####### (23)
7 ######################## (79)
8 ####################################### (126)
9 ############################################### (152)
10 ######################################### (133)
11 ###################################### (124)
12 ############################################################### (202)
13 ############################################################## (200)
14 ############################################################ (192)
15 #################################################################### (218)
16 ######################################################################## (229)
17 ################################################################ (206)
18 ################################################## (160)
19 ############################### (101)
20 ##################################### (118)
21 ######################################## (129)
22 ######################################################### (183)
23 ######################################## (129)

Tabular data:

perl -nle ‘$sum[$1] += 1 if m/^Date: \w{3}, \d+ \w{3} \d{4} (\d\d):\d\d:\d\d/; END {foreach (@sum) { print $i++ . “\t” . $_;} }’ /path/to/Inbox

While I was at it, I wanted to know what the most common timezone offsets were. Again, I wrote two separate one-liners. One prints a histogram, and the other doesn’t.

Histogram:

perl -nle ‘$tz{$1} += 1 if m/^Date: .*([+-]\d{4})/; END {foreach (values %tz) {$max = $_ if $_ > $max }; $div = $max/80; foreach (sort(keys %tz)) { print “$_ ” . (“#” x ($tz{$_}/$div)) . ” ($tz{$_})”; }}’ /path/to/Inbox

Non-histogram:

perl -nle ‘$tz{$1} += 1 if m/^Date: .*([+-]\d{4})/; END {foreach (sort(keys %tz)) { print “$_ $tz{$_}”; }}’ /path/to/Inbox

I subscribe to various email lists, and each has different characteristics. I was surprised to find that my family email box usage pattern was fairly spread out around the clock, except that it drops off significantly during dinner and during the wee hours of the morning. Evening hours are the most active.

I’ve taken the timezone one-liner and modified it to tell me the most common months of the year, or the most common days of the week for email to be sent. For all my email boxes, analyzed over the last few years, email is most active on weekdays, and drops off on weekends.

Mon ############################################################### (5630)
Tue ##################################################################### (6129)
Wed ######################################################################## (6372)
Thu ##################################################################### (6155)
Fri ############################################################ (5329)
Sat ############################## (2675)
Sun ########################## (2368)

I tried translating those one-liners into Ruby, but it wasn’t as compact, and doing it as a one-liner in Java just isn’t going to happen.

Perl 5 to 6

Moritz Lenz has written a series of informative blog posts about Perl 6, for Perl 5 programmers. Here’s a bit of his introduction:

> Perl 6 is underdocumented. That’s no surprise, because (apart from the specification) writing a compiler for Perl 6 seems to be much more urgent than writing documentation that targets the user.

> Unfortunately that means that it’s not easy to learn Perl 6, and that you have to have a profound interest in Perl 6 to actually find the motivation to learn it from the specification, IRC channels or from the test suite.

> This project, which I’ll preliminary call “Perl 5 to 6” (in lack of a better name) attempts to fill that gap with a series of short articles.

[Read more…](http://perlgeek.de/blog-en/perl-5-to-6/)

Google’s new web browser: Chrome

Google is [releasing](http://www.google.com/chrome) a beta web browser called “[Chrome](http://www.google.com/chrome)” tomorrow, and they’ve even got a [comic strip](http://www.google.com/googlebooks/chrome/) to explain the design choices they made, and how it’s supposed to make life better.

The browser is based on [WebKit](http://en.wikipedia.org/wiki/WebKit).
They aim to make JavaScript vastly faster with a new JavaScript virtual
machine called V8. At the same time, the Mozilla team is beefing up
Firefox 3.1 with a faster JavaScript engine called [TraceMonkey](http://www.pcmag.com/article2/0,2704,2328737,00.asp).

V8 and TraceMonkey reportedly race down the freeway while IE 7 and IE 8
are left puttering along at pedestrian speeds.

Mozilla 3.1 to include Theora video support

[LWN reports](http://lwn.net/Articles/292939) that the OGG Theora video format will be supported in Firefox 3.1. I believe this is a game-changing move on the web. It will make it easier and cheaper to distribute video that will render on any OS running Firefox (because there are no patent royalties to pay). It will catapult the Theora video format into the mainstream.

An LWN reader [pointed out](http://lwn.net/Articles/293076/) that Theora has traditionally lacked quality and performance compared to MPEG-4, but that it’s being remedied by the in-progress “Thusnelda” project.

xguest

I just discovered and installed the _xguest_ package for Fedora 8 and 9. Here’s what it does:

> Installing this package sets up the xguest user to be used as a temporary account to switch to or as a kiosk user account. The account is disabled unless SELinux is in enforcing mode. The user is only allowed to log in via gdm [or the fast-user-swiching applet]. The home and temporary directories of the user will be polyinstantiated and mounted on tmpfs.

Here’s how to install it:

yum install xguest

I hit a brick wall when I first tried it. I thought my machine was in SELinux Enforcing mode, when it wasn’t — it was in Permissive mode. I fixed it using system-config-selinux.

It’s possible to change what the xguest user can do using system-config-selinux. I’ve attached a screenshot showing what capabilities can be granted or revoked.

SELinux Administration for xguest user

Fedora 9 and the OpenJDK

Java development is getting easier under Linux because of Sun’s OpenJDK, which linux distributors like Fedora now include. No more need to go through the hassle of downloading it from Sun. Here’s how I installed it.

yum install java-1.6.0-openjdk-devel java-1.6.0-openjdk-javadoc java-1.6.0-openjdk-plugin

A downside is that the default fonts in some Java applications, like IntelliJ IDEA, look terrible. Fedora 9 includes the RedHat Liberation fonts, which stand in for Microsoft fonts. I went into IDEA’s configuration, and changed the default font from “Arial” to “Liberation Sans”. IDEA’s visual appearance is nearly, but not completely, _fontastic_ compared to what it was before.

Blog Infrastructure: Control versus Capitulation

I like to be in control of my destiny where my public website (and by blog) is concerned. That way, my content isn’t at the mercy of a third-party that may start charging to host my content, remove content, or stop hosting my content. I can call this control *self reliance*.

Being in control of my blog has its costs. I am the person responsible to make sure the blog software (wordpress) stays up-to-date, which takes time — valuable time that I’d rather spend doing something else (and usually do).

Most people I know that blog have already out-sourced the their blogging platform, whether they realize it or not. Should I capitulate (i.e. surrender control) and do the same thing?

In some sense, my ability to function in this high tech world requires that I rely on others. I rely on a third party to provide the blogging software (wordpress), host my web server (digitalspace.net), another to provide bandwidth, another to provide a domain name (joker.com). On and on the list goes. I am not an island unto myself. My ability to succeed depends on being a part of civilized society.

I’d capitulate control of my blog, except that I still want a canonical location for my blog to live — one that is a little bit less subject to the whims of a single corporate entity. The best place is at jaredrobinson.com. If I need to switch to a new hosting provider or switch to a different domain name registrar, the canonical URL doesn’t have to change.

I’m not ready to capitulate yet. I like my canonical blog URL.

Test-driven development in Perl

There’s an impressively in-depth presentation from [OSCON 2008](http://en.oreilly.com/oscon2008/public/schedule/proceedings) about [Practical Test Driven Development in Perl](http://assets.en.oreilly.com/1/event/12/Practical%20Test-driven%20Development%20Presentation.pdf). It covers Test::More, Test::Class, Test::Differences, Test::Deep and Test::MockObject.

I also found the following to be interesting: [Even Faster Web Sites](http://assets.en.oreilly.com/1/event/12/Even%20Faster%20Web%20Sites%20Presentation%202.ppt) and [Pro PostgreSQL](http://assets.en.oreilly.com/1/event/12/Pro%20PostgreSQL%20Presentation.odp). Reading these helps me to know a little bit about what I don’t know.

Visualize your hard drive using a TreeMap viewer

Every once in a while, I get low on disk space, and hunting for large directories or files to delete can be difficult manually. [Tree Map visualization](http://en.wikipedia.org/wiki/Treemap) tools make the job easier. There’s [WinDirStat](http://windirstat.info/) for Windows, [KDirStat](http://kdirstat.sourceforge.net) for KDE, and [Disk Usage Analyzer](http://live.gnome.org/GnomeUtils/Baobab) (baobab) for Gnome.

![TreeMap Image](http://library.gnome.org/users/baobab/stable/figures/baobab_fullscan.png.en)