XML for documents, not for large data streams

I like XML, and I hate XML. XML is great because robust parsers already exist for nearly every programming language, thus saving work for programmers and reducing bugs. XML stinks because it’s not always the right tool for the job — it’s ugly, and it’s bulky. So when I read Michael E. Driscoll’s [comparison of documents (including XML) to trees and data to streams](http://dataspora.com/blog/the-rise-of-the-data-web/), it struck a chord with me:

> Trees are rooted and finite: you can’t chop up a tree and easily put it back together again. Streams can be split, sampled, and filtered. The divisibility of data streams lends itself to parallelism in a way that document trees do not. The stream paradigm conceives of data as extending infinitely forward in time. The Twitter data stream has no end: it ought have no end tag. Conceiving of data as streams moves us out of the realm of static objects and into the realm of signal processing.

He also [explains why XML shouldn’t be used for large data streams](http://dataspora.com/blog/xml-and-big-data/):

> XML is a poor language for data because it solves the wrong problems — those of documents — while leaving many of data’s unique issues unaddressed. But many promising alternatives exist — microformats like JSON, Thrift, and even SQLite’s file format.

I wouldn’t have thought of using SQLite’s file format — it has become somewhat ubiquitous. I admire Google ProtocolBuffers and Apache Thrift for offering open source, multi-language binary encoding for data. Now programmers won’t be as likely to reinvent the wheel, and they can rely on robust libraries.

Vim multi-line search-and-replace for wordpress comments

When I switched web hosting providers, I migrated my wordpress instance by exporting to wordpress XML format (as opposed to doing a SQL export).

I didn’t want the SPAM comments to be imported into the new wordpress instance, so I used vim multi-line search and replace to delete the unwanted comments from the XML.

:%s#\_.\{-}<.wp:comment>##

I gleaned that syntax from from [http://osdir.com/ml/editors.vim/2002-06/msg00468.html](http://osdir.com/ml/editors.vim/2002-06/msg00468.html)

Palm T|X Security: Counterproductive

The other day, I was looking through the preferences on my Palm T|X, and I found out that I could enable “Intrusion Protection”. I set it so that it would destroy all data on the TX if I failed to enter my password 25 times. That seemed like enough grace period that I wouldn’t accidentally destroy my data, even if I mis-typed the password several times.

The next day, I let my three-year-old play “Bombel”, and draw on the “Note Pad”. Several minutes later, I noticed that she was pushing buttons willy-nilly at the password screen.

“Oh!”, I thought, “That’s not good.” She was well on her way to exceeding the 25-password attempts and wiping out my data. I knew I could get it back with a hot-sync, but I didn’t want to resort to that.

Palm “intrusion detection” became counterproductive when placed in the hands of a child.

—-

I also tried the Palm TX feature to “Encrypt data when locked”. First, I tried using [AES](http://en.wikipedia.org/wiki/Advanced_Encryption_Standard) encryption, since it would likely be “stronger” than the default of [RC4](http://en.wikipedia.org/wiki/RC4). AES was unusable — it took minutes to encrypt and decrypt my calendar and address databases. RC4 was barely usable, taking ten seconds or so to encrypt and decrypt my calendar. When I whip out my Palm, I want access to my data immediately, so I disabled encryption.

—-

I’ve chosen convenience over confidentiality for the data on my Palm TX, because I felt that the price to pay for confidentiality was too high. I’m not sure that it’s the right decision. I might feel differently if the Palm is lost or stolen. And so might some of the contacts in the address book. I would re-evaluate my decision if I were required to notify those contacts in the case of a lost Palm.

Fedora 11 and Virtualization (KVM)

I’ve recently upgraded another computer from Fedora 9 to Fedora 11, and I’ve decided to try the built-in [KVM](http://en.wikipedia.org/wiki/Kernel-based_Virtual_Machine) (i.e. Applications -> System Tools -> [Virtual Machine Manager](http://virt-manager.et.redhat.com/)). I wanted a virtual machine that had bridged mode networking, but it wasn’t available by default. To get it as an option, I disabled SELinux (not sure if it was necessary), followed [some special instructions](http://wiki.libvirt.org/page/Networking#Fedora.2FRHEL_Bridging) to setup a bridged interface, and restarted my network and libvirtd.

Now I’ve got a working guest OS inside of KVM, and I like it. The guest OS feels snappy and responsive.

Update: KVM and the accompanying tools aren’t as mature as VirtualBox or VMWare. E.g. I didn’t see how to get my USB flash drive to be recognized by a KVM guest OS. At one point, I tried to use VirtualBox at the same time as KVM. VirtualBox told me I needed to disable the KVM kernel module before using VirtualBox.

Switched from digitalspace to justhost

I’ve been running my website on digitalspace.net hosting for years. Then they sold out to jumpline, and my ability to push changes to my website via ‘[rsync](http://www.samba.org/rsync/)’ disappeared, and was never restored. Although I still had ssh shell access, the account was seriously limited. It was probably a good security decision on their part, but I missed having wget, tar, gunzip, chmod, and other essential utilities that I used when upgrading my blogging software. It became tedious, at best, to maintain my website.

I’ve finally switched to hosting through http://www.justhost.com and the transition has taken more time than I wanted. As a father of four dear children, I feel the time pinch. Migrating wordpress has been more tedious than expected. And then there’s email — that was a pain to switch as well. At one point, I even considered abandoning my website and switching my blog to a site like blogger.com. But I stuck with it.

Jumpline support has been good to work with, and I’m pleased with my ssh shell access. I get the power of a typical linux shell with my favorite utilities: rsync, tar, etc.