September 2009 – jaredrobinson.com

September 18, 2009September 20, 2009

Using rsync with SELinux

Last week, I needed to move /home from one Fedora computer to another, and I used rsync over ssh move the data.

On the new system, I noticed that procmail didn’t seem to be working, and neither did Dovecot. Nor could apache serve up my files. This had all been working on my previous Fedora system, which was running SELinux, as was my new system. What had happened?

I hadn’t told rsync to bring across the SELinux file contexts, which are stored in extended attributes. Here is the rsync option I should have used:

-X, –xattrs

I could have used ‘tar’ to move my home directory as well. In that case, I would have needed one of the following options: `–selinux` or `–xattrs`

I resolved my SELinux issues using the excellent [SETroubleShoot](https://fedorahosted.org/setroubleshoot/), which explained what commands to run to restore the proper SELinux contexts on various files.

SELinux requires time to tune, and I use it because it enhances the security of my linux system, which serves up content over HTTP (Apache), IMAP (dovecot) and CIFS (Samba).

September 15, 2009

XML for documents, not for large data streams

I like XML, and I hate XML. XML is great because robust parsers already exist for nearly every programming language, thus saving work for programmers and reducing bugs. XML stinks because it’s not always the right tool for the job — it’s ugly, and it’s bulky. So when I read Michael E. Driscoll’s [comparison of documents (including XML) to trees and data to streams](http://dataspora.com/blog/the-rise-of-the-data-web/), it struck a chord with me:

> Trees are rooted and finite: you can’t chop up a tree and easily put it back together again. Streams can be split, sampled, and filtered. The divisibility of data streams lends itself to parallelism in a way that document trees do not. The stream paradigm conceives of data as extending infinitely forward in time. The Twitter data stream has no end: it ought have no end tag. Conceiving of data as streams moves us out of the realm of static objects and into the realm of signal processing.

He also [explains why XML shouldn’t be used for large data streams](http://dataspora.com/blog/xml-and-big-data/):

> XML is a poor language for data because it solves the wrong problems — those of documents — while leaving many of data’s unique issues unaddressed. But many promising alternatives exist — microformats like JSON, Thrift, and even SQLite’s file format.

I wouldn’t have thought of using SQLite’s file format — it has become somewhat ubiquitous. I admire Google ProtocolBuffers and Apache Thrift for offering open source, multi-language binary encoding for data. Now programmers won’t be as likely to reinvent the wheel, and they can rely on robust libraries.

September 15, 2009

Vim multi-line search-and-replace for wordpress comments

When I switched web hosting providers, I migrated my wordpress instance by exporting to wordpress XML format (as opposed to doing a SQL export).

I didn’t want the SPAM comments to be imported into the new wordpress instance, so I used vim multi-line search and replace to delete the unwanted comments from the XML.

:%s#\_.\{-}<.wp:comment>##

I gleaned that syntax from from [http://osdir.com/ml/editors.vim/2002-06/msg00468.html](http://osdir.com/ml/editors.vim/2002-06/msg00468.html)