mtnwestruby: Simple Bayesian Networks

Mountain West Ruby Conference: SBN
16 March 2007

Carl Youngblood. Simple Bayesian Networks with Ruby.

How do we handle uncertainty? Probablilty theory.
How do we come up with probabilities? From experience, from beliefs — but either of these may not be accurate because of too few samples or skewed samples.


gem install sbn

The Ruby version is faster than his C++ version. Premature optimization is the root of all evil. — Donald Knuth

xmlbif – the format that the bayes network is stored to and read from. Compatible with the Java Bayes package. See http://en.wikipedia.org/wiki/Bayesian_network

He couldn’t get his SBN demonstration program working as expected. Sounds like SBN isn’t ready for prime time. Future improvements:

  • Exact inference
  • True continuous variables
  • Etc.

http://www.pixelglow.com/macstl/

Q: Why not do this in R, and connect R to Ruby?
A: Good idea.

I think it would be cool if someone showed how to connect SBN into CruiseControl.rb to identify what patterns of development and which developers are most likely to break a build.

mtnwestruby: Ruport

Mountain West Ruby Conference by Gregory Brown
16 March 2007
Pragmatic Community Driven Development in Ruby. Or “Rolling the dice with Ruport”. “Reporting Sucks”.

Gregory spent several minutes at the beginning of the presentation trying to get his Nvidia X display settings worked out for both the projector and his screen. None of the other people using Linux desktops had the same problems, and neither did the people using Windows or Mac OS X.

Ruport couldn’t have happened without a community to support it. Community = people + problems. Why is community important? Community helps root out bad ideas, and we all have bad ideas. With a good community, you can discover bad ideas sooner, and replace them with better ideas (hopefully).

How do we leverage community? One aspect is to pick a license that encourages people to get involved.

What license do we choose? Licenses aren’t as easy to “refactor” as code. Choose a license that is “right” for your project. Don’t write your own. In order to be taken seriously at a community and at a corporate level, pick one of the already approved OSS licenses. Picking a license is a compromise.

BSD/MIT favors individuals.
GPL favors communities.

Reporting is a big domain. Sometimes, it’s necessary to integrate with software with less than ideal licensing terms. The Ruport community chose the Ruby License.

Communities are idea warehouses.

You can only really work passionately on your problems. Find people who have the same problems. Learn to say “no”. Sometimes, less is more – you’ve got so many ideas floating around and such large groups – sometimes you need smaller groups and a focused, trimmed down set of ideas so that you can define software and implement it. It’s easier to read tens of emails per week than thousands.

Discussion led to code, which led to bug reports.

Mailing lists are not a bug tracker. Gforge is, but it does too much. They decided to use SVN + Trac for Ruport. Allows a casual user to subscribe to what’s happening and to contribute. It’s amazing how much good tools can affect a community and allow contributions. But what about RubyForge? Virtualy all interesting Ruby software is on RubyForge, which gives your software exposure. They used svk and cron to mirror their repository over to RubyForge.

Friction affects contribution. Explain how people can contribute bug reports, patches, etc.

Every patch is valuable. Not every patch is suitable, but the relationships you can establish are important. Patches (suitable or not) give you an idea of what people are trying to get out of your software. Still want relevant patches. What is relevant? It took them 15 months to figure it out – way to long. They came up with a roadmap and a scope definition. They didn’t just scope features, but the design as well.

We have to be careful not to accept code/features that are only for ourselves – that will never be used by anyone else. They moved some things out of the core into plugins.

Unique project identity is good. Half-way implemented features are bad for users.

Recommends the book “Producing OSS Software” by Karl Fogel, which is available for free online.

Ruport 1.0 will be released on May 15, 2007.

Sometimes, developers have to “hide” from the large community surrounding their project, simply because they don’t have time to respond to all questions and ideas. They have to make time for families, work, and for development. It’s good to get smart users involved to help answer questions.

mtnwestruby: Ruby Queues

Mountain West Ruby Conference
Ruby Queues (RQ) by Ara Howard
16 March 2007

http://www.linuxjournal.com/article/7922
Ara works for NOAA — primarily with satellite data sets. 50KLOC, all paid for by tax payer dollars. Builds medium sized 10-20 node distributed systems.

RQ helps build instant distributed linux clusters. When presenting RQ to scientists, he rarely mentions Ruby. Today, he will talk about the technical side of RQ. RQ isn’t one of the most interesting pieces of software he’s written, but he learned more than average while writing it. One of the reasons he teaches and presents is because he learns while doing it.

RQ has been used to help generate power outage maps after hurricanes hit. Why did he develop it? The lab purchased a bunch of linux machines instead of a Cray, because it was cheaper. His job was to make them work together. He tends to believe that the first link on Goggle will yield the information he needs, so he went looking for a simple distributed computing framework. The solutions he found were the wrong fit, or overly heavyweight. In their environment, the programs that act on the data follow the data, because it’s more expensive to move data than to move programs. He decided to write RQ.

Tried using MySQL for the server queue controller. However, it adds complication with setting up usernames and passwords, and getting approval for the security thereof. He decided to leverage what was already approved – traditional UNIX file permissions and NFS for shared access to data. He also couldn’t run a process as root, or have it listen on a TCP port.

Needed NFS-safe lock files.

gem install lockfile # he wrote this package

NFS lockd wasn’t very good at throughput or fairness. One node would get the lock 500 times in a row, then the next node 500 times, etc. He wrote lock-polling code with a back-off algorithm. It took a while to get it right.

Ended up using SQLite for the shared data store. “Beats the pants off pstore, fsdb, Madeline, etc.” Most of these un-ideal solutions didn’t work well with NFS-heavily-cached data. They would run for 2 weeks, then get corruption. In contrast, SQLite is very robust over NFS — it detects and recovers from corruption.

gem install slave

How does a normal user install daemon processes? RQ cron.

nrtq query – input and output in YAML. He didn’t tell scientists that it was YAML. They didn’t need to know. Using YAML meant he didn’t have to write his own parser, and it’s human readable.
RQ is being used on a single host to queue jobs. There’s a Rails plugin.

Lessons learned:

  • NFS is quirky, but it’s the defacto standard. We get to live with it and work around the quirks.
  • LVM kills performance.
  • Roll your own NFS locking. The standard one is insufficient.
  • Use NFS hard mounts. Puts nodes to sleep until NFS server comes back online.
  • RQ does not move data around. They use vsftpd to allow data to be moved.
  • Constraints are good. Turns out many people and organizations operate under the same constraints.

Linux C++ IDE; NX

Lately, I’ve been developing on Linux. When developing remotely, I can get
along with a shell and vim, with VNC, or with remote-X. However, none of these
options are as fast or as nice as using NX. Here are the instructions to install and use
the NX server and client on Fedora Core 5 and 6:
http://fedoranews.org/contributors/rick_stout/freenx/

What’s the best C++ IDE in Linux? Out of the three IDEs I have evaluated, I’d
recommend either SlickEdit or NetBeans C++. I haven’t tried Emacs. I’ve installed KDevelop, but haven’t tried it much yet.

Eclipse CDT

  Overall: Immature and over complicated. I prefer vim with a ctags file, jedit, nedit, or gedit.
  Code Completion: Broken -- rarely works
  Search by Symbol or Reference: Broken
  Debugger support: Yes. Ugly user interface
  Custom build (bjam): Yes
  Project support: Yes. Automatically adds new files, removes old files from workspace
  Refactoring support: No
  Subversion support: Yes, with plugin

SlickEdit

  Overall: Excellent IDE
  Code Completion: The best of the bunch, but not as good as Visual Studio
  Search by Symbol or Reference: Excellent
  Debugger support: Yes. Difficult to setup
  Custom build (bjam): Yes
  Project support: Yes
  Refactoring support: Good
  Subversion support: Yes
  Notes: Has fairly good key emulation support for Visual Studio, Vim, Brief, Emacs, etc.
  Language Support: Tagging and syntax highlighting for C++, Java, Perl, Python and Ruby (to name just a few).

NetBeans C++

  Overall: Better than Eclipse CDT
  Code Completion: Yes
  Search by Symbol or Reference: Yes
  Debugger support: Yes, but haven't yet figured out how to set breakpoints.
  Custom build (bjam): Yes
  Project support: Not yet evaluated
  Refactoring support: No
  Subversion support: Yes, with plugin or with NetBeans beta 6.0.

KDevelop

  Overall: Not yet evaluated
  Code Completion: Yes
  Search by Symbol or Reference: Symbol - Yes (using ctags); Reference - Unknown.
  Debugger support: Yes
  Custom build (bjam): Most likely
  Project support: Yes
  Refactoring support: Unknown
  Subversion support: Yes

None of these tools are as good at code completion as Microsoft Visual Studio 2005.