An abundance of databases

We live in an age of an abundance of database choices. The databases have trade-offs in terms of work to implement, rigidity vs flexibility, write performance, read performance, query performance, maintenance, support, robustness, security, and so on. It seems that many databases can be tuned to meet requirements, but it may require hiring an expert to get the most out of it, or to tell you that a given database may not be the right fit.

I recently learned of the existence of MemSQL, AeroSpike, Cockroach DB, Clustrix, VoltDB and NuoDB. Several of these came to my attention from reading an InfoWorld article, although what I cover here doesn’t exctly overlap.

MemSQL

  • Commercial only, with gratis community edition.
  • It supports a json column type, and can index, query and update data within the json.
  • Keen insights from their team of engineers. See http://blog.memsql.com/cache-is-the-new-ram/. “Throughput and latency always have the last laugh.” I.e. locality still matters.
  • “As various NoSQL databases matured, a curious thing happened to their APIs: they started looking more like SQL. This is because SQL is a pretty direct implementation of relational set theory, and math is hard to fool.”
  • “We realized that caching cost at least as much RAM as the working set (otherwise it was ineffective), plus the nearly unbearable headache of cache consistency.”
  • http://blog.memsql.com/bpf-linux-performance/

AeroSpike

  • AGPL NoSQL db, led by a former CEO of Salesforce.com. http://stackoverflow.com/questions/25208914
  • key-value store, although since it supports nested key-values, it may be somewhat equivalent to MongoDB’s schemaless json doc storage.
  • Scaleable. Far better than Redis when it’s time to scale.
  • Aerospike is reportedly faster than MongoDB (in 2014, that is)
  • Needs fewer nodes than MongoDB, and so it reportedly costs less.

Cockroach DB

  • APL 2.0
  • survivable
  • scaleable (distributed)
  • SQL
  • beta software
  • Higher write latencies. Built on RocksDB from Facebook.

Clustrix

  • Proprietary drop-in replacement for MySQL.
  • 540 million transactions per minute.
  • Higher write throughput than MongoDB (reportedly).
  • Not a document store. It’s an RDBMS

VoltDB

NuoDB

  • ACID complaint, SQL RDBMS
  • Memory centric
  • Scaleable, without sharding. (how does that work?)
  • More than 1 million transactions per second
  • Flexible schema
  • Java stored procedures
  • Despite claims that it “automatically adjusts for optimal workload”, my guess is that one must monitor and tune it. Computer algorithms are smart… until they’re not.

Faster WiFi: 802.11ac and 802.11ad

I must be out of touch with WiFi networking. The last thing I remember is when 802.11n came out and supported up to 72 Mbps network speeds. Last year, I think we finally jettisoned our last computing device that was 802.11g. Oh wait, I forgot about my home security system. It still uses an 802.11g 2 GHz network — the same frequency that commonly gets interference from microwave ovens, old bluetooth devices, cordless phones, baby monitors, and more.

While I’ve been “out of touch”, 802.11ac has become available. It operates at 5 Ghz and in most home networks, will run no faster than 800 Mbps. The iPhone 6 and LG Nexus 6 support 802.11ac. The 5 GHz frequency range gets less interference than the 2 GHz range.

In the next few years, WiGig (aka 802.11ad) will become available. It operates in the 60 GHz range, and supports streaming 4K video, and can offer throughput of up to 7 Gbit/s.

I look forward to faster WiFi. In the meantime, when I have the need for speed, I use a wired ethernet connection.

UPnP, SSDP, mDNS, LLMNR, etc. on the home network

Sometime in the distant past, I was aware of Universal Plug and Play (UPnP), but I didn’t know much about it. It’s a technology that allows devices in the home to talk to each other without prior configuration — it allows auto-discovery and configuration of printers and media servers, among other things.

The auto-discovery happens via SSDP (Simple Service Discovery Protocol). A device joins a network and announces “I’m here!”, and then other device can choose to respond. Even if the device gets a different IP address, it can still be uniquely identified by its unique identifier (UUID).

Here’s more information about UPnP and related protocols that run on the home network:

https://en.wikipedia.org/wiki/Zero-configuration_networking

UPnP:

UPnP protocol (no authentication):

  • Discovery (SSDP)
  • Description – HTTPU and HTTPMU
  • Control
  • Event notification
  • Presentation

UPnP has well defined device profiles for:

  • Audio & Video — DLNA, and
  • Routers:

    • Internet Gateway Device Protocol

    • Retrieve external IP addr

    • Enumerate port mappings
    • Add/Remove port mappings & port forwarding: firewall-hole-punching
  • Devices Profile for Web Services (DPWS)

Other protocols that help on the home network:

  • LLMNR: Link-local Multicast Name Resolution — implemented by Microsoft in Windows.
  • mDNS (multicast DNS) runs on port 5353. Uses .local hostnames.
  • DNS-SD: DNS service discovery. Can use DNS or mDNS.

Apple’s Bonjour uses mDNS and DNS-SD. Linux’s Avahi uses IPv4LL, mDNS, and DNS-SD. Linux’s systemd has “systemd-resolve”, a command-linetool to resolve hostnames on a network via DNS, mDNS, and LMMNR.

Runtime debugging tools for Linux

Here’s a useful presentation on Linux debugging tools — tools that don’t require source code, additional prints or logging.

http://jvns.ca/blog/2016/09/17/strange-loop-talk/

  • strace has a new flag that I didn’t know about: -y, which prints the paths that are associated with file descriptors.

  • opensnoop lets you see the details of open() calls across the entire system, or for an individual process, or for paths containing certain characters, or it can print the file paths that couldn’t be opened.

  • pgrep shows the stack trace of a running process, which can be useful to get an idea of what a program spends most of its time doing.

  • dstat shows system resource stats. It is a replacement for vmstat, iostat and ifstat.

  • htop — a more beautiful ‘top’, and easier to use. I still mostly use ‘top’ because it is installed by default. Other great tools I use include ‘powertop’ and ‘iotop’.

  • ngrep — an alternative to tcpdump, but allows the use of regexes to match plain-text data in packets.

  • tcpdump — useful when troubleshooting network connections between servers.

  • wireshark — a more UI-friendly tool than tcpdump, with dissectors for most protocols

Python attrs library; stackoverflow documentation

Article: The One Python Library Everyone Needs: attrs

Some people are excited about eventually being able to program in Python 3 everywhere. What I’m looking forward to is being able to program in Python-with-attrs everywhere. It exerts a subtle, but positive, design influence in all the codebases I’ve see it used in.

Or, for those who want more power (an complexity) than the attrs module, there’s macropy and it’s case-classes.


Stackoverflow has introduced a new tech documentation tool that focuses on providing examples, rather then merely sparsely documenting an API. The one on Python string formatting is quite useful.

Chrome Remote Desktop

I needed to help a friend on a remote computer recently. A coworker told me about Chrome Remote Desktop, which works on any computer that has a Chrome browser, including Linux, Mac, Windows, iPhone and Android.

Chrome Remote Desktop is an easy-to-install plugin for Chrome, and is gratis (no cost). It worked quite well, and I’m happy to recommend it.

Alternatives include copilot.com, which is free on weekends. Lifehacker has a list of solutions as well.

Idioms facilitate communication

No matter what you think of a computer language, you ought to respect its idioms for the same reason one has to know idioms in a human language—they facilitate communication, which is the true purpose of all languages, programming or otherwise.

George V. Neville-Neil

George also explains that “a single cache miss is more expensive than many instructions, so optimizing away a few instructions is not really going to win your software any speed tests”.

HTML Subresource Integrity

LWN covers the new W3C spec for HTML subresource integrity (SRI):

SRI is designed to combat injection attacks that come through third-party content. The originating site can include cryptographic hashes of third-party script and image files, enabling the user’s browser to hash the corresponding files it receives from the third-party servers and verify that the hashes match.

Most browsers already support SRI, including Firefox, Chrome and Opera.