Google’s use of Java APIs ruled “fair use”

I’ve been following the Ars coverage of the Oracle v Google trial regarding whether Google’s use of Java APIs is “fair use”. I didn’t think Google would win, but was pleasantly surprised when the jury decided in their favor. Hurrah!

However, just because Google won, doesn’t mean that companies can indiscriminately copy APIs and have it fall within “fair use”. It seems safest to me to make use of APIs that fall under an open source license. That way, the code individuals and companies write can more easily be run against competitive API implementations without being held hostage by the owners of the original API.

URL shorteners can compromise security

It’s useful to shorten long URLs, especially when sending them in tweets and in text messages. An LWN.net article helped me learn that they can be a security risk:

URL shorteners such as bit.ly and goo.gl perform a straightforward task: they turn long URLs into short ones, consisting of a domain name followed by a 5-, 6-, or 7-character token. This simple convenience feature turns out to have an unintended consequence. The tokens are so short that the entire set of URLs can be scanned by brute force. The actual, long URLs are thus effectively public and can be discovered by anyone with a little patience and a few machines at her disposal.

Around 7% of the OneDrive folders discovered in this fashion allow writing. This means that anyone who randomly scans bit.ly URLs will find thousands of unlocked OneDrive folders and can modify existing files in them or upload arbitrary content

— VITALY SHMATIKOV

KeyCzar: Encryption made easy

Encrypting sensitive data-at-rest (i.e. in a database) is a good idea, but how does one manage the encryption keys, and rotate keys or start using a new algorithm down the road without orphaning or migrating the old data? Use KeyCzar

Cryptography is easy to get wrong. Developers can choose improper
cipher modes, use obsolete algorithms, compose primitives in an unsafe
manner, or fail to anticipate the need for key rotation. Keyczar
abstracts some of these details by choosing safe defaults,
automatically tagging outputs with key version information, and
providing a simple programming interface.

Keyczar is designed to be open, extensible, and cross-platform
compatible. It is not intended to replace existing cryptographic
libraries like OpenSSL, PyCrypto, or the Java JCE, and in fact is
built on these libraries.

Or learn from what Google did with KeyCzar, and implement the same ideas (key rotation and key version info) using a more modern encryption library, like libsodium.

RabbitMQ, memcache, and too many socket connections

What happens when you have hundreds of services connected to RabbitMQ and memcache, and those services have a bug that causes them to keep their previous socket connections open, and repeatedly reconnect to RabbitMQ and memcache?

They crash.

It occurred to me that one can prevent too many connections using iptables on the RabbitMQ and memcache machines. Here’s how:

http://www.cyberciti.biz/faq/iptables-connection-limits-howto/

The corollary is that setting the per-ip connection limit too low can also cause problems.

I’d guess that more commonly public-facing servers like NGINX and Apache don’t have the problem of crashing. Hopefully, they degrade gracefully, and refuse additional connections while continuing to service the connections they already have open.

LWN.net: “Changes in the TLS certificate ecosystem”

I was glad to come up to speed with what has been happening with TLS in the last couple of years, and I highly recommend reading these articles.

  • https://lwn.net/Articles/663875/
  • https://lwn.net/Articles/664385/

I learned about HTTP Public Key Pinning, Certificate Transparency, and STARTTLS stripping, among other things.

Here’s one of many good quotes:

The core problem of the TLS certificate system is that there exist hundreds of certificate authorities. And unless extra protection measures are in place, each of those can create valid certificates for any domain. Therefore the whole system is only as strong as the weakest of all certificate authorities.

And as for embedded devices that handle encryption:

We are well aware that crypto appears to be something that needs to be field replaceable, and yet we more or less have no clue how to do that in deployed embedded hardware. Indeed, we seem to have a very poor idea in general on how to maintain the software on field deployed embedded hardware. — Perry Metzger

As I’ve worked with Python, I realize that it’s one thing to implement TLS, and another thing to verify server certificates. The Python requests library can be configured to do the right thing, but the python SMTP cannot. It’s still another thing to check on certificate revocation. Python doesn’t implement OCSP or CRLs, and those mechanisms are problematic anyway. It doesn’t yet implement HTTP Public Key Pinning. The state of affairs may not be much better in other programming toolboxes.

So I’d guess that machine to machine internet communication is probably more vulnerable to man in the middle attacks than consumer web browsers.

Unsatisfactory Freedompop cellular experience

In September, my son started junior high, and he craved having a smartphone. His lawn-mowing money was burning a hole in his self-made duct-tape wallet. So I googled for inexpensive options. Freedompop sounded like a great deal — free phone service (500 MB data per month), based on VoIP over cellular data. Too good to be true? Read on.

We ordered a Samsung Galaxy S4 that was supposed to take up to three weeks to be delivered. The website listed the phone as sitting in a “kitting” stage for the duration, and still no phone, and no follow-up email to tell us whether our order had disappeared into a black hole.

I couldn’t find a phone number for tech support, and there was no “live chat” option on the freedompop website, so I turned to Google, and they came to the rescue. I called, navigated a deep option tree, and eventually reached a live person, which was no easy feat. They told me the phones were on back order, and to wait another week or two.

So I did, with a very anxious son asking me daily about the order status. I showed him how to check the status himself. When two weeks had passed, I called again, and for the second time, I was told the phone was on back order, but it should be coming in the next week or two. Two weeks later, the phone finally shipped. My son was so excited to open the box and play with his shiny new device.

Knowing that smart phones love to consume data, we disabled most of the apps from using cellular data.

In the coming two weeks, I received emails from freedompop saying that cellular data had run out, and that I had been charged $10 to “top-up”. That was a surprise, so I disabled that feature, using their website.

I opened the cellular data usage graph on the smartphone. It told me that the app known as “Android OS” had helpfully been chewing up the cellular data in large stair-step usage patterns. Android doesn’t allow us to disable cellular data for “Android OS”.

Screenshot_2015-11-04-21-46-08 (1)

Apparently the high cellular data usage correlates with the downloaded of a 263 MB OS update over cellular. I recalled that we tried installing the update a few times, and it didn’t work because FreedomPop modifies the phone so that OS updates don’t install.

So the phone is caught in an impossible cycle, and the FreedomPop experience has been anything but a good one.

I submitted an email to their support system, which told me I would have a response within 48 to 72 hours. It’s been five days, and I’ve heard nothing further. I tried calling tech support, and after waiting on the line for 25 minutes, had to hang up.

What I thought we would save in cellular costs, I’ve paid for in time and top-up charges. The promise of “free” was too good to be true.

I wish we had chosen Ting, which has phenomenal customer service.

MongoDB: Pre-splitting a sharded collection

When suddenly writing high volumes of data to a MongoDB collection that’s had little or no data previously, it’s important to pre-split the collection so that there’s good write performance — we don’t want to write all data to a single shard while waiting for the MongoDB balancer to figure things out. While it’s possible to programattically specify the split points in advance, MongoDB has an easier way: Hashed shard keys. E.g.:

db.adminCommand({shardCollection: 'test.user', \
  key: {uid: 'hashed'}, \
  numInitialChunks: 500})

Equivalent python code looks approximately like this:

my_pymongo_connection["admin"].command("shardcollection", "test.user", \
  key={'uid': 'hashed'}, \
  numInitialChunks=500)

The downside is that it can take a while to shard the collection — the call doesn’t return until it’s complete, and it reportedly blocks all other clients of a given mongos instance until it finishes.

Apparently it’s not possible to specify more than 8129 initial chunks.