MongoDB: Pre-splitting a sharded collection

When suddenly writing high volumes of data to a MongoDB collection that’s had little or no data previously, it’s important to pre-split the collection so that there’s good write performance — we don’t want to write all data to a single shard while waiting for the MongoDB balancer to figure things out. While it’s possible to programattically specify the split points in advance, MongoDB has an easier way: Hashed shard keys. E.g.:

db.adminCommand({shardCollection: 'test.user', \
  key: {uid: 'hashed'}, \
  numInitialChunks: 500})

Equivalent python code looks approximately like this:

my_pymongo_connection["admin"].command("shardcollection", "test.user", \
  key={'uid': 'hashed'}, \
  numInitialChunks=500)

The downside is that it can take a while to shard the collection — the call doesn’t return until it’s complete, and it reportedly blocks all other clients of a given mongos instance until it finishes.

Apparently it’s not possible to specify more than 8129 initial chunks.