When suddenly writing high volumes of data to a MongoDB collection that’s had little or no data previously, it’s important to pre-split the collection so that there’s good write performance — we don’t want to write all data to a single shard while waiting for the MongoDB balancer to figure things out. While it’s possible to programattically specify the split points in advance, MongoDB has an easier way: Hashed shard keys. E.g.:
db.adminCommand({shardCollection: 'test.user', \ key: {uid: 'hashed'}, \ numInitialChunks: 500})
Equivalent python code looks approximately like this:
my_pymongo_connection["admin"].command("shardcollection", "test.user", \ key={'uid': 'hashed'}, \ numInitialChunks=500)
The downside is that it can take a while to shard the collection — the call doesn’t return until it’s complete, and it reportedly blocks all other clients of a given mongos instance until it finishes.
Apparently it’s not possible to specify more than 8129 initial chunks.