Choosing a shard key

An important aspect of sharding is choosing a shard key. A shard key is a field that exists in every every shard, so the choice of a shard key is an important one. Key considerations in choosing a shard key are the following:

  • A shard key must be part of the table's primary key.  This requirement is to prevents scenarios where the data with the same primary key ends up in different nodes because the shard key changed.
  • Tables that are sharded should use the same shard key if they are joined together.  This choice leads to the best performance.
  • Choose a shard key that results in a wide variety of keys.  For example, if you have only two values for the shard key, but 128 partitions, then you will get all of the data into only two partitions and performance will be slowed.  
  • Choose a shard key so that the data is distributed well across the keys.  Do not choose a shard key where the data falls heavily into a couple of shard keys; this results in data in only a few partitions and degrades performance.

Any other recommendations or best practices that you would add?

4replies Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
reply to topic
Like3 Follow
  • 3 Likes
  • 3 mths agoLast active
  • 4Replies
  • 888Views
  • 4 Following