Choosing a shard key
An important aspect of sharding is choosing a shard key. A shard key is a field that exists in every every shard, so the choice of a shard key is an important one. Key considerations in choosing a shard key are the following:
- A shard key must be part of the table's primary key. This requirement is to prevents scenarios where the data with the same primary key ends up in different nodes because the shard key changed.
- Tables that are sharded should use the same shard key if they are joined together. This choice leads to the best performance.
- Choose a shard key that results in a wide variety of keys. For example, if you have only two values for the shard key, but 128 partitions, then you will get all of the data into only two partitions and performance will be slowed.
- Choose a shard key so that the data is distributed well across the keys. Do not choose a shard key where the data falls heavily into a couple of shard keys; this results in data in only a few partitions and degrades performance.
Any other recommendations or best practices that you would add?
Bill Back - Details around how create table, create fact table, create dimension table impacts sharding strategy. There is some mention here: · https://help.thoughtspot.com/02_Administration/Administrator_Guide_4.2/020/040/030
(TQL syntax for create fact table, create dimension table does not exist in documentation. We caught it in our DDL script which lead us to inquire. Pankaj also gave us some insight).Reply