In statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. And in Teradata, it is defined as imbalanced processing, caused by uneven distribution. Highly skewed means some AMPs have more rows and some much less, as in data is not properly/evenly distributed. We can have data skew, CPU skew, and IO skew.
Shared Nothing architecture – dividing the work
The shared nothing architecture ensures that each virtual processor is responsible for the storage and retrieval of its own unique data. Data is stored physically together on the node, but the virtual processors ensure parallelism. This is also the basis of Teradata scalability. Each AMP owns an equal slice of the disk:
We will now understand how we can detect skew in a table which has a bad PI and how we can resolve it...