Matrix multiplication is commonly written as shown in the example below:
for (i = 0; i < N; i++)
{
for (j = 0; j < n; j++)
{
for (k = 0; k < n; k++)
{
c[i][j] = c[i][j] + a[i][k] * b[k][j];
}
}
}
The use of b[k][j], is not a stride-1 reference and therefore will not normally be vectorizable. If the loops are interchanged, however, all the references will become stride-1 as shown in the "Matrix Multiplication With Stride-1" example.
Caution
Interchanging is not always possible because of dependencies, which can lead to different results.
for (i = 0; i < N; i++)
{
for (k = 0; k < n; k++)
{
for (j = 0; j < n; j++)
{
c[i][j] = c[i][j] + a[i][k] * b[k][j];
}
}
}