The following two code sequences produce the same result:
uint mod = val % 4;
uint mod1 = val & 0x3;
I can use both to calculate the same result. I know that in hardware the & operator is much more simpler realised than the % operator. Therefore I expect it to have a better performance than the % operator.