What is the overhead in splitting a for-loop into multiple for-loops, if the total work inside is the same?

1 Answer

For example, splitting the following loop results in almost a 2x slow-down (full test code at the bottom):

for (int c = 0; c < size; c++){
    data[c] *= 10;
    data[c] += 7;
    data[c] &= 15;
}

And this is almost stating the obvious since you need to loop through 3 times instead of once and you make 3 passes over the entire array instead of 1.

On the other hand, if you take a look at this question: Why is one loop so much slower than two loops?

for(int j=0;j<n;j++){
    a1[j] += b1[j];
    c1[j] += d1[j];
}

The opposite is sometimes true due to memory alignment.

What to take from this?

Pretty much anything can happen. Neither way is always faster and it depends heavily on what's inside the loops.

And as such, determining whether such an optimization will increase performance is usually trial-and-error. With enough experience you can make fairly confident (educated) guesses. But in general, expect anything.

"Each additional for-loop would cost the equivalent of two int allocations."
You are correct that it's not that simple. In fact it's so complicated that the numbers don't mean much. A loop iteration may take X cycles in one context, but Y cycles in another due to a multitude of factors such as Out-of-order Execution and data dependencies.

Not only is the performance context-dependent, but it also vary with different processors.

Here's the test code:

#include <time.h>
#include <iostream>
using namespace std;

int main(){

    int size = 10000;
    int *data = new int[size];


    clock_t start = clock();

    for (int i = 0; i < 1000000; i++){
#ifdef TOGETHER
        for (int c = 0; c < size; c++){
            data[c] *= 10;
            data[c] += 7;
            data[c] &= 15;
        }
#else
        for (int c = 0; c < size; c++){
            data[c] *= 10;
        }
        for (int c = 0; c < size; c++){
            data[c] += 7;
        }
        for (int c = 0; c < size; c++){
            data[c] &= 15;
        }
#endif
    }

    clock_t end = clock();
    cout << (double)(end - start) / CLOCKS_PER_SEC << endl;

    system("pause");
}

Output (one loop): 4.08 seconds
Output (3 loops): 7.17 seconds

answer Mar 16, 2016 by Manikandan J

Similar Questions

+2 votes

MySql: Performance boost by splitting up large table?

We have a table with 254 columns in it. 80% of the time, a very small subset of these columns are queried. The other columns are rarely, if ever, queried. (But they could be at any time, so we do need to maintain them.). Would I expect to get a marked performance boost if I split my table up into 2 tables, one with the few frequently queried columns and another with less frequently queried ones? Doing this will require a lot of code changes, so I don't want to go down this path if it won't be beneficial.

Can folks here offer their experiences and learned opinions about this?

+8 votes

How to calculate the length of a Link List if it contains a loop?

0 votes

Where to start debugging if web services are slow and what could be the reasons for performance slow down?

+1 vote

C/C++: what is the difference between for loop and while loop except its syntax?

What is the overhead in splitting a for-loop into multiple for-loops, if the total work inside is the same?

Your comment on this post:

1 Answer

Here's the test code:

Your comment on this answer:

Your answer

Preview