top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

GCC - loop optimizations

+2 votes
539 views

I want to use some loop optimizations. I have build GCC-4.8.1 using gmp 5.1.3, mpfr 3.1.2, mpc 1.0.1, cloog 0.18.0 and isl 0.11.1. I tried optimization flags -floop-interchange and -floop-block. I don't find any changes in the optimized code.

Are graphite loop transforms implemented in GCC-4.8.1. Should I configure and build GCC in a different way.

posted Nov 29, 2013 by Mandeep Sehgal

Share this question
Facebook Share Button Twitter Share Button LinkedIn Share Button

1 Answer

+1 vote

Are graphite loop transforms implemented in GCC-4.8.1.

Yes.

Should I configure and build GCC in a different way.

You didn't say how you configured it, so noone can say if you need to do it differently.

answer Nov 30, 2013 by Sonu Jindal
I have used the following configuration

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/gcc/libexec/gcc/x86_64-unknown-linux-gnu/4.8.1/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc_4_8_1/configure --prefix=/opt/gcc --with-gmp=/opt/gmp-5.1.3 --with-mpfr=/opt/mpfr-3.1.2 --with-mpc=/opt/mpc-1.0.1 --enable-languages=c,c++ --with-cloog=/opt/cloog-0.18.0 --with-isl=/opt/isl-0.11.1
Thread model: posix
gcc version 4.8.1 (GCC)

I tried using using -floop-block and -floop-interchange optimization flags on simple program with for loops. I don't find any difference in the optimized assembly code that is generated. I have enabled O1
optimization level and disabled all the flags. I have only enabled -floop-block and -floop-interchange. Should I enable any more options for this to work?
Similar Questions
0 votes

I am looking for detailed description of optimization flags that are provided by GCC compiler ?

+1 vote

My current compiler settings are

arm-none-eabi-gcc -Wall -mfpu=fpv4-sp-d16 -mfloat-abi=softfp -fshort-double-mcpu=cortex-m4 -mthumb -Qn -Os -finstrument-functions -mlong-calls -c temp.c -o temp.o
so on for temp1, temp2... Etc

I am compiling multiple C files. and linker settings are

ld -r temp.o -o one.o
so on for temp1, temp2... Etc

I am linking multiple .O files. Also I am using linker script i.e link.txt (invoked externally as below)

ld -cref -Map map.txt -S -T link.txt -temp.o -lm -lc -lgcc
arm-none-eabi-gcc -Wall sample.o -O Firmware.abs

Now, At output I get .abs file of 140KB.

My Questions are
1. How to optimize (reduce size of .abs) by using compiler or linker specific options?
2. There are Number of NOT used global variables and Blank Function Calls in my C files (NO Dead Code!!!!). Is it possible to perform some link time optimization.
3. Please Elaborate as I am new to this, I have referred sites, gcc help and tried some "lto" options but NO reduction is size.
4. I also tried using -fdata-sections -ffunction-sections
But It has INCREASED my firmware.abs file size from KB to MB (yes it is MB!!!!) I also wondered WHY?
I am using codesourcery arm toolchain evaluation. It has gcc version 4.5.2

+5 votes

I have a C code like this:

int foo(void)
{ 
 int phase;
 . . .
 phase = 1;
 phase = 2;
 phase = 3;
 . . .
}

In case of -O0 gcc generates machine instructions for every assignment 'phase = ...'. But in case of -O2 gcc does not generate instructions for some assignments. Of course, this is correct. However, is there any way to tell gcc that 'phase' object is inspected by another thread, so it should not remove such statements?

0 votes

I've noticed the difference in gcc and llvm behaviour with the following code:

$ cat test.c

int main()
{
 for(int i = 0;; ({break;}))
 printf("Hello, worldn");
}

$ clang test.c -pedantic &; ({break;}))
1 warning generated.
Hello, world

$ gcc test.c -std=gnu11 -pedantic &; ({break;}))
test.c:5:21: warning: ISO C forbids braced-groups within expressions [-Wpedantic]
 for(int i = 0;; ({break;}))

So, llvm thinks that this is GNU extension (seems like it really is), but compiles it, and gcc does not, even if the standard is specified as gnu11. Is it a bug?

0 votes

For the below simple case, I think poly1[i] should be hoisted to outermost loop to avoid loading from innermost loop at each iteration.
But for arm-none-eabi target like cortex-m4, gcc fails to do so. I this a normal case or a missing optimization? Please advise.

void
PolyMul (float *poly1, unsigned int n1,
 float *poly2, unsigned int n2,
 float *polymul, unsigned int *nmul)
{
 unsigned int i, j;

 for (i = 0; i for (j = 0; j polymul[i+j] += poly1[i] * poly2[j];
}
...