This is Gentoo's testing wiki. It is a non-operational environment and its textual content is outdated.

Please visit our production wiki at https://wiki.gentoo.org

Optimisation de GCC

From Gentoo Wiki (test)
Jump to:navigation Jump to:search
This page is a translated version of the page GCC optimization and the translation is 46% complete.
Outdated translations are marked like this.

Ce guide est une introduction à l'optimisation de code compilé en recourant à des variables CFLAGS et CXXFLAGS saines. Il présente aussi la théorie sous-jacente à l'optimisation en général.

Introduction

Que sont les variables CFLAGS et CXXFLAGS ?

CFLAGS and CXXFLAGS are among the environment variables conventionally used to specify compiler options to a build system when compiling C and C++ code. While these variables are not standardized, their use is essentially ubiquitous and any correctly written build should understand these for passing extra or custom options when it invokes the compiler. See the GNU make info page for a list of some of the commonly used variables in this category.

Parce que la majorité des paquets constituant un système Gentoo sont écrits en C ou C++, ce sont deux variables qu'un administrateur voudra généralement paramétrer correctement, car elles influencent beaucoup la manière dont le système est construit.

Elles peuvent être utilisées pour diminuer le nombre de messages de débogage pour un programme, augmenter le niveau d'alerte, et bien-sûr, optimiser le code produit. Le manuel de gcc (en anglais) tient à jour une liste exhaustive des options disponibles et de leurs objectifs.

Comment sont-elles utilisées ?

Normally, CFLAGS and CXXFLAGS would be set in the environment when invoking a configure script or with makefiles generated by the automake program. In Gentoo-based systems, set the CFLAGS and CXXFLAGS variables in /etc/portage/make.conf. Variables set in this file will be exported to the environment of programs invoked by portage such that all packages will be compiled using these options as a base.

CODE Activer CFLAGS dans /etc/portage/make.conf
CFLAGS="-march=athlon64 -O2 -pipe"
CXXFLAGS="${CFLAGS}"
Important
Bien qu'il soit possible d'avoir plusieurs ligne pour les options de la variable USE, faire de même avec CFLAGS conduira à des problèmes avec des programmes tels que cmake. Assurez-vous que la déclaration des CFLAGS tient sur une seule ligne avec le moins d'espaces possible pour éviter ces problèmes. Reportez-vous au bug #500034 comme exemple.

Comme vu dans l’exemple ci-dessus, la variable CXXFLAGS est définie pour utiliser toutes les options présentes dans CFLAGS. La plupart des systèmes doivent être configurés de cette manière. Les options additionnelles pour CXXFLAGS sont moins courantes et ne s'appliquent pas assez généralement pour qu'il soit utile de les paramétrer globalement.

Tip
L'article Safe CFLAGS peut être utile aux débutants pour optimiser leur système.

Erreurs de conception

While compiler optimizations enabled by various CFLAGS can be an effective means of producing smaller and/or faster binaries, they can also impair the function of the code, bloat its size, slow down its execution time, or simply cause a build failure. The point of diminishing performance returns is reached rather quickly when dealing with CFLAGS. Don't set them arbitrarily.

Ne pas oublier que les variables CFLAGS globales configurées dans /etc/portage/make.conf s'appliqueront à tous les paquets du système, ainsi l'administrateur définit généralement seulement des options vastes et universelles. Chaque paquet modifie ensuite ces options dans ebuild ou directement dans le système de 'build' pour générer un ensemble de paramètres utilisés pendant la compilation.

Prêt ?

Ayant pris conscience des risques potentiels, on peut s'attarder sur des optimisations sûres et sans danger. Celles-ci permettent de maintenir une bonne entente avec les développeurs la prochaine fois qu'un problème sera rapporté sur Bugzilla. (En effet, les développeurs demandent généralement de recompiler un paquet avec des options CFLAGS minimales, pour voir si le problème subsiste. Ne pas oublier que des options agressives peuvent gâcher le code !)

Optimiser

Les bases

L'objectif derrière les options des variables CFLAGS et CXXFLAGS est de créer un code parfaitement adapté au système ; il devrait fonctionner parfaitement tout en étant aussi compact et rapide que possible. Parfois, ces conditions sont mutuellement incompatibles, c'est pourquoi ce guide se limitera à des combinaisons réputées pour bien fonctionner. Idéalement, ce sont les meilleurs possibles pour toute architecture de processeur. À titre d'information, les options agressives seront traitées plus tard. Toutes les options listées dans le manuel GCC (il y en a des centaines) ne seront pas traitées, mais seulement les plus basiques et courantes seront étudiées.

Note
Si une option est inconnue, se reporter au chapitre correspondant dans le manuel GCC. Si ce dernier n'est pas assez limpide, utiliser un moteur de recherche ou regarder la liste de diffusion de GCC.

-march

La première, et la plus importante des options est -march. Elle indique au compilateur quel code il devrait produire pour votre architecture de processeur (ou arch) ; elle indique à GCC qu'il devrait produire du code pour un certain type de processeur. Des processeurs différents ont des aptitudes différentes, prennent en charge différents jeux d'instructions et ont des manières différentes d'exécuter le code. L'option -march renseigne le compilateur pour qu'il produise le code spécifique au processeur, en tenant compte de toutes les aptitudes, fonctionnalités, jeux d'instructions, comportements, etc. de ce processeur, à condition que le code source soit disposé à les utiliser. Par exemple, pour bénéficier des instructions AVX, le code source doit être adapté pour les supporter.

-march= is an ISA selection option; it tells the compiler that it may use the instructions from the ISA. On an Intel/AMD64 platform with -march=native -O2 or lower OPT level, the code will likely end up with AVX instructions used but using shorter SSE XMM registers. To take full advantage of AVX YMM registers, the -ftree-vectorize, -O3 or -Ofast options should be used as well[1].

-ftree-vectorize is an optimization option (default at -O3 and -Ofast), which attempts to vectorize loops using the selected ISA if possible. The reason it isn't enabled at -O2 is that it doesn't always improve code, it can make code slower as well, and usually makes the code larger; it really depends on the loop etc.

Même si la variable CHOST dans le fichier /etc/portage/make.conf spécifie l'architecture générale utilisée, -march devrait quand même être utilisée pour que les programmes soient optimisés pour le processeur spécifique du système. Les processeur x86 et x86-64 (parmi d'autres) devrait utiliser l'option -march.

De quel type de processeur dispose le système ? Pour le savoir, exécutez la commande suivante :

user $cat /proc/cpuinfo

or even install app-portage/cpuid2cpuflags and add the available CPU-specific options to the make.conf file, which the tool does through e.g. the CPU_FLAGS_X86 variable:

user $cpuid2cpuflags
CPU_FLAGS_X86: aes avx avx2 f16c fma3 mmx mmxext popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3
root #echo "CPU_FLAGS_X86='aes avx avx2 f16c fma3 mmx mmxext popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3'" >> /etc/portage/make.conf

Pour avoir plus de détails, y compris sur les valeurs march et mtune, deux commande peuvent être utilisées:

  • The first command tells the compiler not to do any linking (-c), and instead of interpreting the --help option for clarifying command line options, it now shows if certain options are enabled or disabled (-Q). In this case, the options shown are those enabled for the selected target:
user $gcc -c -Q -march=native --help=target
  • The second command will show the compiler directives for building the header file, but without actually performing the steps and instead showing them on the screen (-###). The final output line is the command that holds all the optimization options and architecture selection:
user $gcc -### -march=native /usr/include/stdlib.h

Maintenant, regardons l'option -march en action. Ceci est un exemple pour un ancien Pentium III :

FILE /etc/portage/make.confPentium III example
CFLAGS="-march=pentium3"
CXXFLAGS="${CFLAGS}"

En voici un autre pour un processeur AMD 64-bit :

FILE /etc/portage/make.confAMD64 example
CFLAGS="-march=athlon64"
CXXFLAGS="${CFLAGS}"

S'il vous reste un doute quand au type de votre processeur, vous pouvez utiliser l'option -march=native. Lorsque cette option est utilisée, GCC tentera de détecte automatiquement le processeur et attribuer lui-même les options appropriées pour celui-ci. Néanmoins, -march=native ne doit utilisée si vous voulez ou envisagez de compiler des paquets pour un autre processeur !

Warning
N'utilisez PAS -march=native ou -mtune=native dans les variables CFLAGS et/ou CXXFLAGS de make.conf lors de compilation avec distcc.

If compiling packages on one computer in order to run them on a different computer (such as when using a fast computer to build for an older, slower machine), then do not use -march=native. "Native" means that the code produced will run only on that type of CPU. The applications built with -march=native on an AMD Athlon 64 CPU will not be able to run on an old VIA C3 CPU.

Sont aussi disponibles, les options -mtune et -mcpu. Ces options sont normalement utilisées quand il n'y a pas d'option -march disponible ; certaines architecture de processeur peuvent demander les options-mtune ou même -mcpu. Malheureusement, le comportement de GCC n'est pas très cohérent sur la manière dont va ce comporter une option d'une architecture à une autre.

On x86 and x86-64 CPUs, -march will generate code specifically for that CPU using its available instruction sets and the correct ABI; it will have no backwards compatibility for older/different CPUs. Consider using -mtune when generating code for older CPUs such as i386 and i486. -mtune produces more generic code than -march; though it will tune code for a certain CPU, it does not take into account available instruction sets and ABI. Do not use -mcpu on x86 or x86-64 systems, as it is deprecated for those arches.

Only non-x86/x86-64 CPUs (such as SPARC, Alpha, and PowerPC) may require -mtune or -mcpu instead of -march. On these architectures, -mtune / -mcpu will sometimes behave just like -march (on x86/x86-64) but with a different flag name. Again, GCC's behavior and flag naming is not consistent across architectures, so be sure to check the GCC manual to determine which one should be used.

Note
For more suggested -march / -mtune / -mcpu settings, please read chapter 5 of the appropriate Gentoo Installation Handbook for the arch. Also, read the GCC manual's list of architecture-specific options, as well as more detailed explanations about the differences between -march, -mcpu, and -mtune.

-O

Note
To print all packages that were built with specified CFLAGS/CXXFLAGS it's possible to use the following command: grep Ofast /var/db/pkg/*/*/CFLAGS

Next up is the -O variable. This variable controls the overall level of optimization. Changing this value will make the code compilation take more time and will use much more memory, especially as the level of optimization is increased.

There are seven -O settings: -O0, -O1, -O2, -O3, -Os, -Og, and -Ofast. Only use one of them in /etc/portage/make.conf.

With the exception of -O0, the -O settings each activate several additional flags, so be sure to read the GCC manual's chapter on optimization options to learn which flags are activated at each -O level, as well as some explanations as to what they do.

Let us examine each optimization level:

  • -O0: This level (that is the letter "O" followed by a zero) turns off optimization entirely and is the default if no -O level is specified in CFLAGS or CXXFLAGS. This reduces compilation time and can improve debugging info, but some applications will not work properly without optimization enabled. This option is not recommended except for debugging purposes.
  • -O1: the most basic optimization level. The compiler will try to produce faster, smaller code without taking much compilation time. It is basic, but it should get the job done all the time.
  • -O2: A step up from -O1. The recommended level of optimization unless the system has special needs. -O2 will activate a few more flags in addition to the ones activated by -O1. With -O2, the compiler will attempt to increase code performance without compromising on size, and without taking too much compilation time. SSE or AVX may be be utilized at this level but no YMM registers will be used unless -ftree-vectorize is also enabled.
  • -O3: the highest level of optimization possible. It enables optimizations that are expensive in terms of compile time and memory usage. Compiling with -O3 is not a guaranteed way to improve performance, and in fact, in many cases, can slow down a system due to larger binaries and increased memory usage. -O3 is also known to break several packages. Using -O3 is not recommended. However, it also enables -ftree-vectorize so that loops in the code get vectorized and will use AVX YMM registers.
  • -Os: optimizes code for size. It activates all -O2 options that do not increase the size of the generated code. It can be useful for machines that have extremely limited disk storage space and/or CPUs with small cache sizes.
  • -Og : In gcc 4.8, un nouveau niveau d'optimisation général , -Og a été introduit.Il répond au besoin d'une compilation rapide et une amélioration du débogage tout en procurant un niveau de performance en exécution raisonnable. Le ressenti en développement devrait être meilleur qu'avec le niveau d'optimisation -O0. Notez que -Og n'implique pas -g, il se contente de désactiver les optimisations qui pourrait interférer avec le débogage.
  • -Ofast: nouveau dans GCC 4.7, consiste en -O3 plus -ffast math, -fno-protect-parens<c/ode>, et -fstack-arrays. Cette option brise la conformité stricte avec les normes, et n'est pas recommandée en utilisation.

As previously mentioned, -O2 is the recommended optimization level. If package compilation fails and while not using -O2, try rebuilding with that option. As a fallback option, try setting the CFLAGS and CXXFLAGS to a lower optimization level, such as -O1 or even -O0 -g2 -ggdb (for error reporting and checking for possible problems).

-pipe

A common flag is -pipe. This flag has no effect on the generated code, but it makes the compilation process faster. It tells the compiler to use pipes instead of temporary files during the different stages of compilation, which uses more memory. On systems with low memory, GCC might get killed. In those cases do not use this flag.

-fomit-frame-pointer

This is a very common flag designed to reduce generated code size. It is turned on at all levels of -O (except -O0) on architectures where doing so does not interfere with debugging (such as x86-64), but it may need to be activated. In that case add it to the flags. Though the GCC manual does not specify all architectures, it is turned on by using the -O option. It's still necessary to explicitly enable the -fomit-frame-pointer option, to activate it on x86-32 with GCC up to version 4.6, or when using -Os on x86-32 with any version of GCC. However, using -fomit-frame-pointer will make debugging hard or impossible.

In particular, it makes troubleshooting applications written in Java and compiled by gcj much harder, though Java is not the only code affected by using this flag. So while the flag can help, it also makes debugging harder; backtraces in particular will be useless. When not doing software debugging and no other debugging-related CFLAGS such as -ggdb have been used, then try using -fomit-frame-pointer.

Important
Ne combinez pas -fomit-frame-pointer avec l'option similaire -momit-leaf-frame-pointer . Utiliser cette dernière option est déconseillé car -fomit-frame-pointer fait déjà le travail proprement. De plus, -momit-leaf-frame-pointer a démontré un impact négatif sur la performance du code.

-msse, -msse2, -msse3, -mmmx et -m3dnow

These flags enable the Streaming SIMD Extensions (SSE), SSE2, SSE3, MMX, and 3DNow! instruction sets for x86 and x86-64 architectures. These are useful primarily in multimedia, gaming, and other floating point-intensive computing tasks, though they also contain several other mathematical enhancements. These instruction sets are found in more modern CPUs.

Important
Be sure to see if the CPU supports these instruction sets by running cat /proc/cpuinfo. The output will include any supported additional instruction sets. Note that pni is just a different name for SSE3.

Normally none of these flags need to be added to /etc/portage/make.conf, as long as the system is using the correct -march (for example, -march=nocona implies -msse3). Some notable exceptions are newer VIA and AMD64 CPUs that support instructions not implied by -march (such as SSE3). For CPUs like these additional flags will need to be enabled where appropriate after checking /proc/cpuinfo.

Note
Check the list of x86 and x86-64-specific flags to see which of these instruction sets are activated by the proper CPU type flag. If an instruction is listed, then it does not need to be separately specified; it will be turned on by using the proper -march setting.

FAQs sur l'optimisation

Mais j'obtiens de meilleures performance avec -funroll-loops -fomg-optimize !

No, people only think they do because someone has convinced them that more flags are better. Aggressive flags will only hurt applications when used system-wide. Even the GCC manual says that using -funroll-loops and -funroll-all-loops will make code larger and run more slowly. Yet for some reason, these two flags, along with -ffast-math, -fforce-mem, -fforce-addr, and similar flags, continue to be very popular among ricers who want the biggest bragging rights.

La vérité sur ce sujet, c'est qu'il y a des options dangereusement agressives. Jetez donc un coup d'œil aux forums Gentoo et à Bugzilla pour savoir ce que ces options font réellement : rien de bon !

These flags are not needed globally in CFLAGS or CXXFLAGS. They will only hurt performance. They might bring on the idea of having a high-performance system running on the bleeding edge, but they don't do anything but bloat the code and get bugs marked INVALID or WONTFIX.

Dangerous flags like these are not needed. Don't use them. Stick to the basics: -march, -O, and -pipe.

Que dire des niveaux -O supérieurs à 3 ?

Some users boast about even better performance obtained by using -O4, -O9, and so on, but the reality is that -O levels higher than 3 have no effect. The compiler may accept CFLAGS like -O4, but it actually doesn't do anything with them. It only performs the optimizations for -O3, nothing more.

Need more proof? Examine the source code:

CODE -O source code
if (optimize >= 3)
    {
      flag_inline_functions = 1;
      flag_unswitch_loops = 1;
      flag_gcse_after_reload = 1;
      /* Allow even more virtual operators.  */
      set_param_value ("max-aliased-vops", 1000);
      set_param_value ("avg-aliased-vops", 3);
    }

As can be seen, any value higher than 3 is treated as just -O3.

What about compiling outside the target machine?

Some readers might wonder if compiling outside the target machine with a strictly inferior CPU or GCC sub-architecture will result in inferior optimization results (compared to a native compilation). The answer is simple: No. Regardless of the actual hardware on which the compilation takes place and the CHOST for which GCC was built, as long as the same arguments are used (except for -march=native) and the same version of GCC is used (although minor version might be different), the resulting optimizations are strictly the same.

To exemplify, if Gentoo is installed on a machine whose GCC's CHOST is i686-pc-linux-gnu, and a Distcc server is setup on another computer whose GCC's CHOST is i486-linux-gnu, then there is no need to be afraid that the results would be less optimal because of the strictly inferior sub-architecture of the remote compiler and/or hardware. The result would be as optimized as a native build, as long as the same options are passed to both compilers (and the -march parameter doesn't get a native argument). In this particular case the target architecture needs to be specified explicitly as explained in Distcc and -march=native.

The only difference in behavior between two GCC versions built targeting different sub-architectures is the implicit default argument for the -march parameter, which is derived from the GCC's CHOST when not explicitly provided in the command line.

Que dire des options redondantes ?

Oftentimes CFLAGS and CXXFLAGS that are turned on at various -O levels are specified redundantly in /etc/portage/make.conf. Sometimes this is done out of ignorance, but it is also done to avoid flag filtering or flag replacing.

Flag filtering/replacing is done in many of the ebuilds in the Portage tree. It is usually done because packages fail to compile at certain -O levels, or when the source code is too sensitive for any additional flags to be used. The ebuild will either filter out some or all CFLAGS and CXXFLAGS, or it may replace -O with a different level.

The Gentoo Developer Manual outlines where and how flag filtering/replacing works.

Il est possible de contrecarrer le filtrage de -O en listant de manière redondante les options d'un certain niveau, (tel que -O3) en faisant ceci :

CODE Specifying redundant CFLAGS
CFLAGS="-O3 -finline-functions -funswitch-loops"

However, this is not a smart thing to do. CFLAGS are filtered for a reason! When flags are filtered, it means that it is unsafe to build a package with those flags. Clearly, it is not safe to compile the whole system with -O3 if some of the flags turned on by that level will cause problems with certain packages. Therefore, don't try to "outsmart" the developers who maintain those packages. Trust the developers. Flag filtering and replacing is done to ensure stability of the system and application! If an ebuild specifies alternative flags, then don't try to get around it.

Building packages with unacceptable flags will most likely lead to problems. When reporting problems on Bugzilla, the flags that are used in /etc/portage/make.conf will be readily visible and developers will ask to recompile without those flags. Save the trouble of recompiling by not using redundant flags in the first place! Don't just automatically assume to be more knowledgeable than the developers.

Que dire de LDFLAGS ?

The Gentoo developers have already set basic, safe LDFLAGS in the base profiles, so they do not need to be changed.

Puis-je utiliser des options par paquet ?

Warning
Using per-package flags complicates debugging and support. Make sure to mention the use of this feature in the bug reports together with the changes made.

Information on how to use per-package environment variables (including CFLAGS) is described in the Gentoo Handbook, "Per-Package Environment Variables".

See also

External resources

Les ressources suivantes vous seront utiles pour aller plus loin dans la compréhension de l'optimisation :

  • man make.conf

References

  1. GNU GCC Bugzilla, AVX/AVX2 no ymm registers used in a trivial reduction. Retrieved on 2017/07/18.

This page is based on a document formerly found on our main website gentoo.org.
The following people contributed to the original document: Joshua Saddler (nightmorph)
They are listed here because wiki history does not allow for any external attribution. If you edit the wiki article, please do not add yourself here; your contributions are recorded on each article's associated history page.