PGI Compiler Suite reference card

PGI Compiler Suite

pgcc C compiler driver.
pgCC C++ compiler driver.
pgf95
pgf77
pgfortran
Fortran compiler driver.
pghpf High Performance Fortran compiler driver.
pgdbg Debugger.
pgcollect
pgprof
Profiler.
pgcpuid Display the CPU type the compiler sees and display the default -tp switch it will use.
pgaccelinfo Display the accelerator GPU the compiler sees.

File extensions

.c C source files.
.f/for/f90/f95 Fortran source files.
.F/FOR/F90/F95 Fortran source files (containing macros) to be processed by the Fortran processor.
.hpf High Performance Fortran source files.
.cuf Fortran source files with CUDA extensions.
.CUF Fortran source files with CUDA extensions to be processed by the Fortran processor.
.h C/C++ header files.
.i Preprocessed C source files.
.C/cc C++ source files.
.s Assembler code.
.d Dependency files. They contain rules suitable for Makefile describing the dependencies of the source file.

Created by -MD option.

Now the compiler...

Beginning version 7.0, the default compiler options
can be placed in the ~/.mypgirc file (for every PGI compiler),
~/.mypgccrc file (for C compiler),
~/.mypgcpprc file (for C++ compiler),
~/.mypgfortranrc file (for Fortran compiler), etc. The file should contain
something like:

append PREOPTIONS=-fast;
append POSTOPTIONS=-Mipa;

(Notice the semicolons.)
That is, you can set at most two default compiler options, one
of which precedes everything in the command-line, and the other
follows everything in the command-line. If you have more than one
append PREOPTIONS=.. or append POSTOPTIONS=.., only
the FIRST occurrence will be used.
Also note that you cannot use
space in the options. For example, instead of -tp barcelona-64, you
must use -tp=barcelona-64. Moreover, not all command-line
options can be used. For example, -### is not allowed.

For details, see Technical Problem Report 3985
and here.

Compile

-c Compile *.c and assemble *.s. NO linking.
-Idir Also search dir for header files.

This can also be controlled by environmental variables
C_INCLUDE_PATH and CPLUS_INCLUDE_PATH.

-S Compile *.c into assembly codes *.s. NO linking.
-Manno Make the generated assembly codes more readable.
-E Run preprocessor only. The output is sent to stdout.
-C When running preprocessor, don't discard comments in the program.
-dM Display definitions of all built-in macros.
-o file Place output in file
-v When compiling, also display the programs invoked by the compiler.
-dryrun
-###
Display the programs invoked by the driver and exit.
-drystdin Display standard header directories and exit.
-show Display detailed information of current driver.
-V Display the version number.
-# When compiling, also display the programs invoked by the compiler.
-help=hidden Display all available compiler switches, including
the hidden & undocumented ones (Yes, PGI has many of them!)

C/C++ dialect

-A Follow strict ANSI C++ standard.
-a Follow proposed ANSI C++ standard.
-B Accept C++ style comments in C code.
--gnu_extensions Accept GNU extensions.
-mp
-mp=mode
Enable OpenMP.

mode can be align, allcores, bind, nonuma, numa (use thread-CPU affinity).

Preprocessor

-Dname
-Dname=value
Predefine the macro name, with value 1, or with the specified value
-Uname Un-define the (built-in or -D defined) macro name
-M
-MM
Output a rule (to stdout) suitable for Makefile describing the dependencies of the source file.

-MM only outputs header files not in the system header directories.

This option implies -E option.

-MD The same as -M, but *.d files will be generated.
-MMD The same as -MM, but *.d files will be generated.

Warning messages

-Minform=warn Show warning messages.
-w Suppress all warnings.

Link

-Ldir Also search dir for library files.
This can also be controlled by environmental variable
LIBRARY_PATH.
-llibrary Link to liblibrary

The linker searches libraries and object files in the order they are specified, so

    foo.o -lz bar.o

will search library z after file foo.o but before bar.o, so if bar.o refers to functions
in z, then -lz must appear AFTER bar.o

-s Remove all symbol information from the executable
-Bstatic Produce statically linked executable
-shared
-fPIC
-r
Produce shared libraries. For details, see here.
-Mnostartup Don't link to the standard startup files (so the start point of a program is not main, but _start).

To compile crt1.o, one has to use this option.

Also see here for examples.

-Mnostdlib Don't link to the standard system libraries (e.g. libgcc.a) or startup files.
-Bstatic_pgi
-Bdynamic
Whether PGI-provided libraries should be statically or dynamically linked.
-pgcpplibs
-pgf77libs
-pgf90libs
Link to C++, PGF77, or PGF90 runtime libraries.
-Mmpi=mpilib Link to MPI library.

mpilib can be mpich1, mpich2, hpmpi, mvapich1.

-Mscalapack Link to ScaLAPACK library.
-Rdir Tell linker to add dir to the runtime shared/dynamic libraries search path.
-Wl,opt Pass opt to the linker.
-rpath=dir Tell linker to add dir to the runtime shared/dynamic libraries search path.
-m Enable linker to output trace/link map information.
-Wl,--start-group

-Wl,--end-group
All the options between this pair are passed to the linker.

Debugging

-g Produce debugging information.
-gopt Produce debugging information in the presence of optimization.
-Mkeepasm Save all temporary/intermediate assembly files produced during compiling.
-traceback Add debug information for runtime traceback. Should be used together
with -Meh_frame

Set
the environmental variable PGI_TERM to trace to
enable the stack trace back on error.

Profiling

-pg Produce profiling information for pgprof.
-Mprof=option Produce profiling information for pgprof.
option can be func, hwcts (PAPI must be installed), lines, mpich1
mpich2, mvapich1

Optimization

-O0 Don't optimize.
-O1 Optimize.
-O2 Optimize even more.

This is default.

-O3
-O4
Optimize yet more.
-fast This implies -O2 and other optimizations such as
loop unrolling, SSE instructions, loop redundancy elimination (LRE),
partial redundancy elimination (PRE),
Flush To Zero (FTZ) & Denormals Are Zero (DAZ) modes, etc.
-Msmart Invoke a post-pass assembly instruction scheduling optimization.
-Mdaz Treat denormal values used as input to floating-point instruction as 0.
-Mflushz Set denormal results from floating-point calculations to 0.
-Mfprelaxed Generate fast but less accurate code for math functions
(division, reciprocal, square root, reciprocal square root, etc)
-Mfapprox Generate fast but low-precision code for math functions
(division, reciprocal, reciprocal square root)
-Kieee Perform floating-point operations in strict conformance with the IEEE 754
standard. Some optimizations are disabled.
-Minline Enable function inlining.
-Mipa=fast,inline Link time/Inter-procedural optimization.
-Minfo
-Minfo=lvl
Display compile-time optimization information

lvl can be all, ccff, ftn, ipa, loop, lre, mp, opt, par, pfo, unroll, vect..

Note: CCFF means "Common Compiler Feedback Format"

-Mneginfo Display messages why certain optimizations are disabled
during compile-time.
-Mchkfpstk Generate extra code after every function call to ensure that
the FPU register stack is in the expected state.
-Msmartalloc=huge Link to the huge page runtime library.
-Mpfi
-Mpfo
Profile guided optimization (PGO).
-Mconcur Automatically paralellize loops.
-Mvect Automatically vectorize loops.
-tp cpu Generate code for specific cpu, e.g.
athlon, barcelona, barcelona-64, core2-64, istanbul-64, nehalem-64, p7-64, penryn-64, shanghai-64 ...
-help=target List all cpu which can be used
in "-tp cpu" switch.
-ta=nvidia,sub_options Generate code for NVIDIA
accelerator with specific sub_options, e.g.
cc20, cuda2.3, cuda3.0, fastmath...
-pc=n Round the significand to n bits, n can be 32, 64, 80.
-W0,-beta -# (Undocumented) Enable beta release optimizations.

Miscellaneous features

-Mchkstk Generate code to check for sufficient stack stack upon subprogram entry.
-Mbounds Generate code to check array bounds
-Mbyteswapio (Fortran) Swap byte-order (big-endian to little-endian or vice
versa) during I/O of Fortran unformatted data.
-Mchkptr (Fortran) Check for NULL pointers.
-Mcray (Fortran) Enable Cray compatibility mode.
-Mcuda
-Mcuda=emu
(Fortran) Enable CUDA Fortran.

Enable emulation mode.

Run-time environmental variables

In addition to the standard OpenMP run-time environmental variables,
the following variables also affect run-time behavior of
PGI-compiled programs.

NCPUS (OpenMP) Specify the number of processes or threads used in parallel regions.
NCPUS_MAX (OpenMP) Specify the maximal number of processes or threads used in parallel regions.
MP_SPIN (OpenMP) Specify the number of times to check a semaphore before calling
sched_yield (on Linux or Mac OS X) or _sleep (on Windows).
MP_BIND (OpenMP) Set to y to use thread-CPU affinity (binding processes or threads to a physical core/processor).
MP_BLIST (OpenMP) If MP_BIND is set to y,
this variable specifically defines the thread-CPU relationship,
overriding the default values.
MPSTKZ (OpenMP) Specify the number of bytes (e.g. 2m, 4m)
allocated for each thread to use as the private
stack for the thread.
PGI_HUGE_PAGES Specify the number of huge pages (2 MB).

The purpose of huge pages is to
reduce TLB cache misses.

ACML_FAST_MALLOC Set to 1 to use optimized memory management for the BLAS function dgemm
in ACML.

This is a new feature introduced in ACML version 4.4.0.

ACML_FAST_MALLOC_CHUNK_SIZE
ACML_FAST_MALLOC_MAX_CHUNKS
These two parameters further fine tune the behavior of ACML_FAST_MALLOC.
By default the limit is set to 64 chunks of size 10,000,000 bytes.
ACML_FAST_MALLOC_DEBUG Set to any value to dislpay the debugging information of ACML_FAST_MALLOC.
NO_STOP_MESSAGE (Fortran) Set to any value to disable FORTRAN STOP message when STOP is called.
FORTRANOPT (Fortran) This controls Fortran I/O behavior. Its value is a comma-separated list options, which
can be:
  • vaxio: Use VAX I/O conventions.
  • crlf: Interpret DOS/Windows style \r\n (carriage return and line feed) as new line.
  • format_relaxed: An I/O item corresponding to a numerical edit
    descriptor (such as F, E, I) is not required to be a type implied by the descriptor.
PGI_TERM This controls the stack trace-back and just-in-time debugging. Its value
is a comma-separated list options, which can be:
  • debug: Invoke the debugger on error. By default it invokes pgdbg, but
    can be set to use other debuggers. See PGI_TERM_DEBUG environmental variable.
  • trade: Enable stack trace-back.
  • abort: Enable core dump when abort is called.

Each option can be disabled (which is default) by attaching no to it, e.g. noabort.

PGI_TERM_DEBUG This controls how the debugger is invoked. For example, it can be set to
gdb --quiet --pid %d

to use GDB instead.

PGI_STACK_USAGE
STAKSTAT
Set to any value to dislpay the stack usage when the program ends.

PGI C Compiler built-in macros

__cplusplus Is defined if C++ compiler is in use.
__FILE__
__BASE_FILE__
Name of the current input file (as a C string constant)

This is ANSI C standard macro.

__LINE__ Current input line number (as an integer constant)

This is ANSI C standard macro

__DATE__
__TIME__
Date & time on which the preprocessor is run. (as C string constants)

These are ANSI C standard macros.

__TIMESTAMP__ Last modification time of the input file (as a C string constant)
__STDC__
__STDC_VERSION__
Evaluate to 1 to mean the compiler is ISO standard conformant.

__STDC_VERSION__ evaluates to a C string constant
of the form of the form yyyymmL.

__STDC__ is an ANSI C standard macro.

__PGIC__
__PGIC_MINOR__
__PGIC_PATCHLEVEL__
Evaluate to integer constants representing the PGI
compiler version numbers (major/minor/patch level).
__PGI Defined for PGI compiler.
__x86_64__
__amd64__
Defined for x86_64.
__MMX__

__SSE__

__SSE2__

__SSE3__

__SSSE3__
Defined for processors that supports MMX/SSE/SSE2... instructions.

Original from: www.acsu.buffalo.edu/~charngda/pgi.html. Slightly horrified to discover the page was gone, so have saved it as it has saved me more than a few times.