The auto-parallelization feature of the Intel® Fortran Compiler implements a high-level symmetric multiprocessing (SMP) programming model that provides you with an easy way to exploit the parallelism of SMP systems.
Automatic parallelization relieves the user from having to deal with the low-level details of iteration modification, data partitioning, thread scheduling and synchronizations, while exploiting the performance potential available from multiprocessor systems.
To enable the auto-parallelizer, use the -parallel option. The -parallel option detects parallel loops capable of being executed safely in parallel and automatically generates multithreaded code for these loops. An example of the command using auto-parallelization is as follows:
IA-32 compilations:
prompt>ifc -c -parallel -par_threshold0 myprog.f
Itanium-based compilations:
prompt>efc -c -parallel -par_threshold0 myprog.f
Option |
Description |
Default |
|
OMP_NUM_THREADS |
Controls the number of threads used. |
Number of processors currently installed in the system |
|
OMP_SCHEDULE |
Specifies the type of runtime scheduling. |
static |
Enhance the power and effectiveness of the auto-parallelizer by following these coding guidelines:
Expose the trip count of loops whenever possible; specifically use constants where the trip count is known and save loop parameters in local variables.
Avoid placing structures inside loop bodies that the compiler may assume to carry dependent data, for example, procedure calls or global references.
Currently, compiler is analyzed only on loop nests, but potentially on independent regions of code (task parallelism). A loop is parallelizable if:
there is no loop-carried dependency or
any loop-carried dependencies can be resolved by some code transformation, for example: privatization of scalars or runtime dependency testing.
Privatization of scalars is an operation of re-assigning the storage of scalars from the static or parent stack area to the local stack of a thread to enable parallelization. This operation requires a WRITE permission and is usually performed to remove a data dependency between concurrently executing threads.
To prepare auto-parallelization, the compiler performs the following transformations:
Partitions data accesses: shared, private, first-private, last-private, reduction
Modifies loop parameters and references
Generates new entry/exit per threaded task
Generates both parallel and serial versions with conditional execution based on:
- work/overhead threshold analysis
- runtime dependency testing