I am Aditya Trivedi
, a contributor to the Google Summer of Code (GSoC) 2025, working on enhancing OpenMP support in LFortran, a LLVM based Fortran compiler. This blog post provides a summary of my progress during the first week, where I have focused on laying the groundwork for extending OpenMP features such as teams, tasks, and sections. My project aims to build upon LFortran’s existing OpenMP capabilities, positioning it as a robust tool for high-performance computing (HPC).
Objective
The primary goal of this project is to expand LFortran’s OpenMP support to include constructs beyond the existing parallel do, such as teams, tasks, sections, single, and SIMD, in alignment with the OpenMP 6.0 standard. This enhancement will enable LFortran to handle complex parallel workloads, making it competitive with established compilers like GFortran and Clang for HPC applications. During Week 1, I worked for 38 Hours and efforts were concentrated on analyzing the current design, identifying its limitations, proposing a new approach, and studying how other compilers implement OpenMP, thereby establishing a foundation for the upcoming implementation phase.
Current Design
LFortran currently supports the parallel do
construct with clauses such as private
, shared
, reduction
, and collapse
. The implementation, as detailed in Issue #3777, operates as follows:
Parsing:
Thevisit_Pragma
function inast_body_visitor.cpp
recognizes!$omp parallel do
and converts it to aDoConcurrentLoop
node in the Abstract Semantic Representation (ASR), capturing clauses and loop details.Backend:
The OpenMP pass (openmp.cpp
) outlines the loop body into a function, partitions iterations across threads usingomp_get_thread_num
andomp_get_num_threads
, and generatesGOMP_parallel
calls to thelibgomp
runtime.Example:
Aparallel do
loop is transformed into aDoConcurrentLoop
node, lowered to a function with thread partitioning, as described in Issue #3777, Comment #2104814180.
While this design is effective for loop-based parallelism, it presents limitations when attempting to support other OpenMP constructs.
Challenges in the Current Design
While the DoConcurrentLoop
approach is suitable for parallel do
, extending it to support new constructs such as teams
, tasks
, and sections
introduces several challenges:
Non-Loop Constructs:
Constructs likesections
(independent blocks) andtasks
(dynamic scheduling) do not fit the loop-centricDoConcurrentLoop
structure, necessitating complex workarounds.Clause Support:
New clauses (e.g.,num_teams
forteams
,depend
fortasks
) are difficult to integrate into the existing node’s clause arrays.Nesting:
Handling nested constructs (e.g.,parallel do
insideteams
) is challenging, asDoConcurrentLoop
assumes a single loop level.Scalability:
Adapting a loop-based node for diverse constructs risks creating a convoluted design, which could complicate maintenance as OpenMP continues to evolve.
These limitations, which are further discussed in Issue #7332, necessitated the exploration of alternative designs to better accommodate a wider range of OpenMP constructs.
Proposed Design: OMPRegion
ASR Node
To address the identified challenges, I propose the introduction of a new OMPRegion
ASR node, designed to handle all OpenMP constructs in a flexible manner. The proposed node structure is outlined below:
View OMPRegion
Node Structure
stmt
= ...
| OMPRegion(omp_region_type region, omp_clause* clauses, stmt* body)
omp_region_type
= Parallel | Do | ParallelDo | Sections | Single | Task | Simd | Teams | Target | TargetData
omp_clause
= OMPPrivate(expr* vars) | OMPShared(expr* vars) | OMPReduction(reduction_op operator, expr* vars) |
reduction_op
= ReduceAdd | ReduceSub | ReduceMul | ReduceMIN | ReduceMAX
schedule_type
= Static | Dynamic | Guided | Auto | Runtime
...
The benefits of this approach include:
Flexibility:
The node supports both loop-based constructs (e.g.,parallel do
) and non-loop constructs (e.g.,sections
,tasks
) naturally.Extensibility:
It facilitates the addition of new constructs and clauses by extending enums, ensuring alignment with the OpenMP 6.0 specification.Nesting:
Nested directives are managed effectively through recursiveOMPRegion
nodes, making it suitable for complex scenarios such asteams
containing aparallel do
.Standards Alignment:
The design mirrors GFortran’s tree nodes (e.g.,OMP_SECTIONS
,OMP_TASK
) and Clang’s AST classes (e.g.,OMPSectionsDirective
,OMPTaskDirective
), simplifying integration with thelibgomp
runtime.
This proposed design, along with prototype minimal reproducible examples (MREs), has been detailed in Issue #7332, demonstrating its feasibility for implementation.
Exploration of Clang and GFortran’s OpenMP Handling
To inform the design of the OMPRegion
node, I conducted an analysis of how Clang
and GFortran
process OpenMP constructs, focusing on teams
, tasks
, and sections
:
GFortran:
Frontend:
Directives are parsed into specific tree nodes (e.g.,OMP_TEAMS
,OMP_SECTIONS
) withOMP_CLAUSE
nodes to represent associated clauses.Backend:
These nodes are lowered toGIMPLE
, where the directive bodies are outlined into functions, andlibgomp
calls (e.g.,GOMP_teams
,GOMP_sections_start
) are generated.Example:
Asections
directive is transformed into anOMP_SECTIONS
node, which is then lowered to a switch statement withGOMP_sections_start
, as documented in Issue #7332.
Clang:
Frontend:
OpenMP directives are represented as AST classes (e.g.,OMPTeamsDirective
,OMPTaskDirective
), with separate objects for clauses.Backend:
The AST is lowered toLLVM IR
, generatinglibomp
calls (e.g.,__kmpc_fork_teams
). LFortran, however, useslibgomp
due to issues with variadic functions inlibomp
.Example:
Atask
directive is represented as anOMPTaskDirective
, which is lowered to__kmpc_omp_task
calls.
This analysis, documented in Issue #7332, highlights the advantages of using specific nodes for each construct, as it enhances type safety and modularity. These findings support the adoption of the OMPRegion
approach for LFortran.
Issues Opened
To track progress and facilitate collaboration, I have opened the following issues, each accompanied by minimal reproducible examples (MREs) in C and Fortran, both with and without pragmas, as well as GOMP-based implementations:
- [OPENMP] TEAM Construct #7363: Proposes support for the teams construct with clauses such as num_teams and thread_limit.
- [OPENMP] TASK Construct #7365: Focuses on implementing the task construct to enable dynamic scheduling.
- [OPENMP] SECTIONS Construct #7366: Addresses the sections construct for concurrent execution of independent code blocks.
- OpenMP Support Design Discussion #7332: Serves as the central hub for design discussions, MREs, and analysis of Clang and GFortran’s OpenMP implementations.
Example: Task Construct Representations
To illustrate the application of the proposed OMPRegion node, this section presents an example of the task construct as detailed in Issue #7366. The example is provided in multiple forms: Fortran code with OpenMP pragmas, Fortran code using GOMP runtime calls, the corresponding Clang AST representation, and the proposed LFortran ASR design.
Fortran Code with Pragmas
View Fortran Code with Pragmas
program parallel_processing
use omp_lib
implicit none
integer, parameter :: N = 10
integer :: i
!$omp parallel
!$omp single
do i = 1, N
!$omp task
call process_item(i)
!$omp end task
end do
!$omp end single
!$omp end parallel
contains
subroutine process_item(i)
integer, intent(in) :: i
integer :: thread_num
thread_num = omp_get_thread_num()
print *, "Processing item ", i, " on thread ", thread_num
end subroutine process_item
end program parallel_processing
Fortran Code without Pragmas (Using GOMP Calls)
View Fortran Code with GOMP Calls
module thread_data_module_tasks
use, intrinsic :: iso_c_binding
implicit none
type, bind(C) :: thread_data
integer(c_int) :: i
end type thread_data
integer(c_long), parameter :: THREAD_DATA_SIZE = 4 ! Size of thread_data (bytes)
integer(c_long), parameter :: THREAD_DATA_ALIGN = 4 ! Alignment of thread_data (bytes)
end module thread_data_module_tasks
module omp_lib
use iso_c_binding
implicit none
interface
subroutine GOMP_parallel(fn, data, num_threads, flags) bind(C, name="GOMP_parallel")
import :: c_funptr, c_ptr, c_int
type(c_funptr), value :: fn
type(c_ptr), value :: data
integer(c_int), value :: num_threads
integer(c_int), value :: flags
end subroutine
subroutine GOMP_task(fn, data, cpyfn, arg_size, arg_align, if_clause, flags, depend) &
bind(C, name="GOMP_task")
use, intrinsic :: iso_c_binding
type(c_ptr), value :: fn, data, cpyfn, depend
integer(c_long), value :: arg_size, arg_align
logical(c_bool), value :: if_clause
integer(c_int), value :: flags
end subroutine
function omp_get_thread_num() bind(c, name="omp_get_thread_num")
import :: c_int
integer(c_int) :: omp_get_thread_num
end function
end interface
end module omp_lib
subroutine process_item(i)
use omp_lib
implicit none
integer, intent(in) :: i
print *, "Processing item ", i, " on thread ", omp_get_thread_num()
end subroutine process_item
subroutine task_fn(data) bind(C)
use thread_data_module_tasks
implicit none
type(c_ptr), value :: data
type(thread_data), pointer :: d
call c_f_pointer(data, d)
call process_item(d%i)
end subroutine task_fn
subroutine parallel_region(data) bind(C)
use thread_data_module_tasks
use omp_lib
implicit none
type(c_ptr), value :: data
integer(c_int), pointer :: n
integer :: i
type(thread_data), target :: task_data
type(c_ptr) :: task_ptr
interface
subroutine task_fn(data) bind(C)
use thread_data_module_tasks
type(c_ptr), value :: data
end subroutine task_fn
end interface
call c_f_pointer(data, n)
if (omp_get_thread_num() == 0) then
do i = 1, n
task_data%i = i
task_ptr = c_loc(task_data)
call GOMP_task(c_funloc(task_fn), task_ptr, c_null_ptr, THREAD_DATA_SIZE, &
THREAD_DATA_ALIGN, .true._c_bool, 0, c_null_ptr)
end do
end if
end subroutine parallel_region
program main
use thread_data_module_tasks
use omp_lib
use, intrinsic :: iso_c_binding
implicit none
integer, target :: n = 10
type(c_ptr) :: ptr
interface
subroutine parallel_region(data) bind(C)
use thread_data_module_tasks
type(c_ptr), value :: data
end subroutine parallel_region
end interface
ptr = c_loc(n)
call GOMP_parallel(c_funloc(parallel_region), ptr, 0, 0)
end program main
Clang AST Representation
View Clang AST Representation
|-OMPParallelDirective 0x5dcc678e58f0 <line:12:5, col:25>
| `-CapturedStmt 0x5dcc678e5870 <line:13:5, line:23:5>
| |-CapturedDecl 0x5dcc678e44b8 <<invalid sloc>> <invalid sloc> nothrow
| | |-CompoundStmt 0x5dcc678e57d0 <line:13:5, line:23:5>
| | | `-OMPSingleDirective 0x5dcc678e5798 <line:14:9, col:27>
| | | `-CapturedStmt 0x5dcc678e5738 <line:15:9, line:22:9>
| | | |-CapturedDecl 0x5dcc678e4ae8 <<invalid sloc>> <invalid sloc>
| | | | |-CompoundStmt 0x5dcc678e5698 <line:15:9, line:22:9>
| | | | | `-ForStmt 0x5dcc678e5660 <line:16:13, line:21:13>
| | | | | |-DeclStmt 0x5dcc678e4c88 <line:16:18, col:27>
| | | | | | `-VarDecl 0x5dcc678e4c00 <col:18, col:26> col:22 used i 'int' cinit
| | | | | `-CompoundStmt 0x5dcc678e5648 <col:42, line:21:13>
| | | | | `-OMPTaskDirective 0x5dcc678e5600 <line:17:17, col:33>
| | | | | |-OMPFirstprivateClause 0x5dcc678e55c0 <<invalid sloc>> <implicit>
| | | | | `-CapturedStmt 0x5dcc678e53e0 <line:18:17, line:20:17>
| | | | | `-CapturedDecl 0x5dcc678e4f78 <<invalid sloc>> <invalid sloc> nothrow
| | | | | |-CompoundStmt 0x5dcc678e53c8 <line:18:17, line:20:17>
| | | | | | `-CallExpr 0x5dcc678e5388 <line:19:21, col:35> 'void'
| | | | | | |-ImplicitCastExpr 0x5dcc678e5370 <col:21> 'void (*)(int)' <FunctionToPointerDecay>
| | | | | | | `-DeclRefExpr 0x5dcc678e5300 <col:21> 'void (int)' Function 0x5dcc678e3f08 'process_item' 'void (int)'
| | | | | | `-ImplicitCastExpr 0x5dcc678e53b0 <col:34> 'int' <LValueToRValue>
Proposed LFortran ASR Design
View Proposed LFortran ASR Design
OMPRegion(
region = Parallel,
clauses = [],
body = [
OMPRegion(
region = Single,
clauses = [],
body = [
DoLoop(
head = [{v = "i", start = IntegerConstant(1), end = IntegerConstant(10)}],
body = [
OMPRegion(
region = Task,
clauses = [],
body = [Call(symbol="process_item")]
)
]
)
]
)
]
)
Next Steps
In Week 2, I plan to focus on the following tasks:
- Represent the OMPRegion node in ASR for the sections construct (Issue #7366) with good and proper design of implementation in AST visitor such that it can be easily extended to represent other constructs and clauses in ASR easily.
- Validate the MREs against the outputs of GFortran and Clang, using flags such as
-fdump-tree-all
and-Xclang -ast-dump
to ensure correctness.
I would like to thank my mentors, Ondrej Certik, Pranav Goswami and Gaurav Dhingra for their critical reviews and guidance, which played an important role in improving the design of OMPRegion. I also thank the other contributors of LFortran for their support and help whenever needed.