Following the foundation laid in Week 1, where I proposed the OMPRegion ASR node to address the limitations of LFortran’s existing OpenMP support, Week 2 focused on implementing this design. In my previous blog post, I outlined a plan to represent the OMPRegion node in the Abstract Semantic Representation (ASR) for the sections construct and validate minimal reproducible examples (MREs) against GFortran and Clang outputs. This week, I successfully implemented the foundational structure for the OMPRegion node through PR #7449, dedicating approximately 25 hours to this task.

Choice of Sections Construct for Implementation

I chose to implement the sections construct first for several strategic reasons. The parallel do and parallel sections constructs represent fundamentally different paradigms: the former is loop-based, while the latter is non-loop-based, allowing independent code blocks to execute concurrently. The existing design in LFortran was heavily adapted to the parallel do construct, directly mapping it to a DoConcurrentLoop node in the ASR. Shifting parallel do to the new OMPRegion design would have required significant changes to the OpenMP pass, which requires longer PRs and possibly regression (which means the testcases which we compile now of parallel do may not be compiled till openmp pass gets matured enough to process OMPRegion), which is not my intention to do for now.

Implementation

The previous design in LFortran was tailored specifically for the parallel do construct, which limited its flexibility. When a parallel do directive was encountered, the visit_Pragma function assumed the immediate presence of a do loop, collecting its body through a visit_DoLoop and converting it into a DoConcurrentLoop node in the ASR. This approach was restrictive because OpenMP constructs like sections or task can contain arbitrary statements or nested directives, which do not fit the loop-centric model.

In my initial attempt to implement the OMPRegion node, I focused on a special case for the sections construct, aiming to represent its structure in the ASR. However, during discussions with my mentor Pranav Goswami in our weekly calls, we recognized that this approach lacked the generality needed to support a wider range of OpenMP constructs, especially those with nested structures.

Following this feedback, I revised the implementation to adopt a stack-based approach for collecting the body of an OMPRegion from transform_stmts visitor which can be used to collect any type of stmts.The stack-based approach ensures that the implementation is generic and extensible, capable of representing any OpenMP construct, including nested structures, in a robust manner. This design has been implemented in PR #7449, providing a solid foundation for further OpenMP enhancements in LFortran.

Minimal Reproducible Examples (MREs)

To validate the stack-based approach and the new OMPRegion implementation, I developed three MREs, focusing on the sections and task constructs. These examples demonstrate the ability of the new design to handle both simple and nested OpenMP directives, accurately representing them in the ASR.

MRE 1: Sections Construct with Reduction Clause

The first MRE illustrates a sections construct with a reduction clause, distributing independent tasks across multiple threads:

View Fortran Code for Sections Construct
!$omp parallel sections reduction(+:tid)
  !$omp section
  call compute_a()
  tid = tid + omp_get_thread_num()
  print *, "Thread ID:", tid

  !$omp section
  call compute_b()
  tid = tid + omp_get_thread_num()
  print *, "Thread ID:", tid

  !$omp section
  call compute_c()
  tid = tid + omp_get_thread_num()
  print *, "Thread ID:", tid    
  !$omp end parallel sections

The corresponding ASR representation captures the nested structure of the sections construct, including the reduction clause and individual section regions:

View ASR Representation for Sections Construct
[(OMPRegion
    Sections
    [(OMPReduction
        ReduceAdd
        [(Var 6 tid)]
    )]
    [(OMPRegion
        Section
        []
        [(SubroutineCall
            6 compute_a
            ()
            []
            ()
        )
        (Assignment
            (Var 6 tid)
            (IntegerBinOp
                (Var 6 tid)
                Add
                (FunctionCall
                    6 omp_get_thread_num
                    ()
                    []
                    (Integer 4)
                    ()
                    ()
                )
                (Integer 4)
                ()
            )
            ()
            .false.
        )
        (Print
            (StringFormat
                ()
                [(StringConstant
                    "Thread ID:"
                    (String 1 (IntegerConstant 10 (Integer 4) Decimal) ExpressionLength PointerString)
                )
                (Var 6 tid)]
                FormatFortran
                (String 1 () ExpressionLength CString)
                ()
            )
        )]
    )
    (OMPRegion
        Section
        []
        [(SubroutineCall
            6 compute_b
            ()
            []
            ()
        )
        (Assignment
            (Var 6 tid)
            (IntegerBinOp
                (Var 6 tid)
                Add
                (FunctionCall
                    6 omp_get_thread_num
                    ()
                    []
                    (Integer 4)
                    ()
                    ()
                )
                (Integer 4)
                ()
            )
            ()
            .false.
        )
        (Print
            (StringFormat
                ()
                [(StringConstant
                    "Thread ID:"
                    (String 1 (IntegerConstant 10 (Integer 4) Decimal) ExpressionLength PointerString)
                )
                (Var 6 tid)]
                FormatFortran
                (String 1 () ExpressionLength CString)
                ()
            )
        )]
    )
    (OMPRegion
        Section
        []
        [(SubroutineCall
            6 compute_c
            ()
            []
            ()
        )
        (Assignment
            (Var 6 tid)
            (IntegerBinOp
                (Var 6 tid)
                Add
                (FunctionCall
                    6 omp_get_thread_num
                    ()
                    []
                    (Integer 4)
                    ()
                    ()
                )
                (Integer 4)
                ()
            )
            ()
            .false.
        )
        (Print
            (StringFormat
                ()
                [(StringConstant
                    "Thread ID:"
                    (String 1 (IntegerConstant 10 (Integer 4) Decimal) ExpressionLength PointerString)
                )
                (Var 6 tid)]
                FormatFortran
                (String 1 () ExpressionLength CString)
                ()
            )
        )]
    )]
)]

MRE 2: Task Construct with Nested Sections

The second MRE demonstrates a more complex scenario involving a task construct nested within a sections directive, using firstprivate and shared clauses to manage variable scoping:

View Fortran Code for Task Construct
  !$OMP PARALLEL SECTIONS SHARED(array)
    !$OMP SECTION
    do i = 1, n
      !$OMP TASK FIRSTPRIVATE(i) SHARED(array)
        array(i) = array(i) * real(i)
        print *, "Task: i = ", i, ", computed by thread ", omp_get_thread_num()
      !$OMP END TASK
    end do
    !$OMP SECTION
    print*, "All tasks submitted. Waiting for completion."
  !$OMP END PARALLEL SECTIONS

The ASR representation below showcases the nested structure, with the task construct properly embedded within a section region:

View ASR Representation for Task Construct
[(OMPRegion
    Sections
    [(OMPShared
        [(Var 2 array)]
    )]
    [(OMPRegion
        Section
        []
        [(DoLoop
            ()
            ((Var 2 i)
            (IntegerConstant 1 (Integer 4) Decimal)
            (Var 2 n)
            ())
            [(OMPRegion
                Task
                [(OMPFirstPrivate
                    [(Var 2 i)]
                )
                (OMPShared
                    [(Var 2 array)]
                )]
                [(Assignment
                    (ArrayItem
                        (Var 2 array)
                        [(()
                        (Var 2 i)
                        ())]
                        (Real 4)
                        ColMajor
                        ()
                    )
                    (RealBinOp
                        (ArrayItem
                            (Var 2 array)
                            [(()
                            (Var 2 i)
                            ())]
                            (Real 4)
                            ColMajor
                            ()
                        )
                        Mul
                        (IntrinsicElementalFunction
                            Real
                            [(Var 2 i)]
                            0
                            (Real 4)
                            ()
                        )
                        (Real 4)
                        ()
                    )
                    ()
                    .false.
                )
                (Print
                    (StringFormat
                        ()
                        [(StringConstant
                            "Task: i = "
                            (String 1 (IntegerConstant 10 (Integer 4) Decimal) ExpressionLength PointerString)
                        )
                        (Var 2 i)
                        (StringConstant
                            ", computed by thread "
                            (String 1 (IntegerConstant 21 (Integer 4) Decimal) ExpressionLength PointerString)
                        )
                        (FunctionCall
                            2 omp_get_thread_num
                            ()
                            []
                            (Integer 4)
                            ()
                            ()
                        )]
                        FormatFortran
                        (String 1 () ExpressionLength CString)
                        ()
                    )
                )]
            )]
            []
        )]
    )
    (OMPRegion
        Section
        []
        [(Print
            (StringConstant
                "All tasks submitted. Waiting for completion."
                (String 1 (IntegerConstant 44 (Integer 4) Decimal) ExpressionLength PointerString)
            )
        )]
    )]
)]

The third MRE, also focusing on the sections construct, follows a similar structure and is included in PR #7449. These MREs validate the stack-based approach’s ability to handle nested OpenMP directives, ensuring that LFortran can accurately represent complex parallel constructs in the ASR.

Next Steps

In Week 3, I plan to focus on the following tasks:

  • Extend the OpenMP pass to visit the OMPRegion node and support the sections construct, lowering it to appropriate runtime calls.

I would like to thank my mentors, Ondrej Certik, Pranav Goswami, and Gaurav Dhingra, for their critical reviews and guidance, which played an important role in shaping the stack-based approach. I also thank the other contributors of LFortran for their support and help whenever needed.