Continuing from Week 3, where I shifted the parallel do logic to the OMPRegion node and extended the OpenMP pass, Week 4 focused on implementing the sections construct and adding support for the single and master constructs. In my previous blog, I planned to implement sections by lowering it to GOMP_sections_start and GOMP_sections_end calls. This week, I completed this task through PR #7619 and extended support for single and master in PR #7638, spending around 19 hours on these updates.

Choice of Implementing Sections Construct

I started Week 4 aiming to add the sections construct, which lets different threads run separate code blocks at the same time, unlike the loop-based parallel do. But then I thought it made more sense to build on the parallel do work from last week first. Since sections needs the parallel construct anyway, tackling it early helped set a strong base. Here’s why this choice mattered:

  • Solid Foundation: Shifting parallel do to OMPRegion first made it easier to add sections later.
  • Kept Things Working: I ensured all current tests still ran smoothly, avoiding any setbacks.
  • Saved Time: Handling parallel early meant less rework when adding other constructs.

Implementation Details and Results

In PR #7619, I implemented the sections construct in the OpenMP pass, addressing Issue #7366 and contributing to Issue #7332. I added test cases openmp_44.f90 and openmp_45.f90 to verify the implementation. Additionally, I fixed a bug in the reduction clause for standalone parallel constructs (reported in Issue #7618) by maintaining a map of clauses based on nesting levels, ensuring proper hierarchical application across parallel, parallel do, and standalone do constructs.

In PR #7638, I implemented the single and master constructs using a similar approach. For single, I assigned execution to the thread with ID 0, aligning it with the master construct for consistency. This choice was supported by the OpenMP 6.0 specification (page 405), which notes that the thread executing a single block can be implementation-dependent. If needed, this can be refined in future updates based on further feedback or requirements.

The results are promising. Both PRs ensure no regression in existing OpenMP test cases, with updated ASR representations handled seamlessly by the extended OpenMP pass. The sections construct now supports concurrent execution of independent code blocks, while the bug fix enables correct reduction behavior. The single and master constructs are now supported, with threadID 0 executing the designated blocks, providing a good step toward implementing tasks construct.

Example: Sections and Reduction Constructs

To demonstrate the new implementations, consider the following MREs from PR #7619. The first example tests the bug fix for the reduction clause in a standalone parallel construct:

View MRE for Reduction Bug Fix
1program openmp_47
2    use omp_lib
3    implicit none
4    real::res
5    res=1
6    call omp_set_num_threads(16)
7    !$omp parallel reduction(*:res)
8        res=res*1.5
9    !$omp end parallel 
10    if(res /= 1.5**16) error stop
11    print *, res
12end program openmp_47

The OpenMP pass lowers this to the following Fortran code using GOMP calls, ensuring the reduction(*:res) operates correctly across threads:

View Lowered Fortran Code for Reduction Bug Fix
1module thread_data_module_openmp_47
2  use, intrinsic :: iso_c_binding
3  implicit none
4  type, bind(C) :: thread_data
5    real(c_float) :: res
6  end type thread_data
7end module thread_data_module_openmp_47
8
9interface
10  subroutine GOMP_parallel(fn, data, num_threads, flags) bind(C, name="GOMP_parallel")
11    use, intrinsic :: iso_c_binding
12    type(c_ptr), value :: fn, data
13    integer(c_int), value :: num_threads, flags
14  end subroutine
15  subroutine GOMP_atomic_start() bind(C, name="gomp_atomic_start")
16  end subroutine
17  subroutine GOMP_atomic_end() bind(C, name="gomp_atomic_end")
18  end subroutine
19end interface
20
21subroutine parallel_region(data) bind(C)
22  use omp_lib
23  use thread_data_module_openmp_47
24  implicit none
25  type(c_ptr), value :: data
26  type(thread_data), pointer :: d
27  real :: res_local
28
29  call c_f_pointer(data, d)
30  res_local = 1.5 ! Local computation for reduction
31
32  ! Perform atomic update for reduction
33  call GOMP_atomic_start()
34  d%res = d%res * res_local
35  call GOMP_atomic_end()
36end subroutine
37
38program openmp_47
39  use omp_lib
40  use thread_data_module_openmp_47
41  implicit none
42  real :: res
43  type(thread_data), target :: data
44  type(c_ptr) :: ptr
45
46  res = 1
47  call omp_set_num_threads(16)
48  data%res = 1
49  ptr = c_loc(data)
50
51  call GOMP_parallel(c_funloc(parallel_region), ptr, 0, 0)
52  res = data%res
53
54  if (res /= 1.5**16) error stop
55  print *, res
56end program openmp_47

The second example showcases the sections construct with a reduction(+:tid) clause:

View MRE for Sections Construct
1module openmp_44_parallel_sections
2  implicit none
3
4contains
5
6  subroutine compute_a()
7    print *, "Computing A"
8  end subroutine compute_a
9
10  subroutine compute_b()
11    print *, "Computing B"
12  end subroutine compute_b
13
14  subroutine compute_c()
15    print *, "Computing C"
16  end subroutine compute_c
17
18end module openmp_44_parallel_sections
19
20program openmp_44
21  use omp_lib
22  use openmp_44_parallel_sections
23  implicit none
24  integer :: tid=0
25
26  !$omp parallel sections reduction(+:tid)
27  !$omp section
28  call compute_a()
29  tid = tid + omp_get_thread_num()
30  print *, "Thread ID:", tid
31
32  !$omp section
33  call compute_b()
34  tid = tid + omp_get_thread_num()
35  print *, "Thread ID:", tid
36
37  !$omp section
38  call compute_c()
39  tid = tid + omp_get_thread_num()
40  print *, "Thread ID:", tid    
41  !$omp end parallel sections
42  print *, "Final Thread ID:", tid
43
44end program openmp_44

The OpenMP pass lowers this to the following Fortran code using GOMP calls, distributing the sections across threads:

View Lowered Fortran Code for Sections Construct
1module openmp_44_parallel_sections
2  implicit none
3contains
4  subroutine compute_a()
5    print *, "Computing A"
6  end subroutine compute_a
7  subroutine compute_b()
8    print *, "Computing B"
9  end subroutine compute_b
10  subroutine compute_c()
11    print *, "Computing C"
12  end subroutine compute_c
13end module openmp_44_parallel_sections
14
15module thread_data_module_openmp_44
16  use, intrinsic :: iso_c_binding
17  implicit none
18  type, bind(C) :: thread_data
19    integer(c_int) :: tid
20  end type thread_data
21end module thread_data_module_openmp_44
22
23interface
24  subroutine GOMP_parallel(fn, data, num_threads, flags) bind(C, name="GOMP_parallel")
25    use, intrinsic :: iso_c_binding
26    type(c_ptr), value :: fn, data
27    integer(c_int), value :: num_threads, flags
28  end subroutine
29  integer(c_int) function GOMP_sections_start(count) bind(C, name="GOMP_sections_start")
30    use, intrinsic :: iso_c_binding
31    integer(c_int), value :: count
32  end function
33  integer(c_int) function GOMP_sections_next() bind(C, name="GOMP_sections_next")
34    use, intrinsic :: iso_c_binding
35  end function
36  subroutine GOMP_sections_end() bind(C, name="GOMP_sections_end")
37  end subroutine
38  subroutine GOMP_atomic_start() bind(C, name="gomp_atomic_start")
39  end subroutine
40  subroutine GOMP_atomic_end() bind(C, name="gomp_atomic_end")
41  end subroutine
42end interface
43
44subroutine parallel_sections(data) bind(C)
45  use omp_lib
46  use thread_data_module_openmp_44
47  use openmp_44_parallel_sections
48  implicit none
49  type(c_ptr), value :: data
50  type(thread_data), pointer :: d
51  integer(c_int) :: section_id
52  integer :: tid_local
53
54  call c_f_pointer(data, d)
55  tid_local = 0 ! Initialize local reduction variable
56
57  section_id = GOMP_sections_start(3)
58  do while (section_id /= 0)
59    if (section_id == 1) then
60      call compute_a()
61      tid_local = tid_local + omp_get_thread_num()
62      print *, "Thread ID:", tid_local
63    else if (section_id == 2) then
64      call compute_b()
65      tid_local = tid_local + omp_get_thread_num()
66      print *, "Thread ID:", tid_local
67    else if (section_id == 3) then
68      call compute_c()
69      tid_local = tid_local + omp_get_thread_num()
70      print *, "Thread ID:", tid_local
71    end if
72    section_id = GOMP_sections_next()
73  end do
74  call GOMP_sections_end()
75
76  ! Perform atomic update for reduction
77  call GOMP_atomic_start()
78  d%tid = d%tid + tid_local
79  call GOMP_atomic_end()
80end subroutine
81
82program openmp_44
83  use omp_lib
84  use thread_data_module_openmp_44
85  implicit none
86  integer :: tid = 0
87  type(thread_data), target :: data
88  type(c_ptr) :: ptr
89
90  data%tid = 0
91  ptr = c_loc(data)
92
93  call GOMP_parallel(c_funloc(parallel_sections), ptr, 0, 0)
94  tid = data%tid
95  print *, "Final Thread ID:", tid
96end program openmp_44

Next Steps

In Week 5, I plan to focus on the following tasks:

  • Implement the tasks construct using the OMPRegion node, lowering it to GOMP_task calls (Issue #7365).
  • Fix some bugs related do the ASR gen of complicated nested Pragmas and updating of variables which are not in the reduction clause

I would like to thank my mentors, Ondrej Certik, Pranav Goswami, and Gaurav Dhingra, for their valuable guidance and reviews, which helped shape these implementations. I also thank the LFortran community for their ongoing support.