Continuing from Week 3, where I shifted the parallel do
logic to the OMPRegion
node and extended the OpenMP pass, Week 4 focused on implementing the sections
construct and adding support for the single
and master
constructs. In my previous blog, I planned to implement sections
by lowering it to GOMP_sections_start
and GOMP_sections_end
calls. This week, I completed this task through PR #7619 and extended support for single
and master
in PR #7638, spending around 19 hours on these updates.
Choice of Implementing Sections Construct
I started Week 4 aiming to add the sections
construct, which lets different threads run separate code blocks at the same time, unlike the loop-based parallel do
. But then I thought it made more sense to build on the parallel do
work from last week first. Since sections
needs the parallel
construct anyway, tackling it early helped set a strong base. Here’s why this choice mattered:
- Solid Foundation: Shifting
parallel do
toOMPRegion
first made it easier to addsections
later. - Kept Things Working: I ensured all current tests still ran smoothly, avoiding any setbacks.
- Saved Time: Handling
parallel
early meant less rework when adding other constructs.
Implementation Details and Results
In PR #7619, I implemented the sections
construct in the OpenMP pass, addressing Issue #7366 and contributing to Issue #7332. I added test cases openmp_44.f90
and openmp_45.f90
to verify the implementation. Additionally, I fixed a bug in the reduction
clause for standalone parallel
constructs (reported in Issue #7618) by maintaining a map of clauses based on nesting levels, ensuring proper hierarchical application across parallel
, parallel do
, and standalone do
constructs.
In PR #7638, I implemented the single
and master
constructs using a similar approach. For single
, I assigned execution to the thread with ID 0, aligning it with the master
construct for consistency. This choice was supported by the OpenMP 6.0 specification (page 405), which notes that the thread executing a single
block can be implementation-dependent. If needed, this can be refined in future updates based on further feedback or requirements.
The results are promising. Both PRs ensure no regression in existing OpenMP test cases, with updated ASR representations handled seamlessly by the extended OpenMP pass. The sections
construct now supports concurrent execution of independent code blocks, while the bug fix enables correct reduction
behavior. The single
and master
constructs are now supported, with threadID 0 executing the designated blocks, providing a good step toward implementing tasks
construct.
Example: Sections and Reduction Constructs
To demonstrate the new implementations, consider the following MREs from PR #7619. The first example tests the bug fix for the reduction
clause in a standalone parallel
construct:
View MRE for Reduction
Bug Fix
1program openmp_47
2 use omp_lib
3 implicit none
4 real::res
5 res=1
6 call omp_set_num_threads(16)
7 !$omp parallel reduction(*:res)
8 res=res*1.5
9 !$omp end parallel
10 if(res /= 1.5**16) error stop
11 print *, res
12end program openmp_47
The OpenMP pass lowers this to the following Fortran code using GOMP calls, ensuring the reduction(*:res)
operates correctly across threads:
View Lowered Fortran Code for Reduction
Bug Fix
1module thread_data_module_openmp_47
2 use, intrinsic :: iso_c_binding
3 implicit none
4 type, bind(C) :: thread_data
5 real(c_float) :: res
6 end type thread_data
7end module thread_data_module_openmp_47
8
9interface
10 subroutine GOMP_parallel(fn, data, num_threads, flags) bind(C, name="GOMP_parallel")
11 use, intrinsic :: iso_c_binding
12 type(c_ptr), value :: fn, data
13 integer(c_int), value :: num_threads, flags
14 end subroutine
15 subroutine GOMP_atomic_start() bind(C, name="gomp_atomic_start")
16 end subroutine
17 subroutine GOMP_atomic_end() bind(C, name="gomp_atomic_end")
18 end subroutine
19end interface
20
21subroutine parallel_region(data) bind(C)
22 use omp_lib
23 use thread_data_module_openmp_47
24 implicit none
25 type(c_ptr), value :: data
26 type(thread_data), pointer :: d
27 real :: res_local
28
29 call c_f_pointer(data, d)
30 res_local = 1.5 ! Local computation for reduction
31
32 ! Perform atomic update for reduction
33 call GOMP_atomic_start()
34 d%res = d%res * res_local
35 call GOMP_atomic_end()
36end subroutine
37
38program openmp_47
39 use omp_lib
40 use thread_data_module_openmp_47
41 implicit none
42 real :: res
43 type(thread_data), target :: data
44 type(c_ptr) :: ptr
45
46 res = 1
47 call omp_set_num_threads(16)
48 data%res = 1
49 ptr = c_loc(data)
50
51 call GOMP_parallel(c_funloc(parallel_region), ptr, 0, 0)
52 res = data%res
53
54 if (res /= 1.5**16) error stop
55 print *, res
56end program openmp_47
The second example showcases the sections
construct with a reduction(+:tid)
clause:
View MRE for Sections
Construct
1module openmp_44_parallel_sections
2 implicit none
3
4contains
5
6 subroutine compute_a()
7 print *, "Computing A"
8 end subroutine compute_a
9
10 subroutine compute_b()
11 print *, "Computing B"
12 end subroutine compute_b
13
14 subroutine compute_c()
15 print *, "Computing C"
16 end subroutine compute_c
17
18end module openmp_44_parallel_sections
19
20program openmp_44
21 use omp_lib
22 use openmp_44_parallel_sections
23 implicit none
24 integer :: tid=0
25
26 !$omp parallel sections reduction(+:tid)
27 !$omp section
28 call compute_a()
29 tid = tid + omp_get_thread_num()
30 print *, "Thread ID:", tid
31
32 !$omp section
33 call compute_b()
34 tid = tid + omp_get_thread_num()
35 print *, "Thread ID:", tid
36
37 !$omp section
38 call compute_c()
39 tid = tid + omp_get_thread_num()
40 print *, "Thread ID:", tid
41 !$omp end parallel sections
42 print *, "Final Thread ID:", tid
43
44end program openmp_44
The OpenMP pass lowers this to the following Fortran code using GOMP calls, distributing the sections across threads:
View Lowered Fortran Code for Sections
Construct
1module openmp_44_parallel_sections
2 implicit none
3contains
4 subroutine compute_a()
5 print *, "Computing A"
6 end subroutine compute_a
7 subroutine compute_b()
8 print *, "Computing B"
9 end subroutine compute_b
10 subroutine compute_c()
11 print *, "Computing C"
12 end subroutine compute_c
13end module openmp_44_parallel_sections
14
15module thread_data_module_openmp_44
16 use, intrinsic :: iso_c_binding
17 implicit none
18 type, bind(C) :: thread_data
19 integer(c_int) :: tid
20 end type thread_data
21end module thread_data_module_openmp_44
22
23interface
24 subroutine GOMP_parallel(fn, data, num_threads, flags) bind(C, name="GOMP_parallel")
25 use, intrinsic :: iso_c_binding
26 type(c_ptr), value :: fn, data
27 integer(c_int), value :: num_threads, flags
28 end subroutine
29 integer(c_int) function GOMP_sections_start(count) bind(C, name="GOMP_sections_start")
30 use, intrinsic :: iso_c_binding
31 integer(c_int), value :: count
32 end function
33 integer(c_int) function GOMP_sections_next() bind(C, name="GOMP_sections_next")
34 use, intrinsic :: iso_c_binding
35 end function
36 subroutine GOMP_sections_end() bind(C, name="GOMP_sections_end")
37 end subroutine
38 subroutine GOMP_atomic_start() bind(C, name="gomp_atomic_start")
39 end subroutine
40 subroutine GOMP_atomic_end() bind(C, name="gomp_atomic_end")
41 end subroutine
42end interface
43
44subroutine parallel_sections(data) bind(C)
45 use omp_lib
46 use thread_data_module_openmp_44
47 use openmp_44_parallel_sections
48 implicit none
49 type(c_ptr), value :: data
50 type(thread_data), pointer :: d
51 integer(c_int) :: section_id
52 integer :: tid_local
53
54 call c_f_pointer(data, d)
55 tid_local = 0 ! Initialize local reduction variable
56
57 section_id = GOMP_sections_start(3)
58 do while (section_id /= 0)
59 if (section_id == 1) then
60 call compute_a()
61 tid_local = tid_local + omp_get_thread_num()
62 print *, "Thread ID:", tid_local
63 else if (section_id == 2) then
64 call compute_b()
65 tid_local = tid_local + omp_get_thread_num()
66 print *, "Thread ID:", tid_local
67 else if (section_id == 3) then
68 call compute_c()
69 tid_local = tid_local + omp_get_thread_num()
70 print *, "Thread ID:", tid_local
71 end if
72 section_id = GOMP_sections_next()
73 end do
74 call GOMP_sections_end()
75
76 ! Perform atomic update for reduction
77 call GOMP_atomic_start()
78 d%tid = d%tid + tid_local
79 call GOMP_atomic_end()
80end subroutine
81
82program openmp_44
83 use omp_lib
84 use thread_data_module_openmp_44
85 implicit none
86 integer :: tid = 0
87 type(thread_data), target :: data
88 type(c_ptr) :: ptr
89
90 data%tid = 0
91 ptr = c_loc(data)
92
93 call GOMP_parallel(c_funloc(parallel_sections), ptr, 0, 0)
94 tid = data%tid
95 print *, "Final Thread ID:", tid
96end program openmp_44
Next Steps
In Week 5, I plan to focus on the following tasks:
- Implement the
tasks
construct using theOMPRegion
node, lowering it toGOMP_task
calls (Issue #7365). - Fix some bugs related do the ASR gen of complicated nested Pragmas and updating of variables which are not in the reduction clause
I would like to thank my mentors, Ondrej Certik, Pranav Goswami, and Gaurav Dhingra, for their valuable guidance and reviews, which helped shape these implementations. I also thank the LFortran community for their ongoing support.