Following Week 6’s enhancements to the TASK
and TASKWAIT
constructs, Week 7 focused on implementing the TASKLOOP
, TEAMS
, and DISTRIBUTE
constructs. Last week, I planned to work on teams
and optimize TASK
support, but I expanded the scope to include these additional constructs via #7951. This week, I successfully compiled and ran five MREs, spending about 27 hours to ensure reduction
and shared
variables work smoothly, including parallel
regions within teams
.
Implementing TASKLOOP, TEAMS, and DISTRIBUTE Constructs
This week, I worked on adding support for the TASKLOOP
, TEAMS
, and DISTRIBUTE
constructs to LFortran. For TASKLOOP
, I noticed it’s very similar to the TASK
construct, with just a couple of extra clauses. I reused the existing TASK
implementation by transforming !$omp taskloop
into a do
loop with nested TASK
directives, keeping the approach simple and effective. For TEAMS
, I used a GOMP_teams_reg@GOMP_5.0
call with a function pointer for the body and handled shared
data, similar to parallel
regions. Since TEAMS
sits higher in the hierarchy, each team can spawn multiple threads with a parallel
construct. The DISTRIBUTE
construct was integrated to divide work across teams, ensuring proper work distribution.
Bug Fix and Improvements
During implementation, I found a small bug: the DO
construct’s worksharing logic divided iterations by omp_get_max_threads
instead of omp_get_num_threads
, which returns the current team’s thread count. This caused wrong distribution and incorrect results. I fixed this to use the correct function, ensuring accurate work sharing. I also confirmed that reduction
and shared
variables work properly across all constructs, including parallel
regions inside teams
. To set the thread_limit
for CI testing, I added the environment variable KMP_TEAMS_THREAD_LIMIT=32
in CMakeLists.txt
, applying it only to the LLVM-OMP backend for OpenMP tests.
Examples: TASKLOOP, TEAMS, and DISTRIBUTE Constructs
Below are the five MREs I compiled and ran successfully this week to test the new constructs and fixes via #7951
View MRE for TASKLOOP
(openmp_58.f90)
1program openmp_58
2 use omp_lib
3 implicit none
4 integer, parameter :: N = 5
5 integer :: A(N)
6 integer :: i, index,total
7
8 A = 1
9 total=0
10 index=1
11
12 !$omp parallel
13 !$omp single
14 !$omp taskloop shared(A)
15 do i = 1, N
16 total = total + A(index) * 2
17 index=index+1
18 end do
19 !$omp end taskloop
20 !$omp end single
21 !$omp taskwait
22 !$omp end parallel
23
24 print *, "Total = ", total, index
25 if(total/=10) error stop
26end program openmp_58
View MRE for TEAMS
(openmp_59.f90)
1program openmp_59
2 use omp_lib
3 integer :: sum=0
4 !$omp teams num_teams(3) reduction(+:sum)
5 print*, omp_get_team_num()
6 sum=sum+omp_get_team_num()
7 !$omp end teams
8 if(sum/=3) error stop
9end program openmp_59
View MRE for TEAMS
with PARALLEL
(openmp_60.f90)
1program openmp_60
2 use omp_lib
3 implicit none
4integer :: sum , team_sums(4) = 0, local_sum=0
5sum=0
6!$omp teams num_teams(4) thread_limit(3) shared(team_sums) private(local_sum) reduction(+:sum)
7!$omp parallel shared(team_sums) private(local_sum) reduction(+:sum)
8 local_sum = omp_get_thread_num() * 10 + omp_get_team_num()
9 sum = sum + local_sum
10 !$omp critical
11 team_sums(omp_get_team_num() + 1) = team_sums(omp_get_team_num() + 1) + local_sum
12 !$omp end critical
13 !$omp end parallel
14!$omp end teams
15 print*, team_sums
16 print*,sum
17 if(sum/=138) error stop
18 if(team_sums(1) /= 30) error stop
19 if(team_sums(2) /= 33) error stop
20 if(team_sums(3) /= 36) error stop
21 if(team_sums(4) /= 39) error stop
22end program openmp_60
View MRE for DISTRIBUTE
(openmp_61.f90)
1program openmp_61
2 use omp_lib
3 implicit none
4integer :: array(1005), i,sum=0
5!$omp teams num_teams(4)
6!$omp distribute
7do i = 1, 1000
8 array(i) = i * 2
9end do
10!$omp end distribute
11!$omp end teams
12
13! Sum of all elements
14!$omp parallel do reduction(+:sum)
15do i=1,1000
16sum=sum+array(i)
17end do
18!$omp end parallel do
19
20print *,sum
21if(sum/=1001000) error stop
22end program openmp_61
View MRE for DISTRIBUTE
with Nested PARALLEL DO
(openmp_62.f90)
1program openmp_62
2 use omp_lib
3 implicit none
4integer :: array(1000), i, j, sum=0
5array(1)=3
6!$omp teams num_teams(2) thread_limit(5)
7!$omp distribute
8do i = 1, 1000, 100
9 print*,omp_get_num_threads(), omp_get_max_threads()
10 !$omp parallel do
11 do j = i, min(i+99, 1000)
12 array(j) = j * 3
13 end do
14 !$omp end parallel do
15end do
16!$omp end distribute
17!$omp end teams
18
19! Sum of all elements
20!$omp parallel do reduction(+:sum)
21do i=1,1000
22sum=sum+array(i)
23end do
24!$omp end parallel do
25
26print*, sum
27if(sum/=1501500) error stop
28end program openmp_62
Next Steps
For Week 8, I plan to:
- Implement the
SIMD
construct using theOMPRegion
node (Issue #7332). - Fix more bugs and implement other clauses if possible.
I thank my mentors, Ondrej Certik, Pranav Goswami, and Gaurav Dhingra, for their guidance and support. I also appreciate the LFortran community’s support throughout this process.