Following Week 6’s enhancements to the TASK and TASKWAIT constructs, Week 7 focused on implementing the TASKLOOP, TEAMS, and DISTRIBUTE constructs. Last week, I planned to work on teams and optimize TASK support, but I expanded the scope to include these additional constructs via #7951. This week, I successfully compiled and ran five MREs, spending about 27 hours to ensure reduction and shared variables work smoothly, including parallel regions within teams.

Implementing TASKLOOP, TEAMS, and DISTRIBUTE Constructs

This week, I worked on adding support for the TASKLOOP, TEAMS, and DISTRIBUTE constructs to LFortran. For TASKLOOP, I noticed it’s very similar to the TASK construct, with just a couple of extra clauses. I reused the existing TASK implementation by transforming !$omp taskloop into a do loop with nested TASK directives, keeping the approach simple and effective. For TEAMS, I used a GOMP_teams_reg@GOMP_5.0 call with a function pointer for the body and handled shared data, similar to parallel regions. Since TEAMS sits higher in the hierarchy, each team can spawn multiple threads with a parallel construct. The DISTRIBUTE construct was integrated to divide work across teams, ensuring proper work distribution.

Bug Fix and Improvements

During implementation, I found a small bug: the DO construct’s worksharing logic divided iterations by omp_get_max_threads instead of omp_get_num_threads, which returns the current team’s thread count. This caused wrong distribution and incorrect results. I fixed this to use the correct function, ensuring accurate work sharing. I also confirmed that reduction and shared variables work properly across all constructs, including parallel regions inside teams. To set the thread_limit for CI testing, I added the environment variable KMP_TEAMS_THREAD_LIMIT=32 in CMakeLists.txt, applying it only to the LLVM-OMP backend for OpenMP tests.

Examples: TASKLOOP, TEAMS, and DISTRIBUTE Constructs

Below are the five MREs I compiled and ran successfully this week to test the new constructs and fixes via #7951

View MRE for TASKLOOP (openmp_58.f90)
1program openmp_58
2  use omp_lib
3  implicit none
4  integer, parameter :: N = 5
5  integer :: A(N)
6  integer :: i, index,total
7
8  A = 1
9  total=0
10  index=1
11
12  !$omp parallel
13  !$omp single
14  !$omp taskloop shared(A)
15  do i = 1, N
16        total = total + A(index) * 2
17        index=index+1
18    end do
19  !$omp end taskloop
20  !$omp end single
21  !$omp taskwait
22  !$omp end parallel
23
24  print *, "Total = ", total, index
25  if(total/=10) error stop
26end program openmp_58
View MRE for TEAMS (openmp_59.f90)
1program openmp_59
2  use omp_lib
3  integer :: sum=0
4  !$omp teams num_teams(3) reduction(+:sum)
5    print*, omp_get_team_num()
6    sum=sum+omp_get_team_num()
7  !$omp end teams
8  if(sum/=3) error stop
9end program openmp_59
View MRE for TEAMS with PARALLEL (openmp_60.f90)
1program openmp_60
2  use omp_lib
3  implicit none
4integer :: sum , team_sums(4) = 0, local_sum=0
5sum=0
6!$omp teams num_teams(4) thread_limit(3) shared(team_sums) private(local_sum) reduction(+:sum)
7!$omp parallel shared(team_sums) private(local_sum) reduction(+:sum)
8  local_sum = omp_get_thread_num() * 10 + omp_get_team_num()
9  sum = sum + local_sum
10  !$omp critical
11  team_sums(omp_get_team_num() + 1) = team_sums(omp_get_team_num() + 1) + local_sum
12  !$omp end critical
13  !$omp end parallel
14!$omp end teams
15  print*, team_sums
16  print*,sum
17  if(sum/=138) error stop
18  if(team_sums(1) /= 30) error stop
19  if(team_sums(2) /= 33) error stop
20  if(team_sums(3) /= 36) error stop
21  if(team_sums(4) /= 39) error stop
22end program openmp_60
View MRE for DISTRIBUTE (openmp_61.f90)
1program openmp_61
2    use omp_lib
3    implicit none
4integer :: array(1005), i,sum=0
5!$omp teams num_teams(4)
6!$omp distribute
7do i = 1, 1000
8  array(i) = i * 2
9end do
10!$omp end distribute
11!$omp end teams
12
13! Sum of all elements
14!$omp parallel do reduction(+:sum)
15do i=1,1000
16sum=sum+array(i)
17end do
18!$omp end parallel do
19
20print *,sum
21if(sum/=1001000) error stop
22end program openmp_61
View MRE for DISTRIBUTE with Nested PARALLEL DO (openmp_62.f90)
1program openmp_62
2    use omp_lib
3    implicit none
4integer :: array(1000), i, j, sum=0
5array(1)=3
6!$omp teams num_teams(2) thread_limit(5)
7!$omp distribute
8do i = 1, 1000, 100
9    print*,omp_get_num_threads(), omp_get_max_threads()
10  !$omp parallel do
11  do j = i, min(i+99, 1000)
12    array(j) = j * 3
13  end do
14  !$omp end parallel do
15end do
16!$omp end distribute
17!$omp end teams
18
19! Sum of all elements
20!$omp parallel do reduction(+:sum)
21do i=1,1000
22sum=sum+array(i)
23end do
24!$omp end parallel do
25
26print*, sum
27if(sum/=1501500) error stop
28end program openmp_62

Next Steps

For Week 8, I plan to:

  • Implement the SIMD construct using the OMPRegion node (Issue #7332).
  • Fix more bugs and implement other clauses if possible.

I thank my mentors, Ondrej Certik, Pranav Goswami, and Gaurav Dhingra, for their guidance and support. I also appreciate the LFortran community’s support throughout this process.