This is the third and the final post in the series of matrix multiplication.
For what is tiling, you can look up this post.
Current OpenMP programming language is tile oblivious, although it is the de facto standard for writing parallel programs on shared memory systems. Tiling not only can improve data locality for both the sequential and parallel programs, but also can help the compiler to maximise parallelism and minimise synchronisation for programs running on parallel machines. There is a promising future work in trying to make OpenMP tile-aware and suitably add directives which can be sort of an annotation to mark regions that can help OpenMP become tile-aware.
Currently, OpenMP doesn’t support reduction on multi dimensional array and it requires us to manually add the code. Work has been done to add functionality to OpenMP and do parallel reductions on multi dimensional arrays and not just on scalar variables. Also apparently, the current OpenMP parallelisation techniques can not harvest the maximum parallelism and data locality in the code at the same time. They suffer from either insufficient parallelism or poor data locality.
This can be done basically in four steps
- Distribute the iterations of the parallelized loop among the threads
- Allocate memory for the private copy of the tile used in the local recursive calculation
- Perform the local recursive calculation which is specified by the reduction kernelloops
- Update the global copy of the reduction tile
In my solution, I have instead used a flattened 2D array. A private variable is then created and updates happen to that variable which is then stored back to the cell.
You can find my code here.