Stencil Computing

Sounds nice? But what is stencil computing?

Stencil computations, involving operations over the elements of an array, are a common programming pattern in scientific computing, games, and image processing. As a programming pattern, stencil computations are highly regular and amenable to optimisation and parallelisation.

This is why this series of posts should interest you for you to sharpen your skills and get you started on a topic with with lots of research happening and also a chance for you to make a tangible difference!

Not a lot is there on the internet for people who are exploring about stencil computations unless you are looking out on Google Scholar! ūüėõ Hence, my motivation to work on this area and write about it is for all of the community and not just for the scientific and the research community.

So this is what I got started with!

Partial differential equation (PDE) solvers constitute a large fraction of scientific applications in such diverse areas as heat diffusion, electromagnetics, and fluid dynamics. These applications are often implemented using iterative finite-difference techniques, which sweep over a spatial grid, performing nearest neighbor computations called stencils. In a stencil operation, each point in a multidimensional grid is updated with weighted contributions from a subset of its neighbors in both time and space‚ÄĒ thereby representing the coefficients of the PDE for that data element. These operations are then used to build solvers that range from simple Jacobi iterations to complex multigrid and adaptive mesh refinement methods.

To summarise it, stencils are computations where each cell’s value in your multi dimensional array is computed by it’s neighbouring cells. This is shown to be true after dealing with some amount of mathematics. The good news for you is that you as programmers need not deal with the equations and the calculus behind it.

For the series of posts that follow, all explanations will be in tandem with 3D 7 point stencil for a heat equation solver. You can also check out my code for 2D 5 point stencil for the same heat solver.

Screen Shot 2017-04-19 at 23.43.02
(a) 3D 7Point Stencil
Screen Shot 2017-04-19 at 23.45.00
(b) 2D 5Point Stencil

As shown above, all the seven grids in fig (a) are used to compute the value for the cell in the middle and similarly in the case of fig(b).
The stencil equation for which all implementations are tested is
Anext(i, j, k) = A0(i, j, k+1)+ A0(i, j, k-1)+ A0(i, j+1, k)
+ A0(i, j-1, k)+ A0(i+1, j, k)+ A0(i-1, j, k)
- 6 * A0(i, j, k)/ factor2

This is iterated over multiple times whose value is called time steps. This is because they happen over time and have a dependency on the previous iteration and hence the name.

The next posts will discuss the advancement in the area and how can we harness the parallelism?

Part 1. Naive Algorithm

Part 2. Block Algorithm

Part 3. Time Skewing Algorithm

The code for both 2D solver and the 3D solver can be found on Github –¬†Stencil Code.

I use a MacBook Air with an i5 processor and 4GB RAM with and L1 cache size of 64K. The maximum number of threads that can run on my CPU are 8. All experimental setup and results are based on the above system configuration. The results with their checksum value for checking accurate implementation can be found here.

There were many research papers that I looked into to get a glimpse of what is happening and understand what sort of research is taking place. The following are the references to them –

1.Tiling Optimizations for 3D Scientific Computation

2.Implicit and Explicit Optimizations for Stencil Computation

3.Tiling Stencil Computations to Maximize Parallelism

There is a lot of promising work to be done in this field. Other areas on which further posts will be based are –

1. Diamond Tiling: Tiling Techniques to Maximize Parallelism for Stencil Computations

2. Parameterized Diamond Tiling for Stencil Computations

Also, stencil computations on GPU‘s hold a lot of promise and research in the area is also very active. As soon as I am able to lay my hands on a GPU, I will definitely try out all these optimisations and other possible scope of research for improved performance and parallelisations.

Thank you.


Introduction To Parallel Computing!

Screen Shot 2017-03-17 at 20.48.29
Meet Prof X and Student Y

Student Y keeps asking all the “why” questions and answers to these by our very own Prof X will probably help you solve your doubts and clarify your concepts!

Ahh! So now that you’ve met the two protagonists, what are we waiting for?

Prof X : So a long time ago, our hero of the story Mr. Comp was a very hardworking guy. His services were quite expensive and were available only to the elite and his special resources were used by only knowledgable people equipped with special skill sets just to exploit the services of Comp in the best possible manner to make every buck worth it.

Student Y : Looks like he was a big shot! By the way, I do get the hidden implicit references you are making professor. Yeah I know that in olden days computers were a big expensive things and the cost of running them used to run into millions.  Hence, only expert programmers were allowed to handle. With computing resources becoming cheaper day by day, it is now accessible to millions around the world.

Prof X :¬†Haha! I give it to you. Yes, the technological advancements have been very quick but do you know that these advancements actually happen to stick to a pattern. To put it in a dramatic way, the¬†prophecy¬†(Moore’s law) of Robert Moore continues to be fulfilled. Now that the capabilities of our computers continue to grow, we should also learn to harness these powers. This is what parallel computing is about.

Student :¬†Yeah true. I read it somewhere that the iPhone itself is almost 3 times powerful than the supercomputers in the 1980’s but what’s there for me to learn about all of this on the software aspect of it? Like I mean that it’s like Mr Comp has grown more muscle and he still does the same task but probably now with more speed and power.

Prof X : Haha no! There is a vast ocean of knowledge regarding this new era of computing. Coming back to our story. Mr Comp did actually start becoming more strong and powerful but how much stress could he even take? Bad for him. Right? So after some time he evolved himself into a company of more than 1 employee. See now where do we come in the picture?

Student :¬†Huh? I’m not totally not getting you Prof X.

Prof X : Wait for it.  He then started out by created an excellent team of professionals. To handle the tasks that were being given to the Memory department in a faster manner, we recommended him to hire L1, L2, L3 and a few others and also at the crux of the company he added a few more people capable just like him.

Student : Haha!¬†I get it now. It’s like a multi processor machine having a cache structure and hence those names L1, L2 and L3. It still doesn’t answer my previous question professor. More people more work done. How are things changing for us?

Prof X :¬†Smart. Yes to some extent Comp was able to your same old job in a quicker way. Although the labour had become cheap, we were still sticking to our old ways. The responsibility fell on us to best use Mr Comp’s resources. We know have to think in a way how best to use those additional resources now available to us. We must be able to think what task would each of them do? How to make sure that there are no clash of interest among them? We would be giving orders to them and just make use of the hierarchical and structured resource provided to us.

Student :¬†Oh! Now I’m getting it. It’s the developers onus to make best use of this new computer architecture and that is what parallel computing and high performance computing is about. Right?

Prof X :¬†Yes, you’re indeed right. To harness these resources best you need to understand and learn a little of hardware aspects as well as be able to direct the workers of Mr Comp in a more efficient manner (software). We must be in a situation to be best able to exploit the resources of Mr Comp and finish our tasks early so that we can go home and have some good family time.

Student : Thanks Prof X!! You certainly have incited a spark in me but it does take some effort to make that spark turn into a fire for me to turn *dangerous*. I will do a bit more reading and come back to you.

Prof X :¬†Before you leave, I think I should introduce you to a person from the open source world who is here to help us for free. With his help you’ll be better able to make best use of Mr Comp’s resources. Meet none other than Mr. OpenMP. He is like a middle man who makes our life simpler.

Student : Thank you professor! Before leaving, could you suggest me a few books or good sources of information?

Prof X :¬† There happen to be a lot of books out there. The few ones that I’ve been able to land my eyes upon and find it interesting are –
1. Parallel Programming in C with MPI and OpenMP, Michael Quinn
2. Using Open MP, Barbara Chapman , Gabrielle Jost, Ruud Van Der Pas 
I’ll update the list whenever I find something new and interesting. These books will also discuss also about other techniques. So do read upon them to. Open MP documentation from LLNL also happens to be good source to read from.

Student : Thanks professor!

Prof X : Anyways, go into the wild. Google a bit, read a bit, experiment a bit and come back with lots of queries and doubts!

I guess a few of your “why” questions were answered and hopefully Prof X and his conversation with Student Y turned out to be helpful for you.

Thank you.

Signing Off,
Sumedh Arani.