Stencil Computing

Sounds nice? But what is stencil computing?

Stencil computations, involving operations over the elements of an array, are a common programming pattern in scientific computing, games, and image processing. As a programming pattern, stencil computations are highly regular and amenable to optimisation and parallelisation.

This is why this series of posts should interest you for you to sharpen your skills and get you started on a topic with with lots of research happening and also a chance for you to make a tangible difference!

Not a lot is there on the internet for people who are exploring about stencil computations unless you are looking out on Google Scholar! šŸ˜› Hence, my motivation to work on this area and write about it is for all of the community and not just for the scientific and the research community.

So this is what I got started with!

Partial differential equation (PDE) solvers constitute a large fraction of scientific applications in such diverse areas as heat diffusion, electromagnetics, and fluid dynamics. These applications are often implemented using iterative finite-difference techniques, which sweep over a spatial grid, performing nearest neighbor computations called stencils. In a stencil operation, each point in a multidimensional grid is updated with weighted contributions from a subset of its neighbors in both time and spaceā€” thereby representing the coefficients of the PDE for that data element. These operations are then used to build solvers that range from simple Jacobi iterations to complex multigrid and adaptive mesh refinement methods.

To summarise it, stencils are computations where each cell’s value in your multi dimensional array is computed by it’s neighbouring cells. This is shown to be true after dealing with some amount of mathematics. The good news for you is that you as programmers need not deal with the equations and the calculus behind it.

For the series of posts that follow, all explanations will be in tandem with 3D 7 point stencil for a heat equation solver. You can also check out my code for 2D 5 point stencil for the same heat solver.

Screen Shot 2017-04-19 at 23.43.02
(a) 3D 7Point Stencil
Screen Shot 2017-04-19 at 23.45.00
(b) 2D 5Point Stencil

As shown above, all the seven grids in fig (a) are used to compute the value for the cell in the middle and similarly in the case of fig(b).
The stencil equation for which all implementations are tested is
Anext(i, j, k) = A0(i, j, k+1)+ A0(i, j, k-1)+ A0(i, j+1, k)
+ A0(i, j-1, k)+ A0(i+1, j, k)+ A0(i-1, j, k)
- 6 * A0(i, j, k)/ factor2

This is iterated over multiple times whose value is calledĀ time steps.Ā This is because they happen over time and have a dependency on the previous iteration and hence the name.

The next posts will discuss the advancement in the area and how can we harness the parallelism?

Part 1. Naive Algorithm

Part 2. Block Algorithm

Part 3. Time Skewing Algorithm

The code for both 2D solver and the 3D solver can be found on Github –Ā Stencil Code.

I use a MacBook Air with an i5 processor and 4GB RAM with and L1 cache size of 64K. The maximum number of threads that can run on my CPU are 8. All experimental setup and results are based on the above system configuration. The results with their checksum value for checking accurate implementation can be found here.

There were many research papers that I looked into to get a glimpse of what is happening and understand what sort of research is taking place. The following are the references to them –

1.Tiling Optimizations for 3D Scientific Computation

2.Implicit and Explicit Optimizations for Stencil Computation

3.Tiling Stencil Computations to Maximize Parallelism

There is a lot of promising work to be done in this field. Other areas on which further posts will be based are –

1. Diamond Tiling: Tiling Techniques to Maximize Parallelism for Stencil Computations

2. Parameterized Diamond Tiling for Stencil Computations

Also, stencil computations on GPU‘s hold a lot of promise and research in the area is also very active. As soon as I am able to lay my hands on a GPU, I will definitely try out all these optimisations and other possible scope of research for improved performance and parallelisations.

Thank you.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s