Extracting StandAlone kernel #253

TApplencourt · 2020-03-17T20:50:43Z

Hi,

I open this issue to discuss the possibility of extracting key miniQMC kernels into standalone files.

Indeed having some standalone kernels will help the collaboration between QMCPACK and other ECP projects/vendors.
Those kernels will be easy to install, to benchmark, and port to the different programming models. This will greatly facilitate the early exploration and validation of new hardware/software/programming model.

Regards,
Thomas

markdewing · 2020-03-17T21:03:25Z

A little bit of work here
https://github.com/markdewing/qmc_kernels

The only kernels present are vector add (not really qmc-specific, but the simplest kernel) and 3D spline.

Possible additional kernels

distance calculation (various boundary conditions?)
inverse update (and delayed update version)
computation converting raw 3D splines to SPO's ?
1D spline for Jastrow ?

prckent · 2020-03-17T21:07:25Z

The plan is to make an official maintained QMCPACK repository with splines and updates at first. The idea is that they are clean, zero baggage, well documented and accessible for performance analysis, total refactoring, accessible by non-experts etc. We have much of the code, but which versions should @TApplencourt use to start from? I think reference cpu, cuda, gpu offload etc. would all be of interest. e.g. @PDoakORNL made fresh CUDA implementations in a fork of miniqmc...

TApplencourt · 2020-03-18T14:45:47Z

I can start with the spline of @markdewing if you (aka QMCPACK community) want.

If I understand correctly this code handle {double,single} / {real, complex} data type and many more type of spline.

My recommendation is to start with the bare minimum functionality (one type only for example) and to trim down the rest. It will make the porting / analysis easier.

prckent · 2020-03-18T18:28:51Z

Please take a careful look at the one in this repo (here, https://github.com/QMCPACK/miniqmc ). I am not sure which branch is best though - someone else will need to chime in. miniqmc knows how to setup various sizes of problems corresponding to NiO. i.e. It is realistic.

prckent · 2020-03-18T18:30:06Z

I would start with only single precision real. This is the "legacy CUDA" default in mainline and the one used in benchmarks.

TApplencourt · 2020-03-18T21:14:42Z

@markdewing does your implementation differs from miniqmc one? I would prefer to start from our has it look simpler. But if they are different in can trim down the miniqmc too.

In all case, I will use miniqmc to generate realistic problem size.

markdewing · 2020-03-18T22:15:57Z

I started from the miniqmc version.

For correctness checking, the driver prints a couple of values from the reference implementation and a couple of values from the non-reference version and the user has to compare them manually. This needs to be done better.

The nx,ny,nz and nspline parameters for a few NiO problem sizes are:

a32-e384 is 112x66x66 with 144 splines
a64-e768 is 112x66x66 with 240 splines
a128-e1536 is 112x66x66 with 408 splines

PDoakORNL · 2020-03-18T22:18:42Z

It would be quite easy to take these
https://github.com/PDoakORNL/miniqmc/tree/one_code/src/Numerics/Spline2/test
And make a standalone repo with "my" spline kernel. Should I do that?

prckent · 2020-03-19T13:33:05Z

It looks like Peter's code has CPU, CUDA and Kokkos already. Peter - are/were these all working? It might well be better for Thomas to start with these since they look like a more comprehensive starting point.

prckent · 2020-03-19T13:34:56Z

@markdewing Those spline counts look very strange to me, but perhaps I misunderstand? a32-e384 = 32 atoms and 384 electrons, so 192 electrons per spin = 192 splines. The others should be multiples of this number.

Thomas: The grid size corresponds to the primitive cell, i.e. we assume we are doing tiling for the larger cells and running bulks, as we do for the ECP and CORAL benchmarks .

PDoakORNL · 2020-03-19T13:45:53Z

Yes but probably I should merge to the main repo again. The onecode in my current branch is the current state. The Kokkos had been dropped at that point so I don’t believe it works anymore. I started to look at extracting just the batched/blocked spline eval yesterday, I think it could be made fairly compact especially if some of the variants are deleted/templated.

markdewing · 2020-03-19T14:34:59Z

@prckent I took the numbers from QMCPACK. My understanding is that the splines are complex, and depending on the k-point, some of the values are converted to two orbitals, and some are not (in assign_v). Is this correct?
Maybe this is not necessary for a kernel - using real with number of splines equal to the number of SPO's is sufficient.

prckent · 2020-03-19T14:55:31Z

Yes, that explains the difference.

e.g. For the a32-e384 performance test we can see this on the line "NumDistinctOrbitals 144 numOrbs = 192"
https://cdash.qmcpack.org/CDash/testDetails.php?test=7697041&build=108519

TApplencourt · 2020-06-05T19:56:02Z

It took longer than expected[*], but with Kevin, we did some progress on extracting the inner vgh-float kernel. It's really preliminary, but you can find it here: https://github.com/TApplencourt/nanoQMC.

May I ask people of this thread for review? I'm not sure If we initialize the input correctly.

Do you know about some sanity check I can run on the output to verify we don't do any stupid? (the norm should be 1, or something like that...). Now, the Hessian and gradient values look suspiciously large...

The next step is to create more robust testing, then put the outer_loop back and then porting it to multiple programming languages.

[*] I would like to be able to say that it is because I work from home and have to take care of my young child. But I far as I know, I don't have a toddler...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extracting StandAlone kernel #253

Extracting StandAlone kernel #253

TApplencourt commented Mar 17, 2020

markdewing commented Mar 17, 2020

prckent commented Mar 17, 2020 •

edited

Loading

TApplencourt commented Mar 18, 2020

prckent commented Mar 18, 2020

prckent commented Mar 18, 2020

TApplencourt commented Mar 18, 2020

markdewing commented Mar 18, 2020

PDoakORNL commented Mar 18, 2020

prckent commented Mar 19, 2020 •

edited

Loading

prckent commented Mar 19, 2020 •

edited

Loading

PDoakORNL commented Mar 19, 2020

markdewing commented Mar 19, 2020

prckent commented Mar 19, 2020

TApplencourt commented Jun 5, 2020 •

edited

Loading

Extracting StandAlone kernel #253

Extracting StandAlone kernel #253

Comments

TApplencourt commented Mar 17, 2020

markdewing commented Mar 17, 2020

prckent commented Mar 17, 2020 • edited Loading

TApplencourt commented Mar 18, 2020

prckent commented Mar 18, 2020

prckent commented Mar 18, 2020

TApplencourt commented Mar 18, 2020

markdewing commented Mar 18, 2020

PDoakORNL commented Mar 18, 2020

prckent commented Mar 19, 2020 • edited Loading

prckent commented Mar 19, 2020 • edited Loading

PDoakORNL commented Mar 19, 2020

markdewing commented Mar 19, 2020

prckent commented Mar 19, 2020

TApplencourt commented Jun 5, 2020 • edited Loading

prckent commented Mar 17, 2020 •

edited

Loading

prckent commented Mar 19, 2020 •

edited

Loading

prckent commented Mar 19, 2020 •

edited

Loading

TApplencourt commented Jun 5, 2020 •

edited

Loading