Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SplineC2C/R2R rotation with BLAS #4710

Merged
merged 15 commits into from
Aug 28, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 7 additions & 15 deletions src/QMCWaveFunctions/BsplineFactory/SplineR2R.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
#include "SplineR2R.h"
#include "spline2/MultiBsplineEval.hpp"
#include "QMCWaveFunctions/BsplineFactory/contraction_helper.hpp"
#include "Platforms/CPU/BLAS.hpp"

namespace qmcplusplus
{
Expand Down Expand Up @@ -56,7 +57,7 @@ void SplineR2R<ST>::storeParamsBeforeRotation()
{
const auto spline_ptr = SplineInst->getSplinePtr();
const auto coefs_tot_size = spline_ptr->coefs_size;
coef_copy_ = std::make_shared<std::vector<RealType>>(coefs_tot_size);
coef_copy_ = std::make_shared<std::vector<ST>>(coefs_tot_size);

std::copy_n(spline_ptr->coefs, coefs_tot_size, coef_copy_->begin());
}
Expand Down Expand Up @@ -120,21 +121,12 @@ void SplineR2R<ST>::applyRotation(const ValueMatrix& rot_mat, bool use_stored_co
std::copy_n(spl_coefs, coefs_tot_size, coef_copy_->begin());
}

// Apply rotation the dumb way b/c I can't get BLAS::gemm to work...
for (auto i = 0; i < BasisSetSize; i++)
{
std::vector<ST> rot_mat_padded(Nsplines * Nsplines, 0);
for (auto i = 0; i < OrbitalSetSize; i++)
for (auto j = 0; j < OrbitalSetSize; j++)
{
const auto cur_elem = Nsplines * i + j;
auto newval{0.};
for (auto k = 0; k < OrbitalSetSize; k++)
{
const auto index = i * Nsplines + k;
newval += (*coef_copy_)[index] * rot_mat[k][j];
}
spl_coefs[cur_elem] = newval;
}
}
rot_mat_padded[i * Nsplines + j] = rot_mat.data()[i * OrbitalSetSize + j];
BLAS::gemm('N', 'N', Nsplines, BasisSetSize, Nsplines, ST(1.0), rot_mat_padded.data(), Nsplines, (*coef_copy_).data(), Nsplines, ST(0.0), spl_coefs, Nsplines);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we don't need rot_mat_padded.

BLAS::gemm('N', 'N', OrbitalSetSize, BasisSetSize, OrbitalSetSize, ST(1.0), rot_mat.data(), OrbitalSetSize, (*coef_copy_).data(), Nsplines, ST(0.0), spl_coefs, Nsplines);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still a precision issue, unless BLAS::gemm can handle matrices with different precision types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was having trouble getting the BLAS call to work without padding. I am trying to figure it out now

I agree there is still a precision issue. I wasn't sure the best way to handle that either. the spline coeffs are ST type and the rotation is RealType. The BLAS::gemm doesn't like mixed precision types, so I had to do some conversion somewhere. If I changed coef_copy_ back to RealType, and the rotation is RealType, the problem then lies in spl_coefs. It is ST*, and I can't just cast it to RealType* in the case of ST=float.
Making another temporary copy would be costly memory wise

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you intending to work with single or double precision splines? I ran into some problems with rotation and single precision splines early on, and since then have only used double precision splines.
The code might need to be split differently based spline precision - call gemm for double precision, and an explicit loop for single precision. That would at least speed up the double precision case.

The precision issue also relates to the history list vs. global rotation method. With the history list, the rotations get accumulated in the existing coefficient storage. Single precision coefficients fail at this accumulation of small changes. With the global rotation method this might not be an issue, since this accumulation of rotations happens in the rotation matrix. It might work to compute the rotation matrix in double precision, then convert to single precision, and then do the matrix multiplication. But that would need to be tested, since the matrix multiplication step might also be sensitive to precision.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer double only, but some of the unit tests use float so I was trying to get it to work in general

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be okay with different code paths for float vs double (use the loop for float and gemm for double). Using "if constexpr" to test the type should make it straightforward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markdewing, added a split depending on float vs. double. this should avoid the loss in precision.

@ye-luo still trying to figure out how to avoid adding the padded rotation matrix. I thought it should be as simple as fiddling with the LDx parameters, but having trouble getting it to work

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A code path split by precision should also avoid the need for the extra padded rotation matrix.

Copy link
Contributor Author

@camelto2 camelto2 Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ye-luo I think I have the blas call correct now to avoid needing to construct the padded rotation matrix. It is working correctly on some test matrices, and unit tests pass

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we don't need rot_mat_padded.

BLAS::gemm('N', 'N', OrbitalSetSize, BasisSetSize, OrbitalSetSize, ST(1.0), rot_mat.data(), OrbitalSetSize, (*coef_copy_).data(), Nsplines, ST(0.0), spl_coefs, Nsplines);

I should have read this in detail sooner. I didn't realize you had already told me how to it. Took me a bit, but I stumbled onto doing correctly


}


Expand Down
2 changes: 1 addition & 1 deletion src/QMCWaveFunctions/BsplineFactory/SplineR2R.h
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ class SplineR2R : public BsplineSet
std::shared_ptr<MultiBspline<ST>> SplineInst;

///Copy of original splines for orbital rotation
std::shared_ptr<std::vector<RealType>> coef_copy_;
std::shared_ptr<std::vector<ST>> coef_copy_;

///thread private ratios for reduction when using nested threading, numVP x numThread
Matrix<TT> ratios_private;
Expand Down