-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SplineC2C/R2R rotation with BLAS #4710
Changes from all commits
332ad01
047b901
3474373
53f111e
29dadab
c86e331
85b18c5
79e3de9
3259dde
2fb7ce3
b1b093a
88e8077
493fe22
c86cbf8
91adaa9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,6 +17,7 @@ | |
#include "SplineR2R.h" | ||
#include "spline2/MultiBsplineEval.hpp" | ||
#include "QMCWaveFunctions/BsplineFactory/contraction_helper.hpp" | ||
#include "Platforms/CPU/BLAS.hpp" | ||
|
||
namespace qmcplusplus | ||
{ | ||
|
@@ -56,7 +57,7 @@ void SplineR2R<ST>::storeParamsBeforeRotation() | |
{ | ||
const auto spline_ptr = SplineInst->getSplinePtr(); | ||
const auto coefs_tot_size = spline_ptr->coefs_size; | ||
coef_copy_ = std::make_shared<std::vector<RealType>>(coefs_tot_size); | ||
coef_copy_ = std::make_shared<std::vector<ST>>(coefs_tot_size); | ||
|
||
std::copy_n(spline_ptr->coefs, coefs_tot_size, coef_copy_->begin()); | ||
} | ||
|
@@ -120,20 +121,28 @@ void SplineR2R<ST>::applyRotation(const ValueMatrix& rot_mat, bool use_stored_co | |
std::copy_n(spl_coefs, coefs_tot_size, coef_copy_->begin()); | ||
} | ||
|
||
// Apply rotation the dumb way b/c I can't get BLAS::gemm to work... | ||
for (auto i = 0; i < BasisSetSize; i++) | ||
|
||
if constexpr (std::is_same_v<ST, RealType>) | ||
{ | ||
for (auto j = 0; j < OrbitalSetSize; j++) | ||
{ | ||
const auto cur_elem = Nsplines * i + j; | ||
auto newval{0.}; | ||
for (auto k = 0; k < OrbitalSetSize; k++) | ||
//Here, ST should be equal to ValueType, which will be double for R2R. Using BLAS to make things faster | ||
BLAS::gemm('N', 'N', OrbitalSetSize, BasisSetSize, OrbitalSetSize, ST(1.0), rot_mat.data(), OrbitalSetSize, | ||
coef_copy_->data(), Nsplines, ST(0.0), spl_coefs, Nsplines); | ||
} | ||
else | ||
{ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is tough: apparently everything I did had hidden type mis-matches because ST is not guaranteed to be equivalent to ValueType. Seems to me the underlying problem is the type system of the splines is kinda divorced from the rest of the code. If it's true that single precision may be problematic for rotations, then why not ditch single precision splines and keep the production code simple? Or is there some way to enforce double precision splines if rotation is added? Just trying to think of ways to avoid adding complexity to production code if not absolutely necessary. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My guess is single precision splines are still of interest for size reasons. |
||
//Here, ST is float but ValueType is double for R2R. Due to issues with type conversions, just doing naive matrix multiplication in this case to not lose precision on rot_mat | ||
for (IndexType i = 0; i < BasisSetSize; i++) | ||
for (IndexType j = 0; j < OrbitalSetSize; j++) | ||
{ | ||
const auto index = i * Nsplines + k; | ||
newval += (*coef_copy_)[index] * rot_mat[k][j]; | ||
const auto cur_elem = Nsplines * i + j; | ||
FullPrecValueType newval{0.}; | ||
for (IndexType k = 0; k < OrbitalSetSize; k++) | ||
{ | ||
const auto index = i * Nsplines + k; | ||
newval += (*coef_copy_)[index] * rot_mat[k][j]; | ||
} | ||
spl_coefs[cur_elem] = newval; | ||
} | ||
spl_coefs[cur_elem] = newval; | ||
} | ||
} | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of doubt precision loss matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markdewing was mentioning that he had run into problems with precision when using the float splines and wanted the split between double and float. Maybe he can chime in with more details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were cases where splines with float precision had trouble with orbital rotation optimization, and those problems went away when I switched to double precision splines. I haven't investigated further.
The important point for these code paths is that the type of the spline (ST) and the main code type (RealType) have to match for BLAS calls to work without additional copies. For full precision builds, that means double precision splines. For mixed precision builds, I would guess the single precision splines would follow the BLAS code path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How it stands currently is that coefs_copy_ and spl_coefs are ST, whereas rot_mat is passed in as ValueType. There isn't a mixed precision blas call, so in order to use blas we would have to make a copy of rot_mat of ST type. That would allow BLAS to be used always, but you are losing precision when ST is float since it is passed in as ValueType (double or std::complex). I'm not entirely sure how much that precision matters, since the output spline coeffs are ST anyway. We are doing an extra copy, but it is only Norb^2 so shouldn't be a problem.
How it is currently written avoids the extra copy of rot_mat, but only benefits from BLAS when ST == RealType.
How common are runs where ST != RealType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NiO performance tests have the splines set with precision="single", so I think it's still an important case in general. I'm not sure how important it will be for cases involving rotation. Though my guess is someone will want to try it due to memory pressure on the size of the spline coefficients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us defer the investigation of precision. It is important but we need the feature working first.