Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCPopulation now gives walkers useful walker and parent ids #5063

Merged
merged 6 commits into from
Jun 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 46 additions & 13 deletions src/Particle/Walker.h
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
// This file is distributed under the University of Illinois/NCSA Open Source License.
// See LICENSE file in top directory for details.
//
// Copyright (c) 2016 Jeongnim Kim and QMCPACK developers.
// Copyright (c) 2024 QMCPACK developers.
//
// File developed by: D. Das, University of Illinois at Urbana-Champaign
// Ken Esler, [email protected], University of Illinois at Urbana-Champaign
Expand All @@ -11,6 +11,7 @@
// Jeongnim Kim, [email protected], University of Illinois at Urbana-Champaign
// Raymond Clay III, [email protected], Lawrence Livermore National Laboratory
// Ye Luo, [email protected], Argonne National Laboratory
// Peter W. Doak, [email protected], Oak Ridge National Laboratory
// Mark A. Berrill, [email protected], Oak Ridge National Laboratory
// Jeongnim Kim, [email protected], Intel Corp
//
Expand Down Expand Up @@ -82,38 +83,55 @@ class Walker
/** }@ */

private:
/** in legacy the ancients have said only:
* id reserved for forward walking
/** walker identifier during a QMCSection
*
* batched:
* Only MCPopulation should set A living walker will have a WalkerID > 0
* 0 is the value of a default constructed walker.
* Any negative value must have been set by an outside entity and indicates
* an invalid walker ID.
*/
long walker_id_ = 0;
/** in legacy the ancients have said only:
* id reserved for forward walking
/** walker identifier that provided the initial state of walker
*
* batched:
* default constructed = 0;
* parentID > 0 it is the walkerID in this section it was assigned from
* parentID < 0 it is the walkerID of a walker in a WalkerConfiguration
* used to construct an initial population of walkers.
*/
long parent_id_ = 0;

public:
/** allegedly DMCgeneration
/** allegedly DMC generation
* PD: I can find no evidence it is ever updated anywhere in the code.
*/
int Generation = 0;
///Age of this walker age is incremented when a walker is not moved after a sweep
///Age is incremented when a walker is not moved after a sweep
int Age = 0;
///Weight of the walker
FullPrecRealType Weight = 1.0;
/** Number of copies for branching
/** Number of replicas of this walker after branching
* When Multiplicity = 0, this walker will be destroyed.
* PD: It seems to me that this should be an integer.
*/
FullPrecRealType Multiplicity = 1.0;
/// mark true if this walker is being sent.
bool SendInProgress;
/// if true, this walker is either copied or tranferred from another MPI rank.

/** if true, this walker is either a copy to lower multiplicity or tranferred from another MPI rank.
* This walker will need distance table, jastrow factors, etc recomputed.
* So this is really a variable tracking the "cache" of the ParticleSet, etc.
* \todo this is a smell and would be nice to address in ParticleSet refactoring.
*/
bool wasTouched = true;

/** The configuration vector (3N-dimensional vector to store
the positions of all the particles for a single walker)*/
ParticlePos R;

//Dynamical spin variable.
/** Spin configuration vector (size N)
* i.e. Dynamical spin variable. */
ParticleScalar spins;
#if !defined(SOA_MEMORY_OPTIMIZED)
/** \f$ \nabla_i d\log \Psi for the i-th particle */
Expand Down Expand Up @@ -161,10 +179,13 @@ class Walker
}

Walker(const Walker& a) : Properties(1, WP::NUMPROPERTIES, 1, WP::MAXPROPERTIES) { makeCopy(a); }
Walker(const Walker& a, long walker_id, long parent_id)
: walker_id_(walker_id), parent_id_(parent_id), Properties(1, WP::NUMPROPERTIES, 1, WP::MAXPROPERTIES)
Walker(const Walker& a, long walker_id, long parent_id) : Properties(1, WP::NUMPROPERTIES, 1, WP::MAXPROPERTIES)
{
makeCopy(a);
// makeCopy replaces walker_id_ and parent_id with a.walker_id_ ...
// so these can not be set in the contructor initializer list
walker_id_ = walker_id;
parent_id_ = parent_id;
ye-luo marked this conversation as resolved.
Show resolved Hide resolved
}

/** create a valid walker for n-particles (batched version)
Expand Down Expand Up @@ -243,7 +264,18 @@ class Walker
L.resize(nptcl);
}

///copy the content of a walker
/** assign the content of a walker
* except:
* SendInProgress
* wasTouched
* has_been_on_wire
*
* Special Behavior:
* R & spins
* Properties.copy instead of operator= ConstantSizeMatrix has strict size semantics for assignment.
* but they seem like they should be fine.
* PropertyHistory
*/
inline void makeCopy(const Walker& a)
{
walker_id_ = a.walker_id_;
Expand All @@ -252,6 +284,7 @@ class Walker
Age = a.Age;
Weight = a.Weight;
Multiplicity = a.Multiplicity;
// PD. \todo Why this strange idiom, something wrong with ParticleAttrib assignment operator
if (R.size() != a.R.size())
resize(a.R.size());
R = a.R;
Expand Down
123 changes: 91 additions & 32 deletions src/QMCDrivers/MCPopulation.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,10 @@
// This file is distributed under the University of Illinois/NCSA Open Source License.
// See LICENSE file in top directory for details.
//
// Copyright (c) 2020 QMCPACK developers.
// Copyright (c) 2024 QMCPACK developers.
//
// File developed by: Peter Doak, [email protected], Oak Ridge National Laboratory
// Ye Luo, [email protected], Argonne National Laboratory
//
// File refactored from: MCWalkerConfiguration.cpp, QMCUpdate.cpp
//////////////////////////////////////////////////////////////////////////////////////
Expand Down Expand Up @@ -44,8 +45,43 @@ MCPopulation::MCPopulation(int num_ranks,

MCPopulation::~MCPopulation() = default;

void MCPopulation::fissionHighMultiplicityWalkers()
{
// we need to do this because spawnWalker changes walkers_ so we
// can't just iterate on that collection.
auto good_walkers = convertUPtrToRefVector(walkers_);
for (MCPWalker& good_walker : good_walkers)
{
int num_copies = static_cast<int>(good_walker.Multiplicity);
while (num_copies > 1)
{
auto walker_elements = spawnWalker();
// In the batched version walker ids are set when walkers are born and are unique,
// parent ids are set to the walker that provided the initial configuration.
// If the amplified walker was a transfer from another rank its copys get its ID
// as their parent, Not the walker id the received walker had on its original rank.
ye-luo marked this conversation as resolved.
Show resolved Hide resolved
// So walkers don't have to maintain state that they were transfers.
// We don't need branching here for transfered and nontransfered high multiplicity
// walkers. The walker assignment operator could avoid writing to the walker_id of
// left side walker but perhaps the assignment operator is surprising enough as is.
auto walker_id = walker_elements.walker.getWalkerID();
walker_elements.walker = good_walker;
// copy the copied from walkers id to parent id.
walker_elements.walker.setParentID(walker_elements.walker.getWalkerID());
// put the walkers actual id back.
walker_elements.walker.setWalkerID(walker_id);
// fix the multiplicity of the new walker
walker_elements.walker.Multiplicity = 1.0;
// keep good walker valid.
good_walker.Multiplicity -= 1.0;
num_copies--;
}
}
}

void MCPopulation::createWalkers(IndexType num_walkers, const WalkerConfigurations& walker_configs, RealType reserve)
{
assert(reserve >= 1.0);
IndexType num_walkers_plus_reserve = static_cast<IndexType>(num_walkers * reserve);

// Hack to hopefully insure no truly new walkers will be made by spawn, since I suspect that
Expand All @@ -64,22 +100,42 @@ void MCPopulation::createWalkers(IndexType num_walkers, const WalkerConfiguratio

outputManager.pause();

//this part is time consuming, it must be threaded and calls should be thread-safe.
#pragma omp parallel for
// nextWalkerID is not thread safe so we need to get our walker id's here
// before we enter a parallel section.
std::vector<long> walker_ids(num_walkers_plus_reserve, 0);
// notice we only get walker_ids for the number of walkers in the walker_configs
for (size_t iw = 0; iw < num_walkers; iw++)
walker_ids[iw] = nextWalkerID();

// this part is time consuming, it must be threaded and calls should be thread-safe.
// It would make more sense if it was over crowd threads as the thread locality of the walkers
// would at least initially be "optimal" Depending on the number of OMP threads and implementation
// this may be equivalent.
#pragma omp parallel for shared(walker_ids)
for (size_t iw = 0; iw < num_walkers_plus_reserve; iw++)
{
walkers_[iw] = std::make_unique<MCPWalker>(elec_particle_set_->getTotalNum());
walkers_[iw]->Properties = elec_particle_set_->Properties;

// initialize coord
// initialize walkers from existing walker_configs
if (const auto num_existing_walkers = walker_configs.getActiveWalkers())
*walkers_[iw] = *walker_configs[iw % num_existing_walkers];
else
{
walkers_[iw] = std::make_unique<MCPWalker>(*walker_configs[iw % num_existing_walkers]);
// An outside section context parent ID is multiplied by -1.
walkers_[iw]->setParentID(-1 * walker_configs[iw % num_existing_walkers]->getWalkerID());
walkers_[iw]->setWalkerID(walker_ids[iw]);
}
else // these are fresh walkers no incoming walkers
{
// These walkers are orphans they don't get their intial configuration from a walkerconfig
// but from the golden particle set. They get an walker ID of 0;
walkers_[iw] = std::make_unique<MCPWalker>(walker_ids[iw], 0 /* parent_id */, elec_particle_set_->getTotalNum() );
// Should these get a randomize from source?
// This seems to be what happens in legacy but its surprisingly opaque there
// How is it not undesirable to have all these walkers start from the same positions
walkers_[iw]->R = elec_particle_set_->R;
walkers_[iw]->spins = elec_particle_set_->spins;
}

walkers_[iw]->Properties = elec_particle_set_->Properties;
walkers_[iw]->registerData();
walkers_[iw]->DataSet.allocate();

Expand All @@ -91,18 +147,6 @@ void MCPopulation::createWalkers(IndexType num_walkers, const WalkerConfiguratio

outputManager.resume();

int num_walkers_created = 0;
for (auto& walker_ptr : walkers_)
{
if (walker_ptr->getWalkerID() == 0)
{
// And so walker ID's start at one because 0 is magic.
// \todo This is C++ all indexes start at 0, make uninitialized ID = -1
walker_ptr->setWalkerID((num_walkers_created++) * num_ranks_ + rank_ + 1);
walker_ptr->setParentID(walker_ptr->getWalkerID());
}
}

// kill and spawn walkers update the state variable num_local_walkers_
// so it must start at the number of reserved walkers
num_local_walkers_ = num_walkers_plus_reserve;
Expand All @@ -111,8 +155,11 @@ void MCPopulation::createWalkers(IndexType num_walkers, const WalkerConfiguratio
// Now we kill the extra reserve walkers and elements that we made.
for (int i = 0; i < extra_walkers; ++i)
killLastWalker();
// And now num_local_walkers_ will be correct.
}

long MCPopulation::nextWalkerID() { return num_walkers_created_++ * num_ranks_ + rank_ + 1; }

WalkerElementsRef MCPopulation::getWalkerElementsRef(const size_t index)
{
return {*walkers_[index], *walker_elec_particle_sets_[index], *walker_trial_wavefunctions_[index]};
Expand All @@ -130,10 +177,15 @@ std::vector<WalkerElementsRef> MCPopulation::get_walker_elements()

/** creates a walker and returns a reference
*
* Walkers are reused
* It would be better if this could be done just by
* reusing memory.
* Not thread safe.
*
* In most cases Walkers that are reused have either died during DMC
* or were created dead as a reserve. This is due to the dynamic allocation of the Walker
* and walker elements being expensive in time. If still true this is an important optimization.
* I think its entirely possible that the walker elements are bloated and if they only included
* necessary per walker mutable elements that this entire complication could be removed.
*
* Walker ID's are handed out per independent trajectory (life) not per allocation.
*/
WalkerElementsRef MCPopulation::spawnWalker()
{
Expand All @@ -156,27 +208,26 @@ WalkerElementsRef MCPopulation::spawnWalker()
walkers_.back()->Age = 0;
walkers_.back()->Multiplicity = 1.0;
walkers_.back()->Weight = 1.0;
// this does count as a walker creation so it gets a new walker id
walkers_.back()->setWalkerID(nextWalkerID());
}
else
{
app_warning() << "Spawning walker number " << walkers_.size() + 1
<< " outside of reserves, this ideally should never happened." << std::endl;
walkers_.push_back(std::make_unique<MCPWalker>(*(walkers_.back())));
outputManager.resume();
auto walker_id = nextWalkerID();
app_warning() << "Spawning walker ID " << walker_id << " this is living walker number " << walkers_.size()
<< "which is beyond the allocated walker reserves, ideally this should never happen." << '\n';
walkers_.push_back(std::make_unique<MCPWalker>(*(walkers_.back()), walker_id, 0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put "0 /* parent_id */"


// There is no value in doing this here because its going to be wiped out
// When we load from the receive buffer. It also won't necessarily be correct
// Because the buffer is changed by Hamiltonians and wavefunctions that
// Add to the dataSet.
outputManager.pause();

walker_elec_particle_sets_.emplace_back(std::make_unique<ParticleSet>(*elec_particle_set_));
walker_trial_wavefunctions_.emplace_back(trial_wf_->makeClone(*walker_elec_particle_sets_.back()));
walker_hamiltonians_.emplace_back(
hamiltonian_->makeClone(*walker_elec_particle_sets_.back(), *walker_trial_wavefunctions_.back()));
walkers_.back()->Multiplicity = 1.0;
walkers_.back()->Weight = 1.0;
walkers_.back()->setWalkerID((walkers_.size() + 1) * num_ranks_ + rank_ + 1);
}

outputManager.resume();
return {*walkers_.back().get(), *walker_elec_particle_sets_.back().get(), *walker_trial_wavefunctions_.back().get()};
}
Expand Down Expand Up @@ -293,8 +344,16 @@ void MCPopulation::saveWalkerConfigurations(WalkerConfigurations& walker_configs
walker_configs.resize(walker_elec_particle_sets_.size(), elec_particle_set_->getTotalNum());
for (int iw = 0; iw < walker_elec_particle_sets_.size(); iw++)
{
// The semantics of this call are not what would I would expect.
// Are walkers's R's invalid here?
// you are not serializing the population walkers but the particle sets
// to the walker_configs walkers...
// So if you would like them to carry information between sections be careful
// you need to copy it in explicitly.
walker_elec_particle_sets_[iw]->saveWalker(*walker_configs[iw]);
ye-luo marked this conversation as resolved.
Show resolved Hide resolved
walker_configs[iw]->Weight = walkers_[iw]->Weight;
walker_configs[iw]->setWalkerID(walkers_[iw]->getWalkerID());
walker_configs[iw]->setParentID(walkers_[iw]->getParentID());
}
}
} // namespace qmcplusplus
Loading
Loading