Troubleshooting Performance

VSim is designed to optimally use the computational hardware you have available, whether a laptop or a leadership class supercomputing facility.

It achieves this through the use of advanced algorithms, but this does not guarantee any given simulation will run as fast as possible. This document outlines some simple checks which may aid speeding up a simulation.

On the one hand, there are different types of algorithms for field solves, particle movers and monte-carlo, and these may all scale up in slightly different ways, and require different amounts of inter- processor communication for large parallel simulations. Having many histories may also impact performance.

Firstly, it helps to understand which parts of the simulation are taking the most time. The best way to do this is to remove elements of the simulation one at a time, and to assess the difference in speed. Having measured performance of field solves, particle pushes (if applicable), monte carlo interactions (if applicable) and history objects, it may be possible to simplify some of these, for example by adjusting the number of physical particles per macroparticle.

In general one need not dump the fields and particles more often than is necessary as this will lead to slow visualisation, and the slowing down of the simulation while the data is written.

Electromagnetic Solves

Electromagnetic solves tend to be bound by memory access and the ability to pass boundary data across the network. As a rule of thumb - performance tends not to increase well when the domain on each processor is smaller than 40x40x40 - but this limit will depend on the relative performance of your network fabric and CPU. Also, cells outside perfect electrical conductor take longer than inside, so it can sometimes be worth adjusting the domain decomposition strategy to ensure the load is balanced equally. Minimize the regions over which any MAL or PML boundary conditions are applied, as these will be comparatively slow compared with a normal cell update.

Electrostatic Solves

Electrostatic solves can be very communication intensive. Consequently, there may be a ‘sweet spot’ in terms of not only the total number of processors to use, but the number of processors per node. System administrators may be able to provide diagnostic tools to help monitor this behavior. With fixed number of nodes, varying the domain size may also help with understanding the performance bottlenecks.

Particle Balancing

Particles may not be evenly distributed throughout the simulation domain for all times, and their distributions may vary from species to species. The ‘binning’ tool in VSimComposer may be used to count the number of particles vs x,y,z, and modifying the domain decomposition will sometimes help ensure the load is balanced optimally. It can help to temporarily add extra diagnostics to measure total populations of macroparticles for all species as they vary through your simulation, and to focus on those with the most macroparticles.

If the number of macroparticles of some species is changing by many orders of magnitude, consider a strategy which includes macroparticle combining or splitting. See: nullSelfCombination

For densities that are uniformly high, and with consequent low
mean free path, consider whether a fluid/hybrid approach might be better.

Particle Interactions

For collisional plasmas, the time to run discharge simulations is often dominated by the computation of interactions between the particles.

The most important thing is to make sure you are using the right set of interactions to describe your plasma. If you are uncertain, consider running a global model to analyze the reaction paths and ensure insignificant paths are eliminated from the simulation.

Monte-carlo type algorithms for interactions should be set such that the timestep is small enough that interaction probabilities are always small as advised in monte_carlo_interactions_package.

Histories

Histories store their data in RAM in between data dumps and can write very large datasets. Some histories need to do non-trivial amounts of computation each time step.

Configuration Issues

The installer Tech-X provides can be expected to work well out of the box on desktop and high performance computing systems.

HPC systems often have high performance parallel systems. Commonly these are set up differently from your home area, and you will need to ensure that you are running with your data being output to a specific partition. Check the cluster documentation for more information.

HPC sytems are sometimes configured with a different MPI (to VSim’s required mpich MPI) in the environment set up by the queue system. In rare cases the MPI installation provided by VSim can pick up the wrong network card or fail to use the correct infiniband driver (normally where this has been customized heavily on those clusters). This will likely manifest as very poor parallel performance. For example, a simulation using sixteen cores and one node may run much faster than a simulation on thirty-two cores and two nodes (subject to the scaling advice above). In these cases we recommend you contact Tech-X support at support@txcorp.com for advice. Modification of the environment is non-trivial and may have unexpected consequences.