Analysis
The DaskChunkMdanalysis class is
the base class to define multi-frame parallel analysis for
the MD trajectories. It functions as a wrapper of the
MDAnalysis analysis functions to map the analysis to the Dask dataframe.
This class takes care of loading the right universe and dumping the
results as a npy file to avoid huge memory footprint
and dask scheduler clogging.
To define a new analysis,
DaskChunkMdanalysis
needs to be subclassed.
set_feature_info(self, universe) and run_analysis(self, universe, start, stop, step) need to be defined. set_feature_info should return
a list of feature name e.g. the name of each torsion angle. run_analysis should return a list of analysis results.
name will be the feature name appending to the dataframe.
In default, only protein universe file will be used to run analysis.
It can be overridden by defining universe_file=system:
from ENPMDA.analysis import DaskChunkMdanalysis
class NewAnalysis(DaskChunkMdanalysis):
name = 'new_analysis'
universe_file = 'protein'
def set_feature_info(self, universe):
return ['some_info']
def run_analysis(self, universe, start, stop, step):
result = []
for ts in universe.trajectory[start:stop:step]:
result.append(some_analysis(universe.atoms))
return result
Classes
- class ENPMDA.analysis.base.DaskChunkMdanalysis(filename, **kwargs)[source]
This class is the base class for all analysis classes. The analysis results will be dumped as a npy file with unique uuid for each partition.
- run_analysis(universe, start, stop, step)[source]
The function to be overwritten by the analysis class.