Analysis

The DaskChunkMdanalysis class is the base class to define multi-frame parallel analysis for the MD trajectories. It functions as a wrapper of the MDAnalysis analysis functions to map the analysis to the Dask dataframe. This class takes care of loading the right universe and dumping the results as a npy file to avoid huge memory footprint and dask scheduler clogging.

To define a new analysis, DaskChunkMdanalysis needs to be subclassed. set_feature_info(self, universe) and run_analysis(self, universe, start, stop, step) need to be defined. set_feature_info should return a list of feature name e.g. the name of each torsion angle. run_analysis should return a list of analysis results.

name will be the feature name appending to the dataframe. In default, only protein universe file will be used to run analysis. It can be overridden by defining universe_file=system:

from ENPMDA.analysis import DaskChunkMdanalysis
class NewAnalysis(DaskChunkMdanalysis):
    name = 'new_analysis'
    universe_file = 'protein'

    def set_feature_info(self, universe):
        return ['some_info']

    def run_analysis(self, universe, start, stop, step):
        result = []
        for ts in universe.trajectory[start:stop:step]:
            result.append(some_analysis(universe.atoms))
        return result

Classes

class ENPMDA.analysis.base.DaskChunkMdanalysis(filename, **kwargs)[source]

This class is the base class for all analysis classes. The analysis results will be dumped as a npy file with unique uuid for each partition.

run_analysis(universe, start, stop, step)[source]

The function to be overwritten by the analysis class.

set_feature_info(universe)[source]

This function is used to set the feature information. Shold return a list of features.

classmethod test_on_universe(universe, start=0, stop=2, step=1)[source]

This function is used to test the analysis function on a universe.