MDDataFrame

The MDDataFrame class both store the metadata of simulations in the ensemble and functions as a dask dataframe to add, compute, and store analysis.

A MDDataFrame is created from files:

from ENPMDA import MDDataFrame
md_dataframe = MDDataFrame()
md_dataframe.add_traj_ensemble(traj_ensemble, npartitions=16)

Classes

class ENPMDA.ENPMDA.MDDataFrame(dataframe_name, meta_data_list=['universe_protein', 'universe_system', 'system', 'traj_name', 'frame', 'traj_time', 'stride'], timestamp='20230328-081317')[source]

Class to store the metadata and analysis results of the ensemble simulations.

It uses pandas.DataFrame to store metadata and dask.DataFrame to distribute computation jobs so that the parallel analysis can be performed not only for one trajectory but also across simulations and analyses.

__init__(dataframe_name, meta_data_list=['universe_protein', 'universe_system', 'system', 'traj_name', 'frame', 'traj_time', 'stride'], timestamp='20230328-081317')[source]

Parameters:

dataframe_name (str) – The name of the dataframe It will be used as the folder to save all the analysis results. It can also be the absolute path to the folder.
meta_data_list (list, optional) – List of metadata in the dataframe. In default, the locations of pickled universes of protein and system, the system index, the trajectory filename, the frame index, the trajectory time, and the stride are stored.
timestamp (str, optional) – The timestamp of creating the ensemble It will be set to the current time if not provided.

add_analysis(analysis, overwrite=False, **kwargs)[source]

Add an analysis to the dataframe.

Parameters:

analysis (ENPMDA.analysis.base.DaskChunkMdanalysis) – The analysis to be added to the dataframe.
overwrite (bool, optional) – Whether to overwrite the analysis if it is already in the dataframe.
**kwargs (dict, optional) – Keyword arguments to be passed to the analysis.

add_traj_ensemble(trajectory_ensemble: TrajectoryEnsemble, npartitions, stride=1)[source]

Parameters:

trajectory_ensemble (ENPMDA.TrajectoryEnsemble) – The trajectory ensemble to be added to the dataframe.
npartitions (int) – The number of partitions to be used in the dask dataframe.
stride (int, optional) – The stride to be used in the dask dataframe. It is used to skip frames in the trajectory.

compute()[source]: Compute the analysis results. It will be append the analysis results to the dataframe.

property filename: The saving location of all the pickled files.

get_feature(feature_list, extra_metadata=[], in_memory=True)[source]

Get the features from the dataframe.

Parameters:

feature_list (list of str) – The list of features to be extracted.
extra_metadata (list of str, optional) – The list of extra metadata to be extracted.
in_memory (bool, optional) – Whether to load the features in memory.

get_feature_info(feature_name)[source]

Get the information about a feature.

Parameters:: feature_name (str) – The name of the feature.

classmethod load_dataframe(filename) → MDDataFrame[source]

Load the dataframe from a pickle file.

Parameters:: filename (str, optional) – The name of the pickle file.

save(name='dataframe', overwrite=False)[source]

Compute the analysis results and save the dataframe to a pickle file.

Parameters:

name (str, optional) – The name of the pickle file. It will be saved in the working directory.
overwrite (bool, optional) – Whether to overwrite the file if it exists.