Library
rdworks.mollibr.MolLibr
¶
Attributes¶
chunksize = chunksize
instance-attribute
¶
clusters = None
instance-attribute
¶
libr = []
instance-attribute
¶
max_workers = max_workers
instance-attribute
¶
progress = progress
instance-attribute
¶
query = None
instance-attribute
¶
threshold = None
instance-attribute
¶
Functions¶
align_drawing(ref=0, mcs=True, scaffold='', coordgen=True, **kwargs)
¶
Align 2D drawings by using MCS or scaffold SMILES.
Parameters:
-
ref(int, default:0) –index to the reference. Defaults to 0.
-
mcs(bool, default:True) –whether to use MCS(maximum common substructure). Defaults to True.
-
scaffold(str, default:'') –whether to use scaffold (SMILES). Defaults to "".
Returns:
-
Self(Self) –self
cluster(threshold=0.3, ordered=True, drop_singleton=True)
¶
Clusters molecules using fingerprint.
Parameters:
-
threshold(float, default:0.3) –Tanimoto similarity threshold. Defaults to 0.3.
-
ordered(bool, default:True) –order clusters by size of cluster. Defaults to True.
-
drop_singleton(bool, default:True) –exclude singletons. Defaults to True.
Returns:
-
list(list) –[(centroid_1, idx, idx,), (centroid_2, idx, idx,), ...]
compute(**kwargs)
¶
Change settings for parallel computing.
Parameters:
-
max_workers(int) –max number of workers. Defaults to 4.
-
chunksize(int) –chunksize of splitted workload. Defaults to 10.
-
progress(bool) –whether to show progress bar. Defaults to False.
Returns:
-
Self(Self) –rdworks.MolLibr object.
copy()
¶
Returns a copy of self.
Returns:
-
Self(Self) –rdworks.MolLibr object.
count()
¶
Returns number of molecules.
Returns:
-
int(int) –count of molecules.
drop(terms=None, invert=False, **kwargs)
¶
Drops matched molecules and returns a copy of library with remaining molecules.
Parameters:
-
terms(str | Path | None, default:None) –matching terms. Defaults to None.
-
invert(bool, default:False) –whether to invert selection by the
terms. Defaults to False.
Returns:
-
Self(Self) –a copy of self.
nnp_ready(model, **kwargs)
¶
Returns a copy of subset of library that is ready to given neural network potential.
Examples:
>>> libr = rdworks.MolLibr(drug_smiles, drug_names)
>>> ani2x_compatible_subset = libr.nnp_ready('ANI-2x', progress=False)
Parameters:
-
model(str) –name of model.
Returns:
-
Self(Self) –subset of library.
overlap(other)
¶
Returns a common subset with other library.
Parameters:
-
other(Self) –rdworks.MolLibr object.
Returns:
-
Self(Self) –common subset of rdworks.MolLibr.
pick_diverse(n, seed=0, fp_type='morgan', radius=2, n_bits=2048)
¶
Select diverse molecules using MaxMin algorithm.
Parameters:
-
n(int) –Number of molecules to select.
-
seed(int, default:0) –Random seed for reproducibility. Defaults to 0.
-
fp_type(str, default:'morgan') –Type of fingerprint to use: morgan, rdkit, maccs. Defaults to 'morgan'
-
radius(int, default:2) –Radius for Morgan fingerprints. Defaults to 2
-
n_bits(int, default:2048) –Number of bits for fingerprints. Defaults to 2048
Returns:
-
Self–List of selected indices
qed(properties=['QED', 'MolWt', 'LogP', 'TPSA', 'HBD'], **kwargs)
¶
Returns a copy of self with calculated quantitative estimate of drug-likeness (QED).
Parameters:
-
properties(list[str], default:['QED', 'MolWt', 'LogP', 'TPSA', 'HBD']) –description. Defaults to ['QED', 'MolWt', 'LogP', 'TPSA', 'HBD'].
Returns:
-
Self(Self) –self.
rename(prefix=None, sep='.', start=1)
¶
Rename molecules with serial numbers in-place and their conformers.
Molecules will be named by a format, {prefix}{sep}{serial_number} and
conformers will be named accordingly.
Examples:
Parameters:
-
prefix(str, default:None) –prefix for new name. If prefix is not given and set to None, molecules will not renamed but conformers will be still renamed. This is useful after dropping some conformers and rename them serially.
-
sep(str, default:'.') –separator between prefix and serial number (default:
.) -
start(int, default:1) –start number of serial number.
Returns:
-
Self(Self) –rdworks.MolLibr object.
similar(query, threshold=0.2, **kwargs)
¶
Returns a copy of subset that are similar to query.
Parameters:
-
query(Mol) –query molecule.
-
threshold(float, default:0.2) –similarity threshold. Defaults to 0.2.
Raises:
-
TypeError–if query is not rdworks.Mol type.
Returns:
-
Self(Self) –a copy of self.
to_batches(batchsize_atoms=1000)
¶
Split workload flexibily into a numer of batches.
- Each batch has up to
batchsize_atomsnumber of atoms. - Conformers originated from a same molecule can be splitted into multiple batches.
- Or one batch can contain conformers originated from multiple molecules.
coord: coordinates of input molecules (N, m, 3) where N is the number of structures and m is the number of atoms in each structure. numbers: atomic numbers in the molecule (include H). (N, m) charges: (N,)
Parameters:
-
batchsize_atoms(int, default:1000) –max. number of atoms in a batch.
Returns:
-
list(list) –list of batches.
to_csv(path, confs=False, decimals=3)
¶
Writes to a .csv file.
Parameters:
-
path(str | Path) –output filename or path.
-
confs(bool, default:False) –whether to include conformer properties. Defaults to False.
-
decimals(int, default:3) –decimal places for float numbers. Defaults to 3.
to_df(name='name', smiles='smiles', confs=False)
¶
Returns a Pandas DataFrame.
Parameters:
-
name(str, default:'name') –column name for name. Defaults to 'name'.
-
smiles(str, default:'smiles') –column name for SMILES. Defaults to 'smiles'.
-
confs(bool, default:False) –whether to include conformer properties. Defaults to False.
Returns:
-
DataFrame–pd.DataFrame: pandas DataFrame.
to_html()
¶
Writes to HTML strings.
Returns:
-
str(str) –HTML strings.
to_png(filename=None, mols_per_row=5, width=200, height=200, atom_index=False, redraw=False, coordgen=False)
¶
Writes to a .png file.
Parameters:
-
mols_per_row(int, default:5) –number of molecules per row. Defaults to 5.
-
width(int, default:200) –width. Defaults to 200.
-
height(int, default:200) –height. Defaults to 200.
-
atom_index(bool, default:False) –whether to show atom index. Defaults to False.
-
redraw(bool, default:False) –whether to redraw. Defaults to False.
-
coordgen(bool, default:False) –whether to use coordgen. Defaults to False.
to_sdf(path, confs=False, props=True, separate=False)
¶
Writes to .sdf or .sdf.gz file.
Chem.SDWriter is supposed to write all non-private molecular properties.
dirname/filename.sdf -> dirname/filename_{molecule name}.sdf
dirname/filename.sdf.gz -> dirname/filename_{molecule name}.sdf.gz
Parameters:
-
path (str or PosixPath)–output filename or path
-
confs (bool)–whether to write 3D coordinates and conformer properties. Defaults to False.
-
props (bool)–whether to write SDF properties. Defaults to True.
-
separate (bool)–write each molecule to separate files. Defaults to False.
to_smi(path)
¶
Writes to .smi file.
Parameters:
-
path(str | Path) –output filename or path.
to_svg(mols_per_row=5, width=200, height=200, atom_index=False, redraw=False, coordgen=False)
¶
Writes to a .svg strings for Jupyter notebook.
Parameters:
-
path(str | Path) –output filename or path.
-
mols_per_row(int, default:5) –number of molecules per row. Defaults to 5.
-
width(int, default:200) –width. Defaults to 200.
-
height(int, default:200) –height. Defaults to 200.
-
atom_index(bool, default:False) –whether to show atom index. Defaults to False.
-
redraw(bool, default:False) –whether to redraw. Defaults to False.
-
coordgen(bool, default:False) –whether to use coordgen. Defaults to False.
unique(report=False)
¶
Removes duplicates and returns a copy of unique library.
Parameters:
-
report(bool, default:False) –whether to report duplicates. Defaults to False.
Returns:
-
Self(Self) –a copy of self.