Skip to content

Library

rdworks.mollibr.MolLibr

Attributes

chunksize = chunksize instance-attribute

clusters = None instance-attribute

libr = [] instance-attribute

max_workers = max_workers instance-attribute

progress = progress instance-attribute

query = None instance-attribute

threshold = None instance-attribute

Functions

align_drawing(ref=0, mcs=True, scaffold='', coordgen=True, **kwargs)

Align 2D drawings by using MCS or scaffold SMILES.

Parameters:

  • ref (int, default: 0 ) –

    index to the reference. Defaults to 0.

  • mcs (bool, default: True ) –

    whether to use MCS(maximum common substructure). Defaults to True.

  • scaffold (str, default: '' ) –

    whether to use scaffold (SMILES). Defaults to "".

Returns:

  • Self ( Self ) –

    self

cluster(threshold=0.3, ordered=True, drop_singleton=True)

Clusters molecules using fingerprint.

Parameters:

  • threshold (float, default: 0.3 ) –

    Tanimoto similarity threshold. Defaults to 0.3.

  • ordered (bool, default: True ) –

    order clusters by size of cluster. Defaults to True.

  • drop_singleton (bool, default: True ) –

    exclude singletons. Defaults to True.

Returns:

  • list ( list ) –

    [(centroid_1, idx, idx,), (centroid_2, idx, idx,), ...]

compute(**kwargs)

Change settings for parallel computing.

Parameters:

  • max_workers (int) –

    max number of workers. Defaults to 4.

  • chunksize (int) –

    chunksize of splitted workload. Defaults to 10.

  • progress (bool) –

    whether to show progress bar. Defaults to False.

Returns:

  • Self ( Self ) –

    rdworks.MolLibr object.

copy()

Returns a copy of self.

Returns:

  • Self ( Self ) –

    rdworks.MolLibr object.

count()

Returns number of molecules.

Returns:

  • int ( int ) –

    count of molecules.

drop(terms=None, invert=False, **kwargs)

Drops matched molecules and returns a copy of library with remaining molecules.

Parameters:

  • terms (str | Path | None, default: None ) –

    matching terms. Defaults to None.

  • invert (bool, default: False ) –

    whether to invert selection by the terms. Defaults to False.

Returns:

  • Self ( Self ) –

    a copy of self.

nnp_ready(model, **kwargs)

Returns a copy of subset of library that is ready to given neural network potential.

Examples:

>>> libr = rdworks.MolLibr(drug_smiles, drug_names)
>>> ani2x_compatible_subset = libr.nnp_ready('ANI-2x', progress=False)

Parameters:

  • model (str) –

    name of model.

Returns:

  • Self ( Self ) –

    subset of library.

overlap(other)

Returns a common subset with other library.

Parameters:

  • other (Self) –

    rdworks.MolLibr object.

Returns:

  • Self ( Self ) –

    common subset of rdworks.MolLibr.

pick_diverse(n, seed=0, fp_type='morgan', radius=2, n_bits=2048)

Select diverse molecules using MaxMin algorithm.

Parameters:

  • n (int) –

    Number of molecules to select.

  • seed (int, default: 0 ) –

    Random seed for reproducibility. Defaults to 0.

  • fp_type (str, default: 'morgan' ) –

    Type of fingerprint to use: morgan, rdkit, maccs. Defaults to 'morgan'

  • radius (int, default: 2 ) –

    Radius for Morgan fingerprints. Defaults to 2

  • n_bits (int, default: 2048 ) –

    Number of bits for fingerprints. Defaults to 2048

Returns:

  • Self

    List of selected indices

qed(properties=['QED', 'MolWt', 'LogP', 'TPSA', 'HBD'], **kwargs)

Returns a copy of self with calculated quantitative estimate of drug-likeness (QED).

Parameters:

  • properties (list[str], default: ['QED', 'MolWt', 'LogP', 'TPSA', 'HBD'] ) –

    description. Defaults to ['QED', 'MolWt', 'LogP', 'TPSA', 'HBD'].

Returns:

  • Self ( Self ) –

    self.

rename(prefix=None, sep='.', start=1)

Rename molecules with serial numbers in-place and their conformers.

Molecules will be named by a format, {prefix}{sep}{serial_number} and conformers will be named accordingly.

Examples:

>>> a.rename(prefix='a')

Parameters:

  • prefix (str, default: None ) –

    prefix for new name. If prefix is not given and set to None, molecules will not renamed but conformers will be still renamed. This is useful after dropping some conformers and rename them serially.

  • sep (str, default: '.' ) –

    separator between prefix and serial number (default: .)

  • start (int, default: 1 ) –

    start number of serial number.

Returns:

  • Self ( Self ) –

    rdworks.MolLibr object.

similar(query, threshold=0.2, **kwargs)

Returns a copy of subset that are similar to query.

Parameters:

  • query (Mol) –

    query molecule.

  • threshold (float, default: 0.2 ) –

    similarity threshold. Defaults to 0.2.

Raises:

  • TypeError

    if query is not rdworks.Mol type.

Returns:

  • Self ( Self ) –

    a copy of self.

to_batches(batchsize_atoms=1000)

Split workload flexibily into a numer of batches.

  • Each batch has up to batchsize_atoms number of atoms.
  • Conformers originated from a same molecule can be splitted into multiple batches.
  • Or one batch can contain conformers originated from multiple molecules.

coord: coordinates of input molecules (N, m, 3) where N is the number of structures and m is the number of atoms in each structure. numbers: atomic numbers in the molecule (include H). (N, m) charges: (N,)

Parameters:

  • batchsize_atoms (int, default: 1000 ) –

    max. number of atoms in a batch.

Returns:

  • list ( list ) –

    list of batches.

to_csv(path, confs=False, decimals=3)

Writes to a .csv file.

Parameters:

  • path (str | Path) –

    output filename or path.

  • confs (bool, default: False ) –

    whether to include conformer properties. Defaults to False.

  • decimals (int, default: 3 ) –

    decimal places for float numbers. Defaults to 3.

to_df(name='name', smiles='smiles', confs=False)

Returns a Pandas DataFrame.

Parameters:

  • name (str, default: 'name' ) –

    column name for name. Defaults to 'name'.

  • smiles (str, default: 'smiles' ) –

    column name for SMILES. Defaults to 'smiles'.

  • confs (bool, default: False ) –

    whether to include conformer properties. Defaults to False.

Returns:

  • DataFrame

    pd.DataFrame: pandas DataFrame.

to_html()

Writes to HTML strings.

Returns:

  • str ( str ) –

    HTML strings.

to_png(filename=None, mols_per_row=5, width=200, height=200, atom_index=False, redraw=False, coordgen=False)

Writes to a .png file.

Parameters:

  • mols_per_row (int, default: 5 ) –

    number of molecules per row. Defaults to 5.

  • width (int, default: 200 ) –

    width. Defaults to 200.

  • height (int, default: 200 ) –

    height. Defaults to 200.

  • atom_index (bool, default: False ) –

    whether to show atom index. Defaults to False.

  • redraw (bool, default: False ) –

    whether to redraw. Defaults to False.

  • coordgen (bool, default: False ) –

    whether to use coordgen. Defaults to False.

to_sdf(path, confs=False, props=True, separate=False)

Writes to .sdf or .sdf.gz file.

Chem.SDWriter is supposed to write all non-private molecular properties.

dirname/filename.sdf -> dirname/filename_{molecule name}.sdf dirname/filename.sdf.gz -> dirname/filename_{molecule name}.sdf.gz

Parameters:

  • path (str or PosixPath)

    output filename or path

  • confs (bool)

    whether to write 3D coordinates and conformer properties. Defaults to False.

  • props (bool)

    whether to write SDF properties. Defaults to True.

  • separate (bool)

    write each molecule to separate files. Defaults to False.

to_smi(path)

Writes to .smi file.

Parameters:

  • path (str | Path) –

    output filename or path.

to_svg(mols_per_row=5, width=200, height=200, atom_index=False, redraw=False, coordgen=False)

Writes to a .svg strings for Jupyter notebook.

Parameters:

  • path (str | Path) –

    output filename or path.

  • mols_per_row (int, default: 5 ) –

    number of molecules per row. Defaults to 5.

  • width (int, default: 200 ) –

    width. Defaults to 200.

  • height (int, default: 200 ) –

    height. Defaults to 200.

  • atom_index (bool, default: False ) –

    whether to show atom index. Defaults to False.

  • redraw (bool, default: False ) –

    whether to redraw. Defaults to False.

  • coordgen (bool, default: False ) –

    whether to use coordgen. Defaults to False.

unique(report=False)

Removes duplicates and returns a copy of unique library.

Parameters:

  • report (bool, default: False ) –

    whether to report duplicates. Defaults to False.

Returns:

  • Self ( Self ) –

    a copy of self.