Molecule

Conformer Generation¶

To yield more chemically meaningful conformers, Riniker and Landrum implemented the experimental torsion knowledge distance geometry (ETKDG) method which uses torsion angle preferences from the Cambridge Structural Database (CSD) to correct the conformers after distance geometry has been used to generate them.

The configs of various conformer generation options are stored in a EmbedParameter object:

# To explicitly call the ETKDG EmbedParameter object: 
params = AllChem.ETKDG() 
params = AllChem.ETKDGv2()

# Addtional small ring torsion potentials: 
params = AllChem.srETKDGv3()

# Additional macrocycle ring torsion potentials and macrocycle-specific handles
params = AllChem.ETKDGv3()

# use the above two in conjunction, do:
params = AllChem.ETKDGv3()
params.useSmallRingTorsions = True

# default rdDistGeom.EmbedMultipleConfs() settings
    randomSeed=-1           RNG will not be seeded
    useRandomCoords=False
    pruneRmsThresh=-1.0     No pruning by default
    numThreads=1
    useRandomCoords=False   Start the embedding from random coordinates 
                            instead of using eigenvalues of the distance matrix
    numThreads=1            Number of threads to use while embedding
                            Only if built with multi-thread support
                            =0, the max supported by the system

Number of conformers¶

RotBonds	N_confs	RotBonds	N_confs
0	1	11	434
1	8	12	501
2	26	13	572
3	51	14	646
4	82	15	723
5	119	16	804
6	160	17	888
7	207	18	976
8	257	19	1000
9	312	20	1000
10	371

ETKDG method¶

The ETKDG method of Riniker and Landrum uses torsion angle preferences from the Cambridge Structural Database (CSD) to correct the conformers after distance geometry has been used to generate them. With this method, there should be no need to use a minimisation step to clean up the structures. Since the 2018.09 release of the RDKit, ETKDG is the default conformer generation method.

Optimization on a molecule’s conformers is not often necessary when using ETKDG:

res = AllChem.MMFFOptimizeMoleculeConfs(m)

Rms between conformers

AllChem.AlignMolConformers(m, RMSlist=rmslist)
rms = AllChem.GetConformerRMS(m, 1, 9, prealigned=True)

Contrary to the above Greg's comment, RDKit ETKDG conformer quality actually improves greatly by MMFF94 or MMFF94s optimization after embedding. UFF optimization is less effective than MMFF94 or MMFF94s.

Torsional Diffusion is the most efficient and accurate AI method according to Wang et al. (2023)

Auto3D uses RDKit conformers as a starting point for optimizing with a modified version of the ANI-2x deep learning molecular potential (AIMNET), but this does not result in better performance than that of RDKit in the bioactive conformation identification task.

Generating a larger ensemble will monotonically increase the likelihood of retrieving a bioactive conformation, but for efficiency reasons, it is desirable to generate smaller ensembles. As shown in Figures 5 and S6,
selecting the lowest energy poses from a larger ensemble does not improve the retrieval of bioactive conformations with the exception of reducing down to a single conformer.

References¶

L. Chan, G. M. Morris, G. R. Hutchison, Understanding Conformational Entropy in Small Molecules. J. Chem. Theory Comput. 17, 2099–2106 (2021).
Riniker, S.; Landrum, G. A. “Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation” J. Chem. Inf. Comp. Sci. 55:2562-74 (2015)
Z. Wang, et al., Small-Molecule Conformer Generators: Evaluation of Traditional Methods and AI Models on High-Quality Data Sets. J. Chem. Inf. Model. 63, 6525–6536 (2023).
B. Jing, G. Corso, J. Chang, R. Barzilay, T. Jaakkola, Torsional Diffusion for Molecular Conformer Generation. arXiv [physics.chem-ph] (2022).
A. T. McNutt, et al., Conformer Generation for Structure-Based Drug Design: How Many and How Good? J. Chem. Inf. Model. 63, 6598–6607 (2023).

Clustering:¶

Clustering conformers

Usage of `Butina.ClusterData()`¶

isDistData: set this toggle when the data passed in is a
distance matrix. The distance matrix should be stored symmetrically.
distFunc: a function to calculate distances between points. Receives 2 points as arguments, should return a float
reordering: if this toggle is set, the number of neighbors is updated for the unassigned molecules after a new cluster is created such that always the molecule with the largest number of unassigned neighbors is selected as the next cluster center.
returns a tuple of tuples containing information about the clusters: ( (cluster1_elem1, cluster1_elem2, …), (cluster2_elem1, cluster2_elem2, …), …, ) where the first element for each cluster is its centroid.
https://www.rdkit.org/docs/source/rdkit.ML.Cluster.Butina.html

References¶

D. Butina, Unsupervised Data Base Clustering Based on Daylight’s Fingerprint and Tanimoto Similarity: A Fast and Automated Way To Cluster Small and Large Data Sets. J. Chem. Inf. Comput. Sci. 39, 747–750 (1999).

Fingerprint¶

Tutorial

fpgen = rdFingerprintGenerator.GetMorganGenerator(radius=2, fpSize=2048)

fpgen.GetFingerprint(m)               : returns a bit vector of size fpSize
fpgen.GetCountFingerprint(m)          : returns a count vector of size fpSize
fpgen.GetSparseFingerprint(m)         : returns a sparse bit vector
fpgen.GetSparseCountFingerprint(m)    : returns a sparse count vector

Note that sparse vector is very long.

Similarity¶

Similarity search thresholds

Druglikeness and QED¶

References¶

G. R. Bickerton, G. V. Paolini, J. Besnard, S. Muresan, A. L. Hopkins, Quantifying the chemical beauty of drugs. Nat. Chem. 4, 90–98 (2012).

Hydrogen Bonding Acceptor¶

QED definition in RDkit

Acceptors = [
    Chem.MolFromSmarts(hba) for hba in [
        '[oH0;X2]', 
        '[OH1;X2;v2]', 
        '[OH0;X2;v2]', 
        '[OH0;X1;v2]', 
        '[O-;X1]', 
        '[SH0;X2;v2]', 
        '[SH0;X1;v2]',
        '[S-;X1]',
        '[nH0;X2]',
        '[NH0;X1;v3]',
        '[$([N;+0;X3;v3]);!$(N[C,S]=O)]'
    ]]

API¶

`rdworks.mol.Mol` ¶

Container for molecular structure, conformers, and other information.

Attributes¶

`ETKDG_params = rdDistGeom.ETKDGv3()` `class-attribute` `instance-attribute` ¶

`InChIKey = generate_inchi_key(self.rdmol)` `instance-attribute` ¶

`MFP2 = rdFingerprintGenerator.GetMorganGenerator(radius=2, fpSize=2048)` `class-attribute` `instance-attribute` ¶

`charge` `property` ¶

Returns molecular formal charge

Returns:

int ( int ) –

molecular formal charge

`chunksize = chunksize` `instance-attribute` ¶

`confs = [molecule]` `instance-attribute` ¶

`fp = None` `instance-attribute` ¶

`is_confs_aligned = False` `instance-attribute` ¶

`is_stereo_specified` `property` ¶

Check if the molecule is stereo-specified at tetrahedral atom and double bond.

This function uses Chem.FindPotentialStereo() function which returns a list of elements. Explanation of the elements: element.type: whether the element is a stereocenter ('stereoAtom') or a stereobond ('stereoBond') - Atom_Octahedral - Atom_SquarePlanar - Atom_Tetrahedral - Atom_TrigonalBipyramidal - Bond_Atropisomer - Bond_Cumulene_Even - Bond_Doublem. - Unspecified

element.centeredOn:
    The atom or bond index where the stereochemistry is centered.

element.specified:
    A boolean indicating whether the stereochemistry at that location
    is explicitly specified in the molecule.
    values = {
        0: rdkit.Chem.rdchem.StereoSpecified.Unspecified,
        1: rdkit.Chem.rdchem.StereoSpecified.Specified,
        2: rdkit.Chem.rdchem.StereoSpecified.Unknown,
        }

element.descriptor:
    A descriptor that can be used to identify the type of stereochemistry (e.g., 'R', 'S', 'E', 'Z').
    - Bond_Cis = rdkit.Chem.StereoDescriptor.Bond_Cis
    - Bond_Trans = rdkit.Chem.StereoDescriptor.Bond_Trans
    - NoValue = rdkit.Chem.StereoDescriptor.NoValue
    - Tet_CCW = rdkit.Chem.StereoDescriptor.Tet_CCW
    - Tet_CW = rdkit.Chem.StereoDescriptor.Tet_CW

Returns:

bool ( bool ) –

True if stereo-specified.

`max_workers = max_workers` `instance-attribute` ¶

`molblock` `property` ¶

Returns MolBlock

`name = str(name)` `instance-attribute` ¶

`num_confs` `property` ¶

Returns the total number of conformers.

Returns:

int ( int ) –

total count of conformers.

`num_stereoisomers` `property` ¶

Counts number of all possible stereoisomers ignoring the current stereochemistry.

Returns:

int ( int ) –

number of stereoisomers.

`numbers` `property` ¶

Returns the atomic numbers.

Returns:

list ( list[int] ) –

list of atomic numbers.

`progress = progress` `instance-attribute` ¶

`props = {}` `instance-attribute` ¶

`rdmol = None` `instance-attribute` ¶

`ring_bond_stereo_info` `property` ¶

Returns double bond and cis/trans stereochemistry information.

Returns:

list[tuple] –

list[tuple]: [(element.centeredOn, element.descriptor), ...]

`smiles = Chem.MolToSmiles(self.rdmol)` `instance-attribute` ¶

`symbols` `property` ¶

Returns the element symbols.

Returns:

list ( list[str] ) –

list of element symbols.

Functions¶

`align_confs(method='rigid_fragment')` ¶

Aligns all conformers to the first conformer.

Parameters:

method (str, default: 'rigid_fragment' ) –

alignment method: rigid_fragment, CrippenO3A, MMFFO3A, best_rms. Defaults to rigid_fragment.

Returns:

Self ( Self ) –

modified self.

`calculate_sp_torsion_energies(calculator='MMFF94', torsion_angle_idx=None, simplify=True, interval=20.0, water=None, batchsize_atoms=0)` ¶

Calculates potential energy profiles for each torsion angle using ASE optimizer.

It uses the first conformer as a reference.

Parameters:

calculator (str | Callable, default: 'MMFF94' ) –

'MMFF', 'UFF', 'xTB' or ASE calculator.
torsion_angle_idx (int | None, default: None ) –

torsion index to calculate. Defaults to None (all).
simplify (bool, default: True ) –

whether to use fragment surrogate. Defaults to True.
interval (float, default: 20.0 ) –

interval of torsion angles in degree. Defaults to 15.0.
batchsize_atoms (int, default: 0 ) –

maximum number of atoms in a single batch. Setting any number smaller than conf.natoms to disable batch optimization. Defaults to 0.

Args for xTB calculator

water (str, optional): water solvation model (choose 'gbsa', 'alpb', or 'cpcmx') alpb: ALPB solvation model (Analytical Linearized Poisson-Boltzmann). gbsa: generalized Born (GB) model with Surface Area contributions. cpcmx: Extended Conductor-like Polarizable Continuum Solvation Model (CPCM-X). Defaults to None.

Returns:

Self ( Self ) –

modified self.

`calculate_torsion_energies(calculator='MMFF94', torsion_angle_idx=None, simplify=True, fmax=0.05, interval=20.0, use_converged_only=True, batchsize_atoms=0, water=None)` ¶

Calculates potential energy profiles for each torsion angle using ASE optimizer.

It uses the first conformer as a reference.

Parameters:

calculator (str | Callable, default: 'MMFF94' ) –

'MMFF', 'UFF', 'xTB' or ASE calculator.
torsion_angle_idx (int | None, default: None ) –

torsion index to calculate. Defaults to None (all).
simplify (bool, default: True ) –

whether to use fragment surrogate. Defaults to True.
fmax (float, default: 0.05 ) –

fmax of ASE optimizer. Defaults to 0.05.
interval (float, default: 20.0 ) –

interval of torsion angles in degree. Defaults to 15.0.
use_converged_only (bool, default: True ) –

whether to use only converged data. Defaults to True.
batchsize_atoms (int, default: 0 ) –

maximum number of atoms in a single batch. Setting any number smaller than conf.natoms to disable batch optimization. Defaults to 0.

Args for xTB calculator

water (str, optional): water solvation model (choose 'gbsa' or 'alpb') alpb: ALPB solvation model (Analytical Linearized Poisson-Boltzmann). gbsa: generalized Born (GB) model with Surface Area contributions. Defaults to None.

Returns:

Self ( Self ) –

modified self.

`cluster_confs(method='QT', threshold=1.0, sort='size', symmetry_aware=True)` ¶

Clusters all conformers and sets cluster properties.

Conformers are expected to be aligned.

cluster, cluster_mean_energy,

cluster_median_energy, cluster_IQR_energy, cluster_size, cluster_centroid (True or False)

RCKMeans algorithm is unreliable and not supported for now.

Parameters:

method (str, default: 'QT' ) –

clustering algorithm: Butina, QT, NMRCLUST, DQT, BitQT, DynamicTreeCut, AutoGraph. Defaults to QT.
threshold (float, default: 1.0 ) –

RMSD threshold of a cluster. Defaults to 1.0.
sort (str, default: 'size' ) –

sort cluster(s) by mean energy or cluster size. Defaults to size.
symmetry_aware (bool, default: True ) –

whether to use symmetry-aware rmsd. Defaults to True.

Raises:

NotImplementedError –

if unsupported method is requested.

Returns:

Self ( Self ) –

modified self.

`compute(**kwargs)` ¶

Change settings for parallel computing.

Parameters:

max_workers (int) –

max number of workers.
chunksize (int) –

chunksize of splitted workload.
progress (bool) –

whether to show progress bar.

Returns:

Self ( Self ) –

modified self.

`copy()` ¶

Returns a copy of self.

Returns:

Self –

a copy of self.

`count()` ¶

Returns the number of conformers

`deserialize(serialized)` ¶

De-serialize the information and build a new Mol object.

Example

serialized = mol1.serialize() mol2 = Mol().deserialize(serialized)

Parameters:

serialized (str) –

serialized string.

Returns:

Self ( Self ) –

modified self.

`draw(coordgen=False, rotate=False, axis='z', degree=0.0)` ¶

Draw molecule in 2D.

Parameters:

coordgen (bool, default: False ) –

whether to use coordgen. Defaults to False.
rotate (bool, default: False ) –

whether to rotate drawing. Defaults to False.
axis (str, default: 'z' ) –

axis for rotation. Defaults to 'z'.
degree (float, default: 0.0 ) –

degree for rotation. Defaults to 0.0.

Returns:

Self –

Self.

`drop_confs(stereo_flipped=True, unconverged=True, similar=None, similar_rmsd=0.3, cluster=None, k=None, window=None, **kwargs)` ¶

Drop conformers that meet some condition(s).

Parameters:

stereo_flipped (bool, default: True ) –

drop conformers whose R/S and cis/trans stereo is unintentionally flipped. For example, a trans double bond in a macrocyle can end up with both trans and cis isomers in the final optimized conformers.
unconverged (bool, default: True ) –

drop unconverged conformers. see Converged property.
similar (bool, default: None ) –

drop similar conformers. see similar_rmsd.
similar_rmsd (float, default: 0.3 ) –

RMSD (A) below similar_rmsd is regarded similar (default: 0.3)
cluster (bool, default: None ) –

drop all except for the lowest energy conformer in each cluster.
k (int, default: None ) –

drop all except for k lowest energy conformers.
window (float, default: None ) –

drop all except for conformers within window of relative energy.

Examples:

To drop similar conformers within rmsd of 0.5 A

>>> mol.drop_confs(similar=True, similar_rmsd=0.5)

To drop conformers beyond 5 kcal/mol

>>> mol.drop_confs(window=5.0)

Returns:

Self ( Self ) –

modified self.

`dumps(key='', decimals=2)` ¶

Returns JSON dumps of properties.

Parameters:

key (str | None, default: '' ) –

key for a subset of properties. Defaults to None.
decimals (int, default: 2 ) –

decimal places for float numbers. Defaults to 2.

Returns:

str ( str ) –

JSON dumps.

`from_molblock(molblock, compressed=False)` ¶

Initialize a new Mol object from MolBlock.

Parameters:

molblock (str) –

MolBlock string

Raises:

ValueError –

invalid MolBlock

Returns:

Self ( Self ) –

self.

`get_SASA()` ¶

Get Solvent Accessible Surface Area

`get_similarity(other)` ¶

Returns Tanimoto similarity with other Mol object.

Parameters:

other (Mol) –

other Mol object.

Raises:

TypeError –

if other is not Mol object type.

Returns:

float ( float ) –

Tanimoto similarity.

`get_torsion_angle_atoms(strict=True)` ¶

Determine torsion/dihedral angle atoms (i-j-k-l) and rotating group for each rotatable bond (j-k).

Parameters:

strict (bool, default: True ) –

whether to exclude amide/imide/ester/acid bonds.

Returns:

list[tuple] –

[(i, j, k, l), ...]

`has_substr(substr)` ¶

Determine if the molecule has the substructure match.

Parameters:

pattern (str) –

SMARTS or SMILES.

Returns:

bool ( bool ) –

True if matches.

`is_matching(terms, invert=False)` ¶

Determines if the molecule matches the predefined substructure and/or descriptor ranges.

invert	terms(~ or !)	effect
True	~	No inversion
True		Inversion
False	~	Inversion
False		No inversion

Parameters:

terms (str | Path) –

substructure SMARTS expression or a path to predefined descriptor ranges.
invert (bool, default: False ) –

whether to invert the result. Defaults to False.

Returns:

bool ( bool ) –

True if matches.

`is_nnp_ready(model='aimnet2')` ¶

Check if a particular neural network model is applicable to current molecule.

Parameters:

model (str, default: 'aimnet2' ) –

neural network models: ANI-2x, ANI-2xt, AIMNET

Raises:

ValueError –

if model is not supported.

Returns:

bool ( bool ) –

True if applicable.

`is_similar(other, threshold)` ¶

Check if other molecule is similar within Tanimoto similarity threshold.

Parameters:

other (Mol) –

other Mol object to compare with.
threshold (float) –

Tanimoto similarity threshold.

Returns:

bool ( bool ) –

True if similar.

`make_confs(n=50, method='ETKDG', **kwargs)` ¶

Generates 3D conformers.

Parameters:

n (int, default: 50 ) –

number of conformers to generate. Defaults to 50.
method (str, default: 'ETKDG' ) –

conformer generation method. Choices are ETKDG, CONFORGE. Defaults to 'ETKDG'.

Returns:

Self ( Self ) –

modified self.

Reference

T. Seidel, C. Permann, O. Wieder, S. M. Kohlbacher, T. Langer, High-Quality Conformer Generation with CONFORGE: Algorithm and Performance Assessment. J. Chem. Inf. Model. 63, 5549-5570 (2023).

`optimize_confs(calculator='MMFF94', fmax=0.05, max_iter=1000, water=None, batchsize_atoms=0)` ¶

Optimizes 3D geometry of conformers.

Parameters:

calculator (str | Callable, default: 'MMFF94' ) –

MMFF94 (= MMFF), MMFF94s, UFF, or ASE calculator. MMFF94 or MMFF - Intended for general use, including organic molecules and proteins, and primarily relies on data from quantum mechanical calculations. It's often used in molecular dynamics simulations. MMFF94s - A "static" variant of MMFF94, with adjusted parameters for out-of-plane bending and dihedral torsions to favor planar geometries for specific nitrogen atoms. This makes it better suited for geometry optimization studies where a static, time-averaged structure is desired. The "s" stands for "static". UFF - UFF refers to the "Universal Force Field," a force field model used for molecular mechanics calculations. It's a tool for geometry optimization, energy minimization, and exploring molecular conformations in 3D space. UFF is often used to refine conformers generated by other methods, such as random conformer generation, to produce more physically plausible and stable structures.
fmax (float, default: 0.05 ) –

fmax for the calculator convergence. Defaults to 0.05.
max_iter (int, default: 1000 ) –

max iterations for the calculator. Defaults to 1000.
batchsize_atoms (int, default: 0 ) –

max number of atoms in one batch. Defaults to 16384(=16*1024). Disable batch optimization if zero or negative.

Args for xTB calculator

water (str, optional): water solvation model (choose 'gbsa' or 'alpb') alpb: ALPB solvation model (Analytical Linearized Poisson-Boltzmann). gbsa: generalized Born (GB) model with Surface Area contributions. Defaults to None.

Returns:

Self ( Self ) –

modified self.

`qed(properties=['QED', 'MolWt', 'LogP', 'TPSA', 'HBD'])` ¶

Updates quantitative estimate of drug-likeness (QED) and other descriptors.

Parameters:

properties (list[str], default: ['QED', 'MolWt', 'LogP', 'TPSA', 'HBD'] ) –

Defaults to ['QED', 'MolWt', 'LogP', 'TPSA', 'HBD'].

Raises:

KeyError –

if property key is unknown.

Returns:

Self ( Self ) –

modified self.

`remove_stereo()` ¶

Removes stereochemistry.

Examples:

>>> m = Mol("C/C=C/C=C\C", "double_bond")
>>> m.remove_stereo().smiles == "CC=CC=CC"

Returns:

Self ( Self ) –

modified self.

`rename(prefix='', sep='/', start=1)` ¶

Updates name and conformer names.

The first conformer name is {prefix}{sep}{start}

Parameters:

prefix (str, default: '' ) –

prefix of the name. Defaults to ''.
sep (str, default: '/' ) –

separtor betwween prefix and serial number. Defaults to '/'.
start (int, default: 1 ) –

first serial number. Defaults to 1.

Returns:

Self ( Self ) –

modified self.

`report_props()` ¶

Report properties

`report_stereo()` ¶

Report stereochemistry information for debug

`serialize(decimals=3)` ¶

Serialize information necessary to rebuild a Mol object.

Parameters:

decimals (int, default: 3 ) –

number of decimal places for float data type. Defaults to 2.

Returns:

str ( str ) –

serialized string for json.loads()

`singlepoint_confs(calculator, water=None, batchsize_atoms=0)` ¶

Evaluates potential energy of each conformer without geometry optimization.

It sets E_tot(kcal/mol) property for each conformer.

Parameters:

calculator (str | Callable) –

MMFF94 (= MMFF), MMFF94s, UFF, xTB or ASE calculator. MMFF94 or MMFF - Intended for general use, including organic molecules and proteins, and primarily relies on data from quantum mechanical calculations. It's often used in molecular dynamics simulations. MMFF94s - A "static" variant of MMFF94, with adjusted parameters for out-of-plane bending and dihedral torsions to favor planar geometries for specific nitrogen atoms. This makes it better suited for geometry optimization studies where a static, time-averaged structure is desired. The "s" stands for "static". UFF - UFF refers to the "Universal Force Field," a force field model used for molecular mechanics calculations. It's a tool for geometry optimization, energy minimization, and exploring molecular conformations in 3D space. UFF is often used to refine conformers generated by other methods, such as random conformer generation, to produce more physically plausible and stable structures.
fmax (float) –

fmax for the calculator convergence. Defaults to 0.05.
max_iter (int) –

max iterations for the calculator. Defaults to 1000.
batchsize_atoms (int, default: 0 ) –

maximum number of atoms in a single batch. Setting any number smaller than conf.natoms to disable batch optimization. Defaults to 0.

Args for xTB calculator

water (str, optional): water solvation model (choose 'gbsa' or 'alpb') alpb: ALPB solvation model (Analytical Linearized Poisson-Boltzmann). gbsa: generalized Born (GB) model with Surface Area contributions. Defaults to None.

Returns:

Self ( Self ) –

modified self.

`sort_confs(calculator=None, **kwargs)` ¶

Sorts by E_tot(kcal/mol) or E_tot(eV) and sets E_rel(kcal/mol).

Parameters:

calculator (str | Callable | None, default: None ) –

MMFF94 (= MMFF), MMFF94s, UFF, or ASE calculator. MMFF94 or MMFF - Intended for general use, including organic molecules and proteins, and primarily relies on data from quantum mechanical calculations. It's often used in molecular dynamics simulations. MMFF94s - A "static" variant of MMFF94, with adjusted parameters for out-of-plane bending and dihedral torsions to favor planar geometries for specific nitrogen atoms. This makes it better suited for geometry optimization studies where a static, time-averaged structure is desired. The "s" stands for "static". UFF - UFF refers to the "Universal Force Field," a force field model used for molecular mechanics calculations. It's a tool for geometry optimization, energy minimization, and exploring molecular conformations in 3D space. UFF is often used to refine conformers generated by other methods, such as random conformer generation, to produce more physically plausible and stable structures.

Raises: KeyError: if E_tot(eV) or E_tot(kcal/mol) is not defined.

Returns:

Self ( Self ) –

modified self.

`to_plot_data_torsion_angle_vs_energy(idx=None)` ¶

Returns plot data for torsion angle vs energy.

For seaborn plot,

```py
data = self.props['torsion'][torsion_angle_idx]
df = pd.DataFrame({ax: data[ax] for ax in ['angle', 'E_rel(kcal/mol)']})

plt.figure(**kwargs)
plt.clf()  # Clear the current figure to prevent overlapping plots

sns.set_theme()
sns.color_palette("tab10")
sns.set_style("whitegrid")

if len(df['angle']) == len(df['angle'].drop_duplicates()):
    g = sns.lineplot(x="angle",
                    y="E_rel(kcal/mol)",
                    data=df,
                    marker='o',
                    markersize=10)
else:
    g = sns.lineplot(x="angle",
                    y="E_rel(kcal/mol)",
                    data=df,
                    errorbar=('ci', 95),
                    err_style='bars',
                    marker='o',
                    markersize=10)
g.xaxis.set_major_locator(ticker.MultipleLocator(30))
g.xaxis.set_major_formatter(ticker.ScalarFormatter())
if df["E_rel(kcal/mol)"].max() > upper_limit:
    g.set(title=self.name,
        xlabel='Dihedral Angle (degree)',
        ylabel='Relative Energy (Kcal/mol)',
        xlim=(-190, 190),
        ylim=(-1.5, upper_limit))
elif df["E_rel(kcal/mol)"].max() < zoomin_limit:
    g.set(title=self.name,
        xlabel='Dihedral Angle (degree)',
        ylabel='Relative Energy (Kcal/mol)',
        xlim=(-190, 190),
        ylim=(-1.5, zoomin_limit))
else:
    g.set(title=self.name,
        xlabel='Dihedral Angle (degree)',
        ylabel='Relative Energy (Kcal/mol)',
        xlim=(-190, 190),)
g.tick_params(axis='x', rotation=30)

if svg:
    buf = StringIO()
    plt.savefig(buf, format='svg', bbox_inches='tight')
    plt.close() # prevents duplicate plot outputs in Jupyter Notebook
    svg_string = buf.getvalue()
    # optimize SVG string
    scour_options = {
        'strip_comments': True,
        'strip_ids': True,
        'shorten_ids': True,
        'compact_paths': True,
        'indent_type': 'none',
    }
    svg_string = scourString(svg_string, options=scour_options)

    return svg_string

else:
    buf = BytesIO()
    plt.savefig(buf, format='png', bbox_inches='tight')
    plt.close() # prevents duplicate plot outputs in Jupyter Notebook
    buf.seek(0)
    img = Image.open(buf)
    plt.imshow(img)
    plt.axis('off') # Optional: remove axes
    plt.show()
```

Args:
    idx (int, optional) - 0-based torsion angle index.
        Defaults to None (all torsion angles).

Returns:
    {'idx': list, 'angle': list, 'E_rel(kcal/mol)': list}

`to_png(width=300, height=300, legend='', atom_index=False, highlight_atoms=None, highlight_bonds=None, redraw=False, coordgen=False, trim=True)` ¶

Draw 2D molecule in PNG format.

Parameters:

width (int, default: 300 ) –

width. Defaults to 300.
height (int, default: 300 ) –

height. Defaults to 300.
legend (str, default: '' ) –

legend. Defaults to ''.
atom_index (bool, default: False ) –

whether to show atom index. Defaults to False.
highlight_atoms (list[int] | None, default: None ) –

atom(s) to highlight. Defaults to None.
highlight_bonds (list[int] | None, default: None ) –

bond(s) to highlight. Defaults to None.
redraw (bool, default: False ) –

whether to redraw. Defaults to False.
coordgen (bool, default: False ) –

whether to use coordgen. Defaults to False.
trim (bool, default: True ) –

whether to trim white margins. Default to True.

Returns:

Image –

Image.Image: output PIL Image object.

`to_sdf(confs=False, props=True)` ¶

Returns strings of SDF output.

Parameters:

confs (bool, default: False ) –

whether to include conformers. Defaults to False.
props (bool, default: True ) –

whether to include properties. Defaults to True.

Returns:

str ( str ) –

strings of SDF output.

`to_svg(width=300, height=300, legend='', atom_index=False, highlight_atoms=None, highlight_bonds=None, redraw=False, coordgen=False, optimize=True)` ¶

Draw 2D molecule in SVG format.

Examples:

For Jupyternotebook, wrap the output with SVG:

>>> from IPython.display import SVG
>>> SVG(libr[0].to_svg())

Parameters:

width (int, default: 300 ) –

width. Defaults to 300.
height (int, default: 300 ) –

height. Defaults to 300.
legend (str, default: '' ) –

legend. Defaults to ''.
atom_index (bool, default: False ) –

whether to show atom index. Defaults to False.
highlight_atoms (list[int] | None, default: None ) –

atom(s) to highlight. Defaults to None.
highlight_bonds (list[int] | None, default: None ) –

bond(s) to highlight. Defaults to None.
redraw (bool, default: False ) –

whether to redraw. Defaults to False.
coordgen (bool, default: False ) –

whether to use coordgen. Defaults to False.
optimize (bool, default: True ) –

whether to optimize SVG string. Defaults to True.

Returns:

str ( str ) –

SVG string

Molecule

Conformer Generation¶

Number of conformers¶

ETKDG method¶

References¶

Clustering:¶

Usage of Butina.ClusterData()¶

References¶

Fingerprint¶

Similarity¶

Druglikeness and QED¶

References¶

Hydrogen Bonding Acceptor¶

API¶

rdworks.mol.Mol ¶

Attributes¶

ETKDG_params = rdDistGeom.ETKDGv3() class-attribute instance-attribute ¶

InChIKey = generate_inchi_key(self.rdmol) instance-attribute ¶

MFP2 = rdFingerprintGenerator.GetMorganGenerator(radius=2, fpSize=2048) class-attribute instance-attribute ¶

charge property ¶

chunksize = chunksize instance-attribute ¶

confs = [molecule] instance-attribute ¶

fp = None instance-attribute ¶

is_confs_aligned = False instance-attribute ¶

is_stereo_specified property ¶

max_workers = max_workers instance-attribute ¶

molblock property ¶

name = str(name) instance-attribute ¶

num_confs property ¶

num_stereoisomers property ¶

numbers property ¶

progress = progress instance-attribute ¶

props = {} instance-attribute ¶

rdmol = None instance-attribute ¶

ring_bond_stereo_info property ¶

smiles = Chem.MolToSmiles(self.rdmol) instance-attribute ¶

symbols property ¶

Functions¶

align_confs(method='rigid_fragment') ¶

calculate_sp_torsion_energies(calculator='MMFF94', torsion_angle_idx=None, simplify=True, interval=20.0, water=None, batchsize_atoms=0) ¶

calculate_torsion_energies(calculator='MMFF94', torsion_angle_idx=None, simplify=True, fmax=0.05, interval=20.0, use_converged_only=True, batchsize_atoms=0, water=None) ¶

cluster_confs(method='QT', threshold=1.0, sort='size', symmetry_aware=True) ¶

compute(**kwargs) ¶

copy() ¶

count() ¶

deserialize(serialized) ¶

draw(coordgen=False, rotate=False, axis='z', degree=0.0) ¶

drop_confs(stereo_flipped=True, unconverged=True, similar=None, similar_rmsd=0.3, cluster=None, k=None, window=None, **kwargs) ¶

dumps(key='', decimals=2) ¶

from_molblock(molblock, compressed=False) ¶

get_SASA() ¶

get_similarity(other) ¶

get_torsion_angle_atoms(strict=True) ¶

has_substr(substr) ¶

is_matching(terms, invert=False) ¶

is_nnp_ready(model='aimnet2') ¶

is_similar(other, threshold) ¶

make_confs(n=50, method='ETKDG', **kwargs) ¶

optimize_confs(calculator='MMFF94', fmax=0.05, max_iter=1000, water=None, batchsize_atoms=0) ¶

qed(properties=['QED', 'MolWt', 'LogP', 'TPSA', 'HBD']) ¶

remove_stereo() ¶

rename(prefix='', sep='/', start=1) ¶

report_props() ¶

report_stereo() ¶

serialize(decimals=3) ¶

singlepoint_confs(calculator, water=None, batchsize_atoms=0) ¶

sort_confs(calculator=None, **kwargs) ¶

to_plot_data_torsion_angle_vs_energy(idx=None) ¶

to_png(width=300, height=300, legend='', atom_index=False, highlight_atoms=None, highlight_bonds=None, redraw=False, coordgen=False, trim=True) ¶

to_sdf(confs=False, props=True) ¶

to_svg(width=300, height=300, legend='', atom_index=False, highlight_atoms=None, highlight_bonds=None, redraw=False, coordgen=False, optimize=True) ¶

Usage of `Butina.ClusterData()`¶

`rdworks.mol.Mol` ¶

`ETKDG_params = rdDistGeom.ETKDGv3()` `class-attribute` `instance-attribute` ¶

`InChIKey = generate_inchi_key(self.rdmol)` `instance-attribute` ¶

`MFP2 = rdFingerprintGenerator.GetMorganGenerator(radius=2, fpSize=2048)` `class-attribute` `instance-attribute` ¶

`charge` `property` ¶

`chunksize = chunksize` `instance-attribute` ¶

`confs = [molecule]` `instance-attribute` ¶

`fp = None` `instance-attribute` ¶

`is_confs_aligned = False` `instance-attribute` ¶

`is_stereo_specified` `property` ¶

`max_workers = max_workers` `instance-attribute` ¶

`molblock` `property` ¶

`name = str(name)` `instance-attribute` ¶

`num_confs` `property` ¶

`num_stereoisomers` `property` ¶

`numbers` `property` ¶

`progress = progress` `instance-attribute` ¶

`props = {}` `instance-attribute` ¶

`rdmol = None` `instance-attribute` ¶

`ring_bond_stereo_info` `property` ¶

`smiles = Chem.MolToSmiles(self.rdmol)` `instance-attribute` ¶

`symbols` `property` ¶

`align_confs(method='rigid_fragment')` ¶

`calculate_sp_torsion_energies(calculator='MMFF94', torsion_angle_idx=None, simplify=True, interval=20.0, water=None, batchsize_atoms=0)` ¶

`calculate_torsion_energies(calculator='MMFF94', torsion_angle_idx=None, simplify=True, fmax=0.05, interval=20.0, use_converged_only=True, batchsize_atoms=0, water=None)` ¶

`cluster_confs(method='QT', threshold=1.0, sort='size', symmetry_aware=True)` ¶

`compute(**kwargs)` ¶

`copy()` ¶

`count()` ¶

`deserialize(serialized)` ¶

`draw(coordgen=False, rotate=False, axis='z', degree=0.0)` ¶

`drop_confs(stereo_flipped=True, unconverged=True, similar=None, similar_rmsd=0.3, cluster=None, k=None, window=None, **kwargs)` ¶

`dumps(key='', decimals=2)` ¶

`from_molblock(molblock, compressed=False)` ¶

`get_SASA()` ¶

`get_similarity(other)` ¶

`get_torsion_angle_atoms(strict=True)` ¶

`has_substr(substr)` ¶

`is_matching(terms, invert=False)` ¶

`is_nnp_ready(model='aimnet2')` ¶

`is_similar(other, threshold)` ¶

`make_confs(n=50, method='ETKDG', **kwargs)` ¶

`optimize_confs(calculator='MMFF94', fmax=0.05, max_iter=1000, water=None, batchsize_atoms=0)` ¶

`qed(properties=['QED', 'MolWt', 'LogP', 'TPSA', 'HBD'])` ¶

`remove_stereo()` ¶

`rename(prefix='', sep='/', start=1)` ¶

`report_props()` ¶

`report_stereo()` ¶

`serialize(decimals=3)` ¶

`singlepoint_confs(calculator, water=None, batchsize_atoms=0)` ¶

`sort_confs(calculator=None, **kwargs)` ¶

`to_plot_data_torsion_angle_vs_energy(idx=None)` ¶

`to_png(width=300, height=300, legend='', atom_index=False, highlight_atoms=None, highlight_bonds=None, redraw=False, coordgen=False, trim=True)` ¶

`to_sdf(confs=False, props=True)` ¶

`to_svg(width=300, height=300, legend='', atom_index=False, highlight_atoms=None, highlight_bonds=None, redraw=False, coordgen=False, optimize=True)` ¶