Microstate

API¶

`rdworks.microstate` ¶

Attributes¶

`AcidBasePatterns = pd.read_csv(smarts_path / 'smarts_pattern.csv')` `module-attribute` ¶

`AcidBasePatternsSimple = pd.read_csv(smarts_path / 'simple_smarts_pattern.csv')` `module-attribute` ¶

`UnreasonablePatterns = list(map(Chem.MolFromSmarts, ['[#6X5]', '[#7X5]', '[#8X4]', '[r]=[r]=[r]', '[#1]-[+1]~[-1]', '[#1]-[+1]=,:[]-,:[-1]', '[#1]-[+1]-,:[]=,:[-1]', '[+2]', '[-2]', '[#1]-[#8+1].[#8-1,#7-1,#6-1]', '[#1]-[#7+1,#8+1].[#7-1,#6-1]', '[#1]-[#8+1].[#8-1,#6-1]', '[#1]-[#7+1].[#8-1]-[C](-[C,#1])(-[C,#1])', '[OX1]=[C]-[OH2+1]', '[NX1,NX2H1,NX3H2]=[C]-[O]-[H]', '[#6-1]=[]-[*]', '[cX2-1]', '[N+1](=O)-[O]-[H]']))` `module-attribute` ¶

`ln10 = math.log(10)` `module-attribute` ¶

`logger = logging.getLogger(name)` `module-attribute` ¶

`smarts_path = importlib.resources.files('rdworks.data.ionized')` `module-attribute` ¶

Classes¶

`InflectionDetector` ¶

Inflection point detection with plateau analysis. Distinguishes real inflections from noise by analyzing curve behavior.

Attributes¶

`curve_info = 'continuous_no_plateaus'` `instance-attribute` ¶

`num_plateaus = len(self.plateaus)` `instance-attribute` ¶

`num_transitions = 0` `instance-attribute` ¶

`plateaus = self.detect_plateaus()` `instance-attribute` ¶

`x = self.x[sort_idx]` `instance-attribute` ¶

`y = self.y[sort_idx]` `instance-attribute` ¶

Functions¶

`analyze(require_plateaus=True, high_confidence=True)` ¶

Complete analysis with plateau consideration.

Parameters:

require_plateaus (bool, default: True ) –

Only report inflections between clear plateaus

Returns:

dict ( list ) –

analysis results

`detect_plateaus(derivative_threshold=None, min_plateau_length=5)` ¶

Detect plateau regions in the curve.

Parameters:¶

derivative_threshold : float or None Max abs(dy/dx) to consider as plateau. If None, auto-calculated min_plateau_length : int Minimum number of consecutive points to consider a plateau

Returns:¶

plateaus : list of dicts with 'start_idx', 'end_idx', 'mean_y', 'std_y'

`IonizableSite` `dataclass` ¶

(de)protonation site information

Attributes¶

`acid_base` `instance-attribute` ¶

`atom` `instance-attribute` ¶

`atom_idx` `instance-attribute` ¶

`de` `instance-attribute` ¶

`hs` `instance-attribute` ¶

`name` `instance-attribute` ¶

`pr` `instance-attribute` ¶

`q` `instance-attribute` ¶

`QupkakeMicrostates` ¶

Attributes¶

`acidic_sites = []` `instance-attribute` ¶

`basic_sites = []` `instance-attribute` ¶

`calculator = calculator` `instance-attribute` ¶

`mols = []` `instance-attribute` ¶

`origin = origin` `instance-attribute` ¶

`reference = None` `instance-attribute` ¶

`states = []` `instance-attribute` ¶

Functions¶

`Boltzmann_weighted_average(potential_energies)` `staticmethod` ¶

Calculate Boltzmann weighted average potential energy at pH 0.

Parameters:

potential_energies (list) –

a list of potential energies.

Returns:

float ( float ) –

Boltzmann weighted average potential energy.

`count()` ¶

`enumerate()` ¶

`get_ensemble()` ¶

`get_mol(idx)` ¶

`get_populations(pH)` ¶

`populate()` ¶

`State` ¶

Attributes¶

`charge = None` `instance-attribute` ¶

`energy = None` `instance-attribute` ¶

`max_atomic_charge = max_atomic_charge` `instance-attribute` ¶

`max_formal_charge = max_formal_charge` `instance-attribute` ¶

`min_atomic_charge = min_atomic_charge` `instance-attribute` ¶

`min_formal_charge = min_formal_charge` `instance-attribute` ¶

`origin = origin` `instance-attribute` ¶

`protomer_rule = protomer_rule` `instance-attribute` ¶

`rdmol = None` `instance-attribute` ¶

`rdmolH = None` `instance-attribute` ¶

`ref_ph = None` `instance-attribute` ¶

`sites = []` `instance-attribute` ¶

`smiles = smiles` `instance-attribute` ¶

`tautomer_rule = tautomer_rule` `instance-attribute` ¶

`transformation = transformation` `instance-attribute` ¶

Functions¶

`can_be_deprotonated_at(atom_idx)` ¶

Check if an atom can potentially be deprotonated

`can_be_protonated_at(atom_idx)` ¶

Check if an atom can potentially be protonated

`copy()` ¶

`deserialize(encoded_str)` ¶

Deserialize the state from a string.

`find_ionizable_sites()` ¶

`get_deprotonated(atom_idx=None, site_idx=None)` ¶

Make deprotonated state(s) from the current state.

Parameters:

atom_idx (int | None, default: None ) –

atom index. Defaults to None.
site_idx (int | None, default: None ) –

site index. Defaults to None.

Returns:

list[Self] –

list[Self]: list of deprotonated States.

`get_protonated(atom_idx=None, site_idx=None)` ¶

Make protonated state(s) from the current state.

All ionizable sites are considered for protonation unless atom_idx or site_idx is given.

Parameters:

atom_idx (int | None, default: None ) –

atom index. Defaults to None.
site_idx (int | None, default: None ) –

site index. Defaults to None.

Returns:

list[Self] –

list[Self]: list of protonated States.

`get_tautomers()` ¶

`hydrogen_count(idx)` ¶

`info(index=None)` ¶

`ionize(idx, mode)` ¶

`serialize()` ¶

Serialize the state to a string.

`site_info()` ¶

`update()` ¶

`StateEnsemble` ¶

Attributes¶

`states = []` `instance-attribute` ¶

Functions¶

`copy()` ¶

Copy.

`deserialize(encoded_str)` ¶

Deserialize states from a string.

`drop()` ¶

Drop duplicate and unreasonable states.

`get_charge_groups()` ¶

Get charge groups

Returns:

dict ( dict[int, list[int]] ) –

{charge: [state_idx, ...], ...}

`get_charge_groups_per_site()` ¶

Get charge groups per atom site

Returns:

dict ( dict[int, dict[int, list[int]]] ) –

{atom_idx: {atom_charge: [state_idx, ...], ...}}

`get_macro_pKa(ph_values, p)` ¶

Compute inflection points in pH vs charge that could be macroscopic pKa values. For example, ideal pKa transitions should happen at +2.5, +1.5, +0.5, -0.5 in case of +3 to -1 charges.

Parameters:

ph_values (ndarray) –

Array of pH values to evaluate populations
p (ndarray) –

Population matrix for each state at each pH

Returns:

list[tuple[float, float]] –

list[tuple[float,float]] : [(pH, charge), ...]

`get_micro_pKa(ph_values, p)` ¶

Compute inflection points in pH vs site_charge that could be microscopic pKa values For example, ideal pKa transitions should happen at +1.5, +0.5, -0.5 in case of +2 to -1 charges.

Parameters:

ph_values (ndarray) –

Array of pH values to evaluate populations
p (ndarray) –

Population matrix for each state at each pH

Returns:

dict[int, list[tuple[float, float]]] –

dict[int, tuple[float,float]] : {atom_index: [(pH, charge), ..], ...}

`get_plot_data_pH_vs_charge(ph_values, p)` ¶

`get_plot_data_pH_vs_population(ph_values, p, threshold=0.0)` ¶

Get an Altair plot object for pH-dependent population curve.

For Altair chart:

```py
# plot pH vs population

import pandas as pd
import altair as alt

width = 600
height = 300
palette = 'tableau10'

df = pd.DataFrame(data)

# line plot
lineplot = alt.Chart(df).mark_line().encode(
    x=alt.X('pH:Q', title='pH'),
    y=alt.Y('p:Q', title='Population'),
    color=alt.Color('microstate:N', scale=alt.Scale(scheme=palette)),
).properties(
    width=width,
    height=height)

# data labels
labels = alt.Chart(df).mark_text(
    align='left',
    dx=5,
    dy=-5
).encode(
    x=alt.X('pH', aggregate={'argmax': 'p'}),
    y=alt.Y('p', aggregate={'argmax': 'p'}),
    text='microstate:N',
    color='microstate:N'
)

chart = (lineplot + labels)
```

Args: pH (np.ndarray): array of pH values. C (float): constant for pH-dependent dG calculation. Defaults to ln(10). beta (float, optional): \( \beta = \frac{1}{k_{B} T} \). Defaults to 1.0. threshold (float) : threshold to ignore low populated microstates. Defaults to 0.0.

Returns:

dict, microstate indices –

dict: {'pH':list[float], 'p':list[float], 'microstate':list[int(1-based)]} microstate indices: list of state index (0-based)

`get_plot_data_pH_vs_site_charge(ph_values, p)` ¶

`get_population(ph_values, C=ln10, beta=1.0)` ¶

Get populations at a given pH array.

\[ \begin{align} \Delta G_{i, ref} &= PE_{i} - PE_{ref} \\[0.5em] \Delta m_{i, ref} &= charge_{i} - charge _{ref} \\[0.5em] \Delta G_{i, pH} &= \Delta G_{i, ref} + \Delta m_{i, ref} C pH \\[0.5em] p_{i, pH} &= \frac {exp(-\beta \Delta G_{i, pH})}{\sum_{i} exp(-\beta \Delta G_{i, pH})} \end{align} \]

Parameters:

ph_values (ndarray) –

array of pH values.
C (float, default: ln10 ) –

constant for pH-dependent dG calculation. Defaults to ln(10).
beta (float, default: 1.0 ) –

\( \beta = \frac{1}{k_{B} T} \). Defaults to 1.0.

Returns:

ndarray –

np.ndarray: array of populations with shape of (number of states, number of pH).

`get_state(index)` ¶

Get a state by index

Parameters:

index (int) –

state index.

Returns:

State ( State ) –

State

`info()` ¶

Print information of all states.

`serialize()` ¶

Serialize states to a string.

`set_energies(dG, ref_ph=7.0)` ¶

Set energies to states.

Parameters:

dG (list[float] | ndarray) –

list or array of energies.
ref_ph (float, default: 7.0 ) –

pH at which the energies are calculated. Defaults to 7.0

Returns:

Self ( Self ) –

StateEnsemble

`size()` ¶

Number of states.

`sort(p)` ¶

Sort states by population.

Parameters:

p (ndarray) –

array of population.

Returns:

Self ( Self ) –

StateEnsemble

`trim(p, threshold=0.0)` ¶

Trim states whose pH-dependent population is below a given threshold across pH range 0-14.

Parameters:

p (ndarray) –

array of populations.
threshold (float, default: 0.0 ) –

min population. Defaults to 0.0.

Returns:

Self ( Self ) –

StateEnsemble

`StateNetwork` ¶

Attributes¶

`graph = nx.DiGraph()` `instance-attribute` ¶

`initial_state = None` `instance-attribute` ¶

`visited_states = []` `instance-attribute` ¶

Functions¶

`build(smiles, origin=None, transformation=None, min_formal_charge=-2, max_formal_charge=+2, min_atomic_charge=-1, max_atomic_charge=+1, protomer_rule='default', tautomer_rule=None, verbose=False)` ¶

Build the microstate network using BFS from initial state.

`copy()` ¶

Copy.

`deserialize(encoded_str)` ¶

Deserialize the network from a string.

`get_initial_state()` ¶

Get the initial state.

`get_macro_pKa(ph_values, p)` ¶

Calculatate macro-pKa with provided potential energies.

Args:

Returns:

list[tuple[float, float]] –

list[tuple[float,float]] : [(pH, charge), ...]

`get_micro_pKa(ph_values, p)` ¶

Calculate micro-pKa with provided potential energies.

Args:

Returns:

dict[int, list[tuple[float, float]]] –

dict[int,list[float]]: micro-pKa values for each ionizable site.

`get_num_edges()` ¶

Number of edges in the network.

`get_num_nodes()` ¶

Number of nodes in the network.

`get_plot_data_pH_vs_charge(ph_values, p)` ¶

`get_plot_data_pH_vs_population(ph_values, p, threshold=0.0)` ¶

Make an Altair plot object for pH-dependent population curve.

    Args:
        pH (np.ndarray): array of pH values.
        C (float): constant for pH-dependent dG calculation. Defaults to ln(10).

. beta (float, optional): \( \beta = \frac{1}{k_{B} T} \). Defaults to 1.0. threshold (float) : threshold to ignore low populated microstates. Defaults to 0.0.

    Returns:
        (dict, indices):
            dict: {'pH':list[float], 'p':list[float], 'microstate':list[int(1-based)]}
            indices: list of state index (0-based)

`get_plot_data_pH_vs_site_charge(ph_values, p)` ¶

`get_population(ph_values, C=ln10, beta=1.0)` ¶

Calculate populations with provided potential energies and pH.

The reference state can be arbitrary and is set to the initial_state here.

\[ \begin{align} \Delta G_{i, ref} &= PE_{i} - PE_{ref} \\[0.5em] \Delta m_{i, ref} &= charge_{i} - charge _{ref} \\[0.5em] \Delta G_{i, pH} &= \Delta G_{i, ref} + \Delta m_{i, ref} C pH \\[0.5em] p_{i, pH} &= \frac {exp(-\beta \Delta G_{i, pH})}{\sum_{i} exp(-\beta \Delta G_{i, pH})} \end{align} \]

Parameters:

pH (ndarray) –

array of pH values.
C (float, default: ln10 ) –

constant for pH-dependent dG calculation. Defaults to ln(10).
beta (float, default: 1.0 ) –

\( \beta = \frac{1}{k_{B} T} \). Defaults to 1.0.

Returns:

ndarray –

np.ndarray: array of populations with shape of (number of states, number of pH).

`get_state_ensemble(index=None)` ¶

Get states by index or slice or all states.

`info()` ¶

Print information of the network.

`serialize()` ¶

Serialize the network to a string.

`set_energies(dG, ref_ph=7.0)` ¶

Set energies to states.

Parameters:

energies (list[float] | ndarray) –

list or array of energies.

Returns:

Self ( Self ) –

self

`size()` ¶

Number of unique states in the network.

`trim(p, threshold=0.0)` ¶

Remove states whose pH-dependent population is not above a given threshold at pH values.

Parameters:

p (ndarray) –

array of populations.
threshold (float, default: 0.0 ) –

min population. Defaults to 0.0.

Returns:

Self ( Self ) –

StateNetwork

Functions¶

`Boltzmann_weighted_average(energies, beta=1.0)` ¶

Returns the Boltzmann weighted average of energies. [ E_{avg} = \frac{\sum_{i} E_{i} exp(-\beta E_{i})}{\sum_{i} exp(-\beta E_{i})} ]

Parameters:

energies (list[float] | np.ndarray) –

energies
beta (float) –

\( \beta = \frac{1}{k_{B}T} \) (Kcal/mol)

Returns:

float –

float

`Boltzmann_weights(energies, beta=1.0)` ¶

Returns the Boltzmann weights of energies.

The Boltzmann weight, \( p_{i} \) is defined as:

\[ p_{i} = \frac{exp(- \beta E_{i}) }{ \sum_{i} exp(- \beta E_{i})} \]

Since the Boltzmann weighted average of any property is taken at a specific temperature, changing the temperature means changing the value of \( \beta \).

Parameters:

energies (list[float] | np.ndarray) –

energies
beta (float) –

\( \beta = \frac{1}{ k_{B} T } \) (Kcal/mol)

Returns:s np.ndarray

`beta_constant(T=278.15)` ¶

Returns the beta constant in Kcal/mol unit at a given temperature (Kelvin).

The constant \( \beta \) is defined as:

\[ \beta = \frac{1}{ k_{B} T } \]

where \( k_{B} \) is the Boltzmann constant and T is the absolute temperature of the system in Kelvin.

\( k_{B} \) = 1.987204259e-3 Kcal/(mol K)

For example, \( \beta \) = 0.5527408646408499 Kcal/(mol K) at 278.15 K

Parameters:

T (float) –

temperature in Kelvin unit.

Microstate

API¶

rdworks.microstate ¶

Attributes¶

AcidBasePatterns = pd.read_csv(smarts_path / 'smarts_pattern.csv') module-attribute ¶

AcidBasePatternsSimple = pd.read_csv(smarts_path / 'simple_smarts_pattern.csv') module-attribute ¶

ln10 = math.log(10) module-attribute ¶

logger = logging.getLogger(__name__) module-attribute ¶

smarts_path = importlib.resources.files('rdworks.data.ionized') module-attribute ¶

Classes¶

InflectionDetector ¶

Attributes¶

curve_info = 'continuous_no_plateaus' instance-attribute ¶

num_plateaus = len(self.plateaus) instance-attribute ¶

num_transitions = 0 instance-attribute ¶

plateaus = self.detect_plateaus() instance-attribute ¶

x = self.x[sort_idx] instance-attribute ¶

y = self.y[sort_idx] instance-attribute ¶

Functions¶

analyze(require_plateaus=True, high_confidence=True) ¶

detect_plateaus(derivative_threshold=None, min_plateau_length=5) ¶

Parameters:¶

Returns:¶

IonizableSite dataclass ¶

Attributes¶

acid_base instance-attribute ¶

atom instance-attribute ¶

atom_idx instance-attribute ¶

de instance-attribute ¶

hs instance-attribute ¶

name instance-attribute ¶

pr instance-attribute ¶

q instance-attribute ¶

QupkakeMicrostates ¶

Attributes¶

acidic_sites = [] instance-attribute ¶

basic_sites = [] instance-attribute ¶

calculator = calculator instance-attribute ¶

mols = [] instance-attribute ¶

origin = origin instance-attribute ¶

reference = None instance-attribute ¶

states = [] instance-attribute ¶

Functions¶

Boltzmann_weighted_average(potential_energies) staticmethod ¶

count() ¶

enumerate() ¶

get_ensemble() ¶

get_mol(idx) ¶

get_populations(pH) ¶

populate() ¶

State ¶

Attributes¶

charge = None instance-attribute ¶

energy = None instance-attribute ¶

max_atomic_charge = max_atomic_charge instance-attribute ¶

max_formal_charge = max_formal_charge instance-attribute ¶

min_atomic_charge = min_atomic_charge instance-attribute ¶

min_formal_charge = min_formal_charge instance-attribute ¶

origin = origin instance-attribute ¶

protomer_rule = protomer_rule instance-attribute ¶

rdmol = None instance-attribute ¶

rdmolH = None instance-attribute ¶

ref_ph = None instance-attribute ¶

sites = [] instance-attribute ¶

smiles = smiles instance-attribute ¶

tautomer_rule = tautomer_rule instance-attribute ¶

transformation = transformation instance-attribute ¶

Functions¶

can_be_deprotonated_at(atom_idx) ¶

can_be_protonated_at(atom_idx) ¶

copy() ¶

deserialize(encoded_str) ¶

find_ionizable_sites() ¶

get_deprotonated(atom_idx=None, site_idx=None) ¶

get_protonated(atom_idx=None, site_idx=None) ¶

get_tautomers() ¶

hydrogen_count(idx) ¶

info(index=None) ¶

ionize(idx, mode) ¶

serialize() ¶

`rdworks.microstate` ¶

`AcidBasePatterns = pd.read_csv(smarts_path / 'smarts_pattern.csv')` `module-attribute` ¶

`AcidBasePatternsSimple = pd.read_csv(smarts_path / 'simple_smarts_pattern.csv')` `module-attribute` ¶

`ln10 = math.log(10)` `module-attribute` ¶

`logger = logging.getLogger(name)` `module-attribute` ¶

`smarts_path = importlib.resources.files('rdworks.data.ionized')` `module-attribute` ¶

`InflectionDetector` ¶

`curve_info = 'continuous_no_plateaus'` `instance-attribute` ¶

`num_plateaus = len(self.plateaus)` `instance-attribute` ¶

`num_transitions = 0` `instance-attribute` ¶

`plateaus = self.detect_plateaus()` `instance-attribute` ¶

`x = self.x[sort_idx]` `instance-attribute` ¶

`y = self.y[sort_idx]` `instance-attribute` ¶

`analyze(require_plateaus=True, high_confidence=True)` ¶

`detect_plateaus(derivative_threshold=None, min_plateau_length=5)` ¶

`IonizableSite` `dataclass` ¶

`acid_base` `instance-attribute` ¶

`atom` `instance-attribute` ¶

`atom_idx` `instance-attribute` ¶

`de` `instance-attribute` ¶

`hs` `instance-attribute` ¶

`name` `instance-attribute` ¶

`pr` `instance-attribute` ¶

`q` `instance-attribute` ¶

`QupkakeMicrostates` ¶

`acidic_sites = []` `instance-attribute` ¶

`basic_sites = []` `instance-attribute` ¶

`calculator = calculator` `instance-attribute` ¶

`mols = []` `instance-attribute` ¶

`origin = origin` `instance-attribute` ¶

`reference = None` `instance-attribute` ¶

`states = []` `instance-attribute` ¶

`Boltzmann_weighted_average(potential_energies)` `staticmethod` ¶

`count()` ¶

`enumerate()` ¶

`get_ensemble()` ¶

`get_mol(idx)` ¶

`get_populations(pH)` ¶

`populate()` ¶

`State` ¶

`charge = None` `instance-attribute` ¶

`energy = None` `instance-attribute` ¶

`max_atomic_charge = max_atomic_charge` `instance-attribute` ¶

`max_formal_charge = max_formal_charge` `instance-attribute` ¶

`min_atomic_charge = min_atomic_charge` `instance-attribute` ¶

`min_formal_charge = min_formal_charge` `instance-attribute` ¶

`origin = origin` `instance-attribute` ¶

`protomer_rule = protomer_rule` `instance-attribute` ¶

`rdmol = None` `instance-attribute` ¶

`rdmolH = None` `instance-attribute` ¶

`ref_ph = None` `instance-attribute` ¶

`sites = []` `instance-attribute` ¶

`smiles = smiles` `instance-attribute` ¶

`tautomer_rule = tautomer_rule` `instance-attribute` ¶

`transformation = transformation` `instance-attribute` ¶

`can_be_deprotonated_at(atom_idx)` ¶

`can_be_protonated_at(atom_idx)` ¶

`copy()` ¶

`deserialize(encoded_str)` ¶

`find_ionizable_sites()` ¶

`get_deprotonated(atom_idx=None, site_idx=None)` ¶

`get_protonated(atom_idx=None, site_idx=None)` ¶

`get_tautomers()` ¶

`hydrogen_count(idx)` ¶

`info(index=None)` ¶

`ionize(idx, mode)` ¶

`serialize()` ¶

`site_info()` ¶

`update()` ¶

`StateEnsemble` ¶

`states = []` `instance-attribute` ¶

`copy()` ¶

`deserialize(encoded_str)` ¶

`drop()` ¶

`get_charge_groups()` ¶

`get_charge_groups_per_site()` ¶

`get_macro_pKa(ph_values, p)` ¶

`get_micro_pKa(ph_values, p)` ¶

`get_plot_data_pH_vs_charge(ph_values, p)` ¶

`get_plot_data_pH_vs_population(ph_values, p, threshold=0.0)` ¶