Skip to content

Microstate

API

rdworks.microstate

Attributes

AcidBasePatterns = pd.read_csv(smarts_path / 'smarts_pattern.csv') module-attribute

AcidBasePatternsSimple = pd.read_csv(smarts_path / 'simple_smarts_pattern.csv') module-attribute

UnreasonablePatterns = list(map(Chem.MolFromSmarts, ['[#6X5]', '[#7X5]', '[#8X4]', '[*r]=[*r]=[*r]', '[#1]-[*+1]~[*-1]', '[#1]-[*+1]=,:[*]-,:[*-1]', '[#1]-[*+1]-,:[*]=,:[*-1]', '[*+2]', '[*-2]', '[#1]-[#8+1].[#8-1,#7-1,#6-1]', '[#1]-[#7+1,#8+1].[#7-1,#6-1]', '[#1]-[#8+1].[#8-1,#6-1]', '[#1]-[#7+1].[#8-1]-[C](-[C,#1])(-[C,#1])', '[OX1]=[C]-[OH2+1]', '[NX1,NX2H1,NX3H2]=[C]-[O]-[H]', '[#6-1]=[*]-[*]', '[cX2-1]', '[N+1](=O)-[O]-[H]'])) module-attribute

ln10 = math.log(10) module-attribute

logger = logging.getLogger(__name__) module-attribute

smarts_path = importlib.resources.files('rdworks.data.ionized') module-attribute

Classes

InflectionDetector

Inflection point detection with plateau analysis. Distinguishes real inflections from noise by analyzing curve behavior.

Attributes
curve_info = 'continuous_no_plateaus' instance-attribute
num_plateaus = len(self.plateaus) instance-attribute
num_transitions = 0 instance-attribute
plateaus = self.detect_plateaus() instance-attribute
x = self.x[sort_idx] instance-attribute
y = self.y[sort_idx] instance-attribute
Functions
analyze(require_plateaus=True, high_confidence=True)

Complete analysis with plateau consideration.

Parameters:

  • require_plateaus (bool, default: True ) –

    Only report inflections between clear plateaus

Returns:

  • dict ( list ) –

    analysis results

detect_plateaus(derivative_threshold=None, min_plateau_length=5)

Detect plateau regions in the curve.

Parameters:

derivative_threshold : float or None Max abs(dy/dx) to consider as plateau. If None, auto-calculated min_plateau_length : int Minimum number of consecutive points to consider a plateau

Returns:

plateaus : list of dicts with 'start_idx', 'end_idx', 'mean_y', 'std_y'

IonizableSite dataclass

(de)protonation site information

Attributes
acid_base instance-attribute
atom instance-attribute
atom_idx instance-attribute
de instance-attribute
hs instance-attribute
name instance-attribute
pr instance-attribute
q instance-attribute

QupkakeMicrostates

Attributes
acidic_sites = [] instance-attribute
basic_sites = [] instance-attribute
calculator = calculator instance-attribute
mols = [] instance-attribute
origin = origin instance-attribute
reference = None instance-attribute
states = [] instance-attribute
Functions
Boltzmann_weighted_average(potential_energies) staticmethod

Calculate Boltzmann weighted average potential energy at pH 0.

Parameters:

  • potential_energies (list) –

    a list of potential energies.

Returns:

  • float ( float ) –

    Boltzmann weighted average potential energy.

count()
enumerate()
get_ensemble()
get_mol(idx)
get_populations(pH)
populate()

State

Attributes
charge = None instance-attribute
energy = None instance-attribute
max_atomic_charge = max_atomic_charge instance-attribute
max_formal_charge = max_formal_charge instance-attribute
min_atomic_charge = min_atomic_charge instance-attribute
min_formal_charge = min_formal_charge instance-attribute
origin = origin instance-attribute
protomer_rule = protomer_rule instance-attribute
rdmol = None instance-attribute
rdmolH = None instance-attribute
ref_ph = None instance-attribute
sites = [] instance-attribute
smiles = smiles instance-attribute
tautomer_rule = tautomer_rule instance-attribute
transformation = transformation instance-attribute
Functions
can_be_deprotonated_at(atom_idx)

Check if an atom can potentially be deprotonated

can_be_protonated_at(atom_idx)

Check if an atom can potentially be protonated

copy()
deserialize(encoded_str)

Deserialize the state from a string.

find_ionizable_sites()
get_deprotonated(atom_idx=None, site_idx=None)

Make deprotonated state(s) from the current state.

Parameters:

  • atom_idx (int | None, default: None ) –

    atom index. Defaults to None.

  • site_idx (int | None, default: None ) –

    site index. Defaults to None.

Returns:

  • list[Self]

    list[Self]: list of deprotonated States.

get_protonated(atom_idx=None, site_idx=None)

Make protonated state(s) from the current state.

All ionizable sites are considered for protonation unless atom_idx or site_idx is given.

Parameters:

  • atom_idx (int | None, default: None ) –

    atom index. Defaults to None.

  • site_idx (int | None, default: None ) –

    site index. Defaults to None.

Returns:

  • list[Self]

    list[Self]: list of protonated States.

get_tautomers()
hydrogen_count(idx)
info(index=None)
ionize(idx, mode)
serialize()

Serialize the state to a string.

site_info()
update()

StateEnsemble

Attributes
states = [] instance-attribute
Functions
copy()

Copy.

deserialize(encoded_str)

Deserialize states from a string.

drop()

Drop duplicate and unreasonable states.

get_charge_groups()

Get charge groups

Returns:

  • dict ( dict[int, list[int]] ) –

    {charge: [state_idx, ...], ...}

get_charge_groups_per_site()

Get charge groups per atom site

Returns:

  • dict ( dict[int, dict[int, list[int]]] ) –

    {atom_idx: {atom_charge: [state_idx, ...], ...}}

get_macro_pKa(ph_values, p)

Compute inflection points in pH vs charge that could be macroscopic pKa values. For example, ideal pKa transitions should happen at +2.5, +1.5, +0.5, -0.5 in case of +3 to -1 charges.

Parameters:

  • ph_values (ndarray) –

    Array of pH values to evaluate populations

  • p (ndarray) –

    Population matrix for each state at each pH

Returns:

  • list[tuple[float, float]]

    list[tuple[float,float]] : [(pH, charge), ...]

get_micro_pKa(ph_values, p)

Compute inflection points in pH vs site_charge that could be microscopic pKa values For example, ideal pKa transitions should happen at +1.5, +0.5, -0.5 in case of +2 to -1 charges.

Parameters:

  • ph_values (ndarray) –

    Array of pH values to evaluate populations

  • p (ndarray) –

    Population matrix for each state at each pH

Returns:

  • dict[int, list[tuple[float, float]]]

    dict[int, tuple[float,float]] : {atom_index: [(pH, charge), ..], ...}

get_plot_data_pH_vs_charge(ph_values, p)
get_plot_data_pH_vs_population(ph_values, p, threshold=0.0)

Get an Altair plot object for pH-dependent population curve.

For Altair chart:

```py
# plot pH vs population

import pandas as pd
import altair as alt

width = 600
height = 300
palette = 'tableau10'

df = pd.DataFrame(data)

# line plot
lineplot = alt.Chart(df).mark_line().encode(
    x=alt.X('pH:Q', title='pH'),
    y=alt.Y('p:Q', title='Population'),
    color=alt.Color('microstate:N', scale=alt.Scale(scheme=palette)),
).properties(
    width=width,
    height=height)

# data labels
labels = alt.Chart(df).mark_text(
    align='left',
    dx=5,
    dy=-5
).encode(
    x=alt.X('pH', aggregate={'argmax': 'p'}),
    y=alt.Y('p', aggregate={'argmax': 'p'}),
    text='microstate:N',
    color='microstate:N'
)

chart = (lineplot + labels)
```

Args: pH (np.ndarray): array of pH values. C (float): constant for pH-dependent dG calculation. Defaults to ln(10). beta (float, optional): \( \beta = \frac{1}{k_{B} T} \). Defaults to 1.0. threshold (float) : threshold to ignore low populated microstates. Defaults to 0.0.

Returns:

  • dict, microstate indices

    dict: {'pH':list[float], 'p':list[float], 'microstate':list[int(1-based)]} microstate indices: list of state index (0-based)

get_plot_data_pH_vs_site_charge(ph_values, p)
get_population(ph_values, C=ln10, beta=1.0)

Get populations at a given pH array.

\[ \begin{align} \Delta G_{i, ref} &= PE_{i} - PE_{ref} \\[0.5em] \Delta m_{i, ref} &= charge_{i} - charge _{ref} \\[0.5em] \Delta G_{i, pH} &= \Delta G_{i, ref} + \Delta m_{i, ref} C pH \\[0.5em] p_{i, pH} &= \frac {exp(-\beta \Delta G_{i, pH})}{\sum_{i} exp(-\beta \Delta G_{i, pH})} \end{align} \]

Parameters:

  • ph_values (ndarray) –

    array of pH values.

  • C (float, default: ln10 ) –

    constant for pH-dependent dG calculation. Defaults to ln(10).

  • beta (float, default: 1.0 ) –

    \( \beta = \frac{1}{k_{B} T} \). Defaults to 1.0.

Returns:

  • ndarray

    np.ndarray: array of populations with shape of (number of states, number of pH).

get_state(index)

Get a state by index

Parameters:

  • index (int) –

    state index.

Returns:

  • State ( State ) –

    State

info()

Print information of all states.

serialize()

Serialize states to a string.

set_energies(dG, ref_ph=7.0)

Set energies to states.

Parameters:

  • dG (list[float] | ndarray) –

    list or array of energies.

  • ref_ph (float, default: 7.0 ) –

    pH at which the energies are calculated. Defaults to 7.0

Returns:

  • Self ( Self ) –

    StateEnsemble

size()

Number of states.

sort(p)

Sort states by population.

Parameters:

  • p (ndarray) –

    array of population.

Returns:

  • Self ( Self ) –

    StateEnsemble

trim(p, threshold=0.0)

Trim states whose pH-dependent population is below a given threshold across pH range 0-14.

Parameters:

  • p (ndarray) –

    array of populations.

  • threshold (float, default: 0.0 ) –

    min population. Defaults to 0.0.

Returns:

  • Self ( Self ) –

    StateEnsemble

StateNetwork

Attributes
graph = nx.DiGraph() instance-attribute
initial_state = None instance-attribute
visited_states = [] instance-attribute
Functions
build(smiles, origin=None, transformation=None, min_formal_charge=-2, max_formal_charge=+2, min_atomic_charge=-1, max_atomic_charge=+1, protomer_rule='default', tautomer_rule=None, verbose=False)

Build the microstate network using BFS from initial state.

copy()

Copy.

deserialize(encoded_str)

Deserialize the network from a string.

get_initial_state()

Get the initial state.

get_macro_pKa(ph_values, p)

Calculatate macro-pKa with provided potential energies.

Args:

Returns:

  • list[tuple[float, float]]

    list[tuple[float,float]] : [(pH, charge), ...]

get_micro_pKa(ph_values, p)

Calculate micro-pKa with provided potential energies.

Args:

Returns:

  • dict[int, list[tuple[float, float]]]

    dict[int,list[float]]: micro-pKa values for each ionizable site.

get_num_edges()

Number of edges in the network.

get_num_nodes()

Number of nodes in the network.

get_plot_data_pH_vs_charge(ph_values, p)
get_plot_data_pH_vs_population(ph_values, p, threshold=0.0)

Make an Altair plot object for pH-dependent population curve.

    Args:
        pH (np.ndarray): array of pH values.
        C (float): constant for pH-dependent dG calculation. Defaults to ln(10).

. beta (float, optional): \( \beta = \frac{1}{k_{B} T} \). Defaults to 1.0. threshold (float) : threshold to ignore low populated microstates. Defaults to 0.0.

    Returns:
        (dict, indices):
            dict: {'pH':list[float], 'p':list[float], 'microstate':list[int(1-based)]}
            indices: list of state index (0-based)
get_plot_data_pH_vs_site_charge(ph_values, p)
get_population(ph_values, C=ln10, beta=1.0)

Calculate populations with provided potential energies and pH.

The reference state can be arbitrary and is set to the initial_state here.

\[ \begin{align} \Delta G_{i, ref} &= PE_{i} - PE_{ref} \\[0.5em] \Delta m_{i, ref} &= charge_{i} - charge _{ref} \\[0.5em] \Delta G_{i, pH} &= \Delta G_{i, ref} + \Delta m_{i, ref} C pH \\[0.5em] p_{i, pH} &= \frac {exp(-\beta \Delta G_{i, pH})}{\sum_{i} exp(-\beta \Delta G_{i, pH})} \end{align} \]

Parameters:

  • pH (ndarray) –

    array of pH values.

  • C (float, default: ln10 ) –

    constant for pH-dependent dG calculation. Defaults to ln(10).

  • beta (float, default: 1.0 ) –

    \( \beta = \frac{1}{k_{B} T} \). Defaults to 1.0.

Returns:

  • ndarray

    np.ndarray: array of populations with shape of (number of states, number of pH).

get_state_ensemble(index=None)

Get states by index or slice or all states.

info()

Print information of the network.

serialize()

Serialize the network to a string.

set_energies(dG, ref_ph=7.0)

Set energies to states.

Parameters:

  • energies (list[float] | ndarray) –

    list or array of energies.

Returns:

  • Self ( Self ) –

    self

size()

Number of unique states in the network.

trim(p, threshold=0.0)

Remove states whose pH-dependent population is not above a given threshold at pH values.

Parameters:

  • p (ndarray) –

    array of populations.

  • threshold (float, default: 0.0 ) –

    min population. Defaults to 0.0.

Returns:

  • Self ( Self ) –

    StateNetwork

Functions

Boltzmann_weighted_average(energies, beta=1.0)

Returns the Boltzmann weighted average of energies. [ E_{avg} = \frac{\sum_{i} E_{i} exp(-\beta E_{i})}{\sum_{i} exp(-\beta E_{i})} ]

Parameters:

  • energies (list[float] | np.ndarray)

    energies

  • beta (float)

    \( \beta = \frac{1}{k_{B}T} \) (Kcal/mol)

Returns:

  • float

    float

Boltzmann_weights(energies, beta=1.0)

Returns the Boltzmann weights of energies.

The Boltzmann weight, \( p_{i} \) is defined as:

\[ p_{i} = \frac{exp(- \beta E_{i}) }{ \sum_{i} exp(- \beta E_{i})} \]

Since the Boltzmann weighted average of any property is taken at a specific temperature, changing the temperature means changing the value of \( \beta \).

Parameters:

  • energies (list[float] | np.ndarray)

    energies

  • beta (float)

    \( \beta = \frac{1}{ k_{B} T } \) (Kcal/mol)

Returns:s np.ndarray

beta_constant(T=278.15)

Returns the beta constant in Kcal/mol unit at a given temperature (Kelvin).

The constant \( \beta \) is defined as:

\[ \beta = \frac{1}{ k_{B} T } \]

where \( k_{B} \) is the Boltzmann constant and T is the absolute temperature of the system in Kelvin.

\( k_{B} \) = 1.987204259e-3 Kcal/(mol K)

For example, \( \beta \) = 0.5527408646408499 Kcal/(mol K) at 278.15 K

Parameters:

  • T (float)

    temperature in Kelvin unit.