Microstate
API¶
rdworks.microstate
¶
Attributes¶
AcidBasePatterns = pd.read_csv(smarts_path / 'smarts_pattern.csv')
module-attribute
¶
AcidBasePatternsSimple = pd.read_csv(smarts_path / 'simple_smarts_pattern.csv')
module-attribute
¶
UnreasonablePatterns = list(map(Chem.MolFromSmarts, ['[#6X5]', '[#7X5]', '[#8X4]', '[*r]=[*r]=[*r]', '[#1]-[*+1]~[*-1]', '[#1]-[*+1]=,:[*]-,:[*-1]', '[#1]-[*+1]-,:[*]=,:[*-1]', '[*+2]', '[*-2]', '[#1]-[#8+1].[#8-1,#7-1,#6-1]', '[#1]-[#7+1,#8+1].[#7-1,#6-1]', '[#1]-[#8+1].[#8-1,#6-1]', '[#1]-[#7+1].[#8-1]-[C](-[C,#1])(-[C,#1])', '[OX1]=[C]-[OH2+1]', '[NX1,NX2H1,NX3H2]=[C]-[O]-[H]', '[#6-1]=[*]-[*]', '[cX2-1]', '[N+1](=O)-[O]-[H]']))
module-attribute
¶
ln10 = math.log(10)
module-attribute
¶
logger = logging.getLogger(__name__)
module-attribute
¶
smarts_path = importlib.resources.files('rdworks.data.ionized')
module-attribute
¶
Classes¶
InflectionDetector
¶
Inflection point detection with plateau analysis. Distinguishes real inflections from noise by analyzing curve behavior.
Attributes¶
curve_info = 'continuous_no_plateaus'
instance-attribute
¶
num_plateaus = len(self.plateaus)
instance-attribute
¶
num_transitions = 0
instance-attribute
¶
plateaus = self.detect_plateaus()
instance-attribute
¶
x = self.x[sort_idx]
instance-attribute
¶
y = self.y[sort_idx]
instance-attribute
¶
Functions¶
analyze(require_plateaus=True, high_confidence=True)
¶
Complete analysis with plateau consideration.
Parameters:
-
require_plateaus(bool, default:True) –Only report inflections between clear plateaus
Returns:
-
dict(list) –analysis results
detect_plateaus(derivative_threshold=None, min_plateau_length=5)
¶
Detect plateau regions in the curve.
Parameters:¶
derivative_threshold : float or None Max abs(dy/dx) to consider as plateau. If None, auto-calculated min_plateau_length : int Minimum number of consecutive points to consider a plateau
Returns:¶
plateaus : list of dicts with 'start_idx', 'end_idx', 'mean_y', 'std_y'
IonizableSite
dataclass
¶
QupkakeMicrostates
¶
Attributes¶
acidic_sites = []
instance-attribute
¶
basic_sites = []
instance-attribute
¶
calculator = calculator
instance-attribute
¶
mols = []
instance-attribute
¶
origin = origin
instance-attribute
¶
reference = None
instance-attribute
¶
states = []
instance-attribute
¶
Functions¶
Boltzmann_weighted_average(potential_energies)
staticmethod
¶
Calculate Boltzmann weighted average potential energy at pH 0.
Parameters:
-
potential_energies(list) –a list of potential energies.
Returns:
-
float(float) –Boltzmann weighted average potential energy.
count()
¶
enumerate()
¶
get_ensemble()
¶
get_mol(idx)
¶
get_populations(pH)
¶
populate()
¶
State
¶
Attributes¶
charge = None
instance-attribute
¶
energy = None
instance-attribute
¶
max_atomic_charge = max_atomic_charge
instance-attribute
¶
max_formal_charge = max_formal_charge
instance-attribute
¶
min_atomic_charge = min_atomic_charge
instance-attribute
¶
min_formal_charge = min_formal_charge
instance-attribute
¶
origin = origin
instance-attribute
¶
protomer_rule = protomer_rule
instance-attribute
¶
rdmol = None
instance-attribute
¶
rdmolH = None
instance-attribute
¶
ref_ph = None
instance-attribute
¶
sites = []
instance-attribute
¶
smiles = smiles
instance-attribute
¶
tautomer_rule = tautomer_rule
instance-attribute
¶
transformation = transformation
instance-attribute
¶
Functions¶
can_be_deprotonated_at(atom_idx)
¶
Check if an atom can potentially be deprotonated
can_be_protonated_at(atom_idx)
¶
Check if an atom can potentially be protonated
copy()
¶
deserialize(encoded_str)
¶
Deserialize the state from a string.
find_ionizable_sites()
¶
get_deprotonated(atom_idx=None, site_idx=None)
¶
Make deprotonated state(s) from the current state.
Parameters:
-
atom_idx(int | None, default:None) –atom index. Defaults to None.
-
site_idx(int | None, default:None) –site index. Defaults to None.
Returns:
-
list[Self]–list[Self]: list of deprotonated States.
get_protonated(atom_idx=None, site_idx=None)
¶
Make protonated state(s) from the current state.
All ionizable sites are considered for protonation unless atom_idx or site_idx is given.
Parameters:
-
atom_idx(int | None, default:None) –atom index. Defaults to None.
-
site_idx(int | None, default:None) –site index. Defaults to None.
Returns:
-
list[Self]–list[Self]: list of protonated States.
get_tautomers()
¶
hydrogen_count(idx)
¶
info(index=None)
¶
ionize(idx, mode)
¶
serialize()
¶
Serialize the state to a string.
site_info()
¶
update()
¶
StateEnsemble
¶
Attributes¶
states = []
instance-attribute
¶
Functions¶
copy()
¶
Copy.
deserialize(encoded_str)
¶
Deserialize states from a string.
drop()
¶
Drop duplicate and unreasonable states.
get_charge_groups()
¶
Get charge groups
Returns:
-
dict(dict[int, list[int]]) –{charge: [state_idx, ...], ...}
get_charge_groups_per_site()
¶
Get charge groups per atom site
Returns:
-
dict(dict[int, dict[int, list[int]]]) –{atom_idx: {atom_charge: [state_idx, ...], ...}}
get_macro_pKa(ph_values, p)
¶
Compute inflection points in pH vs charge that could be macroscopic pKa values. For example, ideal pKa transitions should happen at +2.5, +1.5, +0.5, -0.5 in case of +3 to -1 charges.
Parameters:
-
ph_values(ndarray) –Array of pH values to evaluate populations
-
p(ndarray) –Population matrix for each state at each pH
Returns:
-
list[tuple[float, float]]–list[tuple[float,float]] : [(pH, charge), ...]
get_micro_pKa(ph_values, p)
¶
Compute inflection points in pH vs site_charge that could be microscopic pKa values For example, ideal pKa transitions should happen at +1.5, +0.5, -0.5 in case of +2 to -1 charges.
Parameters:
-
ph_values(ndarray) –Array of pH values to evaluate populations
-
p(ndarray) –Population matrix for each state at each pH
Returns:
-
dict[int, list[tuple[float, float]]]–dict[int, tuple[float,float]] : {atom_index: [(pH, charge), ..], ...}
get_plot_data_pH_vs_charge(ph_values, p)
¶
get_plot_data_pH_vs_population(ph_values, p, threshold=0.0)
¶
Get an Altair plot object for pH-dependent population curve.
For Altair chart:
```py
# plot pH vs population
import pandas as pd
import altair as alt
width = 600
height = 300
palette = 'tableau10'
df = pd.DataFrame(data)
# line plot
lineplot = alt.Chart(df).mark_line().encode(
x=alt.X('pH:Q', title='pH'),
y=alt.Y('p:Q', title='Population'),
color=alt.Color('microstate:N', scale=alt.Scale(scheme=palette)),
).properties(
width=width,
height=height)
# data labels
labels = alt.Chart(df).mark_text(
align='left',
dx=5,
dy=-5
).encode(
x=alt.X('pH', aggregate={'argmax': 'p'}),
y=alt.Y('p', aggregate={'argmax': 'p'}),
text='microstate:N',
color='microstate:N'
)
chart = (lineplot + labels)
```
Args: pH (np.ndarray): array of pH values. C (float): constant for pH-dependent dG calculation. Defaults to ln(10). beta (float, optional): \( \beta = \frac{1}{k_{B} T} \). Defaults to 1.0. threshold (float) : threshold to ignore low populated microstates. Defaults to 0.0.
Returns:
-
dict, microstate indices–dict: {'pH':list[float], 'p':list[float], 'microstate':list[int(1-based)]} microstate indices: list of state index (0-based)
get_plot_data_pH_vs_site_charge(ph_values, p)
¶
get_population(ph_values, C=ln10, beta=1.0)
¶
Get populations at a given pH array.
Parameters:
-
ph_values(ndarray) –array of pH values.
-
C(float, default:ln10) –constant for pH-dependent dG calculation. Defaults to ln(10).
-
beta(float, default:1.0) –\( \beta = \frac{1}{k_{B} T} \). Defaults to 1.0.
Returns:
-
ndarray–np.ndarray: array of populations with shape of (number of states, number of pH).
get_state(index)
¶
info()
¶
Print information of all states.
serialize()
¶
Serialize states to a string.
set_energies(dG, ref_ph=7.0)
¶
Set energies to states.
Parameters:
-
dG(list[float] | ndarray) –list or array of energies.
-
ref_ph(float, default:7.0) –pH at which the energies are calculated. Defaults to 7.0
Returns:
-
Self(Self) –StateEnsemble
size()
¶
Number of states.
sort(p)
¶
Sort states by population.
Parameters:
-
p(ndarray) –array of population.
Returns:
-
Self(Self) –StateEnsemble
trim(p, threshold=0.0)
¶
Trim states whose pH-dependent population is below a given threshold across pH range 0-14.
Parameters:
-
p(ndarray) –array of populations.
-
threshold(float, default:0.0) –min population. Defaults to 0.0.
Returns:
-
Self(Self) –StateEnsemble
StateNetwork
¶
Attributes¶
graph = nx.DiGraph()
instance-attribute
¶
initial_state = None
instance-attribute
¶
visited_states = []
instance-attribute
¶
Functions¶
build(smiles, origin=None, transformation=None, min_formal_charge=-2, max_formal_charge=+2, min_atomic_charge=-1, max_atomic_charge=+1, protomer_rule='default', tautomer_rule=None, verbose=False)
¶
Build the microstate network using BFS from initial state.
copy()
¶
Copy.
deserialize(encoded_str)
¶
Deserialize the network from a string.
get_initial_state()
¶
Get the initial state.
get_macro_pKa(ph_values, p)
¶
Calculatate macro-pKa with provided potential energies.
Args:
Returns:
-
list[tuple[float, float]]–list[tuple[float,float]] : [(pH, charge), ...]
get_micro_pKa(ph_values, p)
¶
Calculate micro-pKa with provided potential energies.
Args:
Returns:
-
dict[int, list[tuple[float, float]]]–dict[int,list[float]]: micro-pKa values for each ionizable site.
get_num_edges()
¶
Number of edges in the network.
get_num_nodes()
¶
Number of nodes in the network.
get_plot_data_pH_vs_charge(ph_values, p)
¶
get_plot_data_pH_vs_population(ph_values, p, threshold=0.0)
¶
Make an Altair plot object for pH-dependent population curve.
Args:
pH (np.ndarray): array of pH values.
C (float): constant for pH-dependent dG calculation. Defaults to ln(10).
. beta (float, optional): \( \beta = \frac{1}{k_{B} T} \). Defaults to 1.0. threshold (float) : threshold to ignore low populated microstates. Defaults to 0.0.
Returns:
(dict, indices):
dict: {'pH':list[float], 'p':list[float], 'microstate':list[int(1-based)]}
indices: list of state index (0-based)
get_plot_data_pH_vs_site_charge(ph_values, p)
¶
get_population(ph_values, C=ln10, beta=1.0)
¶
Calculate populations with provided potential energies and pH.
The reference state can be arbitrary and is set to the initial_state here.
Parameters:
-
pH(ndarray) –array of pH values.
-
C(float, default:ln10) –constant for pH-dependent dG calculation. Defaults to ln(10).
-
beta(float, default:1.0) –\( \beta = \frac{1}{k_{B} T} \). Defaults to 1.0.
Returns:
-
ndarray–np.ndarray: array of populations with shape of (number of states, number of pH).
get_state_ensemble(index=None)
¶
Get states by index or slice or all states.
info()
¶
Print information of the network.
serialize()
¶
Serialize the network to a string.
set_energies(dG, ref_ph=7.0)
¶
Set energies to states.
Parameters:
-
energies(list[float] | ndarray) –list or array of energies.
Returns:
-
Self(Self) –self
size()
¶
Number of unique states in the network.
trim(p, threshold=0.0)
¶
Remove states whose pH-dependent population is not above a given threshold at pH values.
Parameters:
-
p(ndarray) –array of populations.
-
threshold(float, default:0.0) –min population. Defaults to 0.0.
Returns:
-
Self(Self) –StateNetwork
Functions¶
Boltzmann_weighted_average(energies, beta=1.0)
¶
Returns the Boltzmann weighted average of energies. [ E_{avg} = \frac{\sum_{i} E_{i} exp(-\beta E_{i})}{\sum_{i} exp(-\beta E_{i})} ]
Parameters:
-
energies (list[float] | np.ndarray)–energies
-
beta (float)–\( \beta = \frac{1}{k_{B}T} \) (Kcal/mol)
Returns:
-
float–float
Boltzmann_weights(energies, beta=1.0)
¶
Returns the Boltzmann weights of energies.
The Boltzmann weight, \( p_{i} \) is defined as:
Since the Boltzmann weighted average of any property is taken at a specific temperature, changing the temperature means changing the value of \( \beta \).
Parameters:
-
energies (list[float] | np.ndarray)–energies
-
beta (float)–\( \beta = \frac{1}{ k_{B} T } \) (Kcal/mol)
Returns:s np.ndarray
beta_constant(T=278.15)
¶
Returns the beta constant in Kcal/mol unit at a given temperature (Kelvin).
The constant \( \beta \) is defined as:
where \( k_{B} \) is the Boltzmann constant and T is the absolute temperature of the system in Kelvin.
\( k_{B} \) = 1.987204259e-3 Kcal/(mol K)
For example, \( \beta \) = 0.5527408646408499 Kcal/(mol K) at 278.15 K
Parameters:
-
T (float)–temperature in Kelvin unit.