Training of the correction library for pK_a calculations

If you feel your experimental data could improve the performance of the default pK_a calculator, you can take advantage of the supervised pK_a learning method that is built into the pK_a calculator. Special structural parts may have an effect on the pK_a values calculated by the built-in method, so your correction library based on experimental data of your compound family helps the pK_a calculator to increase the prediction accuracy.

How to improve the accuracy of the pK_a calculation?

First, you need to see clearly which ionization center(s) was predicted inaccurately by the pK_a calculator. You have to collect experimental data for that ionization center(s). The learning algorithm is based on linear regression analysis, therefore you need to collect a certain amount of experimental pK_a data otherwise the regression analysis will fail. There is no strict rule how large pool of data is required to perform a reliable pK_a training. If your purpose is to create a local model only for a certain type of chemical environment of the ionization center, then it may be enough to collect a few representative structures. A more robust model, however, requires as many diverse structures and pK_a values of the ionization center in question as possible.

The first step of the training process is the input of the collected data into an sdf file. After that, you have to run the training algorithm which creates a correction library from your data. This will be stored on your computer. You can use this correction library via MarvinSketch, cxcalc, Chemical Terms.

How to create a training set and generate a correction library

Create a training set in sdf file (.sdf) format.
This can be easily done by using the graphical user interface of Instant JChem. Your sdf file must contain the following fields:
- structure of the molecule
- pK_a value 1 (field name: pKa1)
- ID of the atom which has the pKa1 value (field name: ID1). It can be viewed by checking the Atom number option in MarvinView (menu: View > Misc).
Additional fields of pK_a values are optional (recommended for handling multiprotic compunds). For example pK_a value 2 (pKa2), ID2, etc. Definition of only one pK_a value is enough to apply the training data, but more values in case of multiprotic compounds will enhance the reliability of the pK_a training.
Example
The picture below shows the details of the training set (pKa_trainingset.sdf). ID1 is the index of the atom with the experimental pK_a1 value (ID2 would be the index of the second measured pK_a value /pK_a2/, etc.).
Generate the correction library
Execute the following command from command line:
```
cxtrain pka -i [library name] [training file] 
```
Example
```
cxtrain pka -i mypka mydata.sdf
```

Usage of the pK_a plugin with correction library

MarvinSketch

Select MarvinSketch menu:Tools > Protonation > pKa.
Set the 'Use correction library' box to activate the training option (see figure below).
If you have created multiple training sets, choose the most accurate one from the dropdown list below the checkbox.

MarvinSketch trained pKa calculation MarvinSketch not trained pKa calculation

I. pK_a calculation with training data II. pK_a calculation without training data


I. pK_a calculation with training data	II. pK_a calculation without training data

`cxcalc`

--correctionlibrary

-L

cxcalc pKa  --correctionlibrary  [library name] [input file/string]

Example

$ cxcalc pKa --correctionlibrary mypka "CSC1=NC2=C(N1)C=NC(O)=N2"

Result

 id      apKa1   apKa2   bpKa1   bpKa2   atoms

 1       11.19   16.01   2.34    -2.59   7,11,9,4

Example

$ cxcalc pKa "CSC1=NC2=C(N1)C=NC(O)=N2"

Result

 id      apKa1   apKa2   bpKa1   bpKa2   atoms

 1       8.34   16.01   2.34    -2.59   7,11,9,4

For more options see this page.

Chemical Terms

Chemical Terms Evaluator

evaluate -e "pKa('correctionlibrary:[library name]')" "[input file/string]"

Example

evaluate -e "pKa('correctionlibrary:mypka')" "CSC1=NC2=C(N1)C=NC(O)=N2"

Result

;;;-2,59;;;11,19;;2,34;;16,01;

For more details see this page.

Chemical Terms in Instant JChem

Instant JChem

Choose the 'New Chemical Terms Field icon' on the panel on the right side.
Type the chemical term into the window, use the correctionlibrary:[library name] parameter. Do not forget to adjust the Name, the Type and the DB Column Name.

Example

pKa ('correctionlibrary:mypKa type:acidic','1')

New Chemical Terms window in Instant JChem

Training of the correction library for pKa calculations

How to improve the accuracy of the pKa calculation?

How to create a training set and generate a correction library

Usage of the pKa plugin with correction library

MarvinSketch

cxcalc

Chemical Terms

Chemical Terms Evaluator

Chemical Terms in Instant JChem

Training of the correction library for pK_a calculations

How to improve the accuracy of the pK_a calculation?

Usage of the pK_a plugin with correction library

`cxcalc`