Document to Structure (d2s) Conversion
Document to Structure processes PDF, HTML, XML, text files and office file formats: DOC, DOCX, PPT, PPTX, XLS, XLSX, ODT.
It recognizes and
converts the chemical names (IUPAC, CAS, common and drug names), SMILES and InChI found in the document into
chemical structures.
d2s conversion uses the name-to-structure converter. For the supported names and current limitation, see "Name to Structure Conversion" webpage.
You can extend the document to structure conversion by creating a custom dictionary file.
d2s can be used via API, command line application (MolConverter), or MarvinView.
Text mining can also be automatized by using d2s integrated into
Knime or into Pipeline Pilot.
OCR and syntax correction
Chemaxon's d2s toolkit is able to correct several simple OCR and syntax error. For instance, given the incorrect name "3-rnethyl-l-me-thoxynaphthalene",
it automatically corrects the name to "3-methyl-1-methoxynaphthalene" and generates the corresponding structure.
Open a PDF file containing chemical names. MarvinView will display all the structures corresponding to the recognized names.
The structures can then be saved, copy-pasted, opened in the MarvinSketch editor, ...
As a commandline tool, you can use MolConverter for d2s conversion.
Example:
- Converting "test.pdf" name file to MOL file:
molconvert mol test.pdf -o test.mol
D2s converts the chemical structures from OLE objects – created by various chemical sketchers such as Marvin, ChemDraw, ISIS/DRAW, SYMYX DRAW, and Accelrys Draw – embedded in office documents.
If OSRA is installed on your computer, d2s will also convert the figures of compounds into editable chemical structures from PDF files.
See also
License informations
- You need the "Document to Structure" licence.