MDL Molfiles, RGfiles, SDfiles, Rxnfiles, RDfiles

Codenames: mol, mol:V3, mol:V3ec, mol:V3ea, rgf, sdf, rxn, rxn:V3, rdf, file extensions: .mol, .sdf, .rxn, .rdf

Contents

MDL Molfiles, RGfiles, SDfiles, Rxnfiles, RDfiles formats

Marvin imports and exports MDL Molfiles, RGfiles, SDfiles, REACCS Rxnfiles and RDfiles. The following features are supported in V2.0 molfiles:

Extended molfiles (V3.0). If the number of atoms or bonds in a molecule exceeds 999, then the extended format is used. In an extended molfile, the following properties are supported:

Reaction files (V2.0). A reaction file consists of a REACTANT block, a PRODUCT block, and (optionally) an AGENT block. Reaction files containing reaction agents are non-standard.

Each block starts with 'Molecule or Reaction Identifier'. The form of a molecule idenfier must be one of the following:

$MFMT $MIREG N
$MFMT $MEREG N
$MIREG N
$MEREG N.
Here $MFMT means that a molecule is given in a molfile format, $MIREG N is the internal and $MEREG N is the external registry number of the molecule. Similarly, the identifier has the following form,
$RFMT $RIREG N
$RFMT $REREG N
$RIREG N
$REREG N.
Here $RFMT means that a reaction is given in a rxnfile format, $RIREG N is internal and $REREG N is the external registry number of the reaction.

A reaction agent is a molecule structure that does not take part in the chemical reaction, but is added to the reaction equation for informative purpose only. Agents are normally displayed graphically above the reaction arrow, added to the reaction file after the reactants and the products. The number of agents is displayed in the file header (after the number of reactants and the number of products) if it is non-zero. Reaction files containing agents are non-standard.

Extended reaction files (V3.0). This format is used automatically if a reaction includes Rgroups and/or the number of atoms or bonds exceeds 999. An extended reaction file consists of a REACTANT block, a PRODUCT block, (optionally) an AGENT block, and (optionally) RGROUP blocks.

In SDfiles read by marvin, the name field is special, it overrides the molecule name specified in the molfile part.

A special feature of Marvin RGfiles is that they can contain a reaction as the root structure. This feature is non-standard, such mixed RG/Rxnfiles can only be imported by Marvin.

Special data types in SDfile and RDfile fields

Data fields store strings normally, but other data types are also supported in Marvin, in a non-standard way. If the data starts with the "MProp:scalar:" or "MProp:array:" string, then it can have a special type:

Molfile compression

MarvinSketch and MarvinView can handle compressed molfiles that are typically five times smaller than their original, uncompressed version. This reduces the download time of HTML pages containing molecule applets.

Compressed molfiles can be created by choosing Edit/Source, then Format/Compressed Molfile in MarvinSketch or MarvinView. If you cannot find the Edit menu, then click on the upper left arrow in MarvinSketch, right click or double click the compound in MarvinView.

Codenames: csmol, csrgf, cssdf, csrxn, csrdf, file extensions: .csmol, .cssdf, .csrxn, .csrdf

Special information

Implicit hydrogens on aromatic nitrogen

The mol family of formats cannot store the implicit hydrogens of atoms, so it is calculated from the bond orders. This is always correct when the molecule is in Kekule format, but causes problems when nitrogen-containing aromatic rings are saved with aromatic bond types.

To counteract the information loss, implicit hydrogen count is stored in these formats as attached data on the nitrogen. The associated data sgroup has field name of MRV_IMPLICIT_H and value IMPL_H<n> where n is the number of implicit hydrogens. These special data attachments are then converted back to implicit hydrogens upon import. When the file is read in ISIS/Draw, the lost hydrogen will not reappear, but the attached data will be visible as a warning.

Stereo bond information loss without atom coordinates

The mol family of formats cannot store stereo information without bond coordinates in V2 version. If you have imported the molecule from a format that does not contain coordinates, but contain stereo information (e.g.: name, smiles), then to preserve the stereo information, while exporting the molecule into a V2 ctfile format, Marvin cleans the molecule in 2D. This clean can be omitted by passing the omitClean0D parameter to the exporter, or by using V3 format.

Multipage molecular document

To save information about multipage molecular document, properties are stored as attached data. The field names and values are the following:

Coordination compounds and markush structures

To save information about coordination compounds and markush structures, properties are stored as attached data. The field names and values are the following:

Charge displayed on S-group bracket

To save information about charge location in S-groups in case of generic, monomer, mer and component S-group types, properties are stored as attached data. The field name and value are:

Import options

Xsg Expand all S-groups.
Usg Ungroup all S-groups.
Fsg Ungroup S-groups with 3 or more attachment points.
bXXX     Set the C-C bond length used in the molfile. The molecule file is supposed to store coordinates in 1.54Å/XXX units. Marvin uses Å units internally, thus coordinates are rescaled by factor 1.54/XXX at import if XXX is a nonzero number. If XXX = 0, then coordinates are not rescaled (default for 3D V2 molfiles and for V3 molfiles). If XXX = A, then coordinates are rescaled to transform the molfile's average C-C bond length to 1.54 Å (default for 2D V2 molfiles). Examples: "caffeine.mol{b0}" or "caffeine.mol{b1.54}" (bond lengths are in angstroms), "caffeine.mol{b0.825}" (bond lengths are in ISISDraw's units), "caffeine-V3.mol{bA}" (forces average bond length calculation for V3 molfile).
nomolp Read molecule type data fields ($DTYPE $MFMT and $RFMT in RDfiles) as strings instead of Molecule objects.
skipMMRV Neglect ChemAxon/Marvin specific lines in the properties block. Such lines are in the following format: M  MRV ... They should be skipped if the file is converted with non-ChemAxon software, which preserved them but made them invalid, e.g. by changing the total number of atoms and bonds.
skipAtomValue Disables the import of "Atom values" from the given ctfile.

Export options

... Basic options for aromatization and H atom adding/removal.
V2 or V3     Force writing V2 or V3 (extended) molfiles. The default format is V2 for simple molecules, V3 if the number of atoms or bonds exceeds 999 and in case of reactions with Rgroups. Example: "mol:V3"
P Write floating point numbers with maximum precision. Only meaningful for V3 molfiles. Example: "mol:V3P"
bXXX Set C-C bond length. If XXX is nonzero, then the exported atom coordinates are scaled in such a way that the average C-C bond length will be the specified number. If XXX = 0, then coordinates are not rescaled.
Examples: "mol:b0" or "mol:b1.54" (bond lengths are in angstroms), "mol:b1.54a" (set bond length, aromatize).
Default: 0.825 in V2 format for 2D molecules, 1.54 (Å units) in any other case.
ec Convert to enhanced stereo representation, considering the chiral flag. Only meaningful with option V3. (Chiral centers are grouped into ABS or an AND stereo group, depending on the chiral flag. When the input molecule contained any enhanced stereo labels, the unlabeled stereo centers always will form a new AND group.) Example: "mol:V3ec"
ea Convert to enhanced stereo representation, assuming absolute stereochemistry. Only meaningful with option V3. (Chiral centers are grouped into the ABS group. In case the input molecule already contains enhanced stereo labels, the behaviour is similar to the one described at option ec above.) Example: "mol:V3ec"
omitClean0D Omits the clean operation while exporting 0D molecules into ctfile format with V2 compatibility which is the default. This clean was introduced in 5.4 because the ctfile format cannot contain stereo information without coordinates.
Example: "mol:omitClean0D"
BOM Write the UTF-8 byte order mark (BOM), if the given or the system's encoding is UTF-8.
Example: "mol:BOM"

Reference