Technical Documentation
A Reduced Representation Model of RNA: Preparation

Introduction

The rrRNAv1 force field assembler requires a description of the molecule. The description is called a blueprint and it comes in two forms, one for RNA and another for Proteins. Blueprints are Python dictionaries. It is possible to create the dictionary within a Python script and then pass the dictionary to the force field assembler within the same script. However, it is better to define the dictionary in a Python module so that it can be used in many programs. In fact, the rrRNAv1 force field assembler contains a library of ready-made molecules and molecular assemblies.

RNA Blueprint

The RNA blueprint is a dictionary with keys and associated values listed in the following table. A complete RNA blueprint would have at least the first four keys even if the associated values are empty tuples. The keys are always strings. Beyond the first four keys, the other keys fall into two groups. The "FIX" key references a list of strings that form the first group of keys. The other group of keys each reference a constraint. For a blueprint B:

Key

Value

"RNA"

Secondary structure: this is a tuple of three items. Since the tuple is recursive, it can be deeply nested. See secondary structure for details.

"BSQ"

Base sequence: tuple of uppercase strings - single letter code for the standard bases, multiple letters for modified bases. See Yup.Tools.ChemNames for acceptable names.

"XYZ"

Coordinates: tuple of tuples of three floats

"FIX"

Named Sets: tuple of strings - each string is a key of B that maps to a list of names (next row).

any string of B["FIX"]

Sets of constraints: tuple of strings - each string is a key of B that maps to a constraint (next row).

any other string

Constraint: tuple of individual settings

For a concrete example, see the file Yup/Models/rrRNAv1/TNA4/tRNAPhe.py. In it is defined a dictionary named BLUE. BLUE["FIX"] maps to a tuple of a single string: ( "Default", ). BLUE["Default"] maps to a tuple of 12 strings. Each of these strings is the key to a constraint. For example, BLUE["T-loop angle"] is a constraint containing nine SET3ATOMS settings. In the next chapter, you will see how these constraints can be applied. One can specify the constraint as a single string out of BLUE["FIX"], in this case there is only one choice "Default" and the outcome is that all 12 constraints named in BLUE["Default"] are applied. Or, one can supply a list of strings, e.g. ( "T-loop distance", "D-loop/T-loop", "D-loop distance" ) and only these three constraints will be applied. Finally, one can also name None as a constraint and the outcome is obvious.

Protein Blueprint

The Protein blueprint is a dictionary with the following keys and associated values. A complete protein blueprint would have at least the first three keys even if the associated values are empty tuples. Keys are always strings. Beyond the first three strings, each of the remaining string maps to a constraint. A constraint is a list of settings.

Key

Value

"SEQ"

Amino acid sequence: tuple of strings, each string starts with the lower case single letter code for the amino acid followed by the sequence number.

"XYZ"

Coordinates: tuple of tuples of three floats

"FIX"

Names of Default Constraints: tuple of strings - each string is a key to a constraint (next row). These constraints are applied unless an alternative set is specified.

any other string

Constraint: tuple of individual settings.

For a concrete example, see the file Yup/Models/rrRNAv1/GIX1/THX.py. In it is defined a dictionary named BLUE. BLUE["FIX"] maps to a tuple of a single string: ( "8A cutoff to nearest 0.1A", ). BLUE["8A cutoff to nearest 0.1A"' maps to a list of 102 distance settings. There is one other constraint and it is named "8A cutoff to nearest 0.01A".

Secondary Structure Description

The secondary structure description is a nested (or recursive) tuple built from three components. The keywords DOMAIN, TRACT and HELIX, defined in Yup.Models.rrRNAv1.const, identify these components. Each component has three parts: a keyword, the name of the component and the content of the component.

Keyword

Name

Content

DOMAIN

A string to name this component. A group is created with this name, to hold the contents of this component. If the name is a blank string, the contents are placed in the parent group.

tuple of any number of DOMAIN, TRACT or HELIX.

TRACT

tuple of two integers: ( x, y ) where y >= x > 0 defines a single stranded tract with the sequence number running from x to y. The length of the tract is y - x + 1.

HELIX

tuple of three integers: ( x, y, z ) where z > x > 0 defines a double helix where the sequence number of one strand runs from x to x + y - 1 and the sequence number of the other strand runs from z to z + y - 1.

For example a stem-loop can be defined as: ( DOMAIN, "", ( ( HELIX, "stem", ( 1, 5, 10 ) ), ( TRACT, "loop", ( 6, 9 ) ) ) ). This is an unnamed domain containing a helix and a tract. The helix is five basepairs long and is named "stem". One strand runs from 1 to 5 and the other from 10 to 14. An unpaired strand runs from 6 to 9 and it is named 'loop". The rrRNAv1 force fiel assembler will place 10 P-atoms and 4 X-atoms in a group named "stem". Into another group named "loop" will be placed four P-atoms. The two groups are then placed in the root group.

Settings

An individual setting is a tuple of four items:

( keyword,

( tuple-of-atom-names ),

( tuple-of-atom-inclusion-type-names ),

target-value )

The keywords are defined in Yup.Models.rrRNAv1.const: they are [SET|MOD][234][ATOMS], i.e. six constants SET2ATOMS to MOD4ATOMS.

The tuple of atom names list the names of interacting atoms. Thus, i names are required for an interaction that involves i atoms.

There must be the same number of atom inclusion type names. These are not arbitrary strings; there must be actual atom inclusion types defined in the parameter library. Furthermore, the list of atom types must correspond to an existing interaction type in the parameter library.

The target value has unit of Angstroms for distances and degrees for angles.

In each case, the group of atoms are set to interact with parameters referenced by the set of atom inclusion types. In 'SET' settings, we accept whatever equilibrium dimensions (distance, angles, torsion angles) the list of atom inclusion types references; the target value is ignored. In 'MOD' settings, we also change the equilibrium dimension to the target value. All interactions that make use of this parameter set will now be subject to the new equilibrium value.

For example: ( SET2ATOMS, ( 'LYS023', 'SER024' ), ( 'alfaC', 'alfaC' ), 3.8 ) sets a two-atom interaction, i.e. a bond between the atoms named 'LYS023' and 'SER024' and the bond will be of the type ":alfaC:alfaC:". The target value is 3.8Å but this is purely for documentation. The actual distance constraint is whatever equilibrium bond length is set for the ":alfaC:alfaC:" bond type.

Another example: ( MOD3ATOMS, ( 'U09', 'U10', 'G11' ), ( '#0', '#0', '#1' ), 175.2 ) sets a three-atom interaction, i.e. an angle between the three atoms 'U09', 'U10' and 'G11' and the angle will be of the type ":#0:#0:#1:". Furthermore, the equilibrium angle for this angle type will be set to 175.2 degrees. This means that all interactions that make use of this interaction type will now be subject to an equilibrium angle of 175.2 degrees.

Creating Blueprints

The programs to create blueprints are not complete. These programs are located in Yup/Models/rrRNAv1/lib/. These are incomplete implementations. Two major defects are: [1] there is no automatic processing of secondary structures, these have to be manually translated (perhaps from RNAML) and [2] the input files must be heavily edited Protein Data Bank (PDB) files.

The mkprobp script processes a (stripped down) PDB file and generates a Protein blueprint file. This script takes two or three arguments. The first argument is the name of the PDB file. This file must be stripped down to only the ATOMS and HETATOMS records of the alpha carbons of one protein. The second argument is the name of the output Protein blueprint file. Since this is a Python program, the file should be given a name with the ".py" extension. The third argument is optional and it is the cutoff distance. If not specified, a distance of 8.0Å is used. The generated blueprint will contain two sets of distance constraints. These are all the alpha carbon pairs that are within the cutoff distance.

The mkrnabp script processes a (stripped down) PDB file and generates an incomplete RNA blueprint file. This script takes three or four arguments. The first argument is the name of the stripped down PDB file. The file must contain only the ATOM or HETATOM records for the Phosphorus atom of one RNA molecule. The second argument is the name of a module that defines the constraints. The file will probably have a name with the extension ".py" but the script needs the name of the module not the file. The third argument is the name of the output RNA blueprint file. Since this is a Python program, you would want to specify a name with the ".py" extension. The last argument is optional; it specifies the number of places of precisions to use. This must be 0, 1 or 2 only. If not specified, 2 is assumed.

Constraint Specifications

The constraint specification file is a python program that defines a tuple named "TODO". TODO is a tuple of tuples. Each of the inner tuple contains a string as the first item and the other items are two, three or four item tuples; the contents of these tuples are atom names. For example,

TODO = ( ( "set1", ( "atom1", "atom2", "atom3" ), ( "atom1", "atom2" ) ), ( "set2", ( "atomx", "atomy" ), ( "atoma", "atomb", "atomc", "atomd" ) ) )

defines two constraints. The first is named "set1" and it contains two settings. The first setting is for an angle involving the atoms "atom1". "atom2" and "atom3". The second setting is for a bond involving the atoms "atom1" and "atom2". The second constraint is named "set2" and it has two settings: for a bond and a torsion.

See Yup/Models/rrRNAv1/lib/CXtRNA.py for an example of a constraint specification. Note how the definition of TODO is built in steps using intermediate variables. This makes the file much easier to read. However, bear in mind that TODO is a deeply nested tuple.


Force Field Assembly
Analysis

Service Functions

Manual
Introduction
Simplified RNA
Simplified DNA
Emmental
Virus Packing
Yup.scx Yup.vlat

Home
Information
News
User
Technical
Programmer
iYup
Download
Showcase
ETC