Introduction
The
rrRNAv1 force field assembler requires a
description of the molecule. The description is
called a blueprint and it comes in two forms, one
for RNA and another for Proteins. Blueprints are
Python dictionaries. It is possible to create the
dictionary within a Python script and then pass the
dictionary to the force field assembler within the
same script. However, it is better to define the
dictionary in a Python module so that it can be
used in many programs. In fact, the rrRNAv1 force
field assembler contains a library of ready-made
molecules and molecular assemblies.
RNA
Blueprint
The
RNA blueprint is a dictionary with keys and
associated values listed in the following table. A
complete RNA blueprint would have at least the
first four keys even if the associated values are
empty tuples. The keys are always strings. Beyond
the first four keys, the other keys fall into two
groups. The "FIX" key references a list of strings
that form the first group of keys. The other group
of keys each reference a constraint. For a
blueprint B:
|
Key
|
Value
|
|
"RNA"
|
Secondary
structure: this is a tuple of three
items. Since the tuple is recursive, it
can be deeply nested. See secondary
structure
for details.
|
|
"BSQ"
|
Base
sequence: tuple of uppercase strings -
single letter code for the standard bases,
multiple letters for modified bases. See
Yup.Tools.ChemNames
for acceptable names.
|
|
"XYZ"
|
Coordinates:
tuple of tuples of three floats
|
|
"FIX"
|
Named
Sets: tuple of strings - each string
is a key of B that maps to a list
of names (next row).
|
|
any string of
B["FIX"]
|
Sets
of constraints: tuple of strings -
each string is a key of B that maps
to a constraint (next row).
|
|
any other
string
|
Constraint:
tuple of individual
settings
|
For
a concrete example, see the file
Yup/Models/rrRNAv1/TNA4/tRNAPhe.py.
In it is defined a dictionary named BLUE.
BLUE["FIX"] maps to a tuple of a single
string: (
"Default", ).
BLUE["Default"] maps to a tuple of 12
strings. Each of these strings is the key to a
constraint. For example, BLUE["T-loop
angle"] is a constraint containing nine
SET3ATOMS settings. In the next chapter, you will
see how these constraints can be applied. One can
specify the constraint as a single string out of
BLUE["FIX"], in this case there is only one
choice "Default"
and the outcome is that all 12 constraints named in
BLUE["Default"] are applied. Or, one can
supply a list of strings, e.g.
(
"T-loop distance", "D-loop/T-loop", "D-loop
distance" )
and only these three constraints will be applied.
Finally, one can also name
None
as a constraint and the outcome is
obvious.
Protein
Blueprint
The
Protein blueprint is a dictionary with the
following keys and associated values. A complete
protein blueprint would have at least the first
three keys even if the associated values are empty
tuples. Keys are always strings. Beyond the first
three strings, each of the remaining string maps to
a constraint. A constraint is a list of
settings.
|
Key
|
Value
|
|
"SEQ"
|
Amino
acid sequence: tuple of strings, each
string starts with the lower case single
letter code for the amino acid followed by
the sequence number.
|
|
"XYZ"
|
Coordinates:
tuple of tuples of three floats
|
|
"FIX"
|
Names
of Default Constraints: tuple of
strings - each string is a key to a
constraint (next row). These constraints
are applied unless an alternative set is
specified.
|
|
any other
string
|
Constraint:
tuple of individual
settings.
|
For
a concrete example, see the file
Yup/Models/rrRNAv1/GIX1/THX.py. In it is defined a
dictionary named BLUE. BLUE["FIX"] maps to
a tuple of a single string: ( "8A cutoff to nearest
0.1A", ). BLUE["8A cutoff to nearest 0.1A"'
maps to a list of 102 distance settings. There is
one other constraint and it is named "8A cutoff to
nearest 0.01A".
Secondary
Structure Description
The
secondary structure description is a nested (or
recursive) tuple built from three components. The
keywords DOMAIN, TRACT and
HELIX, defined in
Yup.Models.rrRNAv1.const,
identify these components. Each component has three
parts: a keyword, the name of the component and the
content of the component.
|
Keyword
|
Name
|
Content
|
|
DOMAIN
|
A
string to name this component. A group is
created with this name, to hold the
contents of this component. If the name is
a blank string, the contents are placed in
the parent group.
|
tuple
of any number of DOMAIN,
TRACT or HELIX.
|
|
TRACT
|
tuple
of two integers: ( x, y )
where y >= x > 0
defines a single stranded tract with the
sequence number running from x to
y. The length of the tract is
y - x + 1.
|
|
HELIX
|
tuple
of three integers: ( x, y,
z ) where z > x
> 0 defines a double helix where the
sequence number of one strand runs from
x to x + y - 1 and
the sequence number of the other strand
runs from z to z + y
- 1.
|
For
example a stem-loop can be defined as:
(
DOMAIN, "", ( ( HELIX, "stem", ( 1,
5, 10 ) ), ( TRACT, "loop", ( 6, 9 ) ) )
).
This is an unnamed domain containing a helix and a
tract. The helix is five basepairs long and is
named "stem". One strand runs from 1 to 5 and the
other from 10 to 14. An unpaired strand runs from 6
to 9 and it is named 'loop". The rrRNAv1 force fiel
assembler will place 10 P-atoms and 4 X-atoms in a
group named "stem". Into another group named "loop"
will be placed four P-atoms. The two groups are
then placed in the root group.
Settings
An
individual setting is a tuple of four
items:
|
(
keyword,
|
(
tuple-of-atom-names
),
|
(
tuple-of-atom-inclusion-type-names
),
|
target-value
)
|
The
keywords are defined in
Yup.Models.rrRNAv1.const:
they are
[SET|MOD][234][ATOMS], i.e.
six constants SET2ATOMS to MOD4ATOMS.
The
tuple of atom names list the names of interacting
atoms. Thus, i names are required for an
interaction that involves i
atoms.
There
must be the same number of atom inclusion type
names. These are not arbitrary strings; there must
be actual atom inclusion types defined in the
parameter library. Furthermore, the list of atom
types must correspond to an existing interaction
type in the parameter library.
The
target value has unit of Angstroms for distances
and degrees for angles.
In
each case, the group of atoms are set to interact
with parameters referenced by the set of atom
inclusion types. In 'SET' settings, we accept
whatever equilibrium dimensions (distance, angles,
torsion angles) the list of atom inclusion types
references; the target value is ignored. In 'MOD'
settings, we also change the equilibrium dimension
to the target value. All interactions that make use
of this parameter set will now be subject to the
new equilibrium value.
For
example: (
SET2ATOMS, ( 'LYS023', 'SER024' ), ( 'alfaC',
'alfaC' ), 3.8 )
sets a two-atom interaction, i.e. a bond between
the atoms named 'LYS023' and 'SER024' and the bond
will be of the type ":alfaC:alfaC:". The target
value is 3.8Å but this is purely for
documentation. The actual distance constraint is
whatever equilibrium bond length is set for the
":alfaC:alfaC:" bond type.
Another
example: (
MOD3ATOMS, ( 'U09', 'U10', 'G11' ), ( '#0', '#0',
'#1' ), 175.2 )
sets a three-atom interaction, i.e. an angle
between the three atoms 'U09', 'U10' and 'G11' and
the angle will be of the type ":#0:#0:#1:".
Furthermore, the equilibrium angle for this angle
type will be set to 175.2 degrees. This means that
all interactions that make use of this interaction
type will now be subject to an equilibrium angle of
175.2 degrees.
Creating
Blueprints
The
programs to create blueprints are not complete.
These programs are located in
Yup/Models/rrRNAv1/lib/. These are incomplete
implementations. Two major defects are: [1]
there is no automatic processing of secondary
structures, these have to be manually translated
(perhaps from RNAML)
and [2] the input files must be heavily
edited Protein Data Bank (PDB) files.
The
mkprobp script processes a (stripped down)
PDB file and generates a Protein blueprint file.
This script takes two or three arguments. The first
argument is the name of the PDB file. This file
must be stripped down to only the ATOMS and
HETATOMS records of the alpha carbons of one
protein. The second argument is the name of the
output Protein blueprint file. Since this is a
Python program, the file should be given a name
with the ".py" extension. The third argument is
optional and it is the cutoff distance. If not
specified, a distance of 8.0Å is used. The
generated blueprint will contain two sets of
distance constraints. These are all the alpha
carbon pairs that are within the cutoff
distance.
The
mkrnabp script processes a (stripped down)
PDB file and generates an incomplete RNA blueprint
file. This script takes three or four arguments.
The first argument is the name of the stripped down
PDB file. The file must contain only the ATOM or
HETATOM records for the Phosphorus atom of one RNA
molecule. The second argument is the name of a
module that defines the constraints. The file will
probably have a name with the extension ".py" but
the script needs the name of the module not the
file. The third argument is the name of the output
RNA blueprint file. Since this is a Python program,
you would want to specify a name with the ".py"
extension. The last argument is optional; it
specifies the number of places of precisions to
use. This must be 0, 1 or 2 only. If not specified,
2 is assumed.
Constraint
Specifications
The
constraint specification file is a python program
that defines a tuple named "TODO". TODO is a tuple
of tuples. Each of the inner tuple contains a
string as the first item and the other items are
two, three or four item tuples; the contents of
these tuples are atom names. For
example,
TODO
= ( ( "set1", ( "atom1", "atom2", "atom3" ), (
"atom1", "atom2" ) ), ( "set2", ( "atomx",
"atomy" ), ( "atoma", "atomb", "atomc", "atomd"
) ) )
defines
two constraints. The first is named "set1" and it
contains two settings. The first setting is for an
angle involving the atoms "atom1". "atom2" and
"atom3". The second setting is for a bond involving
the atoms "atom1" and "atom2". The second
constraint is named "set2" and it has two settings:
for a bond and a torsion.
See
Yup/Models/rrRNAv1/lib/CXtRNA.py
for an example of a constraint specification. Note
how the definition of TODO is built in steps using
intermediate variables. This makes the file much
easier to read. However, bear in mind that TODO is
a deeply nested tuple.
Force
Field Assembly
Analysis
Service
Functions
|