|
This
page describes the format of an Atom Vector file.
The Atom
Vector, or av file, or AVF, consists of a header, an encoded Atom Map,
an Edge record and any number of Atom Vector records. The AVF can be
used to store single records of any type; such as coordinates,
gradients and velocities. The AVF can also be used to store multiple
records of any number of Atom Vector types or more usually of one type;
for example a molecular dynamics trajectory.
The data
are stored in unformatted form with predictable widths. This allows a
given record to be located quickly. The Atom Vectors are stored as
single precision floating point numbers.
Header
Every
AVF must have a three-part header. The first part is:
|
Data
|
C
Type
|
|
File Tag = "AtmVctr"
|
char[8]
|
|
Version Number
|
int
|
|
File ID = 0x41564543
|
int
|
|
Number of Atoms
|
int
|
|
Number of Dimensions
|
int
|
The File Tag and the File ID must have the exact values
listed in the above table. The Version Number can have any value but
cannot be omitted. This field can be used in any way. All the Taro
utilities will create AVF with Version Number of zero and will read but
ignore this number. The Number of Atoms and Number of
Dimensions
determine the size of each Atom Vector record. Each record will have as
many items as the product of these two numbers.
The
second part of the header consists of a length count and the encoded
form of the Atom Map:
|
Data
|
C
Type
|
|
Length of Encoded Atom Map (not including the
terminating null character)
|
int
|
|
Encoded Atom Map
|
char[]
|
If there
is no atom map, this part of the header contains only the length item
which has the value zero. The encoded atom map must contain exactly the
same number of atoms as declared earlier. An atom vector will be read
from an AVF into a mapped Atom Vector only if the hierarchy of atom
names match. When reading into an anonymous Atom Vector, this part of
the header is skipped.
The
encoded Atom Map follows very strict
formatting rules. It must be balanced and complete. This is not
something that should be formed by hand.
The last
part of the header consists of:
|
Data
|
C
Type
|
|
Length of the Edge List
|
int
|
|
Edge List
|
<not
implemented>
|
The edge
record is used to draw bond connections in the visual representation of
molecules. If this record is not present the bonds can be created using
the usual distance criterion. This is not yet implemented and only the
length count, which should be zero, is expected to be present.
Records
There
may be any number of Atom Vector records. The only restriction is that
each record contains the same number of items, i.e., [Number of
Atoms] x [Number of
Dimensions]
single precision floating point numbers. The Atom Vectors must also
belong to the same atom map, if there is one.
|
Data
|
C
Type
|
|
Type ID
|
int
|
|
Annotation
|
char[80]
|
| Centroid |
float [Number of Dimensions]
|
|
Atom Vector
|
float [Number of
Atoms] [Number of
Dimensions]
|
The Type ID is a number to identify
vector type. For example, one may use -1 to identify a coordinate
vector, -2 for a velocity vector and so on. (The only official ID is
zero; Negative IDs are semi-official; user IDs should be positive
numbers.) When a record is read from file, the request includes the
expected Type ID.
If the expected Type ID does not match the ID read
from the file, the Atom Vector is not read and the error condition is
flagged. If the expected Type ID is zero, the matching
criterion is not enforced. If the Type ID encoded in the file is
zero, any Type ID
expected by the user will be matched.
The Annotation is a text field of 79
characters. If the user enters a longer note, the excess characters are
simply removed. The annotation field is always present. It may hold
only null characters. The annotation is printed on the standard error
stream when a record is read.
The
coordinate information are stored as a centroid and the coordinates
centered at the origin.
Variants
The AVF
is an unformatted file. This is the form that is used in all programs.
Records in the file header are either fixed width or variable width but
with a length indicator. All records in the body of the file have a
fixed width; if necessary, data fields are padded with null entries.
This allows records to be easily located, and once a record is located,
it can be read quickly.
The
older binary format is still supported but should be converted to the
current format. The main difference is that the older format uses
"atmvctr" as the File Tag and the coordinates are stored without
recentering and as double precision floating point numbers. Thus, the
older files are about twice the size of the current version.
AVFs are
not meant to be transferred between different classes of machines.
However, the following types of machines can use the same AVFs without
conversion: IRIX/MIPS, Linux/PowerPC, MacOS (Classic and X)/PowerPC,
these machines order the data in the same direction. The other camp is
represented by Linux/Intel; the hardware has a different byte order
from the other machines. Within machines of the same byte-ordering,
AVFs are compatible among machines that define ints to be 32-bit wide; these
include many 64-bit machines.
AVFs
should be converted to a formatted form before transferring between
machines with different data order or for storage. Compress the
formatted file (using GZIP for example) before archival storage; the
archive
can later be used on any platform. The formatted form of this type of
file is shown in outline form below. Use the Yup.avf program to convert
between the formatted form and the binary AVF file, or from an alien
byte-order to the native order.
ATMVCTR version-number 1096172867 number-of-atoms number-of-dimensions 500 19
&length-of-atommap
Atom Map ...
...
~length-of-edgelist
#1 record-type annotation
@0 dimension-1 dimension-2
dimension-3 ...
...
@1 ...
#2 record-type annotation
@0 ...
Shown in
red are the values that must appear exactly as shown. The first line
must start with the text "ATMVCTR", followed by the file version number
(any value), a magic number (1096172867), the number of atoms, the
number of dimensions, the column width (500) and the maximum number of
dimensions on a line (19). The second line must start with a checkpoint
character "&" followed by the length of the atom map (number of
characters). The atom map is then listed on the following lines; all
lines must not exceed the column width declared in line 1, except for
the last line. Following this is the "~" checkpoint character followed
by the length of the edge-list which is listed in the following lines.
Each
record starts with the "#" checkpoint character followed by the
sequence number which must start with 1 and be consecutive. This is
followed on the same line by the record type (an integer) and the
annotation up to 79 characters. The coordinates are listed
consecutively for each atom starting from atom 0. Each listing starts
with the "@" checkpoint followed by the atom number and then the
coordinates, continuing over additional lines if the number of
dimensions exceeds 19.
Use
Yup.avf to convert an existing AVF file to a formatted text dump, which
you can then examine to figure out the layout.
Several
fields of the file can contain arbitrary text that include white
spaces. This makes the file hard to parse. The checkpoint characters
help to determine if the parsing is on track. It is preferable to use
the tab character instead of a single space. Do not use more than one
space character. Do not insert white space or other characters before
or after data. Do not insert blank lines except to separate records.
Older
UNIX editors may not be able to handle the long lines in this file.
Your text editor may automatically fold long lines.
Using
AVFs
We
suggest that simulation procedures include a step to convert any
generated AVFs to a formatted text dump, which is then compressed using
GZIP. Then save the job script file, the standard output from the
execution of the file, the compressed AVF text dump and other files
relating to the project, to a storage medium.
The compressed AVF text dump has to be decompressed and converted back
to an unformatted AVF before it can be analyzed. However, by storing
the AtomVector data in text form, they can be converted to the binary
form on any platform, even one that has the opposite word order from
the machine that produced the data.
Utilities
Yup.avf interconverts between the
formatted and unformatted forms of the AVF, and between little and big
endian files.
|