Technical Documentation
The Atom Vector File

This page describes the format of an Atom Vector file.

The Atom Vector, or av file, or AVF, consists of a header, an encoded Atom Map, an Edge record and any number of Atom Vector records. The AVF can be used to store single records of any type; such as coordinates, gradients and velocities. The AVF can also be used to store multiple records of any number of Atom Vector types or more usually of one type; for example a molecular dynamics trajectory.

The data are stored in unformatted form with predictable widths. This allows a given record to be located quickly. The Atom Vectors are stored as single precision floating point numbers.

Header

Every AVF must have a three-part header. The first part is:

Data

C Type

File Tag = "AtmVctr"

char[8]

Version Number

int

File ID = 0x41564543

int

Number of Atoms

int

Number of Dimensions

int

The File Tag and the File ID must have the exact values listed in the above table. The Version Number can have any value but cannot be omitted. This field can be used in any way. All the Taro utilities will create AVF with Version Number of zero and will read but ignore this number. The Number of Atoms and Number of Dimensions determine the size of each Atom Vector record. Each record will have as many items as the product of these two numbers.

The second part of the header consists of a length count and the encoded form of the Atom Map:

Data

C Type

Length of Encoded Atom Map (not including the terminating null character)

int

Encoded Atom Map

char[]

If there is no atom map, this part of the header contains only the length item which has the value zero. The encoded atom map must contain exactly the same number of atoms as declared earlier. An atom vector will be read from an AVF into a mapped Atom Vector only if the hierarchy of atom names match. When reading into an anonymous Atom Vector, this part of the header is skipped.

The encoded Atom Map follows very strict formatting rules. It must be balanced and complete. This is not something that should be formed by hand.

The last part of the header consists of:

Data

C Type

Length of the Edge List

int

Edge List

<not implemented>

The edge record is used to draw bond connections in the visual representation of molecules. If this record is not present the bonds can be created using the usual distance criterion. This is not yet implemented and only the length count, which should be zero, is expected to be present.

Records

There may be any number of Atom Vector records. The only restriction is that each record contains the same number of items, i.e., [Number of Atoms] x [Number of Dimensions] single precision floating point numbers. The Atom Vectors must also belong to the same atom map, if there is one.

Data

C Type

Type ID

int

Annotation

char[80]

Centroid

float [Number of Dimensions]

Atom Vector

float [Number of Atoms] [Number of Dimensions]

The Type ID is a number to identify vector type. For example, one may use -1 to identify a coordinate vector, -2 for a velocity vector and so on. (The only official ID is zero; Negative IDs are semi-official; user IDs should be positive numbers.) When a record is read from file, the request includes the expected Type ID. If the expected Type ID does not match the ID read from the file, the Atom Vector is not read and the error condition is flagged. If the expected Type ID is zero, the matching criterion is not enforced. If the Type ID encoded in the file is zero, any Type ID expected by the user will be matched.

The Annotation is a text field of 79 characters. If the user enters a longer note, the excess characters are simply removed. The annotation field is always present. It may hold only null characters. The annotation is printed on the standard error stream when a record is read.

The coordinate information are stored as a centroid and the coordinates centered at the origin.

Variants

The AVF is an unformatted file. This is the form that is used in all programs. Records in the file header are either fixed width or variable width but with a length indicator. All records in the body of the file have a fixed width; if necessary, data fields are padded with null entries. This allows records to be easily located, and once a record is located, it can be read quickly.

The older binary format is still supported but should be converted to the current format. The main difference is that the older format uses "atmvctr" as the File Tag and the coordinates are stored without recentering and as double precision floating point numbers. Thus, the older files are about twice the size of the current version.

AVFs are not meant to be transferred between different classes of machines. However, the following types of machines can use the same AVFs without conversion: IRIX/MIPS, Linux/PowerPC, MacOS (Classic and X)/PowerPC, these machines order the data in the same direction. The other camp is represented by Linux/Intel; the hardware has a different byte order from the other machines. Within machines of the same byte-ordering, AVFs are compatible among machines that define ints to be 32-bit wide; these include many 64-bit machines.

AVFs should be converted to a formatted form before transferring between machines with different data order or for storage. Compress the formatted file (using GZIP for example) before archival storage; the archive can later be used on any platform. The formatted form of this type of file is shown in outline form below. Use the Yup.avf program to convert between the formatted form and the binary AVF file, or from an alien byte-order to the native order.

ATMVCTR  version-number  1096172867  number-of-atoms  number-of-dimensions  500  19
&length-of-atommap
Atom Map ...
...
~length-of-edgelist

#record-type  annotation
@0  dimension-1  dimension-2  dimension-3  ...
    ...
@1  ...

#2  record-type annotation
@0  ...

Shown in red are the values that must appear exactly as shown. The first line must start with the text "ATMVCTR", followed by the file version number (any value), a magic number (1096172867), the number of atoms, the number of dimensions, the column width (500) and the maximum number of dimensions on a line (19). The second line must start with a checkpoint character "&" followed by the length of the atom map (number of characters). The atom map is then listed on the following lines; all lines must not exceed the column width declared in line 1, except for the last line. Following this is the "~" checkpoint character followed by the length of the edge-list which is listed in the following lines.

Each record starts with the "#" checkpoint character followed by the sequence number which must start with 1 and be consecutive. This is followed on the same line by the record type (an integer) and the annotation up to 79 characters. The coordinates are listed consecutively for each atom starting from atom 0. Each listing starts with the "@" checkpoint followed by the atom number and then the coordinates, continuing over additional lines if the number of dimensions exceeds 19.

Use Yup.avf to convert an existing AVF file to a formatted text dump, which you can then examine to figure out the layout.

Several fields of the file can contain arbitrary text that include white spaces. This makes the file hard to parse. The checkpoint characters help to determine if the parsing is on track. It is preferable to use the tab character instead of a single space. Do not use more than one space character. Do not insert white space or other characters before or after data. Do not insert blank lines except to separate records.

Older UNIX editors may not be able to handle the long lines in this file. Your text editor may automatically fold long lines.

Using AVFs

We suggest that simulation procedures include a step to convert any generated AVFs to a formatted text dump, which is then compressed using GZIP. Then save the job script file, the standard output from the execution of the file, the compressed AVF text dump and other files relating to the project, to a storage medium.

The compressed AVF text dump has to be decompressed and converted back to an unformatted AVF before it can be analyzed. However, by storing the AtomVector data in text form, they can be converted to the binary form on any platform, even one that has the opposite word order from the machine that produced the data.

Utilities

Yup.avf interconverts between the formatted and unformatted forms of the AVF, and between little and big endian files.

Format
Yup.avf

Technical
Introduction
Directory
Vectors
Energy
Model
Assembly
Methods
FPF
FFF
AVF
TaroScript
YammpScript
Python
Utilities

Home
Information
News
User
Technical
Programmer
iYup
Download
Showcase
ETC