#######################
#### Start of File ####
#######################
----------------------------------------------------------------- 
File Name: vbdspec.txt 
Contents: VBD Specs Sheet 
----------------------------------------------------------------- 

Variable Block Database Version 1031, Revision A

TOPICS:
-------
Overview
VBD File Format
VBD File Specs
Platforms

OVERVIEW:
---------
The Variable Block Database is a file format used to store any
type of variable-length binary data in re-sizeable blocks. The
VBD file format was originally created to store persistent
objects in disk files. A persistent object refers to an object
that saves its state between program invocations. The VBD file
format is not limited to storing persistent objects. Any type of
data can be stored and manipulated in a VBD file. VBD files have
this capability because the methods by which any data type is
stored or retrieved must be defined within the application. 

VBD FILE FORMAT:
----------------
Every VBD file contains a file header, an optional static storage
area, and a dynamic storage area where variable blocks of data
are stored. The file header is composed of six fields and is a
total of 28 bytes in length: 

(1) Free space field - stores a pointer to the first block of
de-allocated heap space (4 bytes)

(2) End of file field - stores a pointer to the address of the
last byte in the file (4 bytes)

(3) Start of heap field - stores a pointer to the start of the
dynamic storage area (4 bytes)

(4) Highest block field - stores a pointer to the highest
allocated block (4 bytes)

(5) Signature field - signature used to identify the file type (8
bytes)

(6) Version field - used to identify the VBD file version
number(4 bytes)

This header is used to store information needed by the allocation
functions, a signature, and a version number. The static data
area is used to store fixed data that cannot be altered by any of
the dynamic allocation routines. The size of the static data must
by specified by the application that creates the VBD file. If no
static area is specified then the dynamic data area will start
directly after the VBD file header. 

The first 28 bytes of the VBD file (file addresses 0 - 28) will
always contain the VBD header. If a static area is requested by
the application, the static area will occupy the bytes requested.
For example: if an application request 20 bytes to be reserved
for a static data area when the file is created, then bytes 29
through 48 (file addresses 29 - 48) will be reserved. The dynamic
data area will start directly after that and occupy the rest of
the file. A pointer to the start of the dynamic storage area can
be calculated by adding the size of the VBD file header to the
size of the static data area. 

The dynamic data area always starts out empty and grows when
variable data blocks are allocated. The "free space", "end of
file", and "start of heap" pointers stored in the VBD file header
are used to maintain the dynamic data area. The "start of heap"
pointer stores the address where the dynamic data area starts
(which is where the static area ends.) The "end of file" pointer
marks the end of the file and points to the location where the
file can be extended during block allocation. 

Revision 'A' adds a 32-bit CRC checksum routine used to detect
any bit errors that occur during data storage. The CRC is based
on the Ethernet polynomial of 0x4C11DB7. A checksum is calculated
when data is written to the VBD file, this includes the block
header and the block data. The calculated checksum is then
compared to data actually stored on disk. If the calculated
checksum does not match the actual checksum, a bit error has
occurred during data storage. All bit errors must be handled by
the application since the type of data being stored is not known.

Revision 'A' reserves four bytes at the end of each block that
can be used by an application to store a 32-bit checksum with
each block. Each time a new block is allocated space is reserved
for the number of bytes requested plus four additional bytes. The
application is responsible for reading and writing the block
checksum since the type of data being stored is not known. The
use of a block checksum is optional and does not have to be used
to detect bit errors because of the built-in CRC checksum routine
used each time any data is stored. Some applications require the
use of block checksums to maintain the integrity of the file from
one program invocation to the next.

When blocks are de-allocated, the block is marked deleted and
left in the file. The size of the file is determined by the
number of blocks allocated and will remain the same no matter how
many blocks are deleted. The number of deleted blocks is
maintained in a non-contiguous list. Each block in the list
points to the next block in the list starting at a specified
address. The "free space" pointer in the VBD file header is used
to store the file address of the block where the free space list
starts. 

The "highest block" pointer in the VBD file header is used to
store the address of the highest allocated variable block. It is
used by the application when it needs to index data starting at
the end of the file. 

The signature and version fields in the VBD file header must be
set when a new file is created. The signature field is used to
determine if the file is of the correct type. If the "VBDFILE"
signature is not found then it is assumed that this is not a
valid VBD file. The eighth byte is used for all revision changes.
Version 1031 sets the revision letter to 'A' and performs
compatibility checks to ensure backward compatibility with
previous releases. Revision 'A' adds a 32-bit CRC checksum
routine and reserves space at the end of each block for an
application to write a 32-bit check-word. 

The version field is to represent this file version number. A
version number is used to indicate that changes have been made to
the application used to create VBD files. Version numbers can be
used inside of an application to conditionally perform certain
operations based on its value. This will ensure backward
compatibility with previous versions. 

Variable Block (VB) headers manage all the variable data blocks
created inside the dynamic data area. Block headers are used to
mark the start of a variable data block. Every time a block is
allocated a block header is written to the file. Allocation works
by writing the block header and then reserving a specified number
of bytes for the data that will be stored in the block. Revision
'A' reserves four bytes at the end of the block to allow the
application to store a block checksum. After the space is
allocated, the application is responsible for writing the data to
the file starting at the address after the block header. Each
block header contains four fields and is a total of 16 bytes in
length: 

(1) Check word field - "0x0000FEFE" check word used for file
integrity checks (4 bytes)

(2) Length field - block length including the object, block
header, and CRC (4 bytes) 

(3) Status field - stores the status of dynamic data stored in
the block (4 byte)

(4) Next deleted block field - stores a pointer to the next
deleted block (4 bytes) 

The VB header "check word" field represents a 32-bit check word
used for file integrity checks and is used to maintain
synchronization within the VBD creator and the application.
Revision 'A' uses all 32 bits of the block check-word. In
previous releases the upper 16 bits of the block check-word was
reserved for a 16-bit CRC. Revision 'A' implements a 32-bit CRC
routine and reserves four bytes at the end of each block for a
32-bit checksum. If the check word value is changed, the file
will no longer be compatible with previous releases. 

The "length" field stores the length of the object plus the size
of the block header and the size of the block's CRC checksum
value. This field is used to index the file block by block. By
reading this value the application will always know where the
next block is in sequence. The check-word is used to ensure that
the next block in sequence is a valid variable block. The length
of the object can be calculated by subtracting the size of the
block header and the size of the CRC checksum from the "length"
field.

The "status" field stores the status of dynamic data stored in
this block. Only one byte of the status field is used. The
remaining three bytes of the status field is reserved for future
use. The status of a variable data block can be determined by one
of three byte values: 'N' for normal (ASCII 78), 'D' for deleted
(ASCII 68), or 'R' for removed (ASCII 82.) A block marked 'N' for
normal indicates that the block is in use and cannot be reclaimed
by any of the allocation routines. A block marked 'D' for deleted
means that the data in the block is still valid, but the block
can be overwritten if needed. A block marked 'R' for removed
means that the data in the block has been removed and the block
can be overwritten if needed. Marking deleted blocks 'N' for
normal can easily restore them. Once a block is removed it can
never be restored. 

The "next deleted block" pointer in the VB header stores a
pointer to the next deleted or removed variable block, only if
this block has been deleted or removed. The total number of
deleted blocks is maintained in a non-contiguous list. Each
deleted or removed block in the list points to the next deleted
or removed block in the list starting at the head of the list.
The "free space" pointer in the VBD file header is used to store
the file address where the head of the free space list is
located. When a variable block is deleted or removed the next
deleted block field is set to the VBD header's free space value
to become the head of the free space list and the VBD header's
"free space" pointer is set to the address of this deleted or
removed block. 

In order to prevent the VBD files from becoming extremely
fragmented due to numerous deletions, blocks marked deleted or
removed will be reused by the allocation routine. When a new
variable block is allocated and the "free space" field is not
empty, the allocation routine will walk through the free space
list looking for a deleted or removed block of the size to be
allocated. 

One of two methods can be used to prevent fragmentation, the
best-fit method or the first-fit method. The best-fit method
works by scanning the entire free space list until the best
location to reuse a block is found. Best-fit values are
calculated based on a percentage of the number of bytes
requested. Any blocks greater then 2.5 times bigger then the
number of bytes requested will not be reused. This ensures that
smaller blocks will not use small portions of very large blocks
and all new blocks will use the maximum amount of space in any
block that is reused. However, the best-fit method is very costly
in terms of speed when the free space list is very large.

The first-fit method works by searching the free space list until
the first block of the appropriate size is found. The first block
large enough to hold two block headers plus the number of bytes
requested with at least one byte left over will be reused.
Several splits can occur if the blocks vary greatly in size. When
a block is split, the unused portion of the block is assigned a
new block header (marked removed) and placed back on the free
space list. A small block can cause a very large block to be
divided several times leaving gaps of smaller and smaller blocks. 
The reclaim method used is application dependent. If all the
blocks are more or less the same size the first-fit method is
more efficient terms of speed. If all the blocks vary greatly in
size the best-fit method is more efficient in preventing
fragmentation.
 
VBD FILE SPECIFICATIONS:
------------------------
VBD version 1031, revision 'A' requires that all file addresses
be represented by 32-bit signed integer values. This means that
the file will be allowed to grow to a maximum size of 2.1 GB. The
VBD file header must occupy 28 bytes of disk space starting at
file address zero. The header must be comprised of four 32-bit
signed integers and an eight-byte character array followed by
another 32-bit signed integer. A VBD file's static data area will
only occupy as many bytes as requested. Every variable data block
will be stored directly after the static data and be marked with
a block header. The block header must occupy 16 bytes of disk
space and be comprised of four 32-bit unsigned integers. The data
following any variable block header can have as many bytes as
requested, up to 2.1 GB. Four bytes will be reserved at the end
of each block to store an optional 32-bit checksum with each
block. 

PLATFORM INDEPENDENCE:
----------------------
File addresses are multi-byte numbers used to point to specific
locations in a disk file. The byte order in which multi-byte
numbers are stored in a computer's memory is specific to each
microprocessor. For example, the Intel x86 family of
microprocessors store the lowest-order byte first. Hewlett
Packard's PA-RISC family of microprocessors store the
highest-order byte first. A file address stored in a disk file
will represent different values if the file is created on one
system and read on the other system. 

In order to gain platform independent VBD files must use their
own representation of 32-bit signed integers for file addresses.
By manipulating the file addresses in memory before they are
written to disk, it is possible to represent 32-bit signed
integers independently of the operating system or hardware
platform used. This will overcome any byte ordering problems
encountered when writing file addresses to a common database file
accessed by several different types of machines. 

The same scheme used to overcome the byte ordering problem
encountered with 32-bit signed integers must also be applied to
32-bit unsigned integers and 16-bit integers if any of these data
types are stored in the file. Also a similar method must be
implemented for single and double precision floating point values
if any of these types are used. 

String values are represented in memory exactly as they appear.
Since none of the bytes in a string are reordered in memory, they
can be written directly from memory to disk regardless of the
platform used to create them. 
----------------------------------------------------------------- 
#####################
#### End of File ####
#####################


 
