py::bio Namespace Reference

Functions

def find
def sfx
def parseSfx
def mkSfx
def mapP
def resize
def mapW
def save
def load
def Str
def strSet
def ldRep
def isNA
def cvt

Variables

 E = os.environ.get
list tys = ['b','B','h','H','i','I','l','L','f','d']
dictionary chToTy
dictionary tySz
dictionary tyToCh
dictionary NA
tuple path = E("BIO_PATH","./")

Detailed Description

bio.py - Binary Input/Ouput Library
Author: Andrew Schein

BIO -- Library for storing vectors and matrices of basic C type data
in self-describing binary files.

Briefly, this module helps solve 3 problems:

1. How to store binary data of basic C types on disk in a tool-neutral
   format.
2. How to represent missing values.
3. How to access large binary data without incredably large executable
   start up costs or heavy- handed software architecture. (e.g. lazy
   loading of disk pages)

In greater detail...

This module provides a python numpy/ndarray interface to what is
ultimately a C standard (on-going work) for representing binary data
of basic C types on disk.  Type descriptors are given by the BIO
suffix.  File size (combined with the size of the underlying C type)
is used to infer row span.  For matrices, the BIO suffix provides
column span information.  Eventually, the library will provide
facility for cubes and other n-dimensional structures consisting of
basic C numeric types.  Character strings are encoded via bytes (as in
C).

BIO files can be memory-mapped using numpy's memory map interface,
providing convenient loading of data as it is actually used.  The BIO
library provides mapP (memory map 'private'--disk copy is unaltered)
and mapW (memory map 'writable'--changes are stored on disk) functions
for this purpose, in addition to a more conventional save routine for
storing unmapped numpy ndarray's to the file system.

A desired property of a database is the represention of missing
values, e.g.  NULL values in a SQL database.  Python's numpy has no
such facility or standard, and so BIO establishes a convention for
each type.  The missing code is called NA ('not applicable').  For
floats and doubles, NAN values will suffice. For signed integer types,
BIO establishes the largest magnitude negative number as the NA code.
For unsigned integer types, BIO establishes the largest magnitude
number as the NA code. Note that standard numpy type conversions
(.asType('')) will not convert NA codes properly, and so BIO provides
NA-aware ndarray conversions.


Function Documentation

def py::bio::cvt (   b,
  newTy 
)
INPUT a ndarray b, type specifier newTy
    POST convert b to newTy while preserving NA values 
def py::bio::find (   p,
  wrn = True 
)
INPUT: (potentially partial) directory path p, warning toggle wrn.
    POST:  look up filename p as absolute path (if it has BIO suffix) or else in BIO_PATH. 
def py::bio::isNA (   b  ) 
INPUT: a ndarray
    POST returns byte ndarray describing NA structure of b
def py::bio::ldRep (   f  ) 
def py::bio::load (   f  ) 
INPUT: file name f
    POST:  load BIO array f into matrix and return matrix 
def py::bio::mapP (   p  ) 
INPUT: path to file.
   POST: if p contains a BIO suffix, map privately p.  Otherwise, use find to locate the file
   and mapp that one.  Raises an error if file can't be found.
def py::bio::mapW (   p,
  rows,
  replace = True,
  fill = True,
  default = NA 
)
INPUT:
    p:       absolute path to file for storage including suffix (used to infer size).
    rows:    the number of rows in the file.
    cols:    the column structure as a list.  Currenly only 1/2 dimensions are supported.
    replace: do we eliminate data in current file (if present).
    fill:    do we fill newly allocated space with the default value.
    default: value to fill matrix when fill is set.  Defaults to type-specific NA value
    POST:    create space on disk for ndarray, mmap and returns ndarray.
    Status:  first draft does not support resizing or re-using files. To be added.
def py::bio::mkSfx (   shape,
  typCh 
)
INPUT: numpy ndarray b with >= 1 dimensions.
    POST: construct BIO suffix.  Raise error if ndarray type is not in BIO set.
def py::bio::parseSfx (   sfx  ) 
INPUT: BIO suffix sfx, e.g. .B1000d
POST: parses sfx into column span and type information 
def py::bio::resize (   p,
  rows,
  default = NA 
)
resizing code is activated by mapW 
def py::bio::save (   f,
  b 
)
INPUT: file name prefix f, ndarray b
POST:  write b to f + BIO suffix.
def py::bio::sfx (   fname  ) 
INPUT: fname -- a string.
   POST: sfx returns a tuple (fname_pre,suffix) which splits fname into everything before
   and after the suffix. 
def py::bio::Str (   b  ) 
INPUT: ndarray of bytes or ubytes representing strings.
    POST:  returns string 
def py::bio::strSet (   b,
  string,
  nullTerm = True 
)
INPUT: a byte/ubtye ndarray intended to receive string.
    POST:  will guarantee null termination by default. Will truncate string as necessary.

Variable Documentation

dictionary py::bio::chToTy
Initial value:
{ 'b'       : N.int8   ,    'B'   : N.uint8,  'h'      : N.int16,     'H' : N.uint16,
           'i'       : N.int32  ,    'I'   : N.uint32, 'l'      : N.int64,     'L' : N.uint64,
           'f'       : N.float32,    'd'   : N.float64}
py::bio::E = os.environ.get
dictionary py::bio::NA
Initial value:
{ N.int8    : -1 << 7 , N.uint8   : (1 << 8) , N.int16 : -1<<15, N.uint16  : 1 << 16,
           N.int32   : -1 << 31, N.uint32  : (1 << 32), N.int64 : -1<<63, N.uint64  : 1 << 64,
           N.float32 : N.nan   , N.float64 : N.nan}
tuple py::bio::path = E("BIO_PATH","./")
list py::bio::tys = ['b','B','h','H','i','I','l','L','f','d']
dictionary py::bio::tySz
Initial value:
{   'b'       : 1   ,         'B'   : 1,        'h'      : 2,           'H' : 2,
           'i'       : 4   ,         'I'   : 4,        'l'      : 8,           'L' : 8,
           'f'       : 4   ,         'd'   : 8}
dictionary py::bio::tyToCh
Initial value:
{ N.int8    : 'b'    ,  N.uint8   : 'B',  N.int16      : 'h',    N.uint16 : 'H',
           N.int32   : 'i'    ,  N.uint32  : 'I',  N.int64      : 'l',    N.uint64 : 'L',
           N.float32 : 'f'    ,  N.float64 : 'd'}
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Defines
Generated on Sun Sep 11 09:40:46 2011 for NPSML by  doxygen 1.6.3