py::bio Namespace Reference

Functions
def	find
def	sfx
def	parseSfx
def	mkSfx
def	mapP
def	resize
def	mapW
def	save
def	load
def	Str
def	strSet
def	ldRep
def	isNA
def	cvt
Variables
	E = os.environ.get
list	tys = ['b','B','h','H','i','I','l','L','f','d']
dictionary	chToTy
dictionary	tySz
dictionary	tyToCh
dictionary	NA
tuple	path = E("BIO_PATH","./")

Detailed Description

bio.py - Binary Input/Ouput Library
Author: Andrew Schein

BIO -- Library for storing vectors and matrices of basic C type data
in self-describing binary files.

Briefly, this module helps solve 3 problems:

1. How to store binary data of basic C types on disk in a tool-neutral
   format.
2. How to represent missing values.
3. How to access large binary data without incredably large executable
   start up costs or heavy- handed software architecture. (e.g. lazy
   loading of disk pages)

In greater detail...

This module provides a python numpy/ndarray interface to what is
ultimately a C standard (on-going work) for representing binary data
of basic C types on disk.  Type descriptors are given by the BIO
suffix.  File size (combined with the size of the underlying C type)
is used to infer row span.  For matrices, the BIO suffix provides
column span information.  Eventually, the library will provide
facility for cubes and other n-dimensional structures consisting of
basic C numeric types.  Character strings are encoded via bytes (as in
C).

BIO files can be memory-mapped using numpy's memory map interface,
providing convenient loading of data as it is actually used.  The BIO
library provides mapP (memory map 'private'--disk copy is unaltered)
and mapW (memory map 'writable'--changes are stored on disk) functions
for this purpose, in addition to a more conventional save routine for
storing unmapped numpy ndarray's to the file system.

A desired property of a database is the represention of missing
values, e.g.  NULL values in a SQL database.  Python's numpy has no
such facility or standard, and so BIO establishes a convention for
each type.  The missing code is called NA ('not applicable').  For
floats and doubles, NAN values will suffice. For signed integer types,
BIO establishes the largest magnitude negative number as the NA code.
For unsigned integer types, BIO establishes the largest magnitude
number as the NA code. Note that standard numpy type conversions
(.asType('')) will not convert NA codes properly, and so BIO provides
NA-aware ndarray conversions.

Function Documentation

def py::bio::cvt	(	b,
		newTy
	)

INPUT a ndarray b, type specifier newTy
    POST convert b to newTy while preserving NA values

def py::bio::find	(	p,
		wrn = `True`
	)

INPUT: (potentially partial) directory path p, warning toggle wrn.
    POST:  look up filename p as absolute path (if it has BIO suffix) or else in BIO_PATH.

def py::bio::isNA ( b )

INPUT: a ndarray
    POST returns byte ndarray describing NA structure of b

def py::bio::ldRep ( f )

def py::bio::load ( f )

INPUT: file name f
    POST:  load BIO array f into matrix and return matrix

def py::bio::mapP ( p )

INPUT: path to file.
   POST: if p contains a BIO suffix, map privately p.  Otherwise, use find to locate the file
   and mapp that one.  Raises an error if file can't be found.

def py::bio::mapW	(	p,
		rows,
		replace = `True`,
		fill = `True`,
		default = `NA`
	)

INPUT:
    p:       absolute path to file for storage including suffix (used to infer size).
    rows:    the number of rows in the file.
    cols:    the column structure as a list.  Currenly only 1/2 dimensions are supported.
    replace: do we eliminate data in current file (if present).
    fill:    do we fill newly allocated space with the default value.
    default: value to fill matrix when fill is set.  Defaults to type-specific NA value
    POST:    create space on disk for ndarray, mmap and returns ndarray.
    Status:  first draft does not support resizing or re-using files. To be added.

def py::bio::mkSfx	(	shape,
		typCh
	)

INPUT: numpy ndarray b with >= 1 dimensions.
    POST: construct BIO suffix.  Raise error if ndarray type is not in BIO set.

def py::bio::parseSfx ( sfx )

INPUT: BIO suffix sfx, e.g. .B1000d
POST: parses sfx into column span and type information

def py::bio::resize	(	p,
		rows,
		default = `NA`
	)

resizing code is activated by mapW

def py::bio::save	(	f,
		b
	)

INPUT: file name prefix f, ndarray b
POST:  write b to f + BIO suffix.

def py::bio::sfx ( fname )

INPUT: fname -- a string.
   POST: sfx returns a tuple (fname_pre,suffix) which splits fname into everything before
   and after the suffix.

def py::bio::Str ( b )

INPUT: ndarray of bytes or ubytes representing strings.
    POST:  returns string

def py::bio::strSet	(	b,
		string,
		nullTerm = `True`
	)

INPUT: a byte/ubtye ndarray intended to receive string.
    POST:  will guarantee null termination by default. Will truncate string as necessary.

Variable Documentation

dictionary py::bio::chToTy

Initial value:

{ 'b'       : N.int8   ,    'B'   : N.uint8,  'h'      : N.int16,     'H' : N.uint16,
           'i'       : N.int32  ,    'I'   : N.uint32, 'l'      : N.int64,     'L' : N.uint64,
           'f'       : N.float32,    'd'   : N.float64}

py::bio::E = os.environ.get

dictionary py::bio::NA

Initial value:

{ N.int8    : -1 << 7 , N.uint8   : (1 << 8) , N.int16 : -1<<15, N.uint16  : 1 << 16,
           N.int32   : -1 << 31, N.uint32  : (1 << 32), N.int64 : -1<<63, N.uint64  : 1 << 64,
           N.float32 : N.nan   , N.float64 : N.nan}

tuple py::bio::path = E("BIO_PATH","./")

list py::bio::tys = ['b','B','h','H','i','I','l','L','f','d']

dictionary py::bio::tySz

Initial value:

{   'b'       : 1   ,         'B'   : 1,        'h'      : 2,           'H' : 2,
           'i'       : 4   ,         'I'   : 4,        'l'      : 8,           'L' : 8,
           'f'       : 4   ,         'd'   : 8}

dictionary py::bio::tyToCh

Initial value:

{ N.int8    : 'b'    ,  N.uint8   : 'B',  N.int16      : 'h',    N.uint16 : 'H',
           N.int32   : 'i'    ,  N.uint32  : 'I',  N.int64      : 'l',    N.uint64 : 'L',
           N.float32 : 'f'    ,  N.float64 : 'd'}

py::bio Namespace Reference

Functions

Variables

Detailed Description

Function Documentation

Variable Documentation