Persistency and APIs
Updated: 2004-02-11
Created: 2004-01-24
This document is an incomplete draft.
There are at least two completely different and often incompatible
goals in selling a library:
- Sell the library as such to some people with with considerable
software experience who regard it as a component and who need it
only as one toolkit in an application they have already designed
and are developing using their own favoured toolkits.
- In this case the maximum chance of selling derives from the
intrinsic quality of the library and whether and how well it fits
or can be made to fit with whatever other toolkits the customer
has already chosen.
- Sell the library to some people who need to use it as the
core of an application that they are writing from scratch
without great software skills.
- Better chances of selling come from offering a ready made
selection of integrated toolkits that come with the library and
the client can easily customize in some high level way.
The two goals are pretty often incompatible because faciliting
integration into an arbitrary collection of toolkits is not needed
and is quite different from facilitating easy customization of a
well chosen, static collection of toolkits.
However in general at least some of the tools used for either
situation are common.
TODO: Mention hidden state problem, initial load, final store,
partial/full persistence, call limitations wrt speed and data
types.
TODO: mention debuggers, interpreters, domains
Dealing with persistency and API integration requires two very
high level and rarely used concepts which come with many
different names:
- metadata
- Metadata here is data that describes the properties rather than
the structure of other data. For example, the list of fields
in a record, or the list of parameters and the body of a
function.
- functionals
- These are second-order functions, that is functions on
metadata, like a function, rather than on data.
- reflection
- The ability of a program to operate on itself.
These two concepts are essential for both persistency and
integration because:
- Persistency depends on the ability to take an arbitrary
piece of memory and a type, and to
reflect
on it to convert itsm content to some other format according
to its type.
- Integration depends on the ability to redefine the function
invocation functional (which in C/C++ is normally implicit,
but is there) for functions called from or by other languages,
so arguments lists etc. get converted.
Both persistence and integration depend on having metadata that
describes the types to persist or the functions to integrate, and
functionals that do the store/load of the data to persist or the
convert the call frame from one language ABI to another.
The important choices are on the details of the how and when,
not what.
TODO: Mention GCC extensions, SL/5
- How to generate the metadata, and in which format, and when
to use it.
- What kind of save/restore functional to write, and
where.
- What kind of call conversion functional to write, and
where.
In some languages it's easier than in others; for example in
Lisp since programs and data structures have exactly the same
representation, a program is in effect its own metadata.
In Java and Objective-C the compiler embeds in compiled code a
significant amount of data; in other languages there are builtin
primitives to reflect on function calls.
The main problem is that neither C nor C++ have any easy ways
to generate metadata or write general functionals. Some extended
versions do, but the extensions are as a rule not portable.
It is therefore in general very difficult to write general
purpose save/restore or call conversion functionals for C or
C++. This means that special purpose ones, and some degree of
flexibility has to be lost.
The loss of flexibility can involve several different
alternatives.
The big problem with metadata extraction is that to extract
truly accurate metadata one needs full parsing of the source,
with exactly the same processing done by the compiler.
Ideally therefore this would be done by the compiler, but if
the compiler does not do it, and can't be modified, that's
just not an option.
Using any other tool will to some extent produce inaccurate
metadata; the issue is how often and how inaccurate.
- The metadata is generated by a separate tool
- The metadata describing a program's data structures or
functions can be generated by another tool than the compiler.
This can be a preprocessor or a postprocessor, for example:
- A tool that scans the debugging information generated by
the compiler, as a source oriented debugger is a fully
reflective programs that needs extensive metadata.
After all the compiler usually generates fairly
complete and accurate metadata in the form of debugger
information, and this may be backprocessed into source
form.
-
A version of GCC that
converts the program into a tree represented in XML
(special thanks to Marek for pointing it out).
The problem with this apporach is that it will
generate metadata that is accurate only with regards to
a binary compiled by GCC, and on some platforms that
just is not a viable option.
- A header file scanner that extracts function
declarations (e.g. proto and
unproto).
- The metadata is generated manually
-
This requires writing by hand description of the data
structures and functions in an API. This is often done before
the fact, for example for RPC oriented programs.
There are several API description languages,
for example related to
ILU,
SWIG,
or
DCOM.
- The metadata is generated in part manually in part
by a preprocessor.
- This usually involves adding some manual tags to the
definitions or declarations of types and functions. These tags
are either used by a special purpose preprocessor or by a