Software Engineering Day Notes

$G Software Engineering Day Notes


Mantras for Developing and Using Visualization Software

Here are some mantras for developing software tools.  They are
motivated by longevity, reproducibility, and the ability for the group
to develop tools together and leverage off of each other's work.


 *  use $G for all code

$G provides you with source control, Makefile support, portability
across platforms (including windows and linux), exportability, and a
framework that facilitates collaborative software development.


 *  nothing goes in your own directories

Imagine you are trying to figure out something that someone before you
had already figured out.  They worked on it in their home directory or
in /map/gfx0/common/users/.  Now that they've graduated, those
directories are gone.  Bummer!

Think about the future.  Anything that you work on that might be
useful in the future should be in an appropriate place on the file
server.  For example:

	code in $G/src/
	scientific datasets in $G/data/
	research results in /map/gfx0/common//
	talks in /pro/graphics/talks///
	papers in /pro/graphics/papers///
	proposals in /pro/graphics/proposals///

And, if it's not useful for the future, then why are you working on
it?  Everything in its place, nothing in /home or
/map/gfx0/common/users!


 *  use the carriage-return principle

The carriage return principle states that anything that can be done
with a command-line interface should be.  This is important for
reproducibility and scriptability.

First, consider reproducibility.  It is tempting to interactively
tweak results while busting for a deadline.  But quantifying what's
been done, refining the results later, or replicating them for another
dataset or example is much more difficult when they've been set
interactively.

Scriptability is akin to reproducibility.  If someone wants to do
something to hundreds of examples they'll be happier if they can write
a script to do it than if they have to do something interactive for
each instance.  When we can build on earlier results, we can take
bigger conceptual steps.  The carriage-return principle helps!


 *  use Makefiles and symlinks for leaving an e-trail for your results

When I write a paper, I try to leave a trail specifying how the
results were generated.  Each figure that I include in my latex has an
entry in the Makefile in that directory.  Sometimes that entry creates
a tweaked version of a local image or figure.  Sometimes it creates a
tweaked version, symlink, or copy of an image or figure from
elsewhere.  All of those pre-results are also built using commands in
a Makefile where they are located.

The implication here is that I can always go back and figure out how
results were generated.  Some things can't be scripted like this, but
in that case I put comments in the Makefile saying how things were
generated.

Note how this implies the same e-trail mantra any time you generate
results.  If you make a habit of it, you'll be able to reconstruct
what you did, as will others.  It is a little more work as you go, but
not as much as you might think.


 *  write big libraries and small "main"s

In general, keep the non-library source for your programs small as you
can.  Functionality that isn't in libraries can't be shared, so try to
create libraries that encapsulate your main functionality.  Then write
a small main that uses the library.


 *  read Writing Solid Code (WSC)

This book is full of excellent suggestions about making your software
better.  The next couple of mantras are drawn from WSC.

 +  use automatic run-time error checking (WSC)

On Solaris, dbx has an excellent "check -all" command that finds lots
of bugs.  It looks for out-of-bounds memory references, including
accesses to memory that has been freed.  It also flags data that is
used before being set.

On Linux, gdb does not do this, but there is a somewhat quirky command
"valgrind" that performs many of the same checks.  It is tricky to get
working, particularly for graphics applications.  The 2003 CS190 web
page has some instructions.  Under linux "electric fence" will also
find some memory problems.

	see also http://www.linuxgazette.com/book/view/8755


 +  check assumptions with assertions (WSC)

When you make an assumption within your software, check them with the
"assert" macro.  These macros typically produce no code when compiled
optimized, so there's no run-time cost.  They often find subtle bugs.


 +  keep interfaces and implementations simple (WSC)

In general, avoid generalization.  A real-valued square-root routine
should fail if passed a negative number.  Don't do your caller a
"favor" by hiding erroroneous calls.  They probably are bugs!


 *  write simple component-level tests for all your code

For each library you build, please build some simple test programs or
scripts that link against the library in the source directory, run
with no arguments, and return non-zero if they don't work.  Often, a
script can run a test executable, comparing output with output that
was produced by an earlier successful run with diff, and return the
output and return value of the diff.  The library "libnum+" has some
excellent examples of test cases to use as models.

	$G/src/libnum+/runtests
	$G/src/libnum+/t{mat,pode,qnewton,linsys}.C


  *  create and use integration tests

Some things just can't be completely tested inside a single source
directory with component tests.  The more others come to depend on
your library, the more this will be true.  Test it with those other
components by installing a beta version and testing it before you
install the production version.

Here's the process: first test it with component tests; second,
install it as $G/lib/libmylib-beta.a (note the beta); third, test it
and make sure that it is stable; fourth, install the production
version.  Sometimes you can do this in 10 minutes.  Sometimes you
should test the beta version for a few weeks before replacing the
production one.  It's great to move fast, but don't go so fast that
others pay more than you are gaining.


 *  use the args package

The args package defines, in a compile-time initialized structure, how
a main program should be used.  It processes argc and argv to capture
the argument values and also generates a usage message automatically.
Any program that uses the args package can be invoked with a single
"--" argument to produce the usage message.

If you are writing a script or other non-c/c++ program, a usage
message that shows up with a "--" argument is a good idea.

The test program tqnewton in libnum+ is a good simple example.  This
package does a lot that isn't illustrated in that example; see the
man page for all the details.


 *  install demo scripts, make them default to an interesting demo

When you have a demo working, install it in $G/bin so others can use
it.  With no arguments, it should start up with something reasonable
displayed.  For example, you might have a program called bat-vis.  You
might install $G/bin/bat-vis as well as a script $G/bin/bat-vis-demo.
The demo script should automatically load an interesting dataset.
Basically, make it so that someone else who knows nothing about your
program can at least start it up and see something interesting.  If
you're working on a cave program, it's a good idea to have both
bat-vis-demo-desktop and bat-vis-demo-cave installed.  That way, there
is a fallback option if Cave hardware is down.  A README file
describing how to work your your demo should go in $G/doc.  Man pages
if you have them should go in $G/man.


  * make your .h files work for any architecture, use #ifdefs in .cpp
    files

The $G Make.gfxtools.rules makefile defines these depending on the
architecture you are running on:
WIN32, LINUX, SGI, SOL   If you need to do architecture specific stuff,
use #ifdefs

#ifdef WIN32
  // windows specific code
#else
  // unix style code
#endif