Project Home

Description of Multiple Projects Workflow

Author: Martin Blais
Contact: blais@furius.ca
LastUpdate:$Date$

Abstract

Description of a very simple but efficient system for source files and website organization for working with many projects at the same time, with a workflow that includes multiple sites, files under version control and data files.

Table of Contents

Introduction

This document is a description of the system that I have been using for dealing with work on many many projects at the same time, managing all the source files, easy publish on a web server, without going crazy. This system is composed of very simple ideas, but the decisions involved in coming up with the choices that I made were many and carefully considered (and unfortunately not all documented in this document--they are too many and over time I forgot some of the original reasons) and it has proved over time to make my varied work manageable and has helped maintain my sanity in trying to keep all files and details under control when working at multiple sites.

In extreme programming, it has been said that simple things are often more difficult to accomplish than complicated tasks. This is an attempt at finding the simplest design for efficiently working on lots of projects at the same time by someone who does.

Requirements

The scope and needs of the system here considered are the following: the need to...

General System Description

Location of Projects

After much experimentation [1], we have decided that all project directories will lie in a single directory, typically $HOME/p, which will contain only directories of the types mentioned below. That location is the heart of the system. We use the following environment variable to specify it:

PROJECTS=$HOME/p
export PROJECTS

The scripts are made so that that variable can be specified, the user can thus maintain many of these (typically one for the source code at each site, and one for the web server).

This implies that the name of each project directory must be unique amongst the set of all source code, data directory and generated data projects (see below).

[1]We used to have a separate root directory for source-files directories and data-files directories (i.e. projects-src would be in $HOME/p and projects-data would be in $HOME/data). This makes it more complicated because if we need to refer to the data from the source code we need to use the basename of the source directory-- not a big deal, but it's that much nicer without having to do that.

Types of Files

Each project directory can consist in one of a few types of files:

  1. SOURCE FILES, written by the user/developer. These files need be under version control.

    We detect that a directory contains source files by the presence of a <project>/CVS subdirectory [2];

  2. DATA FILES, used by the source code or web publishing system. These files should not be under version control, but it should be easy to share them between sites and to make sure that new additions are backed up automatically on a centralized server. The reason for this is that often the data files do not change often and are often too big to be managed efficiently by version control systems.

    We detect that a directory contains data files to be backed up to a remote server by the presence of a DATADIR file in the root of the project directory. The contents of the DATADIR file determine the backup policy: the first line of the file is interpreted as the location for the remote rsync'ed directory to back it up to, e.g. for project <pro>, using DATADIR contents:

       user@host.domain.com:/u/blais/dataroot
    
    It will be mirrored at::
    
       user@host.domain.com:/u/blais/dataroot/<pro>/
    

    Note: a DATADIR file is not allowed to be empty.

  3. GENERATED FILES, files that are generated either with the source code or through some other process. Those files can be lost or deleted without much worry. (However, since the configuration of the web server is unified with the projects source for simplicity, it has been decided that the generated files lie next to the source code for simplicity's sake.)

    Generated files are detected by the absence of a CVS or DATADIR file in the root of the project directory;

[2]Right now we use CVS, but nothing prevents us from eventually moving to another revision control system such as tla or subversion or perforce.

The separation of source and data files is important, because stray files lying in the source directories that are not meant to be under version control make it a royal pain in the neck to manage code. If the file is necessary but does not require revision control, we don't hack, we acknowledge it and place it in an appropriately named directory (e.g. <project>-data). It makes it a lot easier to know what's going on when you get lost. The basic message is, always place the files immediately where they should be if you were to leave for a 3 month holiday, don't leave stuff lying around.

Naming Convention

Each project name must be short, and unique amongst the set of all source code, data directory and generated data projects. For consistency (and ease of typing), the names may not contain underscores, but dashes to separate words (whatever you choose, be consistent, it makes it easier on the fingers).

When a data directory relates to a project, we name it <project>-data.

When a source code project's purpose is generating a "generated project", we name it <project>-src and generated project simply <project>. This is so that when we put the generated project under a web server (typical use), it can simply be referred to as URL/.../<project>. This is very convenient. So we have the following arrangement:

$PROJECTS/<project1>-src           : source code directory
$PROJECTS/<project1>-data          : data directory associated to project1
$PROJECTS/<project1>               : generated directory

$PROJECTS/<project2>               : (could be a) source code directory

$PROJECTS/<project3>               : (could be a) source code directory
$PROJECTS/<project3>-data          : data directory associated to project3

$PROJECTS/<project4>               : (could be a) pure data directory not
                                     associated to any project

Example

For example, at the time of writing this, my projects directory looked like this:

elbow:~/p$ ls -l
total 136
drwxr-xr-x   10 blais    users        4096 Dec 29 02:50 adventures
drwxr-xr-x    4 blais    users        4096 Dec  9 14:07 backup-index
drwxr-xr-x   33 blais    users        4096 Jan  1 19:08 camera
drwxr-xr-x    9 blais    users        4096 Dec 29 02:51 camera-tools
drwxr-xr-x    7 blais    users        4096 Dec 26 21:17 company
drwxr-xr-x    6 blais    users        4096 Jan  2 17:02 conf
drwxr-xr-x    7 blais    users        4096 Dec 29 02:50 contract-tools
drwxr-xr-x    8 blais    users        4096 Jan  1 17:24 curator
drwxr-xr-x    6 blais    users        4096 Jan  1 15:26 drumming-notes
drwxr-xr-x   10 blais    users        4096 Dec 24 17:43 extra
drwxr-xr-x    8 blais    users        4096 Dec 24 17:43 hiertemp
drwxr-xr-x    3 blais    users        4096 Jan  3 14:14 impdoc
drwxr-xr-x   11 blais    users        4096 Dec 29 02:56 jukebox
drwxr-xr-x    4 blais    users        4096 Dec 26 21:22 jukebox-data
drwxr-xr-x    5 blais    users        4096 Jan  1 15:29 languages
drwxr-xr-x    9 blais    users        4096 Sep 29 22:07 latindance
drwxr-xr-x    5 blais    users        4096 Dec 29 02:51 mailsoup
drwxr-xr-x    5 blais    users        4096 Dec 26 23:08 memcard
drwxr-xr-x    3 blais    users        4096 Dec 24 17:43 optparse-completion
drwxr-xr-x    6 blais    users        4096 Dec 29 02:50 pydeps
drwxr-xr-x    5 blais    users        4096 Dec 24 17:43 reviews
drwxr-xr-x    4 blais    users        4096 Jan  1 17:24 rhythm-latin
drwxr-xr-x    6 blais    users        4096 Dec 29 02:51 salsa
drwxr-xr-x    4 blais    users        4096 Dec 26 23:27 santeriadb-data
drwxr-xr-x    7 blais    users        4096 Jan  1 18:26 santeriadb-src
drwxr-xr-x    3 blais    users        4096 Oct 28 00:18 solus-config
drwxr-xr-x    4 blais    users        4096 Jan  1 18:23 techdoc
drwxr-xr-x    9 blais    users        4096 Jan  1 15:40 tengis
drwxr-xr-x    4 blais    users        4096 Jan  2 17:21 web-diro
drwxr-xr-x    5 blais    users        4096 Jan  3 14:23 web-furius
drwxr-xr-x    4 blais    users        4096 Dec 29 02:51 x2vnc
drwxr-xr-x   11 blais    users        4096 Dec 31 19:02 xxdiff

Central Repository

We have a remote computer where all work changes are saved to at the end of every work period (i.e. every day or half-day typically). The source code projects are managed using cvs. The data directories are duplicated using rsync.

The location of the repositories is specified in the following configuration options (in a .projectsrc configuration file, read by a custom script):

[options]
source_root = "user@cvs.server.com:/path/to/cvsroot"
data_root = "rsync.server.com:/path/to/dataroot"

There are a few more options supported by my script, but they are not relevant to this document.

Management Script

I wrote a script to manage the update, change, commit process over this structure. It finds the projects and their types and can apply commits and updates globally, and more.

The following example command output lists the checked out projects: project name, type (source: source files under CVS, data: data files managed with rsync, or other: just an unmanaged directory, e.g. HTML generated by a project), if it's backed up to central repository, and full path of directory:

elbow:~$ projects lsco
adventures             [ source | backup ]: /home/blais/p/adventures
backup-index           [ other  |        ]: /home/blais/p/backup-index
books                  [ other  |        ]: /home/blais/p/books
camera                 [ other  |        ]: /home/blais/p/camera
camera-tools           [ source | backup ]: /home/blais/p/camera-tools
company                [ data   | backup ]: /home/blais/p/company
conf                   [ source | backup ]: /home/blais/p/conf
contract-tools         [ source | backup ]: /home/blais/p/contract-tools
curator                [ source | backup ]: /home/blais/p/curator
drumming-notes         [ source | backup ]: /home/blais/p/drumming-notes
extra                  [ source | backup ]: /home/blais/p/extra
hiertemp               [ source | backup ]: /home/blais/p/hiertemp
impdoc                 [ source | backup ]: /home/blais/p/impdoc
jukebox                [ source | backup ]: /home/blais/p/jukebox
jukebox-data           [ data   | backup ]: /home/blais/p/jukebox-data
languages              [ source | backup ]: /home/blais/p/languages
latindance             [ source | backup ]: /home/blais/p/latindance
mailsoup               [ source | backup ]: /home/blais/p/mailsoup
memcard                [ source | backup ]: /home/blais/p/memcard
optparse-completion    [ source | backup ]: /home/blais/p/optparse-completion
projects               [ source | backup ]: /home/blais/p/projects
pydeps                 [ source | backup ]: /home/blais/p/pydeps
reviews                [ source | backup ]: /home/blais/p/reviews
rhythm-latin           [ source | backup ]: /home/blais/p/rhythm-latin
salsa                  [ source | backup ]: /home/blais/p/salsa
santeriadb-data        [ data   | backup ]: /home/blais/p/santeriadb-data
santeriadb-src         [ source | backup ]: /home/blais/p/santeriadb-src
techdoc                [ source | backup ]: /home/blais/p/techdoc
tengis                 [ source | backup ]: /home/blais/p/tengis
web-diro               [ source | backup ]: /home/blais/p/web-diro
web-furius             [ source | backup ]: /home/blais/p/web-furius
x2vnc-modifs           [ source | backup ]: /home/blais/p/x2vnc-modifs
xxdiff                 [ source | backup ]: /home/blais/p/xxdiff

Setting Permission on the CVS Server

Permissions are not set automatically for CVS. Since we add files regularly, it becomes annoying to have to set permission by hand on the server. In fact, this often only becomes a visible problem at the moment of publishing the files on a web server. We would like to process to be automated. To that effect, we wrote a script that sets the permissions from within the cvsroot on the server. The script is called cvsroot-perms and is invoked from this configuration in CVSROOT/loginfo:

^CVSROOT (echo ""; id; echo %s; date; cat) >> $CVSROOT/CVSROOT/commitlog

DEFAULT (echo "================================================="; date; cat) \
        >> $CVSROOT/CVSROOT/commitlog 2>&1 ; \
        ( /u/blais/p/conf-local/plat/i686-pc-linux/bin/python \
          /u/blais/p/conf/common/bin/cvsroot-perms --logbox \
          $CVSROOT/%{} 2>&1 ) | tee -a $CVSROOT/CVSROOT/commitlog

Publishing on a Web Server

One important consideration for this system's design was to be able to easily share specific projects over a web server, as easily as possible, with as least maintenance as possible.

We simply create a directory to contain all of the web server's contents and use that as a projects root. The web server is configured so that requests for simple documents in the root (i.e. without slashes) are redirected to /home/<doc>, so that a request for /index.html will be redirected to /home/index.html.

Also, requests to /home/... are aliased to one of the projects (typically one of the web-<something> projects). This allows me to have meaningful names for the various web projects without having to have many "home" projects.

All links on the websites and projects then refer to /<project>/<path>/<to>/<file>. It's very simple. Projects names are short and do not change often, i.e. they are not a long public "title" that represents the project, but rather a short identifier/directory name that I use internally to identify that project (and in the URL). I can decide to checkout (or copy) any subset of projects on any web server that I manage.

Also, the script I wrote to manage the projects in my personal work space has an option to specify the root of the projects directory, which makes it trivial to update websites.

For example, here is my configuration for my company's website:

#
# Apache config for Furius website layout.
#

# projects directory. everything lies here, it's simpler to manage.
DocumentRoot /home/httpd/public

# serve "ROOT/web-furius" as "DOMAIN/home".
Alias /home /home/httpd/public/web-furius

# make sure that ROOT/web-furius cannot be accessed as "DOMAIN/web-furius".
Redirect /web-furius http://www.furius.ca/home

# make "DOMAIN/error" appear as available from root.
Alias /error /home/httpd/public/web-furius/error

# setup specific error URLs.
ErrorDocument 404 /error/error404.html

# redirect requests for domain ("/") to "DOMAIN/home".
RedirectMatch ^/$ http://www.furius.ca/home

# redirect requests for "DOMAIN/file.{css,html,png}" to "DOMAIN/home" files.
# .txt is important for robots.txt redirection.
RedirectMatch ^/([^/]*)\.(css|html|png|txt)$ http://www.furius.ca/home/$1.$2

Per-Project Configuration

With this setup, my configuration files are simply one of the projects that I checkout in my projects directory root. An advantage of a regular organization of projects is that I can easily distribute the initialization of the environment (shell variables) and shell declarations (functions, aliases, etc.) into each of the projects. My configuration files initialize using each project's .../etc/env and .../etc/bashrc automatically if the project is checked out to the projects directory root. This significantly reduces pollution of my shell's environment (something that quickly goes out of control).

Internal Project Organization

While this section is out of the scope of the UNIX configuration issue, I document it here because with the longer fingers that the Python language provides me, many of the projects that I fiddle with daily are simply outgrowths of the scripts that used to be in my conf/share/bin configuration scripts directory.

For simplicity's sake, I want to have a certain consistency in the directory organization of my projects. In the following we're assuming that the web page for the project is also the project itself (although for some projects the web page is a separate CVS module/project by itself). These are checked out under my $PROJECTS directory:

.../
Root of the project.
.../README
General description file for the project, sometimes used as the source for generating index.html file where it makes sense (see .../.docutils file below).
.../CHANGES
Change log, contains change history of a project, latest at the top.
.../TODO
List of tasks to be done, and ideas for future development. The important or urgent section of this file ends at the .. end marker.
.../VERSION

Version number of package, if there is one. It is useful to have a single location for the version number because it makes releasing software often much easier. Also, to have a well-defined location of the version file allows us to write a release script that works with all projects.

(Note: I only use version numbers/tags for software that is publicly released, e.g. via sourceforge or freshmeat, or that has a downloadable "package" cut for it, otherwise having to choose/update version numbers just gets in the way.)

.../PKGINFO

A list of one-line resources for the package, where appropriate. All software projects, at least all software projects that are published should have a RESOURCES file, so that we can identify which external dependencies are being used by the project.

This is a simple file in RFC822 format the following kinds of fields:

Official-Homepage: Mirror-Homepage: Announcement: Mailing-List: Source-Code-Repository: Bugs-Reporting: Download-Releases: Download-Snapshots: ChangeLog: License: Author: Author-Email: Short-Description:

Each of the fields can be repeated where is relevant.

.../etc/docutilsrc
I use the docutils text format almost everywhere now, it is very useful to write simple HTML documents with a text editor. Since it is pervasive in my system, I wrote a script (similar to docutils-html) that uses my own settings and that looks for a style.css file up the path of the output file. The .docutils file in a project's root indicates the needed .txt to .html conversions (I check the generate HTML files in the repository, but sometimes I might need to regenerate all the docutils for a reason.)
.../COPYING
GNU GPL license file, if the project is to be distributed under that license.
.../index.html (.../index.txt)
The Home Page, for web publication, generated in HTML with docutils. Sometimes it is generated automatically from the README file, sometimes by a different index.txt file.
.../Makefile
A makefile, if there are oft-used and convenient operations to be run for that project (e.g. building a web-site).
.../doc
User and Design Documentation for the project. Sometimes the web page from the root links to documents in this space.
.../bin
Scripts (and executables) to run.
.../lib
Libraries (source and compiled) to run.
.../lib/python/<libname>/...
When I have Python libraries they go under a python subdir.
.../lib/emacs/...
When I have emacs-lisp files they go under an emacs subdir.
.../etc (.../etc/env, .../etc/bashrc)
Configuration files and sample configuration for the project. The main configuration files (in my conf project) looks at all project directories that are checked out and if those files exist, they are automatically run (for each project). Thus my environment and bash setup varies slightly depending on which of the projects I have setup. This also allows me to put project-related environment settings close to the project files, rather than in a large single .bashrc file.
.../src
Source code for code to be compiled.
.../adm
Development scripts to be run for distribution and maintenance.
.../share
Data that pertains to the project. This can contain any data. This is where DTD and XSL files should go as well.
.../test
Whatever test code and data.
.../tools
Related tools, scripts that are of interest to users but that should not be installed as a default.
.../misc
Miscellaneous other stuff.
.../old
Old and obsolete files.

Typical Initialization

Typically, the .../etc/env file will contain something akin to this:

USERPATH=$USERPATH:$PROJDIR/bin
PYTHONPATH=$PYTHONPATH:$PROJDIR/lib/python

to provide the scripts to the PATH of the user and the Python modules. Other environment variables get set here. This is just an example.

Sharing Libraries

Our code projects contain multiple libraries, e.g. Python or Perl modules. We need to strike a balance between

  1. having a single location for each library (the Single Point of Truth principle);
  2. the ability to run and release programs without necessary having to release or checkout dependent libraries. This means that a project is self-contained, including its libraries, unless we choose to accept a particular dependency. This also makes it much easier to package a project because all the required files are contained in it;
  3. in the presence of multiple libraries, having a clear simple rule to determine which of the multiple libraries is being used when running program (there is nothing more annoying than testing a bug fix when you're editing the wrong library file);
  4. if a dependency is deemed "optional", those libraries should be loadable in the presence AND absence of the master copy;

The gist of this is that while we really want to minimize dependencies between our project, we would also like to have a single master source copy for any particular library (to avoid having to merge), but want to allow fallback copies of the master so that a project still works in those cases where the master is not present.

Consequently, we have chosen to take the following approach for lightweight libraries/modules (this solves requirement (1) above):

  • libraries that are not a defining part of a released project will lie in the conf project (e.g. conf/common/lib/python/...), which already acts as the "base" set of files that the user's configuration relies upon. Their master copy is the one in conf;
  • libraries that are a defining part of a released project will have its master copy in that project (e.g. optcomplete/lib/python/optcomplete.py).

We will thus separate the library path and directories between a master part, and a fallback part. For the programs that require this, the configuration will use the variables <libpath> and <libpath>_FALLBACK to set the final <libpath> for the program and we adopt the convention that fallback libraries that would go under .../lib/<program> in the master copy will go under .../lib/<program>-fallback in the fallback location.

For example, for Python we would have:

PYTHONPATH=$PYTHONPATH:$PROJDIR/lib/python
PYTHONPATH_FALLBACK=$PYTHONPATH_FALLBACK:$PROJDIR/lib/python-fallback

...and later on, at the end of the environment code, the final value of the PYTHONPATH is set, to ensure proper ordering (i.e. to make sure that the master copy is caught first in the path):

PYTHONPATH=${PYTHONPATH}:${PYTHONPATH_FALLBACK}