Now includes April 20, 2000 updates, in brown
Here
are 15 images taken from a complete revolution in one of the sites (I forget which).
Here is a
panorama I built out of them. Check out
the interactive
panorama viewer. (You may need to download the plug-in.)
I would like to apply image-based
rendering techniques to this data (soon to be ours in the original film form,
from which we can obtain digital copies), to result in a three-dimensional
environment.
We don't have the physical film, in fact. What we have is a D2
digital tape of the film (just the Timbuktu and Dubrovnik reels), with frame
numbers. What we do when we want a specific frame is tell the film lab (which is
in San Francisco), and they do the conversion for us.
We then receive the film on DLT tape in Cineon format, which is a subset of DPX
format. This is a format maintained by SMPTE for the motion picture industry, and
contains much useful information in the header regarding color metrics, frame rates,
shutter angles, and so forth. Mostly what we are interested in is the 10 bits per pixel
of image information.
These files are probably going to have to be converted into something sane before I use
them (no way I need 10 bits of color, and it's logarithmic at that). I have located
the specifications and have begun writing code to do the conversion.
The film lab did a good job on the film-to-tape transfer but this Unix tape they gave us
is like a bad joke. Each of these files is tarred individually, with an absolute pathname,
and there are many duplicate filenames on different files. Most of the ones I've been
able to read claim to be from the right camera, but there are far more than the 72 there should be...
and others are corrupted. Here is a full-size frame with
lots of JPEG compression.
I wrote a lot of irritating tar scripts trying to make this work. The files are 50 meg each, too. This must
be what it felt like to program with punched cards.
In the meantime, I have been using the data from DV digital video tapes. Unfortunately
this introduces interlacing. Please read my discussion
on interlacing issues.
Now I'm using the images from D2 tape, which is higher quality and which I think doesn't interlace.
I have found a sequence I really want to use, of San Francisco -- in the foreground is a regulat
geometric tiling pattern, with a cool building in the midground with waterfalls on it. I think that
this footage would submit well to depth extraction and my fake transforms. The waterfalls would be tagged as
moving objects and therefore I would render them by cycling the pixel data, on top of the image objects
that result from the stationary scene.
By the way, the other people using this data are Anselmo Lastra and Voicu Popescu.
They are the ones working on extracting depth from the data. When it comes time for me
to create image-based objects, I will either use their results or an existing
depth-from-stereo library.
My code attempts to find the horizontal offset between these images by trying different values, taking the difference between the offset images, and minimizing over those results. Here is the difference with no offset:
And here is the difference from the best match found (also cropped):
It's an improvement, but not a close enough match to use as a mask for stationary/moving discrimination. I need to
What I hope to contribute on top of existing techniques is:
Conversion of an image file into another image file
is exactly the kind of thing this is good for, since the work can be split
up evenly.
I have explored the OpenMP parallelization library, written some code using
it, and successfully achieved increased performance by utilizing multiple
processors on evans.
(See above.)
It is my belief that since we know the extent of the transformation between
successive frames (a rotation by one minute of one degree), we can detect
depth-pixels that have moved by performing the rotation on the previous depth
image, and looking for differences. This depends on having depth for the image.
I think it can also be done without the depth information, because the rotation
is so small that it can be approximated by a translation of the projected pixels,
and because the angle subtended by a feature is more important than its precise
depth, since a small change in left-right or up-down has more of an effect in
screen space than does a small change in back-front.
(See my discussion of interlacing issues for
more about this.)
Again, these are mathematical processes that should achieve good speedup from
parallelization, as the individual pixels are largely independent.
Most of the features in these images appear to be nearly planar, and could actually
be well-represented by large textured polygons. (The sandy ground on the market floor, for example,
is unlikely to be looked at closely enough to require modeling of the footprints.)
This would certainly outperform the image-based objects
that we display from laser data (such as the reading room), which are essentially enormous polygon soups.
Perhaps more aggressive surface simplification is needed.
If Andrei State et al. get the DPLEX working so that different processors can pump geometry through different pipes, I can take advantage of this, since I am writing the application from the ground up rather than having to modify existing code which has problems such as using GLUT. I would find this satsifying personally as only toy applications run correctly in this manner currently.
We have some Timbuktu images showing on a seamlessly combined multi-projector setup, about 90 horizontal degrees in all. If I can get animated (if not 3D immersive) images shwoing on that it would be pretty cool.
Additional references:
Did you remember to read the page with all the angles and diagrams and stuff ?
To sum up, I still have to