The inputs to our system consist of synchronized multi-view video streams that are recorded in our
acquisition room (Fig. 1).
We record the person with a convergent arrangement of cameras (Fig. 2) around the center of the scene.
In each camera view, the silhouette of the person in the foregound is
computed using a background segmentation scheme based on per-pixel color statistics.
In the current setup, we use 8 cameras for recording.
Figure: 1:Screen shots of the acquisition room.
Figure: 2:Schematic view of convergent camera arrangement around the center of the scene.