FHiTL - Digital Microfilm

This project is a series of prototypes to demonstrate the capabilities of a digital microfilm library that can simplify the index extraction process and provide access to the microfilm library from home.

Tasks to be done
  1. Prepare a directory of GIF images scanned from various sample microfilms - Preliminary images from the 1910 Utah Census.
  2. Develop a multiresolution image download protocol that is accessable from Java Applets.
      • A full resolution microfilm frame is too large to download all at once over normal phone modems. In most cases the entire full resolution frame is not desired. Either the user is skimming the frame (which does not require high resolution) or reading a small piece of the frame (which is much smaller and can support high resolution)
      • The mechanism must open an image via a network connection and then allow successive requests for various pieces at various resolutions.
    1. Define the Java interface to the service
    2. Implement a local file version of the interface that works directly off of a full resolution image loaded into memory
    3. Implement a local file version that uses hand segmented multiple images to simulate the multiresolution aspect
    4. Design a full multiresolution image format
    5. Implement a local multiresolution access facility
    6. Implement a network-based multiresolution access facility.
  3. Develop a web-based indexing and retrieval architecture for searching and delivering microfilm frames.
    1. Develop a multiresolution image browser applet
      • Use multiresolution protocol to access image frames and then scroll and zoom around them in an efficient way.
    2. Develop the entry indexing editor as an applet
      • This is an extension of the multiresolution image browser
      • Around each surname, given name, date and location the user can rubberband a rectangle
      • Whenever a rectangle is created, a dialog will pop up for the user to select an entry type and then fill in the index text for that type
      • For each such image xxx.GIF the segmented rectangle information is to be written out to a xxx.idx file which is in a special XML format.
      • The applet must support an overview that shows all of the areas of the images where index rectangles have been defined
    3. Develop the index verification editor as an applet
      • This is an extension of the entry indexing editor. Its purpose is to provide double entry of indexing data to verify its correctness.
      • This will load a multires image and its corresponding xxx.idx file.
      • The user will perform indexing operations just as in the indexing editor, except that when an index is entered, its rectangle is compared with those in the existing index. If there is significant overlap with an existing rectangle, then the indexing information is compared. If there is a difference, a correction dialog is raise to allow the user to verify which should be correct. If there is no overlapping rectangle, this also is highlighted and a second entry is required.
      • Before closing the image and marking the index as verified, the original index is checked for any rectangles that were not verified. If there are any, they are highlighted so that they can be deleted as mistakes or verified.
    4. Using the entry indexing editor, prepare segmentations for 20 images representative of the variety of kinds of records on microfilm.
    5. Develop the entry indexing web page and CGI server.
      • Web server capability
        • recieve a request from some authorized user on the WWW
          • Select an image and its xxx.idx file. Send down an indexing web page for that pair.
        • receive an index result from the indexing web page and update the corresponding xxx.idx file with the index information
      • Web page capability
        • Page for logging on to the extraction process and getting authorization to index
        • Page for requesting an indexing assignment
        • Page with the entry indexing editor applet or the index verification applet depending on the indexing assignment.
    6. Develop a master index generation program
      • Traverse a directory hierarchy
      • For every xxx.idx file found in the hierarchy
        • take each entry out of the file that has indexing information provided and add the index information and the image file reference into the master index
    7. Develop an index access web service
      • Web server capability
        • receive a search request from a web page
        • Search the master index for frame images that match the search
        • If the list is huge, generate a catagory outline of the matching frames. Each entry in the outline represents a refinement of the search to include fewer frames
        • If the list is manageable, load the xxx.idx files for each frame and generate a score or other descriptor for each frame. Present this sorted list as a web-page.
        • Accept requests for individual frames and download and indexed image browser applet.
      • Web page capability
        • Page for generating search requests
        • Category outline web page for selecting frames or narrowing the search
        • Frame listing page for selecting frames
        • index browser page
      • Index browser applet
        • This is a varient of the multiresolution image browser.
        • An overview of the image is presented with all entries matching the search criteria highlighted.
        • The user can then multires browse the image looking for information.
  4. Present a cool demo of all of the above.