More details on Internet Archive’s Scribe Book Scanner Project
Friday February 10th 2006, 9:24 pm
Filed under: Digitization, Preservation

Canon EOS 1Ds Mark IIContrary to what I was led to believe in an email from the Director of Books at Internet Archive, the software for their book digitization project is indeed available to the public (under open licenses no less). I was surprised and excited to come across the Scribe project on Sourceforge today. The documention is nearly non-existant, the system looks like a mishmash of software tied together with glue and tape at the moment, but it is available… Now I just wish the Internet Archive would publicize that the software is available for other libraries to potentially use and improve.

Here are additional details that I’ve gleaned since my last post on the topic.

The “Scribe station”, as it is called, uses two Canon EOS 1Ds Mark II digital cameras which I presume makes up the majority of the cost of the station. The camera produces 16.7 megapixel photos for the low low price of $7100 each. One camera photographs the left page of the book, and the other photographs the right page. (Other systems I’ve seen are able to use one camera by the use of moving mirrors, or by moving the camera.)

In an interview, Brewster Kahle has said that they really didn’t want to build the system on their own, but they tried commercial solutions and weren’t satisfied with the results considering the cost. With their home grown station, he estimates that the cost is 10 cents per page. Not sure if that includes only the labor cost of someone flipping the pages, or if it includes the cost of the hardware and software. He also mentions in the interview that they’ve worked a little with Squid Labs in hopes of getting help on the development. You can hear an excerpt of the interview where he discusses the Scribe Station here (2.8 MB MP3). The entire interview is available here.

The station requires two Windows computers (the Canon software used to control the Cameras presumabley doesn’t support multiple cameras). The bulk of the system is written in Java, but PHP, cygwin, Imagemagick and a host of other dependencies are also used. While I don’t expect any other libraries to independently set up their own station using the IA’s work in the near future… if it continues to evolve, I can image 100’s or 1000’s of libraries digitizing books instead of just a handful that we have today. Suddenly the task of digitizing every public domain book wouldn’t be so duanting.


3 Comments so far
Leave a comment

you should be able to set up mobile digitization labs easily enough too…

Comment by hugh 02.27.06 @ 10:44 am

Sounds a lot like one of the systems I’m currently working with.

Comment by Ethan 05.10.06 @ 5:21 am

[...] books into the Internet Archive’s custom-built Scribe Station is a manual process. Although automated page-turning machines exist, Internet Archive has chosen to [...]

Pingback by A Glimpse into the Internet Archive’s Scanning and Print-on-Demand Operations (Disruptive Library Technology Jester) 03.20.08 @ 9:55 am



Leave a comment
Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>