[Opengenalliance] Fwd: details from digitisation project

Javier Ruiz javier at openrightsgroup.org
Tue Apr 19 16:44:33 BST 2011


Hi

Just to put a bit of background on Nick's comment. FreeBMD were
considering taking on one of the TNA digitisation projects, the school
records until 1914

http://www.archives.org.uk/si-acalg/news-and-events.html

We had some discussions before this list became active about what it
would entail, which I paste below (with amendments).

I have also started exploring possible heritage lottery funding as well.

It would be very helpful to get some feedback from Wikimedia past
experience with TNA, and whether others think it makes sense as an
initial OGA project, given that the GRO index seems stalled.

Javier

--

ben at links.org wrote

Very interesting. I wonder if we could estimate what it would cost for
FreeBMD to do this (presumably using free volunteer labour for the
actual scanning).

A rough crack at some figures:
https://spreadsheets.google.com/ccc?key=0Apg0-hk6r3v4dGo0SEtWNWZfN1pidWxOOEYzV1A1UUE&hl=en_GB&pli=1&authkey=CM7Z-qMF#gid=0

Executive summary: I reckon the best setup for digitising books like
this would be an overhead camera looking down at a flat table, foot
pedal operated. The operator holds open the pages and presses the pedal.
As it happens, I have a friend in the US who is looking into this.

We'd have to custom build this, but the good news is it should be mostly
cheap - I estimate a few thousand.

A 10 megapixel camera would give us 240 dpi on the largest page I found
in a quick scan (28 x 40 cm).

I reckon 3 seconds a page (probably less, actually). I also reckon 30
seconds a volume to enter data in some connected computer system, so
images can be tallied with volumes and pages. I estimated 50 pages per
volume.

So, this gives us a total scan time of 62 working days, and at 5 MB per
page, 2.5 TB of storage (cost at today's prices: < £200 for one copy).

Sounds totally feasible to me. Shall we bid?

---

javier at openrights wrote:

In the WDYTYA conference Ancestry had set up their scanning tables,
which basically consisted of a digital slr camera on a vertical guide,
good diffuse lights and a laptop.

The standard commercial option was sold in another stall for around
£10k a piece, although they had a portable one for £380 pounds to plug
into your own laptop (http://www.solar-imaging.com).

The hardware side should be absolutely no problem with that sort of
budget. There are lots of people doing their own scanners with little
money here http://www.diybookscanner.org/

The other element needed is workflow software to automate the process
but also allow for basic corrections and checks. There are some
projects out there but the most polished seems this one:
http://sourceforge.net/projects/bookscanwizard/ It would need to be
tested for stability, but the developers would probably be delighted to
be involved.

Possibly post-processing http://scantailor.sourceforge.net/ to trim to fit

Then probably there should be some quality control where humans check
the scans are readable elsewhere and approve them. There are things
such as Alfresco Image management
http://www.alfresco.com/products/solutions/ecm/im/

Control should probably take place asap, to correct processes rather
than duplicating work at the end. This could be done by volunteers
online very quickly, as it is a typical task suited for crowdsourcing.
Galaxyzoo may want to get involved as well.

Transcription is an area where there are lots of new developments. I
will post separately.

I am not sure they will want hosting or a "digital pipeline to Sri
Lanka" ;-) When we met the TNA folk they said contracts included all
sort of things.

I think that for them to accept the bid we would need to have some
demonstrable pledge from enough volunteers, at least one (paid?)
volunteer coordinator / project manager in the budget, and show the
administrative capacity.

One added complication is that this is not all based at TNA but in 9
centres around the country, so the volunteers would need to be sourced
locally. Maybe the FHSs?

I would not underestimate the need for training of volunteers and
extra support, transport costs, accommodation and storage for the kit.
The whole thing could add up

It seems a good testing project. Could we get more tender
documentation and book one of those sessions? Even if we dont go ahead
we can learn a lot in the early stages.

The LIA programme is based on commercial companies covering all the
costs, then paying 14% of base revenue to the TNA. In this case they
would not get that money but the documents would be open access, so it
will be a very interesting moral dilemma.

best, Javier



More information about the Opengenalliance mailing list