Data Rescue
Digitising Tools
Assistance for Data Digitisation
ACRE
and
its
numerous
allied
projects
have
built
up
many
years
of
experience
in
digitising
data.
The
following
reference
sources
have
proven
useful
in
their
work
and
will
help
those
getting
involved
for
the
first
time.
Though
they
emphasise
the
needs
of
those
making
submissions
to
the
International
Surface
Pressure
Databank,
the
resources
also
have
value
for
anyone
involved
in
weather
data
rescue.
Data Rescue Forum
This
web
discussion
board
is
devoted
to
issues
affecting
those
who
are
rescuing
historical
weather
data
for
submission
to
reanalyses
data
repositories.
It
covers
topics
relevant
to
preparing
and
formatting
historical
weather
data
to
a
quality
level
that
makes
it
useful
to
the
climate
research
community.
WMO Nomenclature
A
reference
list
for
WMO-recognised
weather
stations
which
includes
metadata
such
as
station
number, name and station metadata such as height and longitude and latitude.
I
SPD submission guidelines
The
guidelines
for
preparing
digitised
data
for
submission
to
the
International
Surface
Pressure
Databank
and
the
metadata
that
identifies
and
qualifies
each
data
item.
Additional
guideline
information is also available
.
Digitised Data To-date
An
extensive
list
of
current
data
holdings
in
the
International
Surface
Pressure
Databank
including
station and period covered.
Zooniverse
The
world’s
largest
online
platform
for
collaborative
volunteer
research,
Zooniverse
brings
scientists
and
citizens
together
over
the
web.
Scientists
upload
their
data
and
choose
the
tasks
they
want
volunteers to do.
Weather Wizards
A
strip
chart
digitiser
created
by
the
International
Environmental
Rescue
Organization
(IEDRO).
Many
old
weather
observations
were
done
by
automatic
instruments
that
recorded
their
readings
as
a
continuous
line
on
a
strip
chart.
Weather
Wizard
is
software
that
recovers
data
from
these
charts
up
to
30
times
faster
than
traditional
manual
methods.
Users
view
a
computer
image
of
a
chart
and
then
sweep
the
cursor
over
the
ink
traced
line
on
the
chart
registering
the
values
it
maps
at
whatever
time
interval is needed.
Excel PDF data extractor
Some
of
the
best
OCR
software
is
still
poor
at
correctly
extracting
printed
columns
of
data.
However,
Bytescout’s
PDF
Extractor
SD
freeware
does
a
creditable
job,
presumably
because
it
has
been
designed to work with columnar, numeric data
Assistance for Document Imaging
In
many
cases,
the
precursor
step
to
digitising
weather
data
is
the
imaging
(scanning
or
photographing)
of
original
data
documents.
Typically
the
documents
are
recording
sheets
and
booklets
completed
many
years
ago
by
weather
observers.
Imaging
the
documents
is
done
to
lessen
the
wear
and
tear
on
these
valuable
records,
to
create
a
record
of
provenance
and
also
to
capture
their
contents
for
future
use.
In
some
cases
only
parts
of
the
documents
are
digitised.
For
instance,
only
pressure
and
attached
thermometer
may
be
captured
by
those
making
submissions
to
the
ISPD,
while
the
remaining
data
are
made
available
to
other
repositories,
primarily
the
International
Surface
Temperature
Initiative
[
ISTI
]
and
the
Global
Precipitation
Climatology
Centre
[
GPCC
].
If
images
of
the
original
document
are
accessible
over
the
web,
researchers
can
in
future
make
reference
back
to
these
documents
and
extract
other
data
of
interest
such
as
cloud
cover,
wind
speed,
sunshine,
etc.
Following
are
some
of
the
freeware
software
tools
used
by
ACRE’s
allied
projects
in
their
imaging
workflow.
Irfanview
Proven
freeware
(30
years
in
development),
this
image
management
solution
has
extensive
capabilities
to
manage
large
libraries
of
images,
enhance
them
in
multiple
ways
including
recolour,
resize,
straighten,
crop,
watermark,
rename,
pack
metadata,
change
canvas
size,
etc.
Most
features
can
be
used
in
batch
mode
allowing
thousands
of
images
to
have
sophisticated
operations
carried
out
on them automatically.
Rasterstitch
(small fee involved)
An
image
stitching
tool
that
stitches
together
multiple
images
to
recreate
a
large
format
document.
Traditional
stitching
tools
blend
two
images
together
which
works
well
for
photographic
scenes.
However,
by
working
at
the
pixel
level
Rasterstitch
is
geared
for
images
of
documents,
achieving
flawless re-compositions of large format textual, written and drawn material.
Canon EOS Utility
This
software,
designed
for
Canon
EOS
cameras,
has
a
remote
shooting
capability
allowing
the
camera
to
be
mounted
in
a
copy
stand
while
being
fully
controlled
via
a
computer
screen
and
mouse.
It
contains
visual
aids
to
ensure
that
all
photographed
material
is
captured
level
and
in
a
specified
position within the image screen.
EXIFtoolGUI
Software
that
enables
the
bulk
creation
of
a
wide
variety
of
metadata
for
insertion
into
multiple
images. Useful for recording a provenance and audit trail in camera and scanned images.
RIOT
An
industry
standard
freeware
tool
for
decreasing
the
size
of
images
with
minimal
loss
of
resolution.
Useful for transferring images or displaying them over the web.
ByteScout PDF MULTITOOL
A
useful
multifunction
freeware
program
that
splits
multipage
pdf
documents
into
single
pdf
files,
creates
single
multi-page
documents
out
of
individual
pdf
files,
rotates
pdf
images
and
converts
pdf
images
to
jpg/png,
etc.
Importantly,
of
all
OCR
software,
it
is
probably
the
best
at
extracting
pdf
data
tables to a CSV file or an Excel spreadsheet.
Scan Tailor
A
freeware
program
useful
for
post-processing
scanned
pages.
It
performs
operations
such
as
page
splitting,
deskewing,
adding/removing
borders,
and
others.
It
will
take
raw
scans
and
return
pages
ready
to
be
printed
or
assembled
into
a
PDF
or
DJVU
file.
It
does
not
include
optical
character
recognition nor assembling multi-page documents.
“Draw on the wealth of
experience built up by
ACRE’s allied project
members …”
“Imaging documents is
done to lessen wear and
tear and make other data
available to researchers…”
I
nclude
link
to
a
workflow
grafik
see
“Data
acquis
and
present
graphic.ppt”
Include
a
link
to
a
case
for metadata tagging
video