Atmospheric Circulation Reconstructions  over the Earth
Data Rescue Digitising Tools

Assistance for Data Digitisation

ACRE   and   its   numerous   allied   projects   have   built   up   many   years   of   experience   in   digitising   data.   The following   reference   sources   have   proven   useful   in   their   work   and   will   help   those   getting   involved   for the   first   time.      Though   they   emphasise   the   needs   of   those   making   submissions   to   the   International Surface   Pressure   Databank,   the   resources   also   have   value   for   anyone   involved   in   weather   data rescue. Data Rescue Forum This   web   discussion   board   is   devoted   to   issues   affecting   those   who   are   rescuing   historical   weather data   for   submission   to   reanalyses   data   repositories.      It   covers   topics   relevant   to   preparing      and formatting   historical   weather   data   to   a   quality   level   that   makes   it   useful   to   the   climate   research community.   WMO Nomenclature A   reference   list   for   WMO-recognised   weather   stations    which   includes   metadata   such   as   station number, name and station metadata such as height and longitude and latitude. I SPD submission guidelines The   guidelines   for   preparing   digitised   data   for   submission   to   the   International   Surface   Pressure Databank    and    the    metadata    that    identifies    and    qualifies    each    data    item.    Additional    guideline   information is also available . Digitised Data To-date   An   extensive   list   of   current   data   holdings   in   the   International   Surface   Pressure   Databank   including station and period covered. Zooniverse The   world’s   largest   online   platform   for   collaborative   volunteer   research,   Zooniverse   brings   scientists and   citizens   together   over   the   web.      Scientists   upload   their   data   and   choose   the   tasks   they   want volunteers to do. Weather Wizards A   strip   chart   digitiser   created   by   the   International   Environmental   Rescue   Organization   (IEDRO).      Many old   weather   observations   were   done   by   automatic   instruments   that   recorded   their   readings   as   a continuous   line   on   a   strip   chart.      Weather   Wizard   is   software   that      recovers   data   from   these   charts   up to   30   times   faster   than   traditional   manual   methods.      Users   view   a   computer   image   of   a   chart   and   then sweep   the   cursor   over   the   ink   traced   line   on   the   chart   registering   the   values   it   maps   at   whatever   time interval is needed. Excel PDF data extractor Some   of   the   best   OCR   software   is   still   poor   at   correctly   extracting   printed   columns   of   data.      However, Bytescout’s   PDF   Extractor   SD   freeware   does   a   creditable   job,   presumably   because   it   has   been designed to work with columnar, numeric data Assistance for Document Imaging In    many    cases,    the    precursor    step    to    digitising    weather    data    is    the    imaging    (scanning    or photographing)    of    original    data    documents.        Typically    the    documents    are    recording    sheets    and booklets   completed   many   years   ago   by   weather   observers.      Imaging   the   documents   is   done   to   lessen the   wear   and   tear   on   these   valuable   records,   to   create   a   record   of   provenance   and   also   to   capture their   contents   for   future   use.      In   some   cases   only   parts   of   the   documents   are   digitised.   For   instance, only   pressure   and   attached   thermometer   may   be   captured   by   those   making   submissions   to   the   ISPD, while   the   remaining   data   are   made   available   to   other   repositories,   primarily   the   International   Surface Temperature   Initiative    [ ISTI ]   and   the   Global   Precipitation   Climatology   Centre    [ GPCC ].      If   images   of   the original   document   are   accessible   over   the   web,   researchers   can   in   future   make   reference   back   to these   documents   and   extract   other   data   of   interest   such   as   cloud   cover,   wind   speed,   sunshine,   etc. Following   are   some   of   the   freeware   software   tools   used   by   ACRE’s   allied   projects   in   their   imaging workflow. Irfanview Proven    freeware    (30    years    in    development),        this    image    management    solution    has    extensive capabilities   to   manage   large   libraries   of   images,   enhance   them   in   multiple   ways   including   recolour, resize,   straighten,   crop,   watermark,   rename,   pack   metadata,   change   canvas   size,   etc.      Most   features can   be   used   in   batch   mode   allowing   thousands   of   images   to   have   sophisticated   operations   carried   out on them automatically. Rasterstitch (small fee involved) An   image   stitching   tool   that   stitches   together   multiple   images   to   recreate   a   large   format   document.     Traditional   stitching   tools   blend   two   images   together   which   works   well   for   photographic   scenes.     However,   by   working   at   the   pixel   level   Rasterstitch   is   geared   for   images   of   documents,   achieving     flawless re-compositions of large format textual, written and drawn material. Canon EOS Utility This   software,   designed   for   Canon   EOS   cameras,   has   a   remote   shooting   capability   allowing   the camera   to   be   mounted   in   a   copy   stand   while   being   fully   controlled   via   a   computer   screen   and   mouse. It   contains   visual   aids   to   ensure   that   all   photographed   material   is   captured   level   and   in   a   specified position within the image screen.   EXIFtoolGUI Software   that   enables   the   bulk   creation   of   a   wide   variety   of   metadata   for   insertion   into   multiple images.  Useful for recording a provenance and audit trail in camera and scanned images. RIOT An   industry   standard   freeware   tool   for   decreasing   the   size   of   images   with   minimal   loss   of   resolution.     Useful for transferring images or displaying them over the web. ByteScout PDF MULTITOOL A   useful   multifunction   freeware   program   that   splits   multipage   pdf   documents   into   single   pdf   files, creates   single   multi-page   documents   out   of   individual   pdf   files,   rotates   pdf   images   and   converts   pdf images   to   jpg/png,   etc.      Importantly,   of   all   OCR   software,   it   is   probably   the   best   at   extracting   pdf   data tables to a CSV file or an Excel spreadsheet. Scan Tailor A   freeware   program   useful   for   post-processing   scanned   pages.   It   performs   operations   such   as   page splitting,   deskewing,   adding/removing   borders,   and   others.   It   will   take   raw   scans   and   return   pages ready   to   be   printed   or   assembled   into   a   PDF   or   DJVU   file.   It   does   not   include   optical   character recognition nor assembling multi-page documents.
“Draw on the wealth of experience built up by ACRE’s allied project members …”
“Imaging documents is done to lessen wear and tear and make other data available to researchers…”
I nclude    link    to    a    workflow    grafik see     “Data     acquis     and     present graphic.ppt”   Include   a   link   to   a   case for metadata tagging  video