Atmospheric Circulation Reconstructions  over the Earth
Data Rescue Digitising Tools

Assistance for Data Digitisation

ACRE and its numerous allied projects have built up many years of experience in digitising data. The following reference sources have proven useful in their work and will help those getting involved for the first time. Though they emphasise the needs of those making submissions to the International Surface Pressure Databank, the resources also have value for anyone involved in weather data rescue. Data Rescue Forum This web discussion board is devoted to issues affecting those who are rescuing historical weather data for submission to reanalyses data repositories. It covers topics relevant to preparing and formatting historical weather data to a quality level that makes it useful to the climate research community. WMO Nomenclature A reference list for WMO-recognised weather stations which includes metadata such as station number, name and station metadata such as height and longitude and latitude. I SPD submission guidelines The guidelines for preparing digitised data for submission to the International Surface Pressure Databank and the metadata that identifies and qualifies each data item. Additional guideline information is also available . Digitised Data To-date An extensive list of current data holdings in the International Surface Pressure Databank including station and period covered. Zooniverse The world’s largest online platform for collaborative volunteer research, Zooniverse brings scientists and citizens together over the web. Scientists upload their data and choose the tasks they want volunteers to do. Weather Wizards A strip chart digitiser created by the International Environmental Rescue Organization (IEDRO). Many old weather observations were done by automatic instruments that recorded their readings as a continuous line on a strip chart. Weather Wizard is software that recovers data from these charts up to 30 times faster than traditional manual methods. Users view a computer image of a chart and then sweep the cursor over the ink traced line on the chart registering the values it maps at whatever time interval is needed. Excel PDF data extractor Some of the best OCR software is still poor at correctly extracting printed columns of data. However, Bytescout’s PDF Extractor SD freeware does a creditable job, presumably because it has been designed to work with columnar, numeric data Assistance for Document Imaging In many cases, the precursor step to digitising weather data is the imaging (scanning or photographing) of original data documents. Typically the documents are recording sheets and booklets completed many years ago by weather observers. Imaging the documents is done to lessen the wear and tear on these valuable records, to create a record of provenance and also to capture their contents for future use. In some cases only parts of the documents are digitised. For instance, only pressure and attached thermometer may be captured by those making submissions to the ISPD, while the remaining data are made available to other repositories, primarily the International Surface Temperature Initiative [ ISTI ] and the Global Precipitation Climatology Centre [ GPCC ]. If images of the original document are accessible over the web, researchers can in future make reference back to these documents and extract other data of interest such as cloud cover, wind speed, sunshine, etc. Following are some of the freeware software tools used by ACRE’s allied projects in their imaging workflow. Irfanview Proven freeware (30 years in development), this image management solution has extensive capabilities to manage large libraries of images, enhance them in multiple ways including recolour, resize, straighten, crop, watermark, rename, pack metadata, change canvas size, etc. Most features can be used in batch mode allowing thousands of images to have sophisticated operations carried out on them automatically. Rasterstitch (small fee involved) An image stitching tool that stitches together multiple images to recreate a large format document. Traditional stitching tools blend two images together which works well for photographic scenes. However, by working at the pixel level Rasterstitch is geared for images of documents, achieving flawless re-compositions of large format textual, written and drawn material. Canon EOS Utility This software, designed for Canon EOS cameras, has a remote shooting capability allowing the camera to be mounted in a copy stand while being fully controlled via a computer screen and mouse. It contains visual aids to ensure that all photographed material is captured level and in a specified position within the image screen. EXIFtoolGUI Software that enables the bulk creation of a wide variety of metadata for insertion into multiple images. Useful for recording a provenance and audit trail in camera and scanned images. RIOT An industry standard freeware tool for decreasing the size of images with minimal loss of resolution. Useful for transferring images or displaying them over the web. ByteScout PDF MULTITOOL A useful multifunction freeware program that splits multipage pdf documents into single pdf files, creates single multi-page documents out of individual pdf files, rotates pdf images and converts pdf images to jpg/png, etc. Importantly, of all OCR software, it is probably the best at extracting pdf data tables to a CSV file or an Excel spreadsheet. Scan Tailor A freeware program useful for post-processing scanned pages. It performs operations such as page splitting, deskewing, adding/removing borders, and others. It will take raw scans and return pages ready to be printed or assembled into a PDF or DJVU file. It does not include optical character recognition nor assembling multi-page documents.
“Draw on the wealth of experience built up by ACRE’s allied project members …”
“Imaging documents is done to lessen wear and tear and make other data available to researchers…”
I nclude link to a workflow grafik see “Data acquis and present graphic.ppt” Include a link to a case for metadata tagging video