Vince Smith Collections for the 21st Century, Florida 5-6 May 2014 No specimen left behind: Collections digitisation at the NHM, London*
Some history… “the rate of progress by the UK taxonomic institutions in digitising and making collections information available is disappointingly low… there is a significant risk of damage to the international reputation of major institutions such as The Natural History Museum” House of Lords Science and Technology CommitteeReport on Taxonomy and Systematics, 2009
Digitisation rates at the NHM (circa 2009)
The prevailing attitude collections digitisation Biodiversity Informatics 2010, 7: 120 – 129 2010 GBIF Task Group: Global Strategy and Action Plan for the Digitisation of Natural History Collections “Digitizing all specimens is not an achievable aim at present”
More technology, more automation, more speed Whole drawer scanning Herbarium sheet scanning Microscope slide scanning
European collections rising to the challenge Large-scale data capture & digitisation in France, Netherlands & Finland
NHM London Science Strategy 2013-17 A New Voyage of Discovery Three Focal Areas 1. Scientific discovery 2. Scientific Infrastructure 3. Scientific engagement Five Challenges 1. The Digital NHM 2. Origins, evolution & futures 3. Biodiversity discovery 4. Natural resources & hazards 5. Science, society & skills Resources & funding Measuring success
data.nhm.ac.uk/globe/ Digitisation target 20M specimens available by 2017
A long way to go, practically, technically & culturally… NHM collections comprise c.80m objects Physical register: c.5m Digital data: 2.8m Images: 350k
NHM Digital Collections Programme A 2, 5 and 10 year plan... To collate, organise and make available one of the world’s most important natural history collections as digital resource, delivering: an online specimen / lot-level database to manage all holdings core meta-data and / or images for key parts of the collection flexible informatics tools ?750,000 for first 2 years
Outline Why Internal objectives & benefits Research opportunity - the iCollections example What How much data to digitise Linking digitisation effort to project benefits How Digi-street pilots, quick wins (herbarium, drawer & slide scanning) Crowdsourcing pilots & options Where NHM Data Portal External Portals (E.g. GBIF, Europeana) Links Crowdfunding H2020 projects (COST, SYNTHESYS, LOD, VRE, Dig. Inf.) Other museums, herbaria & partners (e.g. CETAF & publishers) When
1. Why: Objectives
1. Why: Research opportunity & the iCollections pilot Using the NHM collections to track long-term seasonal response of butterflies to climate change Digitisation of British and Irish Lepidoptera collection Species poor, specimen rich ~500,000 specimens, 5,000 drawers Re-curation, imaging, label data, georeferenced ~25% complete (started Jan.’13) About 50% specimens ‘useable’ Many specimens in most years (late - 19th century to 1970) Provide longer time perspective than most observational records (BMS post-1976)
1. Why: Research opportunity & the iCollections pilot Relationship between 10th percentile collection date of Anthocharis cardamines (Orange tip) and mean Mar. – May temp. (N.B. temp. axis reversed) 1900-2000, strong correlation between initial collection dates & temperature Critical marker on phenological response prior to recent rapid climate change Longer time perspective than most observational records (BMS post-1976) Museum data available for rare or hard to record species An example of unique biological and ecological data from collections Brooks, Self, Toloni & Sparks, 2014, Int. J. Biometeorol. DOI 10.1007/s00484-013-0780-6
2. What: Linking data capture effort to research benefits
3. How: Digi-street pilots (Herbarium Sheets) PROCESS
3. How: Digi-street pilots (Herbarium Sheets) 33k Specimens per day, 3 shifts (6am-10pm), Netherlands collection complete in 1.5 years €1.29 Euros per specimen image (if outsourced), transcription at similar cost Video of Herbarium Sheet Digitisation (Not available on SlideShare Version of this presentation)
3. How: Digi-street pilots (Drawer scanning & segmentation) SatScan whole drawer scanning 30 Million specimens, 130k drawers Fast, high res. multi-specimen drawer images (5 mins. each) No specimen handling Limited drawer / unit tray metadata, plus identifiers Specimen segmentation problem Digital and physical collection gets out of sync Need to automate specimen segmentation
3. How: Digi-street pilots (Drawer scanning & segmentation) Starting image
3. How: Digi-street pilots (Slide scanning) 1. Slides cleaned & barcoded 2. Loaded into hopper (50-100) 3. High resolution scan 4. Images stored & databased
3. How: Crowdsourcing pilot 1 user with 32,629 transcriptions! 92 users with 100+ transcriptions 363 users with 1 transcription Ranked users Log no. of records transcribed NHM Bird registers No advertising Hard to transcribe Challenging starting project
3. How: Crowdsourcing options Zooniverse Projects Smithsonian Digital Volunteers Wikisource transcription (WiR) [email protected] Next steps: Survey and review of natural history transcription projects cf. paying transcribers
4. Where: NHM Data Portal A focus for deposition and discovery of NHM research & collections data Stable, citable identifiers on datasets & specimen / lot records Transparent data quality (un-reviewed, reviewed, reviewed & updated) Download (DwCA), web-services & Linked Open Data Build using CKAN, with enhanced mapping functionality Search Datasets matching criteria Individual dataset Results Browse & search criteria Mapping, table & statistical views
4. Where: External Portals Flickr GBIF Europeana e.g. NHM Coleoptera NHM almost getting data to GBIF! Submitting to Europeana portal (via Open-Up) Niche collections on Flickr Robust API services Gateway to image analysis projects (e.g. species recognition & trait extraction tools)
5. Links Crowdfunding Personalizes donation Scales well Requires lots of data Most crowdsourcing platforms unsuitable Potential for a data visualization to support our needs H2020 Projects EU Research & Innovation funding Programme €80 Billion from 2014-2020 Strong record (EDIT, ViBRANT, SYNTHESYS1/2/3) 5 proposals in development for 2014/15 Better alignment with Digital Collections Programme Partners Major museums & herbaria (Kew, Smithsonian, & Euro.6) Umbrella organisations & projects (GBIF, CETAF, iDigBio) Universities (e.g. on Image analysis) Data publishers (engagement on data & systems)
6. When Herbarium scanning Pilot – TBC (starting late-2014) Drawer scanning Segmentation Software (Aug. 2014) Pilots (Ongoing) Slide scanner Testing 6 systems (Complete) Procurement / purchase (July 2014) Pilot projects & system integration (From Sept. 2014) Crowdsourcing pilots Draft review paper (Aug. 2014) Additional Notes from Nature Project (early 2015) NHM Data Portal Internal release (June 2014) Public release (Jan. 2015 Funding H2020 projects (submitted, Sept. 14 & Jan. 15) Key dates over next 2 years
Acknowledgements Digital Collections Programme Planning: Ian Owens, Ben Atkinson, Dave Thomas, Andy Purvis, Emilie Smith & Vince Smith. iCollections Project team: Gordon Paterson, Geoff Martin, Martin Honey, Blanca Huertas, Darrell Siebert, Vladimir Blagoderov , Steve Cafferty, Adrian Hine, Chris Sleep, Mike Sadka, Elisa Cane, Lyndsey Douglas, Joanna Durant, Gerardo Mazzetta, Flavia Toloni, Peter Wing, Malcolm Penn & Liz Duffle. Research: Steve Brooks, Angela Self, Flavia Toloni & Tim Sparks. Drawer scanning NHM Satscan development: Vladimir Blagoderov, Laurence Livermore & Vince Smith. Software: Pieter Holtzhausen & Stefan van der Walt (Stellenbosch University). Slide scanner Testing: Vladimir Blagoderov & Alex Ball. Crowdsourcing Pilots (NHM Team): Tim Conyers, Lawrence Brooks & Adrian Hine. Review paper: Laurence Livermore & Vince Smith. NHM Data Portal Project team: Vince Smith, Darrell Siebert, Dave Thomas & Adrian Hine. Development: Ben Scott & Alice Heaton. Apologies to anyone I have missed!
Или вы можете войти через Ваш аккаунт
У вас нет аккаунта? Создать аккаунт