Second and final day of Code4Lib BC’s lightning talks. Here are my notes.
Mark Jordan – DOCR/SMD
Slides Source code
OCR clients – phones, tablets that do the work or just a plain script
page server – controls the work
Components
page server
PHP web app
PHP queue manager script
SQLite database
OCR clients
clients use Tesseract OCR engine (available for Android and iOS)
first client is a simple Python script
how it works
client: “im ready , give me an image”
server: here you go
client: OK here’s your text
What I Learned so far
REST
Slim microframework
SQLite
What I want to Learn
Android app development
potential for generalizing OCR to other tasks
Peter Tyrrell – Parsing PDFs
wanted to do search term highlight in PDFs
using NYT Document Viewer (under document cloud)
PDF2DJVU
DVJULIBRE (DJVU to TXT, to XML, to TIF)
Imagemagick (TIF to JPG)
and then a whole bunch of other things to store, process, pass into viewer
John Durno – Uploading to Internet Archive via API
Links
most ambitious digitization project: first 50 years of the British Colonist for the 150th anniversary
~100k images in PDF
Acrobat would highlight for you if configured properly
revised site as part of the next stage to digitize next 100 years
Internet Archive: started digitizing newspapers from microfilm
collection setup within the IA
dumping content into IA can do it through API
can upload metadata with it
based on Amazon S3 API, a lot of tools already work with it
boto library with ia-wrapper
python script to upload
new site is a wrapper around the IA for search purposes
reading happens on IA’s site
Colleen Bell – ERM & LibGuides
use identifier for subject database lists to export JSON
use PHP script to process
add as remote script in libguides
can also do it for individual resources by ID
script pulls based on IDs separated by comma
don’t need libguides, can import into any page
can do this as long as you can get your data
James MacGregor – Article Metrics with OJS/OMP
PKP: scholarly publishing intitiative at SFU
Open Journal System: WordPress for journals
Open Monograph Press: OJS for presses
all software is open source
new: general overhaul statics framework and compatible with addition of PLOS metrics
had a stats framework overhaul: gathered centrally, added features
PKP ALM: application Ruby on Rails web app to aggregate article performance data, and plugin
shows HTML views, PDF downloads, facebook/mendeley shares, pubmed, and more
Jonathan Schatz – The Story of BC Libraries’ IT Environments
did field work going around to assess IT environment in BC Public Libraries within the Sitka group
had about 1 day per library
covered 3 federated libraries
connectivity and network main priorities
phones, internet, network, workstations/servers, printing, technical support, training
had to map wifi with laptop and umbrella
they do a lot with very little resources
sometimes get creative e.g. creating internet routers
Paul Joseph – UBC Digital Library Framework
Status Quo
3 repositories: ContentDM, DSpace (IR), AtoM (for Rare books & special collections)
in addition: separate, collections e.g. Drupal, ElasticSearch
access and presentation in silos, defined by the applications
Tried to improve with in-page view, facets, in-context results, tiling, rapid zoom
Introduce framework
scalable, flexible
plug in metadata and full text (when possible)
use ElasticSearch
series interactions to provide access to metadata and objects
also service provider: OAI-PMH, Open Data API, Open Apps
rely on external services to leverage functionality e.g. RefWorks, Disqus (social commenting)
Calvin Mah / Todd Holbrook – SFU Library Hours Database
as part of the API
might be something they can host, but lack of enthusiasm
maybe could work with coop to host the tool
or host yourself at your institution
straight from UBC, but re-coded it
kept data model and end user look
can do usual hours, but can add exceptions using date range
feeds to API system, available as JSON data
Drupal widgets to drop almost anywhere
That’s it for today. Breakout time! When the time comes, have a safe trip home
Penguins Walking
Filed under: Events