2013-11-29

Second and final day of Code4Lib BC’s lightning talks. Here are my notes.

Mark Jordan – DOCR/SMD

Slides Source code

OCR clients – phones, tablets that do the work or just a plain script

page server – controls the work

Components

page server

PHP web app

PHP queue manager script

SQLite database

OCR clients

clients use Tesseract OCR engine (available for Android and iOS)

first client is a simple Python script

how it works

client: “im ready , give me an image”

server: here you go

client: OK here’s your text

What I Learned so far

REST

Slim microframework

SQLite

What I want to Learn

Android app development

potential for generalizing OCR to other tasks

Peter Tyrrell – Parsing PDFs

wanted to do search term highlight in PDFs

using NYT Document Viewer (under document cloud)

PDF2DJVU

DVJULIBRE (DJVU to TXT, to XML, to TIF)

Imagemagick (TIF to JPG)

and then a whole bunch of other things to store, process, pass into viewer

John Durno – Uploading to Internet Archive via API

Links

most ambitious digitization project: first 50 years of the British Colonist for the 150th anniversary

~100k images in PDF

Acrobat would highlight for you if configured properly

revised site as part of the next stage to digitize next 100 years

Internet Archive: started digitizing newspapers from microfilm

collection setup within the IA

dumping content into IA can do it through API

can upload metadata with it

based on Amazon S3 API, a lot of tools already work with it

boto library with ia-wrapper

python script to upload

new site is a wrapper around the IA for search purposes

reading happens on IA’s site

Colleen Bell – ERM & LibGuides

use identifier for subject database lists to export JSON

use PHP script to process

add as remote script in libguides

can also do it for individual resources by ID

script pulls based on IDs separated by comma

don’t need libguides, can import into any page

can do this as long as you can get your data

James MacGregor – Article Metrics with OJS/OMP

PKP: scholarly publishing intitiative at SFU

Open Journal System: WordPress for journals

Open Monograph Press: OJS for presses

all software is open source

new: general overhaul statics framework and compatible with addition of PLOS metrics

had a stats framework overhaul: gathered centrally, added features

PKP ALM: application Ruby on Rails web app to aggregate article performance data, and plugin

shows HTML views, PDF downloads, facebook/mendeley shares, pubmed, and more

Jonathan Schatz – The Story of BC Libraries’ IT Environments

did field work going around to assess IT environment in BC Public Libraries within the Sitka group

had about 1 day per library

covered 3 federated libraries

connectivity and network main priorities

phones, internet, network, workstations/servers, printing, technical support, training

had to map wifi with laptop and umbrella

they do a lot with very little resources

sometimes get creative e.g. creating internet routers

Paul Joseph – UBC Digital Library Framework

Status Quo

3 repositories: ContentDM, DSpace (IR), AtoM (for Rare books & special collections)

in addition: separate, collections e.g. Drupal, ElasticSearch

access and presentation in silos, defined by the applications

Tried to improve with in-page view, facets, in-context results, tiling, rapid zoom

Introduce framework

scalable, flexible

plug in metadata and full text (when possible)

use ElasticSearch

series interactions to provide access to metadata and objects

also service provider: OAI-PMH, Open Data API, Open Apps

rely on external services to leverage functionality e.g. RefWorks, Disqus (social commenting)

Calvin Mah / Todd Holbrook – SFU Library Hours Database

as part of the API

might be something they can host, but lack of enthusiasm

maybe could work with coop to host the tool

or host yourself at your institution

straight from UBC, but re-coded it

kept data model and end user look

can do usual hours, but can add exceptions using date range

feeds to API system, available as JSON data

Drupal widgets to drop almost anywhere

That’s it for today. Breakout time! When the time comes, have a safe trip home



Penguins Walking

 

Filed under: Events

Show more