Hangingtogether.org

The OCLC Evolving Scholarly Record Workshop, Chicago Edition

2015-04-06

On March 23, 2015, we held the third in the Evolving Scholarly Record Workshop series at Northwestern University. The workshops build on the framework in the OCLC Research report, The Evolving Scholarly Record.

Jim Michalko, Vice President OCLC Research Library Partnership, introduced the third of four workshops to address library roles and new communities of practice in the stewardship of the evolving scholarly record.

Cliff Lynch, Director of CNI, started out by talking about memory institutions as a system — more than individual collections – to capture both the scholarly record and the endlessly ramifying cultural record. It’s impossible to capture them completely, but hopefully we are sampling the best.

It is our role to safeguard the evidentiary record upon which the scholarly record and future scholarship depend. But the scholarly record is taking on new definitions. It includes the relationship between the data and the science acted upon it. Its contents are both r efereed and un-refereed. It includes videos, blogs, websites, social media… And even the traditional should be made accessible in new ways. There is an information density problem and prioritization must be done.

We need to be careful when thinking about the scholarly record and look at new ways in which scholarly information flows.

There is a lot of stuff that doesn’t make it into IRs because all eyes are on capturing things that are already published somewhere. The eyes are on the wrong ball…

[presentations are available on the event page]

Brian Lavoie, Research Scientist in OCLC Research provided a framework for a common understanding and shared terminology for the day’s conversations.

He defined the scholarly record as being the portions of scholarly outputs that have been systematically gathered, organized, curated, identified and made persistently accessible.

OCLC developed the Evolving Scholarly Record Framework to help support discussions, to define key categories of materials and stakeholder roles, to be high-level so it can be cross disciplinary and practical, to serve as a common reference point across domains, and to support strategic planning. The major component is still outcomes, but in addition there are materials from the process (e.g., algorithms, data, preprints, blogs, grant reviews) and materials from the aftermath (e.g., blogs, reviews, commentaries, revision, corrections, repurposing for new audiences).

The stakeholder ecosystem combines old roles (fix, create, collect, and use) in new combinations and among a variety of organizations. To succeed, selection of the scholarly record must be supported by a stable configuration of stakeholder roles.

We’ve been doing this, but passively and often at the end of a researcher’s career. We need to do so much more, proactively and by getting involved early in the process.

Herbert Van de Sompel, Scientist at Los Alamos National Laboratory gave his Perspective on Archiving the Evolving Scholarly Record. A scholarly communication system has to support the research process (which is more visible than ever before) and fulfill these functions:

Registration: allows claims of precedence for scholarly finding (e.g. Mss submission), which is now less discrete and more continuous

Certification: establishes the validity of the claim (e.g., peer review), which is becoming less formal

Awareness: allows actors to remain aware of new claims (alerts, stacks browsing, web discovery), which is trending toward instantaneous

Archiving: allows preservation of the record (by libraries and other stakeholders), which is evolving from medium- to content-driven.

Herbert characterized the future in the following ways: The scholarly record is undergoing massive extension with objects that are heterogeneous, dynamic, compound, inter-related and distributed across the web – and often hosted on common web platforms that are not dedicated to scholarship.

Our goal is to achieve the ability to persistently, precisely, and seamlessly revisit the Scholarly Web of the Past and of the Now at some point in the Future. We need to capture compound objects, with context, and in a state of flux at the request of the owner and at the time of relevance.

Herbert’s distinction be tween recording and archiving is critical. Recording platforms make no commitment to long-term access or preservation. They may be a significant part of the scholarly process, but they are not a dependable part of the scholarly record.

We need to start creating workflows that support researcher-motivated movement of objects from private infrastructure to recording infrastructure and support curator-motivated movement of objects and context from recording infrastructure to archiving infrastructure.

Sarah Pritchard, Dean of Libraries, Northwestern University put things in the campus politics and technology context.

The evolving scholarly record requires that we work with a variety of stakeholders on campus: faculty and students (as creators), academic departments (as managers of course content and grey literature), senior administrators (general counsel, CFO, HR), trustees (governance policy), office of research (as proxy for funder’s requirements), information technology units, and disciplinary communities.

There are many research information systems on campus, beyond the institutional repository: course management systems, faculty research networking systems, grant and sponsored research management systems, student and faculty personnel system, campus servers and intranets, and – because the campus boundaries are pervious — disciplinary repositories, cloud and social platforms. And also office hard drives.

Policies and compliance issues go far beyond the content licensing libraries are familiar with: copyright (at the institutional and individual levels), privacy of records (student work, clinical data, business records), IT security controls and web content policies, state electronic records retention laws, open access (institutionally or funder mandated), and rights of external system owners (hosted content).

Sarah finished with some provocative thoughts:

The library sees itself as a “selector”, but many may see this as overstepping

The library looks out for the institution which can be at odds with the faculty sense of individual professional identity

There is a high cost to change the technical infrastructure and workflow mechanisms and to reshape governance and policy

There is a lack of a sense of urgency

She recommended that we start with low hanging fruit, engage centers of expertise, find pilot project opportunities, and accept that there won’t be a wholesale move into this environment.

Sarah Pritchard’s presentation really affected me: sort of a rallying cry to go out and make things happen!

The campus context provided a perfect launching point for the Breakout Discussions. From ten pages of notes, I’ve distilled the following action-oriented outcomes:

Within the library

If your library has receded from your university goals and strategies, move the library back into the prime business of your institution with a roster of candidate service offerings to re-position yourselves in the campus community.

Earn reputation through service provision and through access as opposed to reputation through ownership.

Selection

Ask yourself, what are we selecting? How do we define the object? What commitments will we make? And how does it fit into the broader system?

Consider some minimum requirements in terms of number of hits or other indications of interest for blogs/websites to be archived. Those indexed by organizations like MLA or that are cited in scholarly articles seem worthy.

Declare collections of record so that others can depend on it, but beware of the commitment if you have to create new storage and access systems for a particular type of material.

Communicate when you have taken on a commitment to web archiving particular resources, possibly via the MARC preservation commitment field.

A lot of stuff that doesn’t get archived because we focus on materials that are already well-tended elsewhere. Look for the at-risk materials.

Accept adequate content sampling.

Focus on training librarians. Get them to use the dissertation as the first opportunity to establish a relationship, establish an ORCID, and mint a DOI. Do some of these things that publishers do to provide a gateway to infrastructure that is not campus-centric but system-centric.

Decide where the library will focus; it can’t be expert in all things. Assess where the vulnerabilities are and set priorities.

Provide a solution where none exists to capture the things that have fallen through the cracks.

Technical solutions

Linked data could be the glue for connecting IDs with institutions. Identifiers for individuals and for organizations, and possibly identifiers for departments, funding agencies, projects…

Follow a standard to create metadata to provide consistency in the way it’s formed, in the content, and in the identifiers being used.

Use technology that is ready now to

help with link rot (the URL is bad) and reference rot (the content has changed), so researchers can reference a resource as it was when they used the data or cited it. Memento makes it easy to archive a web page at a point in time.

provide identifiers

ORCID and ISNI are ready for researcher identification.

DOIs, Perma.cc, and Memento are ready for use.

harvest web resources. Archive-It is ready for web harvesting and the Internet Archive’s Wayback Machine is ready for accessing archived web pages.

transport of big data packets. Globus is a solution for researchers and institutions

create open source repositories. Consider using DSpace, EPrints, Fedora or Drupal to make your own.

Explore ways in which people track conversation around the creation of an output, like the Future of the Book platform or Twitter conversations. Open Annotation is a solution that allows people to discuss where they prefer.

Before building a data repository, ask for whom are we doing this and why? If no one is asking for it, turn your attention elsewhere.

Create a hub for scholars who don’t know what they need, where the main activity may be referring researchers to other services.

To get quick support, promote and provide assistance with the DMPTool, minting DOIs, and archiving that information.

Get your message into two simple sentences.

Evolve the model and the people to move from support to collaboration

With researchers

Do the work to understand researchers’ perspectives. Meet them where they live. A good way to engage researchers is to ask them what’s important in their field. Then ask who is looking after it. Include grad students and untenured and newly-tenured faculty as they may be most receptive.

Data services may vary dramatically among disciplines. Social Sciences want help with SPSS and R. Others want GIS. For STEM and Humanities there are completely different needs.

Before supporting an open access journal, ask the relevant community: do you need a journal, who is the audience, and what is the best way to communicate with them?

Stop hindering researchers with roadblocks relating to using cameras or scanners, copying, or putting up web pages.

Help users make good choices in use of existing disciplinary data repositories and provide a local option for disciplines lacking good choices.

Help faculty avoid having to profile themselves in multiple venues. Offer bibliography and resume services and portability as they move from institution to institution.

Explain the benefits of deposit in the record to students and faculty in terms of their portfolio and resume, and for collaboration.

To educate reluctant researchers, use assistants in the workflow, i.e. grant management assistants or use graduate student ambassadors to discount rumors and half-truths. Try quick lunch and learn workshops. Market through established channels and access points.

Talk to researchers about the levels of granularity available to appropriately manage access to their content.

Coordinate with those writing proposals and make sure they know that if they expect library staff to do some of the work, the library needs to be involved in the discussion. Get involved early in the research proposal process. Stress that maintenance has to be built in. When committing to archiving, include an MOU covering service levels and end-of-life.

A formalized request process may help with communication.

With other parts of your institution

Get at least one other partner on campus on board early — maybe an academic faculty or department who are moving in the same direction you need to go (or administration, grants manager, IT people, educators, other librarians, funders).

Begin with a strategy, a call for partnership and implementation, then have conversations with faculty departments to get an environmental scan. Identified what is needed (e.g., GIS, text-mining, data analysis), and distill into areas you can support internally or send along to campus partners.

Don’t duplicate services. Cede control to another area on the campus. Communicate what is going on in different divisions and establish relationships. Provide guidance to get researchers to those places.

Work with associate deans and others at that level to find out about grant opportunities.

Develop partnerships with research centers and computing services, deciding what where in the lifecycle things are to be archived and by whom.

Other parts of the university may decide to license data from vendors like Elsevier. The library has a relationship that vendor, offer to do the negotiation.

Spin your message to a stakeholder’s context (e.g., archiving the scholarly record is a part of business continuity planning and risk management for the University’s CFO).

Coordinate with other campus pockets of activity involved in assigning DOIs, data management, and SEO activities for the non-traditional objects to optimize institutional outcomes. Integrating these objects into the infrastructure makes them able to circulate with the rest of the record.

Alliances on campus should be about integrating library services into the campus infrastructure. Unless you’ve done that on campus, you’re not doing your best to connect to the larger scholarly record.

With external entities

We should work with scholarly societies to learn about what we need to collect in a particular discipline (data sets, lab books, etc.) — and how to work with those researchers to get those things.

Identify the things can be done elsewhere and those that need to be done locally. Storing e-science data sets may not be a local thing, whereas support for collaboration may be.

Make funder program officers aware of how libraries can help with grant proposals, so they can refer researchers’ questions back to the library.

Rely on external services like JSTOR, arXiv, SSRN, and ICPSR, which are dependable delivery and access systems with sustainable business models.

Use centers of excellence. Consider offering your expertise, for instance, with a video repository and rely on another institution for data deposit.

Work with publishers to provide the related metadata that might, for instance, be associated with a dataset uploaded to PLoSOne.

To help with the impact of researcher output, work with others, such as Symplectic, because they have the metadata we need.

To establish protocols for transferring between layers, make sure conversations include W3C and IETF.

Identify pockets of interoperability and find how to connect rather than waiting for interoperability to happen.

We are at the beginning of this; it will get better.

Thanks to all of our participants, but particularly to our hosts at Northwestern University, our speakers, and our note-takers. We’re looking forward to culminating the series at the workshop in San Francisco in June, where we’ll focus on how we can collaboratively move things forward to do our best to ensure stewardship of the scholarly record.

About Ricky Erway

Ricky Erway, Senior Program Officer at OCLC Research, works with staff from the OCLC Research Library Partnership on projects ranging from managing born digital archives to research data curation.

Mail | Web | Twitter | LinkedIn | More Posts (37)