2017-01-24

Discovery for Scholarly Research: Evolving Needs and Services — An NFAIS Workshop

By Donald T. Hawkins, Freelance Conference Blogger and Editor <dthawkins@verizon.net>

Note: A summary of this article appeared in the December 2016/January 2017 issue of Against The Grain v 28 # 6 on Page 74.

Researchers are now accessing content through a variety of channels, and discovery services have become more important than ever. NFAIS, the National Federation of Advanced Information Services (http://www.nfais.org), held a one-day in-person and virtual workshop on this subject in Alexandria, VA on June 29, 2016. The workshop began with a review by Simon Inger, Principal, Simon Inger Consulting, of the recent report entitled “How Readers Discover Content in Scholarly Publications” that he co-authored with Tracy Gardner. (See my article on the 2016 NFAIS Annual Meeting in the April 2016 issue of ATG[i] and the accompanying online version for a full summary of the report.) Some of its major conclusions are:

Web analytics only show the “last hop”, not the origin of discovery, and they often do not capture either geographical origin or users’ demographics.

Abstracting and indexing services (A&Is) are still first in importance overall, even though a 4-year trend shows some decline.

Academic researchers rate library discovery as high as A&Is.

Publishers say they get more referrals from Google than Google Scholar. (The reason is because analytics do not track where the navigation starts.)

Lower income countries tend to rate A&Is and Google Scholar as less important than publisher websites for searching.

Inger concluded that there is no single right answer in discovery; many factors including brand, ease of use, information literacy training, and availability of resources influence selection of a discovery service.

Discovery Tool Services

Mike Showalter, Executive Director, End-User Services at OCLC, said that discovery services, librarians, and publishers share similar goals: they are looking for validation that they have created and purchased the right materials. Are users finding information that meets their research needs? For 3 decades, OCLC has played a leading part in innovating for our industry; its FirstSearch product is still being used 17 years after its introduction, and about 125,000 people per day land on a WorldCat page. This year marks the 45th anniversary of WorldCat, and it is being improved by adding features desired by librarians as well as by improving relevancy and the user experience. Efforts are underway to improve understanding of user behavior by an increased focus on the scholarly content stream and the use of linked data to add content.

As shown here, the discovery landscape has become more complex. With a combination of aggregations, journal databases, books, archival material, open access repositories, and A&I content, it encompasses more than just articles delivered to users.



One of OCLC’s business challenges is that A&I products are not well represented in its central index. Open access (OA) continues to increase in importance; in 2016 and 2017, OCLC will conduct a full review of OA content opportunities. Showalter said that data discovery varies; large datasets tend to be easily found, but smaller ones such as those connected to a single article are more difficult.

OCLC has recently produced a compilation of articles on the library in the life of the user[1]; some of its conclusions are:

Discovery applications are just one tool to use.

Users’ expectations are driven by what they use in other parts of their lives.

The technology train keeps rolling; where will it be in 10 years?

In considering discovery, we tend to focus on advanced users, but we must recognize that undergraduates account for a significant amount of the use of discovery services. When those students become graduates, their expectations will be very different than we may think today.  The 3 trends currently shaping discovery are:

Things, not strings. Using known entities creates context and pathways to answers.

Personalization and curation. Knowing the context of the user increases search satisfaction.

Multiple starting points and devices are normal and expected.

Dan Driscoll, Vice President, Database Partnerships at EBSCO, said that EBSCO’s relevancy ranking involves more than simple keyword matching, and some metadata fields count more heavily than others in scoring. The goal is to determine what an article is about, not just find the keywords. Unstructured and imprecise keyword searching has been replaced with precise concept searching; user concepts are matched with the appropriate equivalent vocabulary terms.

EBSCO has made a large investment in ethnographic research and has observed that what users do can be quite different from what they say. Life on the web sets expectations; the user experience is personal. “Did you mean” suggestions are a significant advance on spell checking, and EBSCO’s suggestions were significantly upgraded in 2015. Further updates based on data mining will be added during this year. EBSCO has also developed a “Research Starters” product based on data from PhDs at Salem Press and Encyclopedia Britannica. Alternative metrics from Plum Analytics (http://plumanalytics.com/) are better than citations and will be added to result lists in EBSCO’s EDS discovery service.

Christine Stohn, Senior Product Manager at ProQuest/ExLibris noted that discovery is more than searching; it is a gateway that is used in context to guide users to other resources. It is important to give users several alternative entry points to their searches because they are impatient, mobile, and social. They want simple fast results, will not read long explanations, and do not like cluttered pages. They are accustomed to personalized experiences which are difficult to accommodate in discovery services, and they consult and exchange information with others. They are used to and appreciate recommendations.

Here are some conclusions from user and usability studies.

Discovery is about finding specific topics and going beyond known items and topics.

Users often consult with peers and start a search with some knowledge of a topic.

Many users start with Google because they are used to it and find it simple.

Students’ reading lists are often the first entry point for finding material, but they must go beyond the lists.

Discovery systems are a gateway and can be database and resource recommenders. Users frequently search beyond library collections, so the system can guide them to those resources. The systems are part of research, teaching, and learning workflows and should be embedded in reading lists, search applications, mobile apps, etc. OA journals should be tagged so that users can limit a search to OA articles only.

A&I Databases in Discovery

Joelle Masciulli, Head of Research Discovery at Thomson Reuters, described the role of Thomson’s Web of Science (WOS) product in discovery. She began by listing some of the top trends affecting research and researchers:

There is an increased focus on collaboration, especially across disciplines and geographic areas.

The demand for open science and data will continue to grow.

Career and reputation management is important everywhere. Researchers need to be sure they are representing themselves well.

All science is computational, so data must be linked at multiple levels.

Problem-oriented contextual research with an emphasis on solving practical rather than theoretical problems is growing, which has resulted in a decline in the distinction between science and technology.

The WOS today contains over 62 million high quality records with over 1 billion cited references going as far back as 1898, all of which are searchable. It supports 7 language interfaces and multiple character sets, and is a unique collection of metadata about the research ecosystem that can be accessed as a citation network to reveal connections between scholarly works or to generate analytics. Overall usage has grown significantly in recent years; in 2015, the WOS was the top DOI referrer to CrossRef.[2]

Much of today’s emphasis is still on search, not discovery, so a new “WOS Everywhere” concept provides quick powerful access to the global research ecosystem using the world’s leading citation databases. Data is taken from 12,500 of the highest impact journals in the WOS core database, a new “Emerging Sources Citation Index”, and regional citation indexes from emerging economies. WOS Everywhere will provide metadata to help universities build their institutional repositories and will partner with third parties to integrate the metadata into more customer workflows.  A partnership with Google Scholar is also being developed, and a new database of authors using every possible field for identification has just been released.

The next step is to further harness the power of the citation network by viewing the connections among researchers in new ways: through ideas, institutions, funders, etc. so that the way researchers engage with the literature and each other, explore connections and new disciplines, and keep current will be transformed. Discovery must come to the user, which will bring a more social experience into the WOS so people can understand how they can share and increase their visibility.

Jessica Kowalski, Director of Market Development at Elsevier, said that there has been a decline in usage of A&I products, primarily because new forms of usage are emerging. In the past, discussions of A&I services have tended to focus on a few key players, but today, the research landscape has dramatically expanded, as shown here.



In 2012, the primary decision criteria for selecting an A&I service were the breadth of its database, ease of use, and citation quality; today, the criteria are content coverage, author profile capability, and presence of citation analysis tools. Fewer searches are being conducted because products have been designed to be easier to use, so there are fewer clicks to count. To survive, A&I tools must continue to expand their role in the research workflow. Formerly, they connected the initial search to content; now they must also include information from other sources, such as funding, alternative sources, etc. Disambiguation of resources by author or affiliation, integration with local sources, and analysis of citation data and metrics are all important features for an A&I service to have.

Content is still king. Elsevier incorporates data and metrics into a range of tools to help scientists make new discoveries and aid research strategy. Its Scopus service includes content from more than 5,000 services and over 100 countries; some of its content extends back to 1823. A&I data are being used for profiling (entity identification and disambiguation), APIs and custom data, and text mining. The relationships between articles and author or affiliation profiles using citation data are the foundation of Scopus and similar tools; the greatest influence is from the author profile. Scopus is used by more than 3,500 organizations and 150 funding and assessment bodies. Some of the most common uses of Scopus APIs are:

Showing publications or cited-by counts on a website,

Federated search,

Populating repositories with document metadata, and

Populating publication histories and profiles.

The most frequently used piece of metadata is funding: if you are cited, are you also being funded? Researchers with the highest visibility receive funding.

The current emphasis is on more than citations; we are now entering a phase of “publish, be cited and mentioned, or perish”. Article level metrics provide new ways to measure research impact; all records in Scopus have them. Questions can be answered and research discoveries can be made with data science and text mining.

Social Media and Open Access Impact on Discovery

This session featured two products with different pathways to discovery that can complement the traditional services. David Sommer, Co-Founder and Product Director of Kudos (https://www.growkudos.com/), began with a familiar list of today’s information problems, most of which stem from the appearance of over 1 million new publications every year, which in turn results in too much information, many ways to communicate, and many metrics to seek out and analyze.  In such an environment, how can researchers understand which communications efforts will help their work to stand out?

Kudos, an award-winning toolkit, provides tools to help researchers, publishers, and institutions increase the impact of their published work, and is used by over 65 publishers and 90,000 researchers. It works by explaining, sharing, and measuring.

Explain: create plain language explanations of publications. Authors create plain language summaries describing what their article is about and why it is important. Such summaries are useful because:

People in a field want to skim and decide what to read and what to skip.

Those in related fields are also finding this tool useful for finding research relevant to their own research.

Those outside the field (i.e. funders) want to understand research they are supporting.

Increasingly, lay people are searching using plain language terms and uncovering useful data.

Explanations enrich the article, and short titled articles more likely to be cited.

Share: create trackable links for sharing. Kudos integrates with Facebook, Twitter, and LinkedIn, so a single post can appear in multiple channels. Because the links are trackable, they can be counted and the effectiveness of the link can be measured.

Measure: All authors receive a dashboard that lists their articles and shows the metrics and data used to measure the impact of their work.

A recent study of over 4,800 researchers showed that Kudos does work: sharing increased downloads by 23%. The study also revealed that Facebook is used more commonly for sharing work than one might expect, but links shared on LinkedIn are more likely to be clicked. A pilot project integrating Kudos into workflows of several platforms (for example, ScholarOne and Europe PubMedCentral) is underway.

Sommer closed with an appeal for more publishers to get in touch and work with Kudos.

Dominic Mitchell, Quality Control Manager of the Directory of Open Access Journals (DOAJ, https://doaj.org/), traced the history of DOAJ and its impact on the discovery of OA content. From its launch in 2003, DOAJ has become a unique reference source; it now indexes 9,075 journals from 130 countries that have published over 2.18 million articles. The top 3 countries contributing content are Brazil, the UK, and the US. In 2015, there were over 1.5 million referrals to the DOAJ; the top referrers are Serials Solutions and EBSCO.

DOAJ was created to provide a comprehensive service listing quality-controlled peer-reviewed OA journals. It is especially valuable to small independently published journals; with its hallmark of quality, DOAJ provides them with a high level of discoverability. DOAJ’s metadata is free to use and reuse, and it is open to spiders and crawlers, especially Googlebot. It provides a suite of APIs (see https://doaj.org/api/v1/docs) for the development of analysis applications.

Discovery is as important as availability, and greater discoverability will lead to a greater use of OA. Publishers and editors know that DOAJ can be trusted and can be used to show faculty, researchers, and librarians that OA journals can be trustworthy outlets for research. DOAJ works with publishers and traditional discovery services which promote the discovery of OA content by integrating DOAJ’s metadata into their products. Google refers 35% of its traffic (a huge amount) to DOAJ, which offers much more information about journals than Google does, and it also has a strong presence on large social media platforms. Publishers and editors want their journals to be listed in DOAJ because of increased visibility of content, certification of OA journals, and its prestige, all of which result in increased traffic to their websites.

In 2015, DOAJ was named as one of the 2 most vital sources for the development of open content. It is a charity that is supported entirely by donations from publishers and libraries all over the world, so it is vulnerable in terms of funding. Mitchell therefore encouraged publishers and authors to consider supporting DOAJ.

Emerging Discovery Tools

Dan Valen, Product Specialist at Figshare (https://figshare.com/), noted that over 1,500 data repositories now exist, and that sharing of data leads to increased citation rates. Figshare is a general all-purpose data repository in which one can easily manage research outputs and make them available in a citable, shareable, and discoverable manner. It provides data management for institutions, cloud services for publishers, and simplification of the research workflow.

Figshare supports the FAIR data principles (data must be Findable, Accessible, Interoperable, and Reusable). For researchers, this means that hidden content within articles can be exposed, and additional paths to discovery are opened up.  It provides a default set of licenses for end users. Metrics are available on all content. Figshare is free for end users and sells its services to publishers and institutions. End users can upload up to 20 gigabytes of data; publishers can upload up to 1 terabyte.

Sara Rouhi, Director of Business Development, Altmetric, LLP, said that alternative metrics (altmetrics) unlock opportunities for discovery.  Here are some useful definitions:

Altmetrics: any trace of indicator of online behavior: sharing, downloading, saving, commentary, coverage in news media, citations, engagement on scholarly platforms, web analytics, etc.

com: a data science company dedicated to tracking and analyzing the online activity around scholarly research outputs

Research output: any digital object produced in the research life-cycle.

Online activity: any form of engagement with scholarly research.

Altmetrics are useful because they accrue in real time and are dynamic, in contrast to the long lag times with journal citations. They are also useful in research areas with little or no publishing focus and to early career researchers with a small research history. And they fulfil the calls of grant funders for evidence of broader research impact beyond citations.  Altmetrics are used by a wide range of professionals involved in the research process.

Here are some important points to consider regarding altmetrics:

Altmetrics rarely accrue for most research output. Most altmetrics do not track web analytics.

Altmetrics speak to attention, not quality (sometimes bad articles get a lot of attention!). Reputation management is very important: attention can be positive, negative, or neutral.

A post-peer review site should be checked carefully because its data can be very qualitative; only an assessment of the actual mention uncovers new audiences, collaborators, and opportunities.

Blog coverage is particularly interesting.

Altmetric data are used to listen to and amplify what researchers in the field are saying. They allow a researcher to be collegial.

A User Journey: University Perspective[3]

William Mischo, Head, Grainger Engineering Library Information Center, University of Illinois at Urbana-Champaign (UIUC), began with these 2 useful quotations:

Academic libraries should “step back to reconfirm (or reconsider) their vision for discovery, to ensure that their visions connect with information-seeking practices and preferences, and to determine whether they have a viable strategy in place … to achieve their vision.” (Roger Schonfeld, Ithaka S+R, 2014)

“Full Library Discovery” refers to discovery approaches that move beyond the retrieval of collection materials to also include local information services and content such as library websites, pathfinder information, dataset repositories, subject specialist links, and course management system content.” (Lorcan Dempsey, OCLC, 2013)

Over the last 30 years, discovery has progressed from “supercatalogs” including A&I services to federated search systems to web-scale discovery systems (WSDS). Now we have hybrid systems (also called “bento systems”) which are a combination of WSDS and federated searching and present results with content grouped by type or material. Libraries are embracing WSDS because they rival Google and may bring back some users. WSDS extend the OPAC and integrate local content. Despite previous negative experiences with federated search, the one-stop shopping approach is attractive, and it is the “Next Big Thing”. Delivery is the paramount concern for libraries; users want to get to the full text as quickly and easily as possible, and the gateway function of libraries is becoming more important.[4]

There are many studies of user behavior, but more evidence-based data is needed. Many OPAC transaction logs have provided ambiguous results. The Illinois Transaction Log Analysis (TLA) and user surveys studied user behavior and found the following[5]:

Many queries have over 5 search terms.

Users make very little use of explicit Boolean operators; instead they tend to cut and paste titles, authors, citations, and DOIs into search boxes to formulate their searches. They depend heavily on the article literature.

Effective and efficient full text delivery is crucial.

Over half of the searches are for known items.

Users frequently have a material type in mind when they search.

The use of search assistance is high.

Gateway tabs to limit searches to material type are used in about 24% of the searches.

The UIUC library’s gateway portal is powered by its in-house developed Easy Search federated search system (see http://library.illinois.edu) which features contextual and dynamic search assistance and is incorporated into the bento system. Nearly 60% of the searches start from the Easy Search Everything tab; only 4% use the Advanced Search tab. Users like the bento display of results because it lends itself to full library discovery; includes library websites, pathfinder information, dataset repositories, and several other types of information; and a single click brings them to the full text or publisher web page. However, bento displays require significant programming, API processing, and maintenance. There is still a need for a display of catalog item availability and direct links to e-books.

Remaining questions for discovery systems:

Are bento displays better?

Should the focus be on known-item searching?

What is the library’s role in discovery?

Challenges and Opportunities

The final session was a general discussion and summary which produced this list of the major conclusions of the workshop:

Discovery has solved many problems for publishers by exposing a lot of their content.

Even if no money changes hands, relationships are still important and worth cultivating.

Everything on West and Lexis is not discoverable on a discovery system. There is lots of content like that.

If you are the first one to buy something, you can spend a lot of time creating records for the systems.

There is much content in which users are interested that is not articles, such as photos, maps, videos, news, etc. Most discovery issues seem to be oriented towards articles.

Personalization is at a crossroads because of privacy and questions of who the user is.

How engaging a publisher website is depends heavily on the business model and whether they can get the user to pay something.

Donald T. Hawkins is an information industry freelance writer based in Pennsylvania. In addition to blogging and writing about conferences for Against the Grain, he blogs the Computers in Libraries and Internet Librarian conferences for Information Today, Inc. (ITI) and maintains the Conference Calendar on the ITI website (http://www.infotoday.com/calendar.asp). He is the Editor of Personal Archiving: Preserving Our Digital Heritage, (Information Today, 2013) and Co-Editor of Public Knowledge: Access and Benefits (Information Today, 2016). He holds a Ph.D. degree from the University of California, Berkeley and has worked in the online information industry for over 40 years.

[1] “The Library in the Life of the User”, Connaway, Lynn Silipigni, OCLC Research Report, 2015. (Available at http://www.oclc.org/content/dam/research/publications/2015/oclcresearch-library-in-life-of-user.pdf)
[2] http://blog.crossref.org/2016/05/where-do-doiclicks-come-from.html
[3] Also see a summary of another talk on UIUC’s services described under “Researching Researchers: Evidence-Based Strategy for Improved Discovery and Access” in my report on the Electronic Research & Libraries (ER&L) Conference, http://www.against-the-grain.com/2016/06/v28-3-dons-conference-notes/.

[4] For a discussion of some challenges to discovery, see “Spotlight on the Digital; Recent Trends and Research in Scholarly Discovery Behavior”, Chowcat, Ian, Jisc Report, September 2015. Available at https://digitisation.jiscinvolve.org/wp/files/2015/10/spotlight_literature_review_sept2015.pdf

[5] Detailed reports on many of UIUC’s analyses are available at http://www.library.illinois.edu/committee/ddst/discoveryresearch.html.

[i] http://www.against-the-grain.com/2016/04/v28-2-dons-conference-notes

Show more