Bytelan.com

Search Engine Optimization with Data

2015-02-28

Until very recently, search engine optimization was – unsurprisingly – predicated on leveraging the world of web documents.

We’ve always called what people do when they plug some words into Google a “search”. And it’s been appropriate, because there’s always been a ready answer to the obvious question, “search for what?”: web pages, and other web documents.

And so we called the thing returned by Google in response a “search engine results page”, a list of linked web documents where, if the searcher was lucky, they might find the information for which they were looking.

Ten blue links.

The optimization strategy for this search environment was self-evident: ensure your web documents appear highly in those search results, so that your web documents are the ones suggested to searchers.

Over the course of time that environment changed. Verticals started to appear in the search results like blocks of news or video results, and some individual search results started to appear with other information attached to them – “rich snippets” – but at the end of the day the bread and butter of the search results, and so the focus of SEO, remained references to web documents.

But recently this has changed. What we get in return for something typed into Google contains not only references to documents, but to data as well.

That is, the “search results” now contain a mix of traditional document references as well as direct answers, with the proportion of the latter gradually but relentlessly growing.

The diminished presence of those document references changes SEO, but it is unlikely that those document references – links – will ever wholly disappear from the SERPs.

However, as what I want to explore today is optimizing specifically for the data part of what’s returned in response to a query (or, as we’ll see, some other trigger), I want you to join me in a thought experiment I’m going to use to frame my talk.

That is, what if a search engine returned zero blue links? How would one optimize for search in a world where there were no document references, no links to your web pages, but only data?

Rather than dwell on this premise I want to get to meat of the matter it’s intended to highlight – being successful in search in ways other than driving search-to-website traffic.

But I will note that I recently wrote a piece that explored this premise in detail, which I invite you to read at your leisure, as an ever-so-exciting prequel to this presentation.

In this piece I even went so far as to say:

For much of the information that we care about, websites, in fact, might just turn out to be an artifact of the early days of the web, a necessary bridge as we figured out how to better create, store and share information over the internet.

That’s nuts right? Who would be so crazy as to suggest such a thing?

Well, in the course of putting this deck together I stumbled across, and reacquainted myself with, a post by Mike Arnesen written more a year earlier that contained a pretty similar sentiment.

I believe that the document-based web was simply a limitation of our thinking at the time; it was hard to imagine a web of data at first, much less architect a way to allow data to be interchangeable and meaningful over such an astronomical scale.

I commend this excellent piece to you as well, not only because it is shorter and less meandering than my own, but because Mike does a great job of articulating the difference between documents and data – something to which I’ll be returning to time and time again today.

But I will, I pledge, as much as I possibly can, keep this about SEO. That is, optimizing your digital presence for the best performance and best visibility in search engines.

The response from many search marketers to the evolution of search engines, and in particular to the appearance of direct answers and data verticals in the search results, has been to abandon search marketing.

Rely less on search engines for traffic. Build direct relationships with your customers. Shift your focus to social. These may be sage pieces of marketing advice, but they have nothing to do with search marketing.

So I pledge that as I explore with you data-driven approaches to search engine optimization that I will endeavor to keep the focus on how you appear in search engines.

Because the fact that the SERP-to-website-traffic basis of SEO is waning doesn’t mean that search engines cease to be a part of the internet ecosystem.

There are a trillion searches conducted each year in Google – that’s more than 38,000 queries a second – and your company or product or organization is going to be a part the information interchange that occurs on Google.

Just because your website may no longer be at the heart of your presence in search results doesn’t mean you can ignore search, because you will have a presence in search whether you want to or not.

Which leads me to my first piece of optimization advice: if data about you is going to appear in the search results – and it is – make sure it’s accurate by providing it yourself.

Declare data even if it doesn’t provide you with a rich or structured snippet (though rich snippets are still highly desirable even in the absence of referral traffic, as they compactly convey information themselves).

Doing so will ensure information that’s displayed about your phone number, your address, your products’ prices, your products’ availability, your company’s social media profiles and your official website is correct – and those are just some of the types of information that Google officially supports publishers providing about themselves and their wares.

A common complaint is that by providing your data to the search engines you’re “giving away” your data, and that doing so hurts you because you providing the means by which the search engine can answer a query without referring a visit to your website.

Correct.

And hoarding your data helps you how?

I don’t know. But I do know how that withholding your data hurts you. It hurts you because Google is going to return information about you when asked, and it’s goint to have to get that information from somewhere else.

But most of all it hurts you because you’ve excluded the possibility of having your data linked to other data, which it how data becomes useful. And unless the data you’re withholding is especially precious and unique, keeping it captive in a web document doesn’t mean you’re going to force the search engines to refer traffic to you.

The most straightforward way of providing the search engines with your data is to annotate your web documents – web pages – with information about the data found in your documents, transforming strings into usable information.

“Acme Widget Supreme is less than 5.00.” If you provide that as only as text, a data consumer won’t know what you’re talking about. A search engine will be able to index the keywords, and so might be able to return a reference – a URL – to the document that contains that string for a query like “supreme widget 5”, but that’s about as far as it can go.

A search engine won’t surface the Acme Supreme Widget in the search results for the query “widgets under 5 lbs.” because there’s no way for the search engine to know that the “5.00” refers to. Five pounds? Five kilograms? Five dollars?

If the string read “Acme Widget Supreme is less than 5.00 pounds” there’s still ambiguity because a “pound” can be a unit of currency or of weight, and even if you say “less than five pounds sterling” you’re still forcing Google to extract a piece of data from a bunch of text, and still forcing it, ultimately, to guess.

Providing Google with structured data removes that ambiguity and improves the chance that the Acme Widget Supreme will make some sort of appearance in Google’s universe.

Why? Bear with me as I introduce a principle that may seem esoteric, but I think is really important to understand if you want to be successful in semantic search engine optimization.

“Use URIs as names for things.” This is the first principle articulated by Tim Berners-Lee in his touchstone piece “Linked Data – Design Issues”.

(Berners-Lee’s use of the word “things” here has raised the ire of many in the semantic web community, as has Google’s use of “things” in their Knowledge Graph value proposition, “from strings to things.” I acknowledge and understand the objections to the use of “things” in this context, but have nonetheless used it as shorthand in this presentation. To explore this more, I recommend this excellent piece on the topic by Bernard Vatant, and the discussion of his post on Google+.)

He goes onto say:

Use HTTP URIs so that people can look up those names.

When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)

Include links to other URIs. so that they can discover more things.

When you’re using something like schema.org or the Open Graph protocol, its the provision of information and other links at the lookup location that makes structured data so powerful, and so much more sophisticated than what might be called “first generation metadata.”

While structured data not appear not to far removed from the expression of any other arbitrary name/value pairs, the standards employed mean that structured data declarations are not arbitrary, but positively dripping with meaning.

For example, when you surround the title of an article with this bit of code that says, “this here thing is a schema dot org Article, and its title is “Semantic SEO Tips”:

What a search engine understands about this – because you’ve used a URI that’s a name for something – is:

This is a schema.org/Article, understood to be “An article, such as a news article or piece of investigative report. Newspapers and magazines have articles of many different types and this is intended to cover them all.”, which is a more specific type of a schema.org/CreativeWork, which is understood to be “The most generic kind of creative work, including books, movies, photographs, software programs, etc.”, which is a more specific type of schema.org/Thing, which is understood to be “The most generic type of item.” The Article being described has a property schema.org/name, which is understood to be “The name of the item.”, which has an expected value of schema.org/Text and is understood to be “Data type: text”, and has for it’s value the text “Semantic SEO Tips”.

That’s why Google recommends using sameAs for so many schema.org types. By providing a Freebase or Wikidata or Wikipedia link you’re using URIs to identify things, and that helps Google connect searchers that are looking for those things with your data.

I’ll talk more about Google now later, but you can see how unambiguously identifying your entities with sameAs can help connect Google users with your data, even if they’re not searching for things.

Use global identifiers for your products as well.

Remember when you use something like a GTIN – the product identifier associated with a UPC barcode – you’re not merely identifying the thing you’re talking about, but – because a data consumer can find information about that thing at that identifier – you’re also telling that data consumer about all sorts of properties of that thing. And because those properties are shared by other things, your thing is immediately plunged into participation in the world of linked data.

I bring up GTIN specifically here in part to highlight the fact that there are other vocabularies besides schema.org. and that GS1, the organization that issues GTINs and UPCs, has just released a draft GS1 vocabulary for products, which allows many more attributes of products to be described than schema.org does

But because it is a formal vocabulary that uses URIs as names for things and employs standards like RDF, it plays nice with schema.org, as does, for example, BIBFRAME, the vocabulary for libraries.

Another advantage of using standards to provide data to the search engines is that it can also be made available to other data consumers as well – either directly in the same form, or with readily-made syntactic modifications.

Once you’ve made the decision to make data about yourself available, think of who else besides the search engines might find your data useful.

This can include social networks, content aggregators, business partners, and internal data consumers like your site search and analytics.

And if the mapping of properties and their values is well-executed, the resulting consistency demonstrated will help build trust in your data – especially for the search engines, increasing the likelihood that they’ll surface your data and associated web documents when it’s appropriate for them to do so.

In terms of analytics, my co-presenter has come up with a method of using schema.org data to populate Google Analytics reports. It will be much imitated and modified, but in all of its manifestations it will always be for me “The Arensen Method”.

The Arnesen Method is a major conceptual breakthrough, because it necessarily conceives of content as a type of data, rather than as a bunch of test. It provides a method of identifying entities and their properties within Google Analytics.

This in turn allows the meaning of web pages to be correlated with other metrics like exit rate, or pageviews, or conversion rates. It’s light years beyond the blunt instrument of string-based content groupings (string-based because they rely on URL and path strings) in Google Analytics.

The Arnesen Method is still about search because it’s a way of judging how well topics are performing in the not-provided world, and actually has many advantages over those defunct keyword reports.

It’s also a way of understanding, when you think beyond SERP-to-website traffic, of what data people are coming to you website for, how well your website satisfies that data demand and – yes, wait for it – using this intelligence to (among other things) help you figure out how you can get that information to your users with the least possible friction, including the provision of direct answers in response to search queries.

Which is a good time to talk about content with reference to another SearchFest speaker, Facebook’s Jonathon Colman.

Asked what’s the biggest mistake organizations make when trying to craft “great experiences” Jonathan replied “It’s a failure to be useful, to provide actual value and solve problems.”

Your content should satisfy these conditions – be useful, provide value, help solve problems – regardless of the environment in which they’re served.

Throw out content that requires a website visit and doesn’t, in itself, satisfy any of these needs. Yes, this may entail throwing out your content strategy.

If you have content that’s valuable your users, your fans, your customers then you’ll want to do what you can to connect them with that content.

Content that itself has independent value – a creative work that forms a cohesive whole that’s more the sum of the data contained therein, like a white paper or news article or MP3 – can still be leveraged, of course, although that independent value doesn’t protect that sort of content from the changes wrought by an increasingly distributed data environment. In other words, this content will be decreasingly successful if it continues to be confined to websites.

Says futurist Amy Webb:

There is no linear distribution model because everybody has a different device, and they’re all going to come to content in different ways. If the goal is to maximize attention, then it’s about thinking of a more distributed model and along with that a more distributed way of making a profit back.

For distribution of creative works think about how revenue streams can be attached to the content, rather than requiring a website visit.

Papers Must Rethink Digital Distribution | NetNewsCheck.com

http://www.netnewscheck.com/article/39017/papers-must-rethink-digital-distribution

If you’re not in the business of actually selling content, does your current conversion strategy rely on a user making a way-stop at a particular piece of content on your website?

Then it’s time to take a fresh look at that strategy, and the rigidity of the funnel it implies. You must figure out how you can introduce conversions at an earlier, later or altogether different stage in a customer’s journey.

In other words, if your organic search conversion strategy hinges on a user navigating from the SERPs to a specific page, that strategy is increasingly at risk.

Aaron Wall pointed out recently, using the example of an AVG download link returned directly in the SERPs for the query “download AVG”, that “Where there are multiple versions of the software available, Google is arbitrarily selecting the download page, even though a software publisher might have a parallel SAAS option or other complex funnels based on a person’s location or status as a student or such.”

He’s quite right. But a more effective solution than complaining – as Google will continue to link directly to your download locations anyway – is to provide the search engine with the best link you can, and to make sure your funnel is honed to handle lateral downloads (though in this case it’s not even a link to the executable, but to a download page). When you provide free downloads and I ask for a free download on Google, set me up with download as expeditiously as possible. Don’t make me jump through hoops to satisfy the demands of your websit conversion funnel.

And I think AVG is actually a pretty good example of thinking beyond the web page conversion funnel. If you run AVG Free you’ll know that occasionally you’re forced to perform a core update to the program, with a negative-option opt-in free trial for their premium product.

Because of Google Now (and analogous technologies being developed by other search engines and social networks) a user’s search environment now also includes their inbox, so treat your email as part of SEO as well.

For some time Gmail has supported something called “Actions in the Inbox” which, using a combination of schema.org Actions and JSON-LD, allows action labels to appear next to email messages in Gmail, or to connect Gmail messages with other Google products.

I can barely string two JSON declarations together without breaking something, but in fairly short order I was able to create an email with a “view” button and – keeping us grounded in the search landscape – one that pushed a reservation to Google Now.

I know Gmail doesn’t have a commanding share of the email client market, but I’m still shocked why Actions in the Inbox aren’t more widely used.

But not as shocked as I am about how few apps avail themselves of app indexing and, by extension, deep linking.

In a nutshell app indexing allows content within your app to be accessed directly from a search result.

App indexing is available to anyone with an app. And while it’s currently a feature only available to a closed list of vendors, some apps can generate notifications in Google Now if the user has authorized them to do so.

The long-standing “app-or-website?” question becomes increasingly less pressing as app content is made available through search (and now in social media too as, for example, Twitter App Cards also support deep linking).

Driving traffic to your app rather than your website, even a mobile-friendly website, is likely to support a higher conversion rate because your app is designed to be the best performer in a mobile environment (otherwise, why do you have an app)?

Highlighting the principle that, in the new search environment, you need to optimize for conversions where they’re likely to occur, rather than only where you want conversions to occur.

Google event markup that incorporates third party ticket-purchasing links shows the principle of optimizing for conversions where they take place. By marking up ticketing links you can generate an expanded answer card with “a direct link to your preferred ticketing site.” Would you rather sell a ticket, or keep Google from “stealing” your event traffic?

The same thing goes with Bing’s “Order online” button. What’s more important to you, having someone visit your restaurant website (hopefully with a Flash intro and every menu in PDF format), or having them actually place an order?

Both the Google third party ticketing link for an event and the Bing “order online” link highlight changes that require us to rethink and rework SEO success metrics.

What if you’re wildly successful with generating the event answer card in the SERPs? By any “traditional” SEO measure – rankings, referred traffic from search, revenue generated from search – you’d surely be fired.

There’s neither an easy nor a single solution to the problem of measuring search visibility that isn’t tied to referred search traffic. Even much-disparaged rank tracking has ongoing issues in correctly detecting and reporting on a brand’s presence in non-traditional search results.

I have no solution, but I think it’s important to point out that SEO success metrics have to evolve along with SEO strategy, or buy-in for SEO is going to drop off steeply. Definitely an area we should all be exploring more.

Many of the newest opportunities for data optimization in search rely either on JSON-LD, schema.org Actions or the two together, and both are relatively recent additions to the semantic search marketer’s toolbox.

JSON-LD is a method of providing structured data to a data consumer without the requirement for marking up web page text that’s visible to human. Sanctioned uses by Google include the event markup just discussed, as well as well as ways of telling Google about your company’s official logo, social media accounts, contact information and onsite search engine.

schema.org has always provided a way of declaring information about entities, but only with Actions can activities that involve these entities now be expressed – the vocabulary now, literally, has verbs.

JSON-LD and schema.org Actions together are extremely powerful, and their union supports the development of flexible linked data APIs. This is the combination that fuels Actions in the Inbox for Gmail and app indexing for Bing. And these protocols support innovations like the one displayed in the slide – a music activity exporter designed by Alexandre Passant.

Is this technology rocket science?

No, but it’s technical enough that, unless you’re a technical wiz yourself, you’re going to need to turn to your developers for aid and support.

In any case, its important that you formally introduce your developers with whatever markup protocols they’re going to be working with, which may include but are not limited to microdata, RDFa, JSON-LD and schema.org Acitions, and possibly more exotic animals like Turtle and SPARQL.

The steepness of the learning curve varies, and while things that have been deliberately designed to be easy on developers – of that list microdata and JSON-LD – from a functional perspective it’s important to approach, and have your developers approach, each type of semantic of markup not as just another flavor of metadata, but as linked property/value (or name/value) pairs.

Unless you’ve got some god-like power in your organization that I don’t possess, ensuring your developers have the information they need to be effective in support of your search marketing efforts is going to mean pushing forward initiatives where that knowledge is required and using the opportunity to foment professional development.

It can hard to resist the urge to expedite development by doing the groundwork and providing code for your developers, and essentially using them as copy/paste monkeys. But failure to ensure that your developers are actually educated in technologies important to your SEO efforts increases the likelihood of errors, decreases the likelihood that best practices will take hold, stifles innovation (since your developers won’t be capable of identifying opportunities) and ensures your time will continue to be taxed by the requirements of providing code and performing QA.

I hope you’ve started to get the sense through my presentation that the sort of things I’ve been talking about are quite a lot different than the mainstays of “traditional” SEO: link acquisition, content strategies designed to drive traffic to websites, duplicate content handling, penalties and their remedies.

Well, they’re obviously quite a lot different because all those traditional SEO approaches have to do with websites, and I’ve specifically been exploring visibility in search other than document references – that is, other than ten blue links.

But while website traffic will always play a role in the web ecosystem the website model is waning, and it is waning disproportionately quickly in search, because the search engines have multiple reasons to answer queries directly rather than sending users to websites whenever they can.

Yes, they can expose the user to more search ads by doing so, but for Google that’s very much secondary to user satisfaction issues, and especially in the environment that’s driven the search engines to evolve to direct answers, and for that what’s driven the search engines to employ semantic technologies and create things like the Knowledge Graph and schema.org. I speak, of course, of the mobile web.

“We have all had to take a deep look at what search really means in a world that has gone mobile,” he says. “Our heads explode when we think about this.”

Amit Singhal of Google, quoted by Steven Levy

On a mobile device it’s simply a better experience not to force a user to visit a site for specific information.

Whether or not this is “good” for website owners is irrelevant, because for Google satisfying the demands of searchers will always trump the desires of publishers.

The drive from the search-result-to-website paradigm is a result of two technological realities that cannot be reversed: first, the change by which internet-provided resources are now mostly consumed on mobile devices and, second, the advances in natural language processing, cognitive computing and semantic web technologies which make alternatives to web document references – ten blue links – possible for the search engines to begin with.

SEO Skeptic