What may be obvious to a human reader can sometimes be really hard for a computer to understand. You and I can go to a retailer’s website, look at a product and it’s completely obvious to us what the name of the product is, what it looks like, what it costs, read a description, etc. The content could be aligned in a completely weird way, pictures could be in a random location, but we would immediately understand what is what.
This is much harder for computers. In simplistic terms, computers are much more discrete in how they process a web page. For the most part, they look at the HTML code of a web page rather than the visual representation that we look at in a browser. Of course there are systems which try to comprehend the graphical representation like we do, but to do this and to “make sense” of a page is non-trivial for a computer. Moreover, they may be programmed to understand one retailer’s format really well – maybe a handful of retailers – but the problem devolves when you try to generalize past that to all retailers. Right now, we do not have the algorithms available for a computer system to do mass-scale retrieval and digesting of web content in a visual way like humans do.
So then how do you tell a search engine like Google or Bing what’s actually on your site? Traditionally, things like keywords, meta tags, inbound links, etc. have been important to tell a search engine what’s on a site. Google still uses some meta tags. They’re still important, but many of the tags were easy to game, particularly the keywords tag, and so they’re not really used anymore. The net result was bad search results for users.
To solve this, the industry is starting to adopt hidden formatting in web pages which tell a computer what type of content (products, customers, orders, events, information, etc) and various properties of that content (name, price, description, time, etc). This is what’s called structured data. Once a search engine is fed data like this, it’s very easy for it to display much richer search results. Here’s an example for our store, Level X Motorsports:
Each one the four pieces of content are fed to Google very specifically with the schema.org format, which is a combined effort of Google, Bing and Yahoo to adhere to one format. This search result is so much better than Google guessing where the price is on the page or what the actual description is vs. perhaps a small blurb about the store itself. Prior to schema.org or on pages without it, Google basically reverts to the lowest common denomination, which is to show very little. That’s better than trying to guess.
To see how it works, check out the “itemprop” tags in the product page’s HTML code below and how they correspond to the above:
Pretty straightforward, right? Well, yea, for the most part. There is also a second popular format called OpenGraph, which grew out of Facebook. It is found on many pages, because Facebook created it to feed content into its like and share buttons. The two are competitive in some sense, but they can happily coexist on a page. Personally, I prefer schema.org, because it’s what the search engine want.
Whichever way your allegiance falls or that you can support more easily, it is now imperative for any eCommerce site to have similar markup on all of their product pages. If you want to check what’s going on your site, see this tool I just put together: