What may be obvious to a human reader can sometimes be really hard for a computer to understand. You and I can go to a retailer’s website, look at a product and it’s completely obvious to us what the name of the product is, what it looks like, what it costs, read a description, etc. The content could be aligned in a completely weird way, pictures could be in a random location, but we would immediately understand what is what.
This is much harder for computers. In simplistic terms, computers are much more discrete in how they process a web page. For the most part, they look at the HTML code of a web page rather than the visual representation that we look at in a browser. Of course there are systems which try to comprehend the graphical representation like we do, but to do this and to “make sense” of a page is non-trivial for a computer. Moreover, they may be programmed to understand one retailer’s format really well – maybe a handful of retailers – but the problem devolves when you try to generalize past that to all retailers. Right now, we do not have the algorithms available for a computer system to do mass-scale retrieval and digesting of web content in a visual way like humans do.
So then how do you tell a search engine like Google or Bing what’s actually on your site? Traditionally, things like keywords, meta tags, inbound links, etc. have been important to tell a search engine what’s on a site. Google still uses some meta tags. They’re still important, but many of the tags were easy to game, particularly the keywords tag, and so they’re not really used anymore. The net result was bad search results for users. Continue reading