Browsing Posts in SEO

SEO blogs are all ablaze after Google's latest (major) algorithm change since Caffeine, in 2010, was announced yesterday (Sept 26th, 2013) to mark their 15th anniversary.

The algorithm introduces a concept known as "Google's Knowledge Graph". However this, in itself, is not new. It's something Google have been "quietly" working on since May 2012: "In May 2012, the Web search engine Google has introduced the so-called Knowledge Graph, a graph that understands real-world entities and their relationships to one another. It currently contains more than 500 million objects, as well as more than 3.5 billion facts about and relationships between these different objects. Soon after its announcement, people started to ask for a programmatic method to access the data in the Knowledge Graph, however, as of today, Google does not provide one." - Open Knowledge Graph. It marks a huge paradigm shift towards "things, not strings!"

Thomas Steiner, a Google Employee, working at Universitat Politècnica de Catalunya – Department LSI, Barcelona, Spain (tsteiner{at}lsi.upc.edu), and Stefan Mirea, a Google Intern, working at Computer Science, Jacobs University Bremen, Germany (s.mirea{at}jacobs-university.de) began an initiative entitled "SEKI@home, or Crowdsouring an Open Knowledge Graph" - openknowledgegraph.org to help tune the database "We suggest crowdsourcing for the described task of extracting facts from SERPs"

It strikes me, somewhat, in similarity to the Open Graph Protocol initiative, something Facebook have been actively invested in for a while. Search indexes are becoming more object orientated rather than simply string algorithms. Earlier this year Facebook introduced search to their platform. A feature that leverages a lot of their Open Graph platform/experience.

For a while the Google Knowledge Graph was accessible via. the SPARQL Protocol (The SPARQL Protocol and RDF Query Language (SPARQL) is a query language and protocol for RDF Microdata) from the openknowledgegraph.org website. However since Google's official introduction this has now been officially shutdown and can now be accessed instead via. the Freebase API

For a while Google has been slowly introducing "Rich Snippets", search results based on Microdata and RDFa.

I believe this latest algorithm change will bring into the Google mix more emphasis on RDF/Microdata structured front-end markup. However, at the end of the day contextually relevant and GOOD content will ultimately triumph SEO. Google's introduction's of newer algorithms are always an attempt to stem the tide of cacophonous content.

Apache mod_rewrite and RewriteRule's are incredibly important as websites get increasingly competitive for placement in the SERP. In most instances, if you're using a framework, there is a strong probability that this will be included in the underlying platform and it'll just be a case of invoking it. This post simply gives you an introduction to the concept and some resources for further reading.

Essentially it boils down to three things:

  1. Write some server-side script to transform your "cruft(ed)" URL's into something more meaningful. i.e. Transform catalog.php?catID=2_4 into something like catalog/kitchen/kitchen-cabinets/ One way you could do this is have an SEO "slug" for each category in your database. In this example cat id: 2 has slug: "kitchen" and cat id: 4 has slug: "kitchen-cabinets". One thing I will stress is it's important to give some serious consideration to your "cruft-free" URL pattern structure. For example: In MVC architecture it's generally considered as controller/action/var1/var2/var3 etc... The reason for designing a solid URL pattern is it will be incredibly useful for (and simplifying) your RewriteRule's!
  2. Now! Hands-up, I'll admit I'm not the greatest when it comes to Regular Expressions and pattern matching. For me it's a combination of (limited) knowledge, Google and trial & error! The next step is to add some RewriteRule's to your Apache config', usually in an .htaccess file, but in rare instances, they can be also be placed in a .conf file.

    RewriteEngine on
    RewriteBase /

    RewriteCond %{HTTP_HOST} !^www\.
    RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]

    RewriteCond %{REQUEST_URI} !\.(?:css|png|jpe?g|gif)$ [NC]
    RewriteRule ^catalog/(.*)/$ catalog.php?catID=$1

    This is where it gets a little involved and you'll spend a few hours wading through Apache Docs and Google trying to perfect your .htaccess RewriteRules!

    1. First thing is turn RewriteEngine on. There is an assumption that mod_rewrite is available, for most, it is, but you could wrap your directives in an <IfModule mod_rewrite.c> block to be on the safe side.
    2. The RewriteBase directive, although not always necessary, is used to provide a relative path base for rules, it works somewhat similar to the <base> tag in HTML.
    3. The RewriteCond directive, again not always required, specifies conditions for subsequent rule processing
    4. Lastly, the RewriteRule itself. This is supplied in the format: RewriteRule [pattern] [target] [flags], the [pattern] component of this may take some tuning, as, if like me you're not to great with RegEx. You'll find tons of pointers via. Google, but this page in the Apache Doc's goes over mod_rewrite in a lot more detail, especially their Regular Expressions

      Both RewriteRule and RewriteCond also support the use of "flags" to modify their behavior. For example [NC] = nocase. Some of the other more common flags are [QSA] = qsappend (Query String Append), [L] = last, and [R] = redirect with a valid HTTP status code. For a complete list of flags, go here.

  3. Lastly. A point of note. Without RewriteCond, your rules will be applied to ALL URL's in a document. Including references to CSS/JS/Images etc... which may have a very undesired result if those have been referenced with relative paths in your HTML markup. Two simple fixes. Make all those URL's absolute, or, if like me you don't like this idea, simply add a <base> HTML tag to your pages.

Links

An interesting topic of research. As I dive into building more and more Web2.0 rich applications, inevitably relying a lot on AJAX, I've been presented with some considerations.

Can Web2.0 actually harm SEO? Does it really provide a better user experience? I doubt many people would contest that "Web2.0" is the latest 'cool' factor on the web, but after coming across this article: "Why Web2.0 Developers are Search Morons" I took mild offense and began to take some time to understand the arguments SEO guru's and UI folks are presenting to developers.

Often these days it feels that my job responsibilities have changed. I am no longer a developer, but a mediator!

Web-Designers clash with UI on more and more outrageous presentations, UI folks continually pontificating what makes for a better User Experience, SEO guru's contesting the effectiveness of those wonderful Web-Designs and thus begins a circle of discussion. It really is a balancing act, to take all of this input and produce something that everyone is happy with, most importantly the user!

A few years ago developers were the sole oracle for building websites but, personally, I think this came with it's own set of problems. We're technically inclined, we like gadgets, we like it when we can asynchronously do something, sometimes our applications were built more as an exercise in technical expertise and personal challenge rather than usability. Then along came Web2.0 and alot of things changed.

As a developer though Web2.0 is cool! An ever increasing plethora of Javascript Frameworks, and more importantly browsers that support them has given us even MORE gadgets to play with. It was only a few years ago that Javascript was truly supported X-browser so a lot of today's client-side scripting would not have even been considered. Today though W3C standards are driving the internet in a common direction and helping accelerate more and more elaborate User Experience's driven by standardized technologies.

Back to the original topic though. Web2.0; more specifically AJAX and SEO. Whilst Search Engine technology is working towards being able to index Javascript content it's still in it's infancy. Developers need to be sympathetic of indexing technology that looks at more 'static' content. Much as Flash deals with SEO issues, so should we. Providing alternate SEO friendly content where needed. Providing useful feedback and meaningful methodologies for users to back out of issues when AJAX.onreadystate != 4.

These are just a few articles I've read debating various related topics and are well worth the read.