Managing Robots META Tags with Tridion or any CMS*

*"Any CMS" refers to any system that lets you configure your own authorable-fields to be rendered as you prefer, in this case <META> tags with name attribute set to "ROBOTS."

I often see requirements that have likely made their way from an original Request for Proposal (RFP) to a CMS Functional Design that reads something like: "authors should have the ability to set page-level meta tags including the Robots meta tag."

This meta tag "tells" search engines how to index content on a given Web page and its links.*

Valid Content values for <META NAME="ROBOTS"> includes:


And if absent, the default is INDEX,FOLLOW. See the Web Robots Pages for the details.

*Bots aren't forced to actually listen to these instructions, though I'd say most search engines try to play nice. 

Everything's an Attribute Value (Too Simple and Error Prone)

The simplistic way to implement this might be CMS check boxes for each of the above so you'd have (where "[ ]" means checkbox):
  • [ ] INDEX
  • [ ] NOINDEX
  • [ ] FOLLOW
  • [ ] NOFOLLOW
But this gives you the ability to make contradictory options like INDEX and NOINDEX. Luckily I haven't seen this, but this is a good example to avoid assumption. We already know a few things:
  • The Robots tags are at the page level
  • Some options are valid
  • We have a good default (no tag at all or "INDEX,FOLLOW" as described above)
  • No translation for the values needed (though you might translate the internal authoring fields if needed)

Two Boolean Options (Okay) 

Since we know some are exclusive "OR" choices, then you can get instead:
  • [ ] INDEX
  • [ ] FOLLOW
Minor note: templating code or however you render these values would need to translate an unselected INDEX into NOINDEX.

Don't Do This

Especially with SDL Tridion, I'd prefer the above over the following, which makes schema updates and searches for items tagged with such features harder:

Index? (Don't do this)
  • ( ) Yes
  • ( ) No

Follow? (Don't do this)
  • ( ) Yes
  • ( ) No

Practical Outputs

With Index and Follow as two Boolean options, authors have 4 possible outcomes:

  • INDEX,FOLLOW (default)

Focus on Behavior

Since Web Robots points out INDEX,FOLLOW is assumed as a default, a more business-friendly CMS setup could be:

"How should search engines treat this page? Index:"
  • (x) Everything (INDEX,FOLLOW) [Selected by Default]
  • ( ) Just this page (INDEX,NOFOLLOW)
  • ( ) Just links (NOINDEX,FOLLOW)
If managing the options (Categories and Keywords if using SDL Tridion), then each selection could have a configurable output simplifying template logic. Though I'd prefer to not let technical preferences dictate the content model.

And if you prefer the two option setup instead, consider using SDL Tridion Experience Manager Page Types, which will let you set multiple default options so authors automatically get this with new pages:
  • [x] INDEX
  • [x] FOLLOW
Luckily I've seen Web-savvy customers know how this set of CONTENT attribute values translates into these choices. The important thing here is that we can simplify development and authoring steps by taking a practical look at the options. From an implementation perspective, this would simply be a Practical Practice (see my rant on Best Practices) if a client ever asks, "how do I give authors the ability to set page-level meta tags such as the Robots meta tag?"

For more content modeling practice or to learn more about search engine instructions, look at the X-Robots-Tag and how Google handles it.

No comments:

Post a Comment

Feel free to share your thoughts below.

Some HTML allowed including links such as: <a href="link">link text</a>.