Character Encodings with Tridion

This is probably a bigger subject to tackle in a single blog post, but I wanted to suggest the following if you ever run into a "strange character" or "my language looks wrong" problem with Tridion or any other content management system.
The correct Pavlovian response to character issues is to diligently search for any mismatched character encoding settings.

Troubleshooting Encoding Issues

Start with the source content (database, Word, text file) and work your way through the Content Management Explorer (CME), including schema filters and templates, then through the deployer, databases, presentation servers, databases and all the settings in between. Both Java and ASP.NET have global and page settings to confirm as well.

As a general guide to find all these settings, do an online search for {technology} + character encoding to find where to check for your given technology stack. Then check database settings with your DBA, web server settings with your development or design team, and Tridion settings with your implementation partner, developers, or support. Better yet, follow your content to determine where the issue happens and narrow down this long chain from authoring to presentation!
In Tridion
Specifically within or related to Tridion, you can check the publication settings, template behavior, and configuration to confirm encoding settings.

Publication

An easy spot to check in the Tridion CME is the Publication Target settings for the default code page, which can be set to "Unicode (UTF-8)" to match other UTF-8 settings.

Templates

Templates wouldn't have an encoding setting themselves, but may either add the (in)appropriate setting as text in a file header (HTML or XML) or possibly change characters based on your templating logic.

Hint: if you're manipulating characters in Tridion template building blocks or component templates, it's possible you're working around a problem with an encoding setting. Consider simplifying your templates by updating your character encoding, possibly on your presentation server (thanks to Jeremy for the tip).

Config Settings

If you have a log search and analysis tool like Splunk, you're looking for settings that set the encoding value in the XML config files (for Tridion or other system that handles string). Otherwise do a file system search for *.xml and *.config files for "encoding" to confirm the settings.

When in doubt, stick with UTF-8 as a default "good practice" choice to avoid issues later! If you don't believe me, consider all the examples from Nuno Linhares (hint: it's always UTF-8). Also Joel Sponsky of "Joel on Software" fame lays out a great primer in his article on Character Encodings.

One final tip: what a file says it is and what it actually is saved as are distinct.

Quick Checklist Summary
(this can equally apply to general templating troubleshooting)
  • Follow the content
    • Double check what's in the component
    • Preview your item with its template
    • Preview the page
    • Publish the item and check on the site
    • (compare to other content, templates, or CME environments)
  • Check settings:
    • Page Publication
    • Templates (what encoding do they set and what characters do they replace, if any)
    • Configuration settings
  • Get help
    • Devs and Community
    • DBA
    •  Partner or Support
Happy troubleshooting and yes, it is hard finding all the spots character encoding can go wrong in distributed, multi-tiered systems. This is especially true if you've mainly worked in one part Web or system development. It'll come down to one or two seemingly simple settings, but text is everywhere.

This by far doesn't cover all the details. Do you have a character encoding horror story or a favorite hidden setting to share?

2 comments:

  1. It seems like later versions of .NET might not use the page setting. So double check your specific setup and good luck finding those pesky settings!

    ReplyDelete
  2. Under "Templating" you say that the templates themselves won't have an encoding setting. In fact, you can change the encoding at run time from a template. I've done this myself in scenarios where I didn't control the publication targets, so couldn't update the default setting. In principle it's possible for a template to dynamically switch encodings in mid-flight, but I've never seen a use-case for this IRL.
    The main gotcha I'd point out is that even on the front-end, the page can be composed of input from various sources. I once had a bug where we were injecting text into a UTF8 page from javascript, and the javascript files were published in a different encoding. Also watch out for bytes that come from web-services, such as search results that you are going to present in your page. If the encoding you receive the data in isn't the same as the rest of the page, you've got work to do.

    ReplyDelete

Feel free to share your thoughts below.

Some HTML allowed including links such as: <a href="link">link text</a>.