Content indexing and cultures
Sense/Net ECMS is an enterprise software that supports Localization. Users speaking different languages may access the site in their preferred language. When creating and searching for content however, this may become confusing and this article was created to explain how we handle localized content and characters in different languages in Sense/Net ECMS.
You may create content (e.g. memos, articles, documents) containing characters in any language. We save every piece of text data into the database in Unicode format that preserves the special characters for all languages.
We store the UTC value of datetime fields in the database. It is the responsibility of the UI layer to provide the appropriate UTC value when saving content, but providing local data to the user. The built-in controls and user interfaces already handle this: for example when you open an edit page of a task and fill the deadline field using the datetime picker, you set the value in your local time. The control then converts it to a UTC time and that is what gets saved into the database. This means when somebody else in a different time zone opens the same task, she may see a different time value, based on her time zone - but it still will be the 'same point in time'.
Sense/Net ECMS builds heavily on indexing. Not just Field Indexing that allows you to search freely for content in the Content Repository, but the whole content tree structure is indexed. This allows us to provide parent/child relationships based on the path of content items. When a content is indexed, all text data is converted to lower case - including the path. This is to make sure that we can find content more easily and reliably. So in terms of indexing there is no difference between 'lemon cake' and 'Lemon Cake'. There are languages (like Turkish) where this is important - see the section below about the reasons.
When you want to look for content in Sense/Net ECMS, you use Content Query. We use it as well under the hood for system operations like loading children of content. Whenever we execute a content query, we make the search field values lower case too. The following two queries are the same and will return with the same result set:
This is true for any of the field types; e.g. long text fields:
+Description:"Hello W*" +Description:"hello w*"
These lower case operations are always done using the invariant culture. This is important in case of several languages, for example Turkish, where the lower case version of the upper case 'I' character is different than in other languages. Developers may check the following article for details:
Please note that when searching for a date or time value, you need to provide the UTC value in the content query. When creating a UI using client-side technologies, you need to take care of converting the datetime values to and from local time zone values. For datetime queries please check the content query syntax article.
The general advice when writing culture-related code is that when using ordering and comparison, please always use culture-independent string methods, unless you are absolutely sure that you need to make the operation culture-aware.
string.Compare(source, target, StringComparison.InvariantCultureIgnoreCase);
When you need to store converted values, you may use the following way instead of the culture-dependent ToLower method:
For details, please visit the following article: