Posted on by

Getting Correct Search Results with Non-English Characters

Participants Database is set up to be compatible with non-English installs using the normal localization setup provided by WordPress: download the language pack, choose your language in the settings, and you’re set. Participants Database will follow along, as long as it has a translation file that matches the chosen language. This is discussed in more detail here: Localizing Participants Database.

Searching with Non-English Characters

When it comes to searching Participants Database records, non-English characters can be a problem. This is due to the database “collation” which is a character map that the database uses to understand how to deal with all the characters that can be stored. By default, Participants Database sets up it’s database to use “utf8_unicode_ci” as it’s collation. This is the generic UTF-8 collation used by mySQL, and it works really well for English. (If your site was set up to use a different collation before Participants Database was installed, it will probably use the collation for that language, because it gets the collation to use from the global WP settings.)

If the plugin search functions can’t tell the difference between ö and o then it is using the generic UTF-8 collation. To fix this, you’ll need to change the collation. Why do this as the default? The reason is simple: in English, these characters are not treated differently, and are often used interchangeably, so it makes sense that an English-based database would treat them as equivalent.

Changing the Database Collation

In order to set up the database so that it knows now to treat non-English characters, you’ll need to change the collation. Each field (known as a “column” in database-talk) will have it’s own collation. It is the field’s collation that you need to change.

To do this, you need to use a database tool such as phpMyAdmin. This is typically provided by your web host and will be found in the web hosting control panel.

Open phpMyAdmin, and find the current database for your site. This is the database you used when configuring your WP install. In that database, you’ll find a table named ‘wp_participants_database’ (the ‘wp_’ part may be different on your install). Click on this table to select it, then click on the “structure” tab so see the configuration of all the fields.

This is the structure panel for the main data table in Participants Database.

On the right, you’ll see the “change” link: click that then you can set the collation for that field. Typically, the ‘utf8_bin’ collation will work for most cases. There are also specific collations for many languages, and if your language is one of them, it may be best to select that.

You only need to change the fields that will contain the special characters.

After you’ve changed the collation, it will look something like this:

Now that this is set up, your searches will strictly match characters such as ö or å, and if you use the specific language collation, it will also alphabetize (sort) based on the correct position of these characters.

Problems with CSV Imports and Special Characters

It sometimes happens that when a CSV is imported to Participants Database, the special characters are replaced with incorrect characters. This is due to the CSV not being encoded as UTF-8. For many spreadsheet applications, and even text editors, saving a file as UTF-8 encoding requires and extra step to set up. If you don’t know how to do that, you may have to get your questions answered in the appropriate forum, but it is very important your CSV use UTF-8 encoding in order to be correctly imported to the database.

Leave a Reply

Your email address will not be published. Required fields are marked *

Would you like to be notified of followup comments via e-mail? You can also subscribe without commenting.

Leave a Reply

Your email address will not be published. Required fields are marked *

Would you like to be notified of followup comments via e-mail? You can also subscribe without commenting.