skip to content

DerekAllard.com : CodeIgniter, ExpressionEngine, and the World of Web Design

Changing MySQL default character sets to UTF-8

July 16th, 2007

With BambooInvoice 0.8 about to be released, and the big new feature being internationalization, this has brought to light a few issues with character sets, particularly as they apply to the database.  It seems that MySQL, in its default configuration, and especially with popular installers such as XAMPP, MAMP and WinLamp, usually sets the default character set as “latin-1”.  For English, this is a complete non-issue, and things work as they should, however if you try to set other languages, characters such as ö, ä, and ï (as well as dozens of others) tend to load to the screen as garbage.  This is often compounded by the fact that the webpage itself might be in a character set such as UTF-8, which is very “international friendly”, and so things appear to work on the “front end” but as soon as data goes into a database and back out… its borked.

The solution of course is to simply use UTF-8 as the character set for your database.  The problem though, is that many of us have legacy data in MySQL, and as mentioned above, the data started its life as latin-1.  This is particularly true of BambooInvoice, as the installer I set up did not specificy any character set, so nearly everyone’s databases (my own included) is in latin-1.

I’ve found 2 good ways of converting an existing database from latin-1 to UTF-8.  Here’s how you can convert your data:

Method One: use SQL to change every table and field

ALTER TABLE tbl_name CONVERT TO CHARACTER SET charset_name;

Method Two: Export then Import

The problem with method 1, is that not everyone is able to run “ALTER” commands, or has access to an interface that would allow it.  In these cases, you’ll need to export your database,

  • Log into phpmyadmin.
  • Back up the database via a phpmyadmin export - SQL.  Select the option that will DROP a table if it already exists.
    PHPmyAdmin with Drop tables selected
  • Duplicate that file, and rename “database_new”
  • Save this file, and use phpmyadmin to import it.
  • You’ll now have 2 SQL files, a backup, and a “_new” file for our use. Open up your “new” file in a text editor.
  • At the end of every “CREATE TABLE” command, you’ll see CHARSET=latin1 (or something similar); Do a search and replace and change “CHARSET=latin1” to “CHARSET=utf8”.
  • Save this file, and use phpmyadmin to import it.

What you’ll have done is changed the collation of every table from what it currently is set to ("latin1_swedish_ci") to “utf8_general_ci”, but we’ll have done it by literally removing then replacing each table.  If you character set isn’t “latin1_swedish_ci”, don’t worry about it, just replace whatever you have.

For a great resource check out Turning MySQL data in latin1 to utf8 utf-8

This entry was made on July 16th, 2007 @ 13:11 and filed into , .

Comments

James wrote on July 16th, 2007 @ 15:08

I tend to find that most mySql gui’s offer the ability to easily change the character encoding on both fields and tables.  CocoaMySql and SqlYog both offer this and it makes life a lot easier than doing it via the commandline or a export/import.

Matthew Pennell wrote on July 17th, 2007 @ 2:29

I’ve had this problem recently while working on a (CI) app that handles international company details. One other thing that you need to watch is processing data (my app processes an XML feed), where you need to set the content-type in the header at every point in the process to make sure that a latin-1 or iso-8859-1 doesn’t creep in there. :)

passingby wrote on August 01st, 2007 @ 3:12

nice article.  you tell how to set the charset but how do you display the current character set on a mysql db to see what is currently being used?

describe [tablename] does not reveal the info

thanks

Post a Comment

Sorry, comments are automatically closed after 45 days, or sooner if one entry gets targetted by spammers. Why not contact me directly?