DerekAllard.com

Changing MySQL default character sets to UTF-8

With BambooInvoice 0.8 about to be released, and the big new feature being internationalization, this has brought to light a few issues with character sets, particularly as they apply to the database.  It seems that MySQL, in its default configuration, and especially with popular installers such as XAMPP, MAMP and WinLamp, usually sets the default character set as “latin-1”.  For English, this is a complete non-issue, and things work as they should, however if you try to set other languages, characters such as ö, ä, and ï (as well as dozens of others) tend to load to the screen as garbage.  This is often compounded by the fact that the webpage itself might be in a character set such as UTF-8, which is very “international friendly”, and so things appear to work on the “front end” but as soon as data goes into a database and back out… its borked.

The solution of course is to simply use UTF-8 as the character set for your database.  The problem though, is that many of us have legacy data in MySQL, and as mentioned above, the data started its life as latin-1.  This is particularly true of BambooInvoice, as the installer I set up did not specificy any character set, so nearly everyone’s databases (my own included) is in latin-1.

I’ve found 2 good ways of converting an existing database from latin-1 to UTF-8.  Here’s how you can convert your data:

Method One: use SQL to change every table and field

ALTER TABLE tbl_name CONVERT TO CHARACTER SET charset_name

Method Two: Export then Import

The problem with method 1, is that not everyone is able to run “ALTER” commands, or has access to an interface that would allow it.  In these cases, you’ll need to export your database,

What you’ll have done is changed the collation of every table from what it currently is set to (“latin1_swedish_ci”) to “utf8_general_ci”, but we’ll have done it by literally removing then replacing each table.  If you character set isn’t “latin1_swedish_ci”, don’t worry about it, just replace whatever you have.

For a great resource check out Turning MySQL data in latin1 to utf8 utf-8

Comments

James wrote on

I tend to find that most mySql gui’s offer the ability to easily change the character encoding on both fields and tables.  CocoaMySql and SqlYog both offer this and it makes life a lot easier than doing it via the commandline or a export/import.

Matthew Pennell wrote on

I’ve had this problem recently while working on a (CI) app that handles international company details. One other thing that you need to watch is processing data (my app processes an XML feed), where you need to set the content-type in the header at every point in the process to make sure that a latin-1 or iso-8859-1 doesn’t creep in there. :)

passingby wrote on

nice article.  you tell how to set the charset but how do you display the current character set on a mysql db to see what is currently being used?

describe [tablename] does not reveal the info

thanks