I watch a lot of Chinese films in my spare time and often download Chinese subtitles to facilitate viewing. The problem is that most text-based subtitle files provided on the internet are created in Windows and encoded in either GB18030 (Simplified Chinese) or BIG5 (Traditional Chinese) character sets. These files won’t display properly on Ubuntu and must be converted to UTF-8 encoding if they are to be useful.
I found that there are (at least) three ways of converting files using these character sets on Ubuntu – using the iconv, the recode and the enconv utilities. The iconv utility preserves the original text file and creates a new output file, the recode utility overwrites the original text file with the new encoding whilst the econv utility tops them all off by detecting the encoding of the source file automatically and then converting it to your local native encoding (UTF-8 on Ubuntu).
Below I will demonstrate how each of these utilities can be used:
The iconv method
The iconv utility should be installed by default on Ubuntu
In a terminal, browse to the directory where the target files are located and use either of the commands below, the former for converting from GB18030 to UTF-8 and the latter from BIG6 to UTF-8.
iconv -f gb18030 -t utf8 src_filename -o output_filename
iconv -f big5 -t utf8 src_filename -o output_filename
The recode method
You may need to install the recode package on your system before preceding:
sudo apt-get install recode
In a terminal, browse to the directory where the target files are located and use either of the commands below, the former for converting from GB18030 to UTF-8 and the latter from BIG6 to UTF-8.
recode GB18030 src_filename
recode BIG5 src_filename
The enconv method
You will probably need to install the enca package on your system before preceding:
sudo apt-get install enca
In a terminal, browse to the directory where the target files are located and use simply enter the following command:
enconv src_filename
The utility should automatically detect the encoding of the source file and convert the file to an Ubuntu compatible UTF-8 version!
Summary
The recode and iconv utilities are pretty powerful and handy if you know the original encoding of your source file. However, econv seems to be the pick of the bunch as it is a cinch to use and will automatically detect the encoding of most text files for you.
