14
Aug 10

Nautilus script for converting text file encoding to UTF-8 in Ubuntu

Here is a Nautilus script that detects the original character encoding of selected text files and automatically converts them to your local native encoding (which should be UTF-8 in Ubuntu).

To run the script, you need to place it in the ~/.gnome2/nautilus-scripts directory and give it execute permissions.

You also need to install the following packages:
enca
libnotify-bin

#!/bin/sh
#
# Filename: Enconv Tool
# Date: 2008/02/02 15:10:34
# Licence: GNU GPL
# Dependency: enca, libnotify-bin
# Author: Jonathan Lumb 

# Do the conversion
enconv "$@"

# Display success / error message
if [ "$?"-ne 0]
then
notify-send -i /usr/share/icons/Humanity/actions/48/gtk-cancel.svg "Error" "Files were not converted"
exit 1
else
notify-send -i /usr/share/icons/Humanity/actions/48/document-export.svg "Success" "Files were successfully converted"
fi
exit 0


14
Aug 10

Convert BIG5 or GB18030 (Chinese Character Encoding) to UTF-8 in Ubuntu

I watch a lot of Chinese films in my spare time and often download Chinese subtitles to facilitate viewing. The problem is that most text-based subtitle files provided on the internet are created in Windows and encoded in either GB18030 (Simplified Chinese) or BIG5 (Traditional Chinese) character sets. These files won’t display properly on Ubuntu and must be converted to UTF-8 encoding if they are to be useful.

I found that there are (at least) three ways of converting files using these character sets on Ubuntu – using the iconv, the recode and the enconv utilities. The iconv utility preserves the original text file and creates a new output file, the recode utility overwrites the original text file with the new encoding whilst the econv utility tops them all off by detecting the encoding of the source file automatically and then converting it to your local native encoding (UTF-8 on Ubuntu).

Below I will demonstrate how each of these utilities can be used:

The iconv method
The iconv utility should be installed by default on Ubuntu

In a terminal, browse to the directory where the target files are located and use either of the commands below, the former for converting from GB18030 to UTF-8 and the latter from BIG6 to UTF-8.

iconv -f gb18030 -t utf8 src_filename -o output_filename
iconv -f big5 -t utf8 src_filename -o output_filename

The recode method

You may need to install the recode package on your system before preceding:

sudo apt-get install recode

In a terminal, browse to the directory where the target files are located and use either of the commands below, the former for converting from GB18030 to UTF-8 and the latter from BIG6 to UTF-8.

recode GB18030 src_filename
recode BIG5 src_filename

The enconv method

You will probably need to install the enca package on your system before preceding:

sudo apt-get install enca

In a terminal, browse to the directory where the target files are located and use simply enter the following command:

enconv src_filename

The utility should automatically detect the encoding of the source file and convert the file to an Ubuntu compatible UTF-8 version!

Summary

The recode and iconv utilities are pretty powerful and handy if you know the original encoding of your source file. However, econv seems to be the pick of the bunch as it is a cinch to use and will automatically detect the encoding of most text files for you.


08
Aug 10

CSS for displaying code snippets on a WordPress blog

I recently updated the theme I use on this blog (Cleanr) to the latest version via the WordPress Dashboard and discovered that code snippets were no longer displaying correctly – overflowing into the sidebar.

After comparing the old and new style.css files for this theme, I noticed that the following line was missing from the new one:

pre {display: block; overflow: auto; background: #f3f3f3; padding: 5px; margin: 20px 0; font-family: monospace; border: 1px solid #dadada;}

I enclose all of my code snippets in HTML “pre” tags – the above code (written by myself) wraps them in a nice little box on the page – using a horizontal slider to deal with the overflow issue experienced. It seems I made changes to the original Cleanr theme – but completely forgot to change the new code after performing the update.

Hopefully everything should look a little nicer now :-)


14
Jul 10

Regex for matching URLs blocked by GFW

The following regular expression can be used in browser plugins such as FoxyProxy and Proxy Switchy to match URLs of websites that are inaccessible in China. URLs that are matched can then be automatically redirected through a proxy.

https?://([^.]+\.)*(twitter|wordpress|blogspot|flickr|blogger|feedburner|youtube|dailymotion|bit)\.(com|net|ly)/.*