Posts Tagged: Linux


14
Aug 10

Convert BIG5 or GB18030 (Chinese Character Encoding) to UTF-8 in Ubuntu

I watch a lot of Chinese films in my spare time and often download Chinese subtitles to facilitate viewing. The problem is that most text-based subtitle files provided on the internet are created in Windows and encoded in either GB18030 (Simplified Chinese) or BIG5 (Traditional Chinese) character sets. These files won’t display properly on Ubuntu and must be converted to UTF-8 encoding if they are to be useful.

I found that there are (at least) three ways of converting files using these character sets on Ubuntu – using the iconv, the recode and the enconv utilities. The iconv utility preserves the original text file and creates a new output file, the recode utility overwrites the original text file with the new encoding whilst the econv utility tops them all off by detecting the encoding of the source file automatically and then converting it to your local native encoding (UTF-8 on Ubuntu).

Below I will demonstrate how each of these utilities can be used:

The iconv method
The iconv utility should be installed by default on Ubuntu

In a terminal, browse to the directory where the target files are located and use either of the commands below, the former for converting from GB18030 to UTF-8 and the latter from BIG6 to UTF-8.

iconv -f gb18030 -t utf8 src_filename -o output_filename
iconv -f big5 -t utf8 src_filename -o output_filename

The recode method

You may need to install the recode package on your system before preceding:

sudo apt-get install recode

In a terminal, browse to the directory where the target files are located and use either of the commands below, the former for converting from GB18030 to UTF-8 and the latter from BIG6 to UTF-8.

recode GB18030 src_filename
recode BIG5 src_filename

The enconv method

You will probably need to install the enca package on your system before preceding:

sudo apt-get install enca

In a terminal, browse to the directory where the target files are located and use simply enter the following command:

enconv src_filename

The utility should automatically detect the encoding of the source file and convert the file to an Ubuntu compatible UTF-8 version!

Summary

The recode and iconv utilities are pretty powerful and handy if you know the original encoding of your source file. However, econv seems to be the pick of the bunch as it is a cinch to use and will automatically detect the encoding of most text files for you.


23
Jun 10

Type Chinese Pinyin Accents in Ubuntu

Chinese learners or speakers may sometimes want to write out the romanisation for certain chinese characters complete with accents indicating the different tones. For example:

你好 [nǐhǎo]

This is possible using the Ibus input framework that comes with Ubuntu. The support for pinyin romanisation is provided in the ibus-m17n package. This must first be installed.

sudo apt-get install ibus-m17n

Once installed, restart Ibus and add the input method in the menu as in the screenshot below.
ibus-pinyin

You should now be able to activate the input method. Simply type the pinyin for a character preceded by the tone number (ranging from 1 to 4).


15
Jan 10

Fix the “534 Protection level negotiation failed.” error in lftp

If you are using lftp to connect to a secure FTP (SFTP) server using SSL, you may sometimes get the following error during file transfers:

534 Protection level negotiation failed.

In this case, you need to add the following line to the bottom of your /etc/lftp.conf file

set ftp:ssl-protect-data yes

Now try connecting again and hopefully everything should work ok – it sorted my problems when I was trying to connect to the Leeds University SFTP servers using lftp on Ubuntu 9.10 (Karmic).


9
Jul 09

用 tsocks 和 proxychains 使 Linux 下所有软件能够翻墙

情况
由于最近在中国某个地方爆发了 riot, 所以境内很多网站又是无法访问 (twitter.com、facebook.com 等),让我们这些网民很无奈。

使用 SSH 翻墙
我曾经介绍过如何使用 SSH 来建立一个 SOCKS 代理服务器,让你能够在 Firefox 里正常访问以上所提起的网址。然而不是所有 Linux 软件都能支持代理服务器。如果你最热爱的 Linux 工具需要访问”被封”的网站,又没有嵌入的代理支持,该怎么办呢?
遇到这种情况当然不要放弃该软件… 毕竟我们用的系统是 Linux 而不是以前让我们咳声叹气,丧失信心的 Windows,总有一个方法去解决问题。

举个例子吧
我不久前发现了 Twitter 这个网站。我一开始不经常用,也搞不明白别人为什么对这个 web 2.0 服务都着了迷。后来我在推特上跟的人越来越多,跟着我的人亦是日益增多,不知不觉我也迷上了该网站,天天都会上。凡是经常用推特的人一般都会用一个推特的客户端,这才能跟得上朋友们的状态更新和最火热的网络新闻。本人作为 Ubuntu 的用户,我自然就选了 Gwibber 这个基于 GNOME 的客户端来访问我的推特。这个软件功能很丰富,用起来得心应手,不过总有一个问题让我有点遗憾,就是 Gwibber 还不听从 GNOME 的代理设置。平时这也不是一个很大的问题,但是每遇中国网络封锁较严重时,都会让我暂时无法使用该软件。

解决方案… Tsocks
经过几个 Google 搜索,我最终很高兴地发现 Linux 有一个能够强迫任何软件通过 SOCKS 代理上网的工具,其名就是 tsocks。Tsocks 是一个透明 SOCKS 代理软件,只要你电脑有一个连接到国外服务器的 SSH 隧道,你就能让任何软件翻墙。

安装并配置 Tsocks
以下说明都是为了那些使用 Ubuntu 的 Linux 用户,不过在别的 Linux 发行版下,安装的过程应该与此差不多。

在终端中:

sudo apt-get install tsocks

修改配置文件:

sudo nano /etc/tsocks.conf

将其内容改成以下几行并保存退出:

local = 192.168.1.0/255.255.255.0 #local表示本地的网络,也就是不使用socks代理的网络
server = 127.0.0.1 # SOCKS 服务器的 IP
server_type = 5 # SOCKS 服务版本
server_port = 9999 #SOCKS 服务使用的端口

你可能需要修改一下以上内容,用你自己的 SSH 隧道设置。

运行软件
用 tsocks 运行你的软件很简单,在终端中:

tsocks 你的软件 &

我现在运行 Gwibber 都是这样运行的:

tsocks gwibber &

祝你们翻墙愉快!

EDIT—————–>

我今天还发现了另外一个工具,其功能似乎比 tsocks 要更丰富,配置起来更简单,而且不会那么容易出错。这个工具就是 proxychains。以下有配置方法:

sudo apt-get install proxychains

修改配置文件 (/etc/proxychains.conf),应该如下:

# proxychains.conf  VER 2.0
#
#        HTTP, SOCKS4, SOCKS5 tunneling proxifier.
#

# The option below identifies how the ProxyList is treated.
# only one option should be uncommented at time,
# otherwise the last appearing option will be accepted
#
# Dynamic - Each connection will be done via chained proxies
# all proxies chained in the order as they appear in the list
# at least one proxy must be online to play in chain
# (dead proxies are skipped)
# otherwise EINTR is returned to the app
#
# Strict - Each connection will be done via chained proxies
# all proxies chained in the order as they appear in the list
# all proxies must be online to play in chain
# otherwise EINTR is returned to the app
#
# Random - Each connection will be done via random proxy
# (or proxy chain, see  chain_len) from the list
# this option is good for scans

dynamic_chain
#strict_chain
#random_chain

# Make sense only if random_chain
chain_len = 2

# Quiet mode (no output)
#quiet_mode

# Write stats about good proxies to proxychains.stats
#write_stats

#Some timeouts in milliseconds
#
tcp_read_time_out 15000
tcp_connect_time_out 10000

[ProxyList]
# ProxyList format
#       type  host  port [user pass]
#       (values separated by 'tab' or 'blank')
#
#
#        Examples:
#
#            	socks5	192.168.67.78	1080	lamer  secret
#		http	192.168.89.3	8080	justu	hidden
#	 	socks4	192.168.1.49	1080
#	        http	192.168.39.93	8080
#
#
#       proxy types: http, socks4, socks5
#        ( auth types supported: "basic"-http  "user/pass"-socks )
#
#http 	10.0.0.5 3128
socks5 127.0.0.1 9999
socks4 127.0.0.1 9050

注意事项:

  1. 要选 dynamic_chain 而不是 random_chain
  2. 可以列举几个代理服务器,proxychains 会按顺序用,代理无法访问即自动选用下一个
  3. 代理服务器要根据自己电脑的情况自行调整

运行 proxychains
运行 proxychains 跟运行 tsocks 完全一样。在终端中:

proxychains 你的软件 &

比如说:

proxychains chromium-browser &

我还是推荐你使用 proxychains!