Skip to content

Latest commit

 

History

History
82 lines (51 loc) · 2.64 KB

ReadMe.pod

File metadata and controls

82 lines (51 loc) · 2.64 KB

NAME

Data::HanConvert - The data for converting between traditional and simplified Chinese languages.

DESCRIPTION

This distribution does not contain code but data to be used by other programs. They are split into 4 modules that needs to be seperately required.

use Data::HanConvert::cn2tw;
use Data::HanConvert::cn2tw_characters;
use Data::HanConvert::tw2cn;
use Data::HanConvert::tw2cn_characters;

Once required, these corresponding hashref are available:

$Data::HanConvert::cn2tw
$Data::HanConvert::cn2tw_characters
$Data::HanConvert::tw2cn
$Data::HanConvert::tw2cn_characters

The one named with "_characters" suffix contains only character-to-character mapping, while the other contains only phrase-to-phrase mapping. The mapping are split into different files because they are significantly larger and may not be required depending on the scenario of use.

Notice that this data set is for conversion purposes. The phrases dataset are not necessarily containing only valid dictionary phrases, but may contain random long-ngrams solely for disambiguation purposes. Users are encourged to review the data set before using this data for other purposes.

AUTHORS

Audrey Tang <[email protected]>

The origial data collection work from Encode::HanConvert

Yo-An Lin <[email protected]>

The php builder

Kang-min Liu <[email protected]>

The Gugod

加詞

修改 src/hanconvert.txt,每一列代表一項的對應,必需有兩欄。第一欄為正體中文,第 二欄位簡體中文。欄位以至少一個空白 (SPC, 0x20) 或跳格 (TAB, 0x09) 分隔。

此對照表中,列首列尾不宜有多餘的分隔用字符。但程式處理時,空白字符 應被忽略不計,讓編修者可加入適當數量的空白列來稍做區隔。

若一列中以 # 字符為起首,則該列的內容也會被忽略,不計為對照表內容。編修者可利用 以此方式在檔案中加入註解。

注意:正簡對照並非完美一對一對應,hanconvert.txt 應可以容許單一詞出現多重對應, 撰寫處理程式時應理解此點,並依情境所需選擇適當的處理方式。

編修權限

如果需要編修權限,請將 github 帳號告知 @gugod 。

LICENCE

This work is CC0.

To the extent possible under law, Kang-min Liu has waived all copyright and related or neighboring rights to Data::HanConvert. This work is published from: Taiwan.