We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
As per the documentation of the tokenize submodule (https://pythainlp.org/docs/5.0/api/tokenize.html), clause_tokenize should work as follows:
tokenize
clause_tokenize
from pythainlp.tokenize import clause_tokenize clause_tokenize(["ฉัน","นอน","และ","คุณ","เล่น","มือถือ","ส่วน","น้อง","เขียน","โปรแกรม"]) # [['ฉัน', 'นอน'], # ['และ', 'คุณ', 'เล่น', 'มือถือ'], # ['ส่วน', 'น้อง', 'เขียน', 'โปรแกรม']]
However, a run on a fresh conda environment results in the following:
from pythainlp.tokenize import clause_tokenize clause_tokenize(["ฉัน","นอน","และ","คุณ","เล่น","มือถือ","ส่วน","น้อง","เขียน","โปรแกรม"]) # [['ฉัน', 'นอน', 'และ', 'คุณ', 'เล่น', 'มือถือ', 'ส่วน', 'น้อง', 'เขียน', 'โปรแกรม']]
The model is downloaded immediately before running the code above, so there should not be any problems with the model cache.
The returned list should consist of 3 clauses: [['ฉัน', 'นอน'], ['และ', 'คุณ', 'เล่น', 'มือถือ'], ['ส่วน', 'น้อง', 'เขียน', 'โปรแกรม']]
The returned list consists of only 1 clause: [['ฉัน', 'นอน', 'และ', 'คุณ', 'เล่น', 'มือถือ', 'ส่วน', 'น้อง', 'เขียน', 'โปรแกรม']]
conda create -n temp_env python=3.10
pythainlp
python-crfsuite
conda activate temp_env python -m pip install pythainlp python-crfsuite
python >>> from pythainlp.tokenize import clause_tokenize >>> clause_tokenize(["ฉัน","นอน","และ","คุณ","เล่น","มือถือ","ส่วน","น้อง","เขียน","โปรแกรม"]) [['ฉัน', 'นอน', 'และ', 'คุณ', 'เล่น', 'มือถือ', 'ส่วน', 'น้อง', 'เขียน', 'โปรแกรม']]
5.0.4
3.10.15
Windows 11 Pro 23H2
python-crfsuite version: 0.9.11
I also tried running the same code on the WSL2 distro for Ubuntu 24.04.1 LTS and got the same result.
Ubuntu 24.04.1 LTS
No response
The text was updated successfully, but these errors were encountered:
Hello @panyutsriwirote, thank you for your interest in our work!
If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
สวัสดี @panyutsriwirote ขอบคุณที่สนใจงานของเรา
ถ้านี่เป็นรายงานข้อผิดพลาด กรุณาแนบภาพหน้าจอ ข้อความแสดงข้อผิดพลาด และ โค้ดที่สั้นที่สุดเท่าที่จะทำให้เกิดปัญหา เพื่อที่เราจะสามารถช่วยเหลือได้
Sorry, something went wrong.
I was check. It isn't bug.
Thank you for reporting.
No branches or pull requests
Description
As per the documentation of the
tokenize
submodule (https://pythainlp.org/docs/5.0/api/tokenize.html),clause_tokenize
should work as follows:However, a run on a fresh conda environment results in the following:
The model is downloaded immediately before running the code above, so there should not be any problems with the model cache.
Expected results
The returned list should consist of 3 clauses:
[['ฉัน', 'นอน'], ['และ', 'คุณ', 'เล่น', 'มือถือ'], ['ส่วน', 'น้อง', 'เขียน', 'โปรแกรม']]
Current results
The returned list consists of only 1 clause:
[['ฉัน', 'นอน', 'และ', 'คุณ', 'เล่น', 'มือถือ', 'ส่วน', 'น้อง', 'เขียน', 'โปรแกรม']]
Steps to reproduce
pythainlp
andpython-crfsuite
PyThaiNLP version
5.0.4
Python version
3.10.15
Operating system and version
Windows 11 Pro 23H2
More info
python-crfsuite
version: 0.9.11I also tried running the same code on the WSL2 distro for
Ubuntu 24.04.1 LTS
and got the same result.Possible solution
No response
Files
No response
The text was updated successfully, but these errors were encountered: