Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode encoding bug that needs to be fixed and a temporary solution #626

Open
cokacoda opened this issue Sep 13, 2024 · 0 comments
Open

Comments

@cokacoda
Copy link

Problem

In short, the problem is that when I try to have a conversation in Chinese, a Unicode encoding error occurs : UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-6: character maps to <undefined> (See the end for a detailed log)

Versions

OS: Windows 11
Shell: Powershell 7.4.5
ShellGPT: v1.4.4

Some thoughts

According to the log, I found that the command was actually sent correctly to the openai server, there was no problem with the API call, and in fact I had seen some returned responses in the terminal. However, I found that the error comes directly from “encoding = ‘locale’” and “f = <_io.TextIOWrapper name=‘C:\Users\atp\AppData\Local\Temp\cache\2c4c3249b2b4c...
mode=’w‘ encoding=’cp1252'>”. If we do a little research, we find that CP1252 is a character encoding for Western European languages. It is mainly used for processing text containing Latin letters, not Chinese or other languages.

However, the interesting thing is that my system's default language is Chinese, and my terminal also supports Chinese input and display. Logically, the locale setting should detect UTF-8, and I haven't seen anyone else encountering problems using Chinese on sgpt on the internet.

Failed attempt

Although powershell uses UTF-8 by default, I tried chcp 65001 anyway. After restarting the terminal, the problem is still not solved.

Successful attempt

In a completely random spirit, I opened the Windows settings, clicked on Language and Region in the Time and Language option, and then entered Manage Language Settings. In the pop-up window, I found this text: “Language for non-Unicode programs. This setting (System Locale) controls the language used when displaying text in programs that do not support Unicode.” And the current setting is “French (France)”. So I clicked Change system locale and then checked Beta: Unicode UTF-8 for worldwide language support. I was then asked to restart the system, and the problem was solved.

Potential problem for international user

The reason my system sets French as the alternate language for non-Unicode programs is because I live in France and Windows sets it by default. I believe that most people set the locale and time zone of their computer according to their current place of residence, and the software does not detect the appropriate locale setting, which leads to the previous problem. I have not tested whether non-local encoding languages will also cause the same error in other operating systems and shells, but this is definitely a potential problem until it is officially resolved.

Suggestions

  1. Modify the code that detects the localization configuration in the code so that it can correctly encode and decode according to the language used.
  2. When a problem occurs, the function could first try to encode and decode using Unicode or other possible encoding methods.
  3. (Highly recommended) Add the option to customize encoding and decoding schemes, for example, to allow sgpt --encoding utf8.

BTW

I think ShellGPT is an excellent project and a practical tool, but

  1. I find the practice of using the first dialogue content as the title of the history file very inconvenient, especially when the sentences are very long. The command to try to keep the dialogue becomes very long, and it is also likely to cause the problem of an illegal file name being disallowed to be created. One potential solution is to consider all conversations without an explicitly specified chat name to be in the same chat, until sgpt --new is entered, after which the conversation is considered to be in a new one.
  2. Please provide a convenient function to delete specific and all conversation records.

Full log

❯ sgpt "什么是斐波那契数列?"
斐波那契数列(Fibonacci
sequence)是一个从0和1开始的数列,后续每一个数都是前两个数之和。其定义如下:

 • F(0) = 0
 • F(1) = 1
 • F(n) = F(n-1) + F(n-2) (对于 n ≥ 2)

前几个斐波那契数是:0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ...

这个数列在数学和计算机科学中有广泛的应用,例如递归算法的示例和动态规划问题。
╭──────────────────────── Traceback (most recent call last) ─────────────────────────╮
│ C:\Users\atp\anaconda3\Lib\site-packages\sgpt\app.py:229 in main                 │
│                                                                                    │
│   226 │   │   │   functions=function_schemas,                                      │
│   227 │   │   )                                                                    │
│   228 │   else:                                                                    │
│ ❱ 229 │   │   full_completion = DefaultHandler(role_class, md).handle(             │
│   230 │   │   │   prompt=prompt,                                                   │
│   231 │   │   │   model=model,                                                     │
│   232 │   │   │   temperature=temperature,                                         │
│                                                                                    │
│ ╭───────────────────────────────── locals ──────────────────────────────────╮      │
│ │               cache = True                                                │      │
│ │                chat = None                                                │      │
│ │                code = False                                               │      │
│ │         create_role = None                                                │      │
│ │      describe_shell = False                                               │      │
│ │              editor = False                                               │      │
│ │    function_schemas = None                                                │      │
│ │           functions = True                                                │      │
│ │   install_functions = None                                                │      │
│ │ install_integration = None                                                │      │
│ │         interaction = True                                                │      │
│ │          list_chats = None                                                │      │
│ │          list_roles = None                                                │      │
│ │                  md = True                                                │      │
│ │               model = 'gpt-4o'                                            │      │
│ │              prompt = '什么是斐波那契数列?'                              │      │
│ │                repl = None                                                │      │
│ │                role = None                                                │      │
│ │          role_class = <sgpt.role.SystemRole object at 0x000001C9F1F93E00> │      │
│ │               shell = False                                               │      │
│ │           show_chat = None                                                │      │
│ │           show_role = None                                                │      │
│ │        stdin_passed = False                                               │      │
│ │         temperature = 0.0                                                 │      │
│ │               top_p = 1.0                                                 │      │
│ │             version = None                                                │      │
│ ╰───────────────────────────────────────────────────────────────────────────╯      │
│                                                                                    │
│ C:\Users\atp\anaconda3\Lib\site-packages\sgpt\handlers\handler.py:165 in handle  │
│                                                                                    │
│   162 │   │   │   caching=caching,                                                 │
│   163 │   │   │   **kwargs,                                                        │
│   164 │   │   )                                                                    │
│ ❱ 165 │   │   return self.printer(generator, not disable_stream)                   │
│   166                                                                              │
│                                                                                    │
│ ╭──────────────────────────────────── locals ────────────────────────────────────╮ │
│ │        caching = True                                                          │ │
│ │ disable_stream = False                                                         │ │
│ │      functions = None                                                          │ │
│ │      generator = <generator object Cache.__call__.<locals>.wrapper at          │ │
│ │                  0x000001C9F20E4940>                                           │ │
│ │         kwargs = {}                                                            │ │
│ │       messages = [                                                             │ │
│ │                  │   {                                                         │ │
│ │                  │   │   'role': 'system',                                     │ │
│ │                  │   │   'content': 'You are ShellGPT\nYou are programming and │ │
│ │                  system administration assistant.\nYou ar'+279                 │ │
│ │                  │   },                                                        │ │
│ │                  │   {'role': 'user', 'content': '什么是斐波那契数列?'}       │ │
│ │                  ]                                                             │ │
│ │          model = 'gpt-4o'                                                      │ │
│ │         prompt = '什么是斐波那契数列?'                                        │ │
│ │           self = <sgpt.handlers.default_handler.DefaultHandler object at       │ │
│ │                  0x000001C9F0758710>                                           │ │
│ │    temperature = 0.0                                                           │ │
│ │          top_p = 1.0                                                           │ │
│ ╰────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                    │
│ C:\Users\atp\anaconda3\Lib\site-packages\sgpt\printer.py:23 in __call__          │
│                                                                                    │
│   20 │                                                                             │
│   21 │   def __call__(self, chunks: Generator[str, None, None], live: bool = True) │
│   22 │   │   if live:                                                              │
│ ❱ 23 │   │   │   return self.live_print(chunks)                                    │
│   24 │   │   with self.console.status("[bold green]Loading..."):                   │
│   25 │   │   │   full_completion = "".join(chunks)                                 │
│   26 │   │   self.static_print(full_completion)                                    │
│                                                                                    │
│ ╭──────────────────────────────────── locals ────────────────────────────────────╮ │
│ │ chunks = <generator object Cache.__call__.<locals>.wrapper at                  │ │
│ │          0x000001C9F20E4940>                                                   │ │
│ │   live = True                                                                  │ │
│ │   self = <sgpt.printer.MarkdownPrinter object at 0x000001C9F1ED8320>           │ │
│ ╰────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                    │
│ C:\Users\atp\anaconda3\Lib\site-packages\sgpt\printer.py:38 in live_print        │
│                                                                                    │
│   35 │   def live_print(self, chunks: Generator[str, None, None]) -> str:          │
│   36 │   │   full_completion = ""                                                  │
│   37 │   │   with Live(console=self.console) as live:                              │
│ ❱ 38 │   │   │   for chunk in chunks:                                              │
│   39 │   │   │   │   full_completion += chunk                                      │
│   40 │   │   │   │   markdown = Markdown(markup=full_completion, code_theme=self.t │
│   41 │   │   │   │   live.update(markdown, refresh=True)                           │
│                                                                                    │
│ ╭──────────────────────────────────── locals ────────────────────────────────────╮ │
│ │           chunk = ''                                                           │ │
│ │          chunks = <generator object Cache.__call__.<locals>.wrapper at         │ │
│ │                   0x000001C9F20E4940>                                          │ │
│ │ full_completion = '斐波那契数列(Fibonacci                                     │ │
│ │                   sequence)是一个从0和1开始的数列,后续每一个数都是前两个数 … │ │
│ │                   F(0) = 0\n- F(1) '+127                                       │ │
│ │            live = <rich.live.Live object at 0x000001C9F0D24C20>                │ │
│ │        markdown = <rich.markdown.Markdown object at 0x000001C9F2133020>        │ │
│ │            self = <sgpt.printer.MarkdownPrinter object at 0x000001C9F1ED8320>  │ │
│ ╰────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                    │
│ C:\Users\atp\anaconda3\Lib\site-packages\sgpt\cache.py:41 in wrapper             │
│                                                                                    │
│   38 │   │   │   │   result += i                                                   │
│   39 │   │   │   │   yield i                                                       │
│   40 │   │   │   if "@FunctionCall" not in result:                                 │
│ ❱ 41 │   │   │   │   file.write_text(result)                                       │
│   42 │   │   │   self._delete_oldest_files(self.length)  # type: ignore            │
│   43 │   │                                                                         │
│   44 │   │   return wrapper                                                        │
│                                                                                    │
│ ╭──────────────────────────────────── locals ────────────────────────────────────╮ │
│ │   args = (                                                                     │ │
│ │          │   <sgpt.handlers.default_handler.DefaultHandler object at           │ │
│ │          0x000001C9F0758710>,                                                  │ │
│ │          )                                                                     │ │
│ │   file = WindowsPath('C:/Users/atp/AppData/Local/Temp/cache/2c4c3249b2b4c2d… │ │
│ │   func = <function Handler.get_completion at 0x000001C9F2137240>               │ │
│ │      i = ''                                                                    │ │
│ │    key = '2c4c3249b2b4c2da14f9405df3454948'                                    │ │
│ │ kwargs = {                                                                     │ │
│ │          │   'model': 'gpt-4o',                                                │ │
│ │          │   'temperature': 0.0,                                               │ │
│ │          │   'top_p': 1.0,                                                     │ │
│ │          │   'messages': [                                                     │ │
│ │          │   │   {                                                             │ │
│ │          │   │   │   'role': 'system',                                         │ │
│ │          │   │   │   'content': 'You are ShellGPT\nYou are programming and     │ │
│ │          system administration assistant.\nYou ar'+279                         │ │
│ │          │   │   },                                                            │ │
│ │          │   │   {'role': 'user', 'content': '什么是斐波那契数列?'}           │ │
│ │          │   ],                                                                │ │
│ │          │   'functions': None                                                 │ │
│ │          }                                                                     │ │
│ │ result = '斐波那契数列(Fibonacci                                              │ │
│ │          sequence)是一个从0和1开始的数列,后续每一个数都是前两个数之和。其定… │ │
│ │          F(0) = 0\n- F(1) '+127                                                │ │
│ │   self = <sgpt.cache.Cache object at 0x000001C9F2104170>                       │ │
│ ╰────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                    │
│ C:\Users\atp\anaconda3\Lib\pathlib.py:1048 in write_text                         │
│                                                                                    │
│   1045 │   │   │   │   │   │   │   data.__class__.__name__)                        │
│   1046 │   │   encoding = io.text_encoding(encoding)                               │
│   1047 │   │   with self.open(mode='w', encoding=encoding, errors=errors, newline= │
│ ❱ 1048 │   │   │   return f.write(data)                                            │
│   1049 │                                                                           │
│   1050 │   def iterdir(self):                                                      │
│   1051 │   │   """Yield path objects of the directory contents.                    │
│                                                                                    │
│ ╭──────────────────────────────────── locals ────────────────────────────────────╮ │
│ │     data = '斐波那契数列(Fibonacci                                            │ │
│ │            sequence)是一个从0和1开始的数列,后续每一个数都是前两个数之和。其… │ │
│ │            F(0) = 0\n- F(1) '+127                                              │ │
│ │ encoding = 'locale'                                                            │ │
│ │   errors = None                                                                │ │
│ │        f = <_io.TextIOWrapper                                                  │ │
│ │            name='C:\\Users\\atp\\AppData\\Local\\Temp\\cache\\2c4c3249b2b4c… │ │
│ │            mode='w' encoding='cp1252'>                                         │ │
│ │  newline = None                                                                │ │
│ │     self = WindowsPath('C:/Users/atp/AppData/Local/Temp/cache/2c4c3249b2b4c… │ │
│ ╰────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                    │
│ C:\Users\atp\anaconda3\Lib\encodings\cp1252.py:19 in encode                      │
│                                                                                    │
│    16                                                                              │
│    17 class IncrementalEncoder(codecs.IncrementalEncoder):                         │
│    18 │   def encode(self, input, final=False):                                    │
│ ❱  19 │   │   return codecs.charmap_encode(input,self.errors,encoding_table)[0]    │
│    20                                                                              │
│    21 class IncrementalDecoder(codecs.IncrementalDecoder):                         │
│    22 │   def decode(self, input, final=False):                                    │
│                                                                                    │
│ ╭──────────────────────────────────── locals ────────────────────────────────────╮ │
│ │ final = False                                                                  │ │
│ │ input = '斐波那契数列(Fibonacci                                               │ │
│ │         sequence)是一个从0和1开始的数列,后续每一个数都是前两个数之和。其定 … │ │
│ │         F(0) = 0\r\n- F('+135                                                  │ │
│ │  self = <encodings.cp1252.IncrementalEncoder object at 0x000001C9F21311C0>     │ │
│ ╰────────────────────────────────────────────────────────────────────────────────╯ │
╰────────────────────────────────────────────────────────────────────────────────────╯
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-6: character
maps to <undefined>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant