Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when LANG="" file with "ö" in name cannot be opened #11214

Open
ChristianS99 opened this issue Aug 30, 2024 · 6 comments
Open

when LANG="" file with "ö" in name cannot be opened #11214

ChristianS99 opened this issue Aug 30, 2024 · 6 comments

Comments

@ChristianS99
Copy link

ChristianS99 commented Aug 30, 2024

Overview

When LANG is set to "" file "Passwörter.kdbx" cannot be opened

Steps to Reproduce

$ keepassxc-cli ls Passwörter.kdbx 
Passwort zum Entsperren von Passwörter.kdbx eingeben:
<...>
$ LANG= keepassxc-cli ls Passwörter.kdbx                                                           
Failed to open database file Passw?rter.kdbx: not found

Expected Behavior

file can be opened

Context

Not 100% sure, but to my experience, LANG should only affect the used output lanugage, and should not affect the encoding that is used for interpreting given arguments, eg.:

$ cat föo                                                                                                                                                                                                               
dasd                                                                                                                                                                                                                                        
$ LANG= cat föo                                                                                                                                                                                                         
dasd
$ LANG= keepassxc-cli --debug-info
KeePassXC - Version 2.7.9
Revision: 8f6dd13

Qt 5.15.14
Debugging mode is disabled.

Operating system: Gentoo Linux
CPU architecture: x86_64
Kernel: linux 6.10.3-gentoo-dist

Enabled extensions:
- Browser Integration
- Passkeys
- SSH Agent
- KeeShare
- YubiKey
- Secret Service Integration

Cryptographic libraries:
- Botan 3.2.0

GUI is affected in same way

@droidmonkey
Copy link
Member

This is definitely a qt issue. There isn't much we can do about this. Obviously easy fix is to not use non-ascii characters in your file name or make sure your LANG is set.

@phoerious
Copy link
Member

phoerious commented Aug 30, 2024

Turns out, this is a Qt issue, but the problem is in fact a bit more complex.

"Passw?rter.kdbx" means a Latin1 string is being interpreted as UTF-8 here. However, simply running

LANG= keepassxc-cli ls Passwörter.kdbx

isn't a good problem demonstration. keepassxc-cli will use a default locale, but the parameters are in whatever your terminal's own input encoding is, so this could be anything.

The following Python snippet is a more stable test case:

import subprocess
subprocess.Popen([b'keepassxc-cli', b'ls', 'Passwörter.kdbx'.encode('iso-8859-1')],
                  env={'LANG': ''}).wait()

This is where we parse the command line arguments: https://github.com/keepassxreboot/keepassxc/blob/develop/src/cli/keepassxc-cli.cpp#L194

I believe that

    for (int i = 0; i < argc; ++i) {
        arguments << QString(argv[i]);
    }

is indeed wrong. This should at least be QString::fromLocal8Bit(argv[i]), but according to the docs, this is equivalent to QString::fromUtf8() on Linux, which is obviously wrong when your system locale isn't Unicode-based.

I think defaulting to UTF-8 is a sane assumption for most Linux systems, but if your input encoding is something else, this will obviously fail. To fix this, we'd need to parse LANG or LC_ALL ourselves, but even with those variables set, we could only guess what the actual input encoding of the command line parameters is.

@ChristianS99
Copy link
Author

Turns out, this is a Qt issue, but the problem is in fact a bit more complex.

"Passw?rter.kdbx" means a Latin1 string is being interpreted as UTF-8 here. However, simply running

yeah, agree. looks, like output is latin1, where the terminal expects utf8

LANG= keepassxc-cli ls Passwörter.kdbx

isn't a good problem demonstration. keepassxc-cli will use a default locale, but the parameters are in whatever your terminal's own input encoding is, so this could be anything.

terminal's input encoding is utf8, even with LANG="" as setting this in front of command only change it for the command, not the terminal

The following Python snippet is a more stable test case:

import subprocess
subprocess.Popen([b'keepassxc-cli', b'ls', 'Passwörter.kdbx'.encode('iso-8859-1')],
                  env={'LANG': ''}).wait()

This is where we parse the command line arguments: https://github.com/keepassxreboot/keepassxc/blob/develop/src/cli/keepassxc-cli.cpp#L194

I believe that

    for (int i = 0; i < argc; ++i) {
        arguments << QString(argv[i]);
    }

is indeed wrong. This should at least be QString::fromLocal8Bit(argv[i]), but according to the docs, this is equivalent to QString::fromUtf8() on Linux, which is obviously wrong when your system locale isn't Unicode-based.

mind, this is the qt6 docs, qt5 is different, and actually fromLocal8Bit and fromUtf8 do different things.

I think defaulting to UTF-8 is a sane assumption for most Linux systems, but if your input encoding is something else, this will obviously fail. To fix this, we'd need to parse LANG or LC_ALL ourselves, but even with those variables set, we could only guess what the actual input encoding of the command line parameters is.

#include <QString>
#include <QTextStream>
#include <iostream>

int main(int argc, char *argv[])
{
    if (argc > 1) {
        QTextStream out(stdout);
        std::cout << argv[1] << std::endl;
        int p = 0;
        while (argv[1][p] != 0) {
            printf("%x ", (unsigned char)argv[1][p]);
            p+=1;
        }
        printf("\n");
        QString s1 = QString(argv[1]);
        out << s1 << Qt::endl;
        QString s2 = QString::fromLocal8Bit(argv[1]);
        out << s2 << Qt::endl;
        QString s3 = QString::fromUtf8(argv[1]);
        out << s3 << Qt::endl;
    }
}

small test program to try a few things. running this program with LANG= ./qttest aäböc
gives this output:

aäböc
61 c3 a4 62 c3 b6 63 
a?b?c
a??b??c
a?b?c

line 1: terminal is consistent with encoding of input given and output expected
line 2: the encoding actualy is utf8
line3: Qstring obviously converts the byte sequence somehow.
line 4 and 5: fromLocal8Bit and fromUtf8 are different (on qt5)

@phoerious
Copy link
Member

phoerious commented Aug 30, 2024

yeah, agree. looks, like output is latin1, where the terminal expects utf8

This is not just the terminal output, but first and foremost the file name. File names are always UTF-8 on Linux, so using a Latin1 string is wrong in any case.

When I QDebug my QLocale, it always says "Latin1", even when it's actually UTF-8. I also couldn't find any difference in behaviour between QString(argv[i]) and QString::fromLocal8Bit(argv[i]). However, looking at the Qt source code for QCommandlineParser::process(&QCoreApplication), I figure that QString::fromLocal8Bit(argv[i]) is indeed the correct way.

@AugustoMagalhaes
Copy link

Christian, try to execute the executable like this

LANG=de_DE.UTF-8 executable args

Tell me if it works, I had the same problem in Qt and fixed it by using LANG=C in another situation. Good luck mate

@ChristianS99
Copy link
Author

LANG=de_DE.UTF-8 executable args

This works, and it is my default. It just stumbled over the problem by accident, and thought I ccould report it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants