Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Ignore files not under version control #3300

Open
mdeweerd opened this issue Jan 21, 2024 · 7 comments
Open

Feature: Ignore files not under version control #3300

mdeweerd opened this issue Jan 21, 2024 · 7 comments

Comments

@mdeweerd
Copy link
Contributor

mdeweerd commented Jan 21, 2024

Context:
I created a script to add lines to the file ignore list that I can run if there are only false codespell positives.

I want to exclude files that are not under git version control as these do not need to be added as exceptions.

It seems appropriate and useful that 'codespell' has an option to do so.

For example, the following shell command filters to only files that are under version control:
git ls-files $(ls -p | grep -v -E '/$').

I could apply something like that to the script by using the codespell output two times or running it a second time on the filtered files. Done.

However, excluding files that are not under version controls seems like a good filter for codespell.

EDIT: updating the ignored lines file could even be a feature in codespell.

@mdeweerd mdeweerd changed the title Feature: Ignore/Do not files not under version control Feature: Ignore files not under version control Jan 21, 2024
@DimitriPapadopoulos
Copy link
Collaborator

Currently codespell is a simple piece of software focused on processing the arguments it is given, without filters — except for hidden files. I am not certain adding filters is worth the extra complexity. You could instead modify the arguments passed to codespell. Any way, I'll let maintainers decide.

@mdeweerd
Copy link
Contributor Author

The use case is that when running it from the CLI with local files used for development, these files are codespell'ed as well and when running the script to ignore line that are misspelled, lines from files not under version control also got added.
I updated my script to avoid that, but a CLI codespell run still shows these "false positives".

@yarikoptic
Copy link
Contributor

yarikoptic commented Jan 22, 2024

this might also be a very nice alternative to needing to explicit list to ignore all of those other folders (.git/ , .mypy_cache/) etc. And it would also help to avoid side-effects of fixing some files not under git, while committing only the ones under git. So overall I would have liked to have such mode and making it configurable via config.

edits:

@DimitriPapadopoulos
Copy link
Collaborator

OK, you convinced me. Which version control though? Not everyone uses git.

@mdeweerd
Copy link
Contributor Author

I vote for git - I moved almost all my subversion projects to git, rcs/cvs has been long gone, and it's what I encounter the most.

@mdeweerd
Copy link
Contributor Author

Note that git can do this:

git ls-files -- ':*php'
git ls-files -- . ':!*sql' ':!*bin'

So the file glob patterns could all be provided to git ls-files and then codespell just needs to iterate over the result.

@yarikoptic
Copy link
Contributor

well, could be a setting ignore_vcs = git and then puke NotImplementedError for anything but git ATM so keeping it open for others to add andling for what they care. Minor note: for the test, make sure you have some files with obscure filenames [*] and use git ls-files -z and separate based on \000 not newline , and also I think that would avoid you need to deal with quotes:

use without -z

❯ mkdir /tmp/test; cd /tmp/test; git init; echo 1 > "new
line"; touch empty; touch '"quoted"'; git add *; git commit -m "added files"; git ls-files
Initialized empty Git repository in /tmp/test/.git/
[master (root-commit) 8bd959f] added files
 3 files changed, 1 insertion(+)
 create mode 100644 "\"quoted\""
 create mode 100644 empty
 create mode 100644 "new\nline"
"\"quoted\""
empty
"new\nline"

with -z but then translating \000 into a | for visualization

❯ rm -rf /tmp/test; mkdir /tmp/test; cd /tmp/test; git init; echo 1 > "new
line"; touch empty; touch '"quoted"'; git add *; git commit -m "added files"; git ls-files -z | tr '\000' '|'
Initialized empty Git repository in /tmp/test/.git/
[master (root-commit) 28da778] added files
 3 files changed, 1 insertion(+)
 create mode 100644 "\"quoted\""
 create mode 100644 empty
 create mode 100644 "new\nline"
"quoted"|empty|new
line|%

[*] In DataLad we programmatically compose "most obscure" (although still without new line IIRC) filename for a filesystem at hands during testing: https://github.com/datalad/datalad/blob/maint/datalad/tests/utils_pytest.py#L1453 . So on my laptop it is :

❯ python -c 'from datalad.tests.utils_pytest import OBSCURE_FILENAME; print(repr(OBSCURE_FILENAME))'
' |;&%b5{}\'"<>ΔЙקم๗あ .datc '

NB if you use debian package -- there would be no unicode part.

so note spaces leading/trailing it to ensure we do not strip them anywhere...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants