Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File name mismatch when using the Extract option #26

Open
Alessi0X opened this issue Jun 26, 2024 · 0 comments
Open

File name mismatch when using the Extract option #26

Alessi0X opened this issue Jun 26, 2024 · 0 comments
Assignees
Labels

Comments

@Alessi0X
Copy link

Alessi0X commented Jun 26, 2024

Describe the bug
When using the extract option (i.e., -e), there is a file name mismatch. In fact, the software expects to read from a file called links.txt, but it writes a file with the format <date>_links.txt.

To Reproduce
In order to reproduce the problem, it's just as easy as running one of the examples on the homepage, that is (after minor modifications):
python3 torcrawl.py -v -u http://www.github.com/ -c -d 2 -p 0 -e -w
and the output will be

## Your IP: A.B.C.D.
## URL: http://www.github.com/
## Folder created: www.github.com
## Crawler started from http://www.github.com/ with 2 depth crawl, and 0 second(s) delay.
## Step 1 completed with: 40 result(s)
## Step 2 completed with: 857 result(s)
## File created on /Users/user/TorCrawl.py/www.github.com/links.txt
Error: [Errno 2] No such file or directory: 'www.github.com/links.txt'
## Can't open: www.github.com/links.txt
Traceback (most recent call last):
  File "/Users/user/TorCrawl.py/torcrawl.py", line 210, in <module>
    main()
  File "/Users/user/TorCrawl.py/torcrawl.py", line 199, in main
    extractor(
  File "/Users/user/TorCrawl.py/modules/extractor.py", line 206, in extractor
    cinex(input_file, out_path, selection_yara)
  File "/Users/user/TorCrawl.py/modules/extractor.py", line 72, in cinex
    for line in file:
TypeError: 'type' object is not iterable

in fact, by browsing the newly-created www.github.com folder, we have a file called 20240626_links.txt rather than simply links.txt.

Expected behavior
That TypeError should not appear.

Desktop (please complete the following information):

  • OS: macOS 14.5
  • Python Version: 3.12.4

Fix
The fix is quite straightforward. In torcrawl.py, the line

        if args.extract:
            input_file = out_path + "/links.txt"
            extractor(
                website, args.crawl, output_file, input_file, out_path, selection_yara
            )

should be replaced with

        if args.extract:
            input_file = out_path + "/" + now + "_links.txt"
            extractor(
                website, args.crawl, output_file, input_file, out_path, selection_yara
            )
@MikeMeliz MikeMeliz self-assigned this Sep 4, 2024
@MikeMeliz MikeMeliz added the bug label Sep 4, 2024
yeheshuah added a commit to yeheshuah/TorCrawl.py that referenced this issue Oct 13, 2024
yeheshuah added a commit to yeheshuah/TorCrawl.py that referenced this issue Oct 13, 2024
yeheshuah added a commit to yeheshuah/TorCrawl.py that referenced this issue Oct 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants