You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When using the extract option (i.e., -e), there is a file name mismatch. In fact, the software expects to read from a file called links.txt, but it writes a file with the format <date>_links.txt.
To Reproduce
In order to reproduce the problem, it's just as easy as running one of the examples on the homepage, that is (after minor modifications): python3 torcrawl.py -v -u http://www.github.com/ -c -d 2 -p 0 -e -w
and the output will be
## Your IP: A.B.C.D.
## URL: http://www.github.com/
## Folder created: www.github.com
## Crawler started from http://www.github.com/ with 2 depth crawl, and 0 second(s) delay.
## Step 1 completed with: 40 result(s)
## Step 2 completed with: 857 result(s)
## File created on /Users/user/TorCrawl.py/www.github.com/links.txt
Error: [Errno 2] No such file or directory: 'www.github.com/links.txt'
## Can't open: www.github.com/links.txt
Traceback (most recent call last):
File "/Users/user/TorCrawl.py/torcrawl.py", line 210, in <module>
main()
File "/Users/user/TorCrawl.py/torcrawl.py", line 199, in main
extractor(
File "/Users/user/TorCrawl.py/modules/extractor.py", line 206, in extractor
cinex(input_file, out_path, selection_yara)
File "/Users/user/TorCrawl.py/modules/extractor.py", line 72, in cinex
for line in file:
TypeError: 'type' object is not iterable
in fact, by browsing the newly-created www.github.com folder, we have a file called 20240626_links.txt rather than simply links.txt.
Expected behavior
That TypeError should not appear.
Desktop (please complete the following information):
OS: macOS 14.5
Python Version: 3.12.4
Fix
The fix is quite straightforward. In torcrawl.py, the line
Describe the bug
When using the extract option (i.e.,
-e
), there is a file name mismatch. In fact, the software expects to read from a file calledlinks.txt
, but it writes a file with the format<date>_links.txt
.To Reproduce
In order to reproduce the problem, it's just as easy as running one of the examples on the homepage, that is (after minor modifications):
python3 torcrawl.py -v -u http://www.github.com/ -c -d 2 -p 0 -e -w
and the output will be
in fact, by browsing the newly-created
www.github.com
folder, we have a file called20240626_links.txt
rather than simplylinks.txt
.Expected behavior
That
TypeError
should not appear.Desktop (please complete the following information):
Fix
The fix is quite straightforward. In
torcrawl.py
, the lineshould be replaced with
The text was updated successfully, but these errors were encountered: