Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installation Guide #8

Open
dlitmano opened this issue Nov 1, 2017 · 26 comments
Open

Installation Guide #8

dlitmano opened this issue Nov 1, 2017 · 26 comments

Comments

@dlitmano
Copy link

dlitmano commented Nov 1, 2017

Can someone explain me how to use this crawler. I am very interested because of my research project.

@BossMAN559
Copy link

BossMAN559 commented Nov 1, 2017

I have been working on trying to figure out this project for a few days now.
not really any information past some basic commands.
I got a lot of it working now, but still have a few things not working.
here is some instructions I can give to get you to where I am at.

install Ubuntu 17.10 server
to minimize config file changes I used the server name grunt and the login freshonions
installed TOR and verified it connected and is working.

install MySQL
sudo apt-get install mysql-server

install PIP
sudo apt-get install python-pip -y

clone a copy of the freshonions
git clone https://github.com/dirtyfilthy/freshonions-torscraper.git

rename the folder
mv freshonions-torscraper torscraper
cd torscraper

install the requirements
sudo pip install -r requirements.txt
sudo apt-get install python-flask -y
pip install timeout-decorator
(I don't know why those last two get missed on the installer)

setup MySQL
mysql -u root -Prootpassword
CREATE DATABASE databasename;
exit

MySQL -u root -prootpassword databasename < schema.sql

nano etc/database
set database password and database name

install privoxy
sudo apt-get install privoxy
sudo nano /etc/privoxy/config
enable the line with forward-socks5 / 127.0.0.1:9050

restart service
nano etc/proxy
this is where I am getting a bit stuck at.
export TOR_PROXY_PORT=8118 #is this privoxy or TOR port
export TOR_PROXY_HOST=127.0.0.1
export http_proxy=http://127.0.0.1:8118 #is this privoxy or TOR port
export https_proxy=https://127.0.0.1:3129
export SOCKS_PROXY=127.0.0.1:8118 #is this privoxy or TOR port
HIDDEN_SERVICE_PROXY_HOST=127.0.0.1
HIDDEN_SERVICE_PROXY_PORT=8118 #is this privoxy or TOR port

generate flask secret
mkdir /home/freshonions/torscraper/etc/private
touch /home/freshonions/torscraper/etc/private/flask.secret
./scripts/create_flask_secret.sh

./scripts/harvest.sh will start to pull information
./scripts/web.sh will start a web server at port 5000

@BossMAN559
Copy link

okay so I found out that it was elastisearch that was not installed/setup causing all the errors. turning that off an now it is crawling the addresses now.

@dlitmano
Copy link
Author

hey =)
thank you for your guide, but I have troubles with privoxy. I have no idea how to connect my crawler with privoxy and tor together. My crawler can crawl normal pages but not .onion adresses.

@BossMAN559
Copy link

BossMAN559 commented Nov 13, 2017

i seem to have gotten everything to work except the stats page
in the etc/proxy file i set this

export TOR_PROXY_PORT=8118
export TOR_PROXY_HOST=localhost
export http_proxy=http://localhost:8118
export https_proxy=https://localhost:8118
export SOCKS_PROXY=localhost:9050
HIDDEN_SERVICE_PROXY_HOST=localhost
HIDDEN_SERVICE_PROXY_PORT=9090

in the /etc/privoxy/config file i have these turned on
listen-address 127.0.0.1:8118
listen-address [::1]:8118
forward-socks5t / 127.0.0.1:9050 .
forward-socks4 / 127.0.0.1:9050 .
forward-socks4a / 127.0.0.1:9050 .
forward 192.168../ .
forward localhost/ .

@BossMAN559
Copy link

oh, if you install the newest build of tor 0.3.2.4-alpha, you can also index the new version 3 addresses.

@dlitmano
Copy link
Author

/scripts/harvest.sh will start to pull information
failed to connect to onion.cab

@BossMAN559
Copy link

oh, i had that error as well. i removed the onion.cab line out of the harvest script.

@dlitmano
Copy link
Author

are you using it without elasticsearch?

@BossMAN559
Copy link

i started with elasticsearch turned off but i went in and got that working.

install their key
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

install apt-transport
sudo apt-get install apt-transport-https

add their dis information
echo "deb https://artifacts.elastic.co/packages/5.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-5.x.list

install
sudo apt-get update && sudo apt-get install elasticsearch

start the serivce
sudo /bin/systemctl daemon-reload
sudo /bin/systemctl enable elasticsearch.service
sudo systemctl start elasticsearch.service

@dlitmano
Copy link
Author

thank you for the help. I will try it tomorrow and write you back ;)

@dlitmano
Copy link
Author

File "/usr/local/lib/python2.7/dist-packages/elasticsearch_dsl/document.py", line 410, in save
**doc_meta
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 73, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/init.py", line 300, in index
_make_path(index, doc_type, id), params=params, body=body)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 312, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/http_urllib3.py", line 123, in perform_request
raise ConnectionError('N/A', str(e), e)
ConnectionError: ConnectionError(<urllib3.connection.HTTPConnection object at 0x7f8540f8e190>: Failed to establish a new connection: [Errno -2] Name or service not known) caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7f8540f8e190>: Failed to establish a new connection: [Errno -2] Name or service not known)

@dlitmano
Copy link
Author

Tried with elasticsearch and get new Exception

@dlitmano
Copy link
Author

dlitmano commented Nov 14, 2017

sudo /etc/init.d/elasticsearch start
do not helped

@dlitmano
Copy link
Author

and why do I need elasticsearch?

@BossMAN559
Copy link

as it turns out my install of elasticsearch wont start. it looks like that is a monster program that will take up all resources on my system. when i first got it working it responded to my test command

curl -X GET 'http://localhost:9200'

but now it wont even start.
i just turned it back off, i would have just erased the VM and started again, but my index has over 5000 onion pages listed and i kind of want to keep that

@zaranmd
Copy link

zaranmd commented Nov 22, 2017

@dlitmano @BossMAN559 I run this project finally...I can explain about elasticsearch... Elasticsearch is a module in order to being able to do search on the web. For installing it , download elasticsearch from its site for your system: https://www.elastic.co/guide/en/elasticsearch/reference/5.5/zip-targz.html...
then start elasticsearch by :
cd .../elasticsearch/bin
./elasticsearch
by running elasticsearch you can get access to it by localhost:9200 in the browser...note that, elasticsearch must be running on backend (terminal) during your search on the freshonion web. And also, you should have enabled it in torscraper/etc/elasticsearch directory.

@zaranmd
Copy link

zaranmd commented Nov 25, 2017

@BossMAN559 i fixed the stats page problem, it seems to be a mysql config issue.
open mysql config file as a root which is in this path--> /etc/mysql/mysql.cnf
then add these 2 commands to it:
[mysqld]
sql-mode=""
after saving it, restart mysql by:
/etc/init.d/mysql restart
i hope this will help u...

@PatrickWilli
Copy link

Hello guys,
thank you for posting your experience while trying to get this running. It helped me a lot.
I have 1400 onions in the database, where only 120 are alive. is this normal? how many % of your sites are up? i fear i messed something up. in the terminal i see no error massages, only lot of INFO and DEBUG which is probably normal.

@L3houx
Copy link

L3houx commented Feb 23, 2018

I update the readme to be able to follow only the readme and all should be good. This is the link of the updated ReadMe : https://github.com/GoSecure/freshonions-torscraper/blob/update-readme/README.md

Also, @PatrickWilli it's not "normal", but if you follow step by step my installation guide, you will be able to have a better ratio.

@thewuffwuff
Copy link

@mrL3x Hi, I'm a student which is interested with this project but at the same time has lack of computer knowledge. Would you guide me to the process of install this programme in my pc? would it be able to run in Ubuntu install in VMWARE? thanks. Appreciate it so much if u could respond

@L3houx
Copy link

L3houx commented Feb 26, 2018

@thewuffwuff Hi, I can help you if you ask me questions in the issues section. You can install Ubuntu in VMware, but you will need resources to run this project on a virtual machine, have at least 8Go of ram (minimum) and a good processor. Install Ubuntu on VMware is really easy, a lot of people already did it, just take a look on YouTube to start with VMWARE. After that, I can help you, but with the new GoSecure readme file, you suppose to be good without help. I describe a lot what you need to do step by step.

The new readme file : https://github.com/GoSecure/freshonions-torscraper/blob/update-readme/README.md

@thewuffwuff
Copy link

@mrL3x Thanks a lot man for the reply. I have already install Ubuntu in VMware before this. I tried to understand the readme first as I faced some issues but mostly due to lack of understanding. I will be contacting you again soon. ;)

@mrgterence
Copy link

Hi, I would like to ask what is mean by
"nano etc/database"
"set database password and database name"
Can you explain to me in detail ?

I try to run below command:
./scripts/harvest.sh
./scripts/web.sh
but it end with error of 'pony.orm.dbapiprovider.OperationalError: (2003, "Can't connect to MySQL server on 'groan' ([Errno -2] Name or service not known)")'
'
I try to google it but do not found any solution.

Thanks in advance

@L3houx
Copy link

L3houx commented Mar 26, 2018

Hi @mrgterence, you need to configure the database file to be able to connect the crawler to your database. You will host a DB on your server to store data. You had creds to connect to your server Database (user, password). The host will probably 127.0.0.1 if you hosted your database locally. The user will be the user that you used to connect to your database. It is the same thing with the password... To your second problem, I thought it can be 2 reasons. First, your not in the virtual environment. Second, you didn't start the MySQL/MariaDB service... You were supposed to be able to configure all the project with the updated readme file. If you weren't, you should take a look at the basic concept of Database hosting to have a better understanding of this project.

If you had others questions you can ask me.

@icepaule
Copy link

Hi L3houx,
this issues was solved at my end by adding "localhost" to the etc/database:
DB_HOST=localhost

You might want to give it a trry. :-)

Cheers
Marcus

@ghost
Copy link

ghost commented Jan 6, 2019

@dlitmano you are the man!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants