Installation Guide #8

dlitmano · 2017-11-01T16:19:03Z

Can someone explain me how to use this crawler. I am very interested because of my research project.

BossMAN559 · 2017-11-01T18:04:28Z

I have been working on trying to figure out this project for a few days now.
not really any information past some basic commands.
I got a lot of it working now, but still have a few things not working.
here is some instructions I can give to get you to where I am at.

install Ubuntu 17.10 server
to minimize config file changes I used the server name grunt and the login freshonions
installed TOR and verified it connected and is working.

install MySQL
sudo apt-get install mysql-server

install PIP
sudo apt-get install python-pip -y

clone a copy of the freshonions
git clone https://github.com/dirtyfilthy/freshonions-torscraper.git

rename the folder
mv freshonions-torscraper torscraper
cd torscraper

install the requirements
sudo pip install -r requirements.txt
sudo apt-get install python-flask -y
pip install timeout-decorator
(I don't know why those last two get missed on the installer)

setup MySQL
mysql -u root -Prootpassword
CREATE DATABASE databasename;
exit

MySQL -u root -prootpassword databasename < schema.sql

nano etc/database
set database password and database name

install privoxy
sudo apt-get install privoxy
sudo nano /etc/privoxy/config
enable the line with forward-socks5 / 127.0.0.1:9050

restart service
nano etc/proxy
this is where I am getting a bit stuck at.
export TOR_PROXY_PORT=8118 #is this privoxy or TOR port
export TOR_PROXY_HOST=127.0.0.1
export http_proxy=http://127.0.0.1:8118 #is this privoxy or TOR port
export https_proxy=https://127.0.0.1:3129
export SOCKS_PROXY=127.0.0.1:8118 #is this privoxy or TOR port
HIDDEN_SERVICE_PROXY_HOST=127.0.0.1
HIDDEN_SERVICE_PROXY_PORT=8118 #is this privoxy or TOR port

generate flask secret
mkdir /home/freshonions/torscraper/etc/private
touch /home/freshonions/torscraper/etc/private/flask.secret
./scripts/create_flask_secret.sh

./scripts/harvest.sh will start to pull information
./scripts/web.sh will start a web server at port 5000

BossMAN559 · 2017-11-01T23:33:03Z

okay so I found out that it was elastisearch that was not installed/setup causing all the errors. turning that off an now it is crawling the addresses now.

dlitmano · 2017-11-13T16:20:06Z

hey =)
thank you for your guide, but I have troubles with privoxy. I have no idea how to connect my crawler with privoxy and tor together. My crawler can crawl normal pages but not .onion adresses.

BossMAN559 · 2017-11-13T17:11:03Z

i seem to have gotten everything to work except the stats page
in the etc/proxy file i set this

export TOR_PROXY_PORT=8118
export TOR_PROXY_HOST=localhost
export http_proxy=http://localhost:8118
export https_proxy=https://localhost:8118
export SOCKS_PROXY=localhost:9050
HIDDEN_SERVICE_PROXY_HOST=localhost
HIDDEN_SERVICE_PROXY_PORT=9090

in the /etc/privoxy/config file i have these turned on
listen-address 127.0.0.1:8118
listen-address [::1]:8118
forward-socks5t / 127.0.0.1:9050 .
forward-socks4 / 127.0.0.1:9050 .
forward-socks4a / 127.0.0.1:9050 .
forward 192.168../ .
forward localhost/ .

BossMAN559 · 2017-11-13T17:23:44Z

oh, if you install the newest build of tor 0.3.2.4-alpha, you can also index the new version 3 addresses.

dlitmano · 2017-11-13T18:07:49Z

/scripts/harvest.sh will start to pull information
failed to connect to onion.cab

BossMAN559 · 2017-11-13T18:18:15Z

oh, i had that error as well. i removed the onion.cab line out of the harvest script.

dlitmano · 2017-11-13T18:43:19Z

are you using it without elasticsearch?

BossMAN559 · 2017-11-13T19:01:47Z

i started with elasticsearch turned off but i went in and got that working.

install their key
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

install apt-transport
sudo apt-get install apt-transport-https

add their dis information
echo "deb https://artifacts.elastic.co/packages/5.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-5.x.list

install
sudo apt-get update && sudo apt-get install elasticsearch

start the serivce
sudo /bin/systemctl daemon-reload
sudo /bin/systemctl enable elasticsearch.service
sudo systemctl start elasticsearch.service

dlitmano · 2017-11-13T19:11:46Z

thank you for the help. I will try it tomorrow and write you back ;)

dlitmano · 2017-11-14T07:39:49Z

File "/usr/local/lib/python2.7/dist-packages/elasticsearch_dsl/document.py", line 410, in save
**doc_meta
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 73, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/init.py", line 300, in index
_make_path(index, doc_type, id), params=params, body=body)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 312, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/http_urllib3.py", line 123, in perform_request
raise ConnectionError('N/A', str(e), e)
ConnectionError: ConnectionError(<urllib3.connection.HTTPConnection object at 0x7f8540f8e190>: Failed to establish a new connection: [Errno -2] Name or service not known) caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7f8540f8e190>: Failed to establish a new connection: [Errno -2] Name or service not known)

dlitmano · 2017-11-14T07:40:11Z

Tried with elasticsearch and get new Exception

dlitmano · 2017-11-14T07:41:33Z

sudo /etc/init.d/elasticsearch start
do not helped

dlitmano · 2017-11-14T07:50:07Z

and why do I need elasticsearch?

BossMAN559 · 2017-11-14T17:09:27Z

as it turns out my install of elasticsearch wont start. it looks like that is a monster program that will take up all resources on my system. when i first got it working it responded to my test command

curl -X GET 'http://localhost:9200'

but now it wont even start.
i just turned it back off, i would have just erased the VM and started again, but my index has over 5000 onion pages listed and i kind of want to keep that

zaranmd · 2017-11-22T09:16:24Z

@dlitmano @BossMAN559 I run this project finally...I can explain about elasticsearch... Elasticsearch is a module in order to being able to do search on the web. For installing it , download elasticsearch from its site for your system: https://www.elastic.co/guide/en/elasticsearch/reference/5.5/zip-targz.html...
then start elasticsearch by :
cd .../elasticsearch/bin
./elasticsearch
by running elasticsearch you can get access to it by localhost:9200 in the browser...note that, elasticsearch must be running on backend (terminal) during your search on the freshonion web. And also, you should have enabled it in torscraper/etc/elasticsearch directory.

zaranmd · 2017-11-25T06:35:08Z

@BossMAN559 i fixed the stats page problem, it seems to be a mysql config issue.
open mysql config file as a root which is in this path--> /etc/mysql/mysql.cnf
then add these 2 commands to it:
[mysqld]
sql-mode=""
after saving it, restart mysql by:
/etc/init.d/mysql restart
i hope this will help u...

PatrickWilli · 2017-12-03T10:59:09Z

Hello guys,
thank you for posting your experience while trying to get this running. It helped me a lot.
I have 1400 onions in the database, where only 120 are alive. is this normal? how many % of your sites are up? i fear i messed something up. in the terminal i see no error massages, only lot of INFO and DEBUG which is probably normal.

L3houx · 2018-02-23T20:58:39Z

I update the readme to be able to follow only the readme and all should be good. This is the link of the updated ReadMe : https://github.com/GoSecure/freshonions-torscraper/blob/update-readme/README.md

Also, @PatrickWilli it's not "normal", but if you follow step by step my installation guide, you will be able to have a better ratio.

thewuffwuff · 2018-02-26T04:25:40Z

@mrL3x Hi, I'm a student which is interested with this project but at the same time has lack of computer knowledge. Would you guide me to the process of install this programme in my pc? would it be able to run in Ubuntu install in VMWARE? thanks. Appreciate it so much if u could respond

L3houx · 2018-02-26T12:56:54Z

@thewuffwuff Hi, I can help you if you ask me questions in the issues section. You can install Ubuntu in VMware, but you will need resources to run this project on a virtual machine, have at least 8Go of ram (minimum) and a good processor. Install Ubuntu on VMware is really easy, a lot of people already did it, just take a look on YouTube to start with VMWARE. After that, I can help you, but with the new GoSecure readme file, you suppose to be good without help. I describe a lot what you need to do step by step.

The new readme file : https://github.com/GoSecure/freshonions-torscraper/blob/update-readme/README.md

thewuffwuff · 2018-03-01T03:00:49Z

@mrL3x Thanks a lot man for the reply. I have already install Ubuntu in VMware before this. I tried to understand the readme first as I faced some issues but mostly due to lack of understanding. I will be contacting you again soon. ;)

mrgterence · 2018-03-25T08:30:54Z

Hi, I would like to ask what is mean by
"nano etc/database"
"set database password and database name"
Can you explain to me in detail ?

I try to run below command:
./scripts/harvest.sh
./scripts/web.sh
but it end with error of 'pony.orm.dbapiprovider.OperationalError: (2003, "Can't connect to MySQL server on 'groan' ([Errno -2] Name or service not known)")'
'
I try to google it but do not found any solution.

Thanks in advance

L3houx · 2018-03-26T12:09:51Z

Hi @mrgterence, you need to configure the database file to be able to connect the crawler to your database. You will host a DB on your server to store data. You had creds to connect to your server Database (user, password). The host will probably 127.0.0.1 if you hosted your database locally. The user will be the user that you used to connect to your database. It is the same thing with the password... To your second problem, I thought it can be 2 reasons. First, your not in the virtual environment. Second, you didn't start the MySQL/MariaDB service... You were supposed to be able to configure all the project with the updated readme file. If you weren't, you should take a look at the basic concept of Database hosting to have a better understanding of this project.

If you had others questions you can ask me.

icepaule · 2018-06-29T09:07:46Z

Hi L3houx,
this issues was solved at my end by adding "localhost" to the etc/database:
DB_HOST=localhost

You might want to give it a trry. :-)

Cheers
Marcus

ghost · 2019-01-06T08:09:10Z

@dlitmano you are the man!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Installation Guide #8

Installation Guide #8

dlitmano commented Nov 1, 2017

BossMAN559 commented Nov 1, 2017 •

edited

Loading

BossMAN559 commented Nov 1, 2017

dlitmano commented Nov 13, 2017

BossMAN559 commented Nov 13, 2017 •

edited

Loading

BossMAN559 commented Nov 13, 2017

dlitmano commented Nov 13, 2017

BossMAN559 commented Nov 13, 2017

dlitmano commented Nov 13, 2017

BossMAN559 commented Nov 13, 2017

dlitmano commented Nov 13, 2017

dlitmano commented Nov 14, 2017

dlitmano commented Nov 14, 2017

dlitmano commented Nov 14, 2017 •

edited

Loading

dlitmano commented Nov 14, 2017

BossMAN559 commented Nov 14, 2017

zaranmd commented Nov 22, 2017

zaranmd commented Nov 25, 2017

PatrickWilli commented Dec 3, 2017

L3houx commented Feb 23, 2018

thewuffwuff commented Feb 26, 2018

L3houx commented Feb 26, 2018 •

edited

Loading

thewuffwuff commented Mar 1, 2018

mrgterence commented Mar 25, 2018

L3houx commented Mar 26, 2018

icepaule commented Jun 29, 2018

ghost commented Jan 6, 2019

Installation Guide #8

Installation Guide #8

Comments

dlitmano commented Nov 1, 2017

BossMAN559 commented Nov 1, 2017 • edited Loading

BossMAN559 commented Nov 1, 2017

dlitmano commented Nov 13, 2017

BossMAN559 commented Nov 13, 2017 • edited Loading

BossMAN559 commented Nov 13, 2017

dlitmano commented Nov 13, 2017

BossMAN559 commented Nov 13, 2017

dlitmano commented Nov 13, 2017

BossMAN559 commented Nov 13, 2017

dlitmano commented Nov 13, 2017

dlitmano commented Nov 14, 2017

dlitmano commented Nov 14, 2017

dlitmano commented Nov 14, 2017 • edited Loading

dlitmano commented Nov 14, 2017

BossMAN559 commented Nov 14, 2017

zaranmd commented Nov 22, 2017

zaranmd commented Nov 25, 2017

PatrickWilli commented Dec 3, 2017

L3houx commented Feb 23, 2018

thewuffwuff commented Feb 26, 2018

L3houx commented Feb 26, 2018 • edited Loading

thewuffwuff commented Mar 1, 2018

mrgterence commented Mar 25, 2018

L3houx commented Mar 26, 2018

icepaule commented Jun 29, 2018

ghost commented Jan 6, 2019

BossMAN559 commented Nov 1, 2017 •

edited

Loading

BossMAN559 commented Nov 13, 2017 •

edited

Loading

dlitmano commented Nov 14, 2017 •

edited

Loading

L3houx commented Feb 26, 2018 •

edited

Loading