Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

media scraper does not work #24

Open
james16000 opened this issue Jan 21, 2022 · 0 comments
Open

media scraper does not work #24

james16000 opened this issue Jan 21, 2022 · 0 comments

Comments

@james16000
Copy link

hi
Windows 7
Python 3.7.6
pip 21.3.1

It does not work on Reddit, Twitter, Instagram and Tik Tok
After loading the repository and run it on cmd
I wanted to try it on Instagram, Tiktok, Twitter and Reddit
I see this problem on Instagram

I will give you screenshots of the problem

C:\Users\pc\Desktop\media-scraper>python -m mediascraper.instagram instagram Starting PhantomJS web driver... .\webdriver/phantomjsdriver_2.1.1_win32/phantomjs.exe C:\Users\pc\AppData\Local\Programs\Python\Python37-32\lib\site-packages\selenium\webdriver\phantomjs\webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headle ss versions of Chrome or Firefox instead warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless ' Crawling... Traceback (most recent call last): File "C:\Users\pc\AppData\Local\Programs\Python\Python37-32\lib\runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "C:\Users\pc\AppData\Local\Programs\Python\Python37-32\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\pc\Desktop\media-scraper\mediascraper\instagram.py", line 16, in <module> tasks = scraper.scrape(username) File "C:\Users\pc\Desktop\media-scraper\mediascrapers.py", line 262, in scrape tasks += (task[0], username, task[1]) IndexError: list index out of range

1

C:\Users\pc\Desktop\media-scraper>python m-scraper.py rq instagram instagram Namespace(credential_file=None, early_stop=False, keywords=['instagram'], save_path=None) Instagramer Task: instagram Traceback (most recent call last): File "m-scraper.py", line 36, in <module> scraper.run(sys.argv[3:]) File "C:\Users\pc\Desktop\media-scraper\m_scraper\rq\downloader.py", line 82, in run self.crawl(keyword, args.early_stop) File "C:\Users\pc\Desktop\media-scraper\m_scraper\rq\instagramer.py", line 46, in crawl tasks, end_cursor, has_next, length, user_id, rhx_gis, csrf_token = get_first_page(username) File "C:\Users\pc\Desktop\media-scraper\m_scraper\rq\utils\instagram.py", line 52, in get_first_page rhx_gis = shared_data['rhx_gis'] KeyError: 'rhx_gis'

2

tiktok

C:\Users\pc\Desktop\media-scraper>python m-scraper.py rq tiktok tiktok Namespace(credential_file=None, early_stop=False, keywords=['tiktok'], save_path=None) {'statusCode': 10000, 'verifyConfig': {'code': 10000, 'type': 'verify', 'subtype': 'slide', 'fp': 'verify_dd9489ac31d2e6b50a4f9ed75b5240f2', 'region': 'va', 'detail': 'vEyCkJEKBnSe-zq257GFQJrLW03-aOs8 awmNop3PD5IGQA4kjoDDIU6NDQKG7BnEsMWT8C-WHIUjsfHZ9OMgl9009Qcdo2LIOBhGJyNK118AOCRmw8StlADDjuzkZrFHFDTHnSgp2x651wwrNM6-FYFCOlP0izZx6n*pCjcMIM1sjOh0zwAye*FM5lPnHiVJ1eER3KmM*q6VpyCU*uNyTeYkaDpcFOMdgP3br0Hl sWO--*jeaUPVnSjP8RejdrEQgq7oLsXM4rjjf14GhyWBa0H8kj*LODz42UoKrM32r4Fm6VjEAoEeRrjmHVUkwbwAptLOsfmREJTSdtNToMx6t4NqBXWm0mJ24vXdY9Txp83rH49pTmZE1wbEupTi18B1Tw..'}} Traceback (most recent call last): File "m-scraper.py", line 36, in <module> scraper.run(sys.argv[3:]) File "C:\Users\pc\Desktop\media-scraper\m_scraper\rq\downloader.py", line 82, in run self.crawl(keyword, args.early_stop) File "C:\Users\pc\Desktop\media-scraper\m_scraper\rq\tiktoker.py", line 38, in crawl raise Exception('body not found') Exception: body not found
3

twitter

C:\Users\pc\Desktop\media-scraper>python -m mediascraper.twitter twitter Starting PhantomJS web driver... .\webdriver/phantomjsdriver_2.1.1_win32/phantomjs.exe C:\Users\pc\AppData\Local\Programs\Python\Python37-32\lib\site-packages\selenium\webdriver\phantomjs\webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headle ss versions of Chrome or Firefox instead warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless ' Crawling... Traceback (most recent call last): File "C:\Users\pc\AppData\Local\Programs\Python\Python37-32\lib\runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "C:\Users\pc\AppData\Local\Programs\Python\Python37-32\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\pc\Desktop\media-scraper\mediascraper\twitter.py", line 18, in <module> tasks = scraper.scrape(username) File "C:\Users\pc\Desktop\media-scraper\mediascrapers.py", line 392, in scrape done = self.scrollToBottom() File "C:\Users\pc\Desktop\media-scraper\mediascrapers.py", line 87, in scrollToBottom last_height, new_height = self._driver.execute_script("return document.body.scrollHeight"), 0 File "C:\Users\pc\AppData\Local\Programs\Python\Python37-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 636, in execute_script 'args': converted_args})['value'] File "C:\Users\pc\AppData\Local\Programs\Python\Python37-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute self.error_handler.check_response(response) File "C:\Users\pc\AppData\Local\Programs\Python\Python37-32\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.WebDriverException: Message: {"errorMessage":"Refused to evaluate a string as JavaScript because 'unsafe-eval' is not an allowed source of script in the following Content Se curity Policy directive: \"script-src 'self' 'unsafe-inline' https://*.twimg.com https://recaptcha.net/recaptcha/ https://www.google.com/recaptcha/ https://www.gstatic.com/recaptcha/ https://www.googl e-analytics.com https://twitter.com https://app.link https://accounts.google.com/gsi/client https://appleid.cdn-apple.com/appleauth/static/jsapi/appleid/1/en_US/appleid.auth.js 'nonce-MDk3YWFmNWEtOGR kZS00NGQzLWE2MjMtZjUzNzhjMGZhZGJl'\".\n","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Content-Length":"112","Content-Type":"application/json;charset=UTF-8","Host":"1 27.0.0.1:55938","User-Agent":"selenium/3.141.0 (python windows)"},"httpVersion":"1.1","method":"POST","post":"{\"script\": \"return document.body.scrollHeight\", \"args\": [], \"sessionId\": \"7c44f2b 0-7a92-11ec-a2dc-d38583c010ce\"}","url":"/execute","urlParsed":{"anchor":"","query":"","file":"execute","directory":"/","path":"/execute","relative":"/execute","port":"","host":"","password":"","user" :"","userInfo":"","authority":"","protocol":"","source":"/execute","queryKey":{},"chunks":["execute"]},"urlOriginal":"/session/7c44f2b0-7a92-11ec-a2dc-d38583c010ce/execute"}} Screenshot: available via screen
4

I also tried set the web driver to 777 for convenience.
C:\Users\pc\Desktop\media-scraper>chmod 777 webdriver/phantomjsdriver_2.1.1_win32/phantomjs.exe 'chmod' is not recognized as an internal or external command, operable program or batch file.

5

I wish this tool was working because it would be the best tool on the internet
Greetings to all

@elvisyjlin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant