Skip to content

Repo / Website for the WebDepencies project (IMC 2020) and follow ons

License

Notifications You must be signed in to change notification settings

synergylabs/Web-Dependencies

Repository files navigation

Development Document

The web application can be accessed at https://webdependency.andrew.cmu.edu

The repo consists of the following components

1. Measurement Modules

The scripts in this folder are basic setup scripts for the measurement module. The measurement module measure the third-party DNS, CDN and CA dependencies of top-10K websites given a country. The code of this measurement module is available at https://github.com/AqsaKashaf/Webdep.git

The general measurement workflow is the following:

  1. Fetch a list of popular website from Google BigQuery given a country c as input
    1. Please follow https://cloud.google.com/bigquery/docs/reference/libraries to setup Google BigQuery client authentication
    2. We are using a custom gmail account for this project [email protected]. Please contact Yuvraj, Vyas or Aqsa for its credentials.
    3. Once credentials retrieved and stored locally, save the path to the credential file in the variable export GOOGLE_APPLICATION_CREDENTIALS="<path_to_credentials_json>"
  2. Measurement: retrieve service dependencies for all websites in the list returned from step 1
  3. Classify: Based on the measurement results, classify if the service dependency is Private, Third-party, or unknown
  4. Measure redundancy: For DNS and CDN, measure if a website is using multiple providers, for CA measure OCSP stapling.

The setup script install-dep.sh installs dependencies required by the measurement module repo. The setup-cron.sh script schedules the measurements and then uploads the results to a global storage, which in our case is Box (see details below).

Request access to the following if you don't have permissions:

To run the measurement:

  1. Clone https://github.com/AqsaKashaf/Webdep.git
  2. Install dependencies: install-dep.sh and pipenv install
  3. Start Python virtual environment: pipenv shell
  4. Run scripts: python Webdep\<service folder>\get_<service>_details_all.py <country_code> This script gets the second last month's google CrUX list for "<country_code>" and measures their dependency. The output file name is: <country_code>-<service>-<YYYYMM>. Each line is formatted as follows: <rank>,<domain_name>,<provider>,<provider_type>
    1. country_code is the two character code for the country, e.g. the country code for the United States is us
    2. service is DNS, CDN or CA.
  5. If you want to measure only for a single website, then run python Webdep\<service folder>\get_<service>_details_unit.py

2. Web Application

The web application is a React frontend application. The web application may load files in this repo or call the API Server to fetch files from CMU Box. The application uses Material UI with some Material Dashboard Components.

  • The source files for the home page is in src/layouts/home/.
  • The source files for region analysis, country analysis are in src/layouts/dashboard.
  • The source files for side nav bar are in src/examples/Sidenav The soruce files for top nav bar are in src/examples/Navbars/DashboardNavbar

To run the application:

  1. Install dependencies: npm install
  2. Run application: npm start
  3. The application should be available at http://localhost:8080/
  4. To change the test port, see package.json
  5. To build: npm build, build settings can also be changed in package.json

3. Box File Server

The Box file server is a backend service written in Python Flask. It is used to fetch files from CMU Box. Currently the route /country/<country>/service/<service>/month/<month> is in use. It first look for requested file from directory files, and fetch from Box if the file doesn't exist locally. The file server is deployed at http://webdependency.andrew.cmu.edu:5000/

Request access to the following if you don't have permissions:

Before running the server, you will need to fetch the client secret for the application:

  1. Go to https://cmu.app.box.com/developers/console/app/1897359/configuration
  2. Click "Fetch Client Secret" in "OAuth 2.0 Credentials" section
  3. Copy the client secret
  4. Create a new file in your home directory ~/.secrets/credentials.json
    1. Format:
      {
        "box_client": <client_secret>
      }
      
    2. If you create the file at another location, change the path for the credentials_file in the init_client function in box_client.py

To run the server:

  1. Go to box file server directory: cd box_file_server
  2. Installl dependencies in the virtual environment: pipenv install
  3. Activate Python virtual environment pipenv shell
  4. Run the server: flask run
  5. The server should be available at http://localhost:5000
  6. To make it public use: flask run --host=0.0.0.0

About

Repo / Website for the WebDepencies project (IMC 2020) and follow ons

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •