We will build the environment required for this document.
Anaconda is a package for easy environment creation and management for doing machine learning. You can also easily install packages, like RDKit, which will be explained later.
- Why use Anaconda?
-
The programming language Python has a relatively large number of standard libraries, but you need to install the libraries for chemoinformatics yourself. This is not a big deal if you get used to it, but it will be troublesome for beginners. Anaconda comes into play in order to reduce this effort.
- There are two major versions of Python: 2.x and 3.x.
-
Support for 2.x will end in 2020, so new learners do not need to use 2.x.
Now let’s install Anaconda. Visit the official site and download the Python 3 installer for your environment. If the OS is Linux / Mac, you can select the installer of GUI / CUI, so download Python 3.7 64-bit command line installer.
$ bash ~/Downloads/Anaconda3-4.1.0-Linux-x86_64.sh # Please change the installer name accordingly
Press Enter
Welcome to Anaconda3 2018.12
In order to continue the installation process, please review the license
agreement.
Please, press ENTER to continue
>>>
Continue to press Enter and enter yes with yes, no
Do you accept the license terms? [yes|no]
[no] >>>
I am asked where to install, but the default location is usually fine. Press Return.
Anaconda3 will now be installed into this location:
/Users/kzfm/anaconda3
- Press ENTER to confirm the location
- Press CTRL-C to abort the installation
- Or specify a different location below
You will be asked if you want to install VSCode after installation as well, so press No.
Thank you for installing Anaconda3!
===========================================================================
Anaconda is partnered with Microsoft! Microsoft VSCode is a streamlined
code editor with support for development operations like debugging, task
running and version control.
To install Visual Studio Code, you will need:
- Internet connectivity
Visual Studio Code License: https://code.visualstudio.com/license
Do you wish to proceed with the installation of Microsoft VSCode? [yes|no]
>>> Please answer 'yes' or 'no':
>>>
Once the Anaconda installation is complete, you will be able to use the 'conda' command from a command prompt or terminal.
Python installed with Anaconda is 3.7, but the latest RDKit distributed at the time of this writing requires Python 3.6. So build a virtual environment with conda and install the required version of Python. After the -n of the command is "py4chemoinformatics", but you can use any name you like. After creating the virtual environment, install the packages used in this chapter and later.
$ conda create -n py4chemoinformatics python3.6
$ source activate py4chemoinformatics # Mac/Linux
$ activate py4chemoinformatics # Windows
# install packages
$ conda install -c conda-forge rdkit
$ conda install -c conda-forge seaborn
$ conda install -c conda-forge ggplot
$ conda install -c conda-forge git
RDKit is one of the most commonly used toolkits in the field of chemoinformatics. One of the so-called open source software (OSS), which can be used free of charge. For more information Please refer to Introduction.
It is one of the packages for visualizing statistical data.
One of the graph drawing packages is that it can draw rationally with a consistent grammar . Originally developed for the statistical analysis language R, it was ported to Python by the company yhat .
It is a version control system. I will not explain Git in this book, but if you do not know Git at all , take a look at Git Primer, which can be understood by monkeys.
As explained in "Introduction", all data including pdf will be downloaded by the following command, so please download it as necessary.
$ git clone https://github.com/Mishima-syk/py4chemoinformatics.git
- Why create a virtual environment
-
Some systems use Python internally to provide various features, so changing the Python version for a particular package can cause problems. Virtual environments solve these problems. Even if the package requires different library versions, you can set up a virtual Python environment for trial and error. If it becomes unnecessary, the virtual environment can be easily deleted without causing any problems in the original environment. So, by being able to create separate development environments in one system, you will not be bothered by library dependencies problems and Python version differences that often occur during development.
In this document, only one virtual environment is prepared for this document, but in practice many virtual environments are often created and developed. Therefore, I will list the conda subcommands that I use frequently.
$ conda install <package name> # install package
$ conda create -n <Name-of-virtual-environment> python = <version> # Create virtual environment.
$ conda info -e # Display virtual environment list created
$ conda remove -n <environment-name> # Virtual environment deletion
$ source activate <environment-name> # Using virtual environment ( Mac/Linux)
$ activate <environment-name> # Using virtual environment (Windows)
$ source deactivate # leaving virtual environment
$ conda list # Display a list of libraries installed in the virtual environment you are using now