Environment Setup
In this chapter, you will learn how to setup the working environment for Python machine learning on your local computer.
Libraries and Packages
To understand machine learning, you need to have basic knowledge of Python programming. In addition, there are a number of libraries and packages generally used in performing various machine learning tasks as listed below −
-
numpy − is used for its N-dimensional array objects
-
pandas − is a data analysis library that includes dataframes
-
matplotlib − is 2D plotting library for creating graphs and plots
-
scikit-learn − the algorithms used for data analysis and data mining tasks
-
seaborn − a data visualization library based on matplotlib
Installation
You can install software for machine learning in any of the two methods as discussed here −
Method 1
Download and install Python separately from python.org on various operating systems as explained below −
To install Python after downloading, double click the .exe (for Windows) or .pkg (for Mac) file and follow the instructions on the screen.
For Linux OS, check if Python is already installed by using the following command at the prompt −
>$ python --version. ...
If Python 2.7 or later is not installed, install Python with the distribution’s package manager. Note that the command and package name varies.
On Debian derivatives such as Ubuntu, you can use apt −
>$ sudo apt-get install python3
Now, open the command prompt and run the following command to verify that Python is installed correctly −
>$ python3 --version Python 3.6.2
Similarly, we can download and install necessary libraries like numpy, matplotlib etc. individually using installers like pip. For this purpose, you can use the commands shown here −
>$pip install numpy $pip install matplotlib $pip install pandas $pip install seaborn
Method 2
Alternatively, to install Python and other scientific computing and machine learning packages simultaneously, we should install Anaconda distribution. It is a Python implementation for Linux, Windows and OSX, and comprises various machine learning packages like numpy, scikit-learn, and matplotlib. It also includes Jupyter Notebook, an interactive Python environment. We can install Python 2.7 or any 3.x version as per our requirement.
To download the free Anaconda Python distribution from Continuum Analytics, you can do the following −
Visit the official site of Continuum Analytics and its download page. Note that the installation process may take 15-20 minutes as the installer contains Python, associated packages, a code editor, and some other files. Depending on your operating system, choose the installation process as explained here −
For Windows − Select the Anaconda for Windows section and look in the column with Python 2.7 or 3.x. You can find that there are two versions of the installer, one for 32-bit Windows, and one for 64-bit Windows. Choose the relevant one.
For Mac OS − Scroll to the Anaconda for OS X section. Look in the column with Python 2.7 or 3.x. Note that here there is only one version of the installer: the 64-bit version.
For Linux OS − We select the “Anaconda for Linux” section. Look in the column with Python 2.7 or 3.x.
Note that you have to ensure that Anaconda’s Python distribution installs into a single directory, and does not affect other Python installations, if any, on your system.
To work with graphs and plots, we will need these Python library packages – matplotlib and seaborn.
If you are using Anaconda Python, your system already has numpy, matplotlib, pandas, seaborn, etc. installed. We start the Anaconda Navigator to access either Jupyter Note book or Spyder IDE of python.
After opening either of them, type the following commands −
>import numpy import matplotlib
Now, we need to check if installation is successful. For this, go to the command line and type in the following command −
>$ python Python 3.6.3 |Anaconda custom (32-bit)| (default, Oct 13 2017, 14:21:34) [GCC 7.2.0] on linux
Next, you can import the required libraries and print their versions as shown −
>>>>import numpy >>>print numpy.__version__ 1.14.2 >>> import matplotlib >>> print (matplotlib.__version__) 2.1.2 >> import pandas >>> print (pandas.__version__) 0.22.0 >>> import seaborn >>> print (seaborn.__version__) 0.8.1