In this tutorial, we will complete a small end-to-end Machine Learning project using scikit-learn (scikit-learn.org), comprehensive, but simple and one of the most useful Machine Learning libraries for Python.
On a small dataset we will go through the typical pipeline of a real Machine Learning project: start with statistical summaries and visualization of the data, build multiple different machine learning models, use cross-validation to estimate their accuracies, select the best algorithm, make and evaluate the predictions on a validation set.
At the end of the session, we might have a look at the other useful functions integrated into scikit-learn.
The following tools will be used in this code clinic:
Python3 – www.python.org
Python SciPy libraries: – scipy – numpy – matplotlib – pandas – sklearn (shorten from scikit-learn)
You should stick to your favourite Python IDE; I will be working in Spyder – www.spyder-ide.org, which I highly recommend as IDE for R-users, who starts with Python and moves from R-Studio.