a). Load data into scikit – learn; Run many machine learning Algorithms both for Unsupervised and Supervised data.
b). Assess model Accuracy and performance.
c). Being able to decide what’s the best model for every Scenario.
1. Introduction to Scikit – learn
b). Installing scikit – learn
c). Data Manipulation: from Pandas to scikit – learn
d). Creating Synthetic Data
2. Supervised Methods
a). Naive Bayes : Bernoulli – Multinomial
b). Detecting Spam in real SMS Kaggle data
c). Linear Support Vector Machines (SVM): SVM and LinearSVC
d). Linear Support Vector Machines (SVM): NuSVM
e). Logistic Regression
f). Predicting if income >50k using real US Census Data
g). Isotonic Regression
h). Linear Regression – Lasso – Ridge
g). Decision trees
h). Introduction to Ensemble methods
i). Averaging Ensemble methods
j). Digital Classification via Random Forests
k). Boosting Ensemble methods
l). Grid Search Cross Validation
m). Predicting real house prices in the US using ExtraTreesRegressor
3. UnSupervised Methods
a). Density Estimation
b). Principal Components
c). K – means
f). Clustering and PCA on real countries data from Kaggle
g). Outlier detection
h). Novelty detection
This course will explain how to use scikit-learn to do advanced machine learning. If you are aiming to work as a professional data scientist, you need to master scikit-learn!
It is expected that you have some familiarity with statistics, and python programming. It’s not necessary to be an expert, but you should be able to understand what is a Gaussian distribution, code loops and functions in Python, and know the basics of a maximum likelihood estimator. The course will be entirely focused on the python implementation, and the math behind it will be omitted as much as possible.
We’ll start by explaining what is the machine learning problem, methodology and terminology. We’ll explain what are the differences between AI, machine learning (ML), statistics, and data mining. Scikit-learn (being a Python library) benefits from Python’s spectacular simplicity and power. We’ll start by explaining how to install scikit-learn and its dependencies. And then show how can we can use Pandas data in scikit-learn, and also benefit from SciPy and Numpy. We’ll then show how to create synthetic data-sets using scikit-learn. We will be able to create data-sets specifically tailored for regression, classification and clustering.
Machine learning :
Can be divided into two big groups: supervised and unsupervised learning. In supervised learning we will have an objective variable (which can be continuous or categorical) and we want to use certain features to predict it. Scikit-learn will provide estimators for both classification and regression problems.
On the other hand, in unsupervised learning we will have a set of features (but with no outcome or target variable) and we will attempt to learn from that data. Whether it has outliers, whether it can be grouped into groups, whether we can remove some of those features, etcetera. For example we will see k-means which is the simplest algorithm for classifying observations into groups.
I try to keep this course as updated as possible, specially since scikit-learn is constantly being updated. For example, neural networks was added in the latest release. I tried to keep the examples as simple as possible, keeping the amount of observations (samples) and features (variables) as small as possible.