blog-post-intro-to-machine-learning

http://blog.algorithmia.com/introduction-machine-learning-developers/

Unsupervised Machine Learning

blog-post-intro-to-machine-learning#unsupervised-training-data-unlabeled1 2Training data is unlabeled beforehand blog-post-intro-to-machine-learning#unsupervised-training-data-unlabeled1 2

blog-post-intro-to-machine-learning#unsupervised-find-underlying-structure1 2Used to find underlying structure of data based on statistical properties such as frequency blog-post-intro-to-machine-learning#unsupervised-find-underlying-structure1 2

blog-post-intro-to-machine-learning#unsupervised-exploratory-analysis-find-patterns1Used for exploratory analysis to find unrealized patterns blog-post-intro-to-machine-learning#unsupervised-exploratory-analysis-find-patterns1

blog-post-intro-to-machine-learning#unsupervised-lack-of-labeled-training-documents1 2 3Used when there is a lack of labeled training documents blog-post-intro-to-machine-learning#unsupervised-lack-of-labeled-training-documents1 2 3

blog-post-intro-to-machine-learning#unsupervised-example-hierarchical-clustering1 2 3Example: Hierarchical clustering blog-post-intro-to-machine-learning#unsupervised-example-hierarchical-clustering1 2 3

blog-post-intro-to-machine-learning#unsupervised-example-k-means-clustering1 2 3Example: K-means clustering blog-post-intro-to-machine-learning#unsupervised-example-k-means-clustering1 2 3

blog-post-intro-to-machine-learning#unsupervised-example-maximum-entropy1 2Example: Maximum Entropy blog-post-intro-to-machine-learning#unsupervised-example-maximum-entropy1 2

K-means clustering

blog-post-intro-to-machine-learning#k-means-used-for-relationship-discovery1Used for relationship discovery and understanding the underlying structure of data blog-post-intro-to-machine-learning#k-means-used-for-relationship-discovery1

blog-post-intro-to-machine-learning#k-means-useful-for-unlabeled-data1Useful for unlabelled data as a first round of analysis blog-post-intro-to-machine-learning#k-means-useful-for-unlabeled-data1

blog-post-intro-to-machine-learning#k-means-manually-give-target-number-of-clusters1Manually give a target number of clusters blog-post-intro-to-machine-learning#k-means-manually-give-target-number-of-clusters1

blog-post-intro-to-machine-learning#k-means-makes-no-assumptions1 2 3k-means: makes no assumptions about the data meaning it uses random seeds and an iterative process that eventually converges. blog-post-intro-to-machine-learning#k-means-makes-no-assumptions1 2 3

blog-post-intro-to-machine-learning#k-means-clustering-centroid1 2 3 4k-means: This unsupervised clustering algorithm uses a distance metric with the goal of minimizing the Euclidean distance from the data points to a centroid, remeasuring and reassigning each data point to a centroid on each iteration. blog-post-intro-to-machine-learning#k-means-clustering-centroid1 2 3 4

blog-post-intro-to-machine-learning#k-means-voronoi-cells1 2This algorithm takes n observations into k clusters with each observation belonging to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. blog-post-intro-to-machine-learning#k-means-voronoi-cells1 2

Supervised Learning

blog-post-intro-to-machine-learning#supervised-types1The two main types of supervised machine learning are regression and classification. blog-post-intro-to-machine-learning#supervised-types1

blog-post-intro-to-machine-learning#regression-continuous-data1 2supervised: For instance a regression model is used for the prediction of continuous data such as predicting housing prices based on historical data points and trends. blog-post-intro-to-machine-learning#regression-continuous-data1 2

blog-post-intro-to-machine-learning#classification-modelused for the prediction of categorical data, for example assigning discrete class labels in an image classification model that labels the image as a person or landscape blog-post-intro-to-machine-learning#classification-model

types of supervised algorithms

blog-post-intro-to-machine-learning#types-of-supervised-algorithms1 2There are many types of supervised algorithms available, one of the most popular ones is the Naive Bayes model which is often a good starting point for developers since it's fairly easy to understand the underlying probabilistic model and easy to execute. blog-post-intro-to-machine-learning#types-of-supervised-algorithms1 2

blog-post-intro-to-machine-learning#decision-trees-predictive-model1 2Decision trees are also a predictive model and have two types of trees: regression (which take continuous values) and classification models (which take finite values) and use a divide and conquer strategy that recursively separates the data to generate the tree. blog-post-intro-to-machine-learning#decision-trees-predictive-model1 2

blog-post-intro-to-machine-learning#neural-networks1 2Neural networks is a model inspired by how biological neural networks solve problems and can either be supervised or unsupervised. Neural networks that are supervised have a known output and are built in layers of interconnected weighted nodes with an output layer that gives us a known output such as an image label. blog-post-intro-to-machine-learning#neural-networks1 2

Naive Bayes classification

blog-post-intro-to-machine-learning#naive-bayes-classification1 2Naïve Bayes Classification is an algorithm that attempts to make predictions based on previously labeled data using a probabilistic model. Features are independent of each other meaning that one feature doesn't impact the value of another feature and a set of labels are considered and assigned in advance. blog-post-intro-to-machine-learning#naive-bayes-classification1 2

blog-post-intro-to-machine-learning#feature-independence1 2Features are independent of each other meaning that one feature doesn't impact the value of another feature and a set of labels are considered and assigned in advance. blog-post-intro-to-machine-learning#feature-independence1 2

blog-post-intro-to-machine-learning#feature-detection-is-decided-in-advance1Naive Bayes: feature detection is decided in advance blog-post-intro-to-machine-learning#feature-detection-is-decided-in-advance1

Conventional Validation

blog-post-intro-to-machine-learning#training-set1Training Set: Fit the model based on known data blog-post-intro-to-machine-learning#training-set1

blog-post-intro-to-machine-learning#validation-set-for-parameter-tuning1 2Validation Set: Used for parameter tuning – choose model complexity blog-post-intro-to-machine-learning#validation-set-for-parameter-tuning1 2

blog-post-intro-to-machine-learning#hyperparameters1Hyperparameters: can be done by setting different values and choosing which tests better or via statistical methods blog-post-intro-to-machine-learning#hyperparameters1

blog-post-intro-to-machine-learning#hyperparameters-num-clusters1hyperparameters: Number of clusters in k-means: in our K-means example we used the elbow method. blog-post-intro-to-machine-learning#hyperparameters-num-clusters1

blog-post-intro-to-machine-learning#hyperparameters-num-leaves1hyperparameters: Number of leaves in a decision tree blog-post-intro-to-machine-learning#hyperparameters-num-leaves1

blog-post-intro-to-machine-learning#test-set1 2Test Set: Assess model after model has been run on the training set – run confusion matrix to find errors and compare models blog-post-intro-to-machine-learning#test-set1 2

Cross validation

blog-post-intro-to-machine-learning#cross-validation-definition1Cross validation methods help to understand how a model will generalize to unseen data and are used for smaller datasets. blog-post-intro-to-machine-learning#cross-validation-definition1

blog-post-intro-to-machine-learning#k-fold-cross-validation1k-folds cross-validation blog-post-intro-to-machine-learning#k-fold-cross-validation1

blog-post-intro-to-machine-learning#leave-one-out1 2leave one out cross-validation blog-post-intro-to-machine-learning#leave-one-out1 2

blog-post-intro-to-machine-learning#leave-p-out1leave p out cross-validation blog-post-intro-to-machine-learning#leave-p-out1

blog-post-intro-to-machine-learning#repeated-random-subsampling1repeated random sub-sampling validation blog-post-intro-to-machine-learning#repeated-random-subsampling1

K-fold cross validation

K-fold cross-validation follows these steps:

blog-post-intro-to-machine-learning#k-folds-training-set-and-validation1K-folds cross-validation: Training data set is split into subsets of data – one as the test set, the remaining datasets are for training blog-post-intro-to-machine-learning#k-folds-training-set-and-validation1

blog-post-intro-to-machine-learning#k-folds-averages1K-folds cross-validation: Averages error rate over rounds to estimate model performance. blog-post-intro-to-machine-learning#k-folds-averages1

References lecture-notes-theoretical-machine-learning

Referring Pages

machine-learning-glossary testing-concept-two-sets-of-tests