# 12/02/17: Machine Learning at Coursera: Week 10 Large Scale Machine Learning

• e.g. Census data, Website traffic data
• Can we train on 1000 examples instead of 100 000 000? Plot
• If high variance, add more examples
If high bias, add extra features

Gradient Descent with Large Datasets

• G.D. = batch gradient descent
• Stochastic Gradient Descent
• cost function = cost of theta wrt a specific example (x^i, y^i). Measures how well the hypothesis works on that example.
• May need to loop over the entire dataset 1-10 times

• Batch gradient descent: Use all m examples in each iteration
• Stochastic gradient descent: Use 1 example in each iteration
• Mini-batch gradient descent: Use b examples in each iteration
• typical range for b = 2-100 (10 maybe)
• Mini-batch Gradient Descent allows vectorized implementation
Can partially parallelize the computation

Stochastic G.D. convergence

• every 1000 iterations we can plot the costs averaged over te last 1000 examples
• Learning Rate, smaller learning rate means smaller oscillations (plot)
average over more examples, 5000, may get a smoother curve
• If curve is increasing, should use smaller learning rate
• Learning Rate
alpha = const 1 / ( iterationNumer + const2 )

Online Learning

• continuous stream of data
• e.g. 1. shipping service, from origin and destination, optimize the price we offer
• x = feature vector (price, origin, destination)
y = if they chose to use our service or not
• e.g. 2. product search
• input: “Android phone 1080p camera”
• we want to offer 10 phones per query
• learning predicted click through rate (CTR)

Map Reduce and Data Parallelism

• Use local CPU to look at local data
• Massive data parallelism
• Free text, unstructured data
• sentiment analysis
• NoSQL
• MongoDB

Sources

# 10/02/17: Machine Learning at Coursera: Week 9

Anomaly Detection Density Estimation

Anomaly Detection

Gaussian distribution

• e.g. Aircrafts engines features: heat generated, vibration intensity
• e.g. servers: memory usage, cpu load, cpu load / network traffic

Building Anomaly Detection system
Developing and Evaluating
vs. Supervised Learning
What Features to Use

Multivariate Gaussian Distribution

Recommender Systems Content Based Recommendations

• r(i, j): 1 if user j has rated movie i
• y(i, j): rating user j to movie i

Collaborative Filtering

• Symmetry breaking

Low-rank Matrix Factorization

# 24/01/17: Machine Learning at Coursera: Week 8 Unsupervised Learning & K-means

• Clustering Algorithms, K-means Algorithm
• Centroids
• K-means for non-separated clusters
• Random initialisation
• Elbow method

Dimensionality Reduction

• 2D -> 1D
• Data Compression to speedup training as well as visualizations of complex datasets
• Indexes (e.g. GDP, Human Development Index)
• Principal Component Analysis (PCA), projection
• Data Preprocesing. Scaling, normalization
• [U, S, V] = svd(sigma)
• U = covariance matrix
• Reconstruction from compressed representation

# 24/01/17: Machine Learning at Coursera: Week 7 This week is about Support Vector Machine (SVM).

First we will learn about Large Margin Classification, in reference to the larger minimum distance from any of the training samples.

We will study Kernels, and the adaptation to non-linear classifiers.

Choosing landmarks will also be covered.

C parameter will be studied.

Similarity and Gaussian Kernels are also main keywords of this session.

We will get good advice about using SVM vs Logistic Regression vs Neural Networks.

# 30/12/16: Machine Learning at Coursera: Week 6

Advice for Applying Machine Learning What to try next? More samples? Smaller sets? More Complex features? Decreasing Lambda?

We will learn to use Test Sets and Cross Validation Set.

In this lesson is presented the powerful tool Bias vs. Variance.

Machine Learning System Design Building a Spam Classifier, with this example we will learn to Prioritize what to work on and the Best use of our time.

Plotting learning curves will help us to grade our work and pivot our working path.

We will also learn to use a method for Error Analysis. Developing intuition with samples related to errors. Numerical evaluation would be an important tool for us.

For Skewed Classes we will get Precision and Recall. F1 Score measures the trade off between them.

# 29/12/16: Machine Learning at Coursera: Week 5 Fifth week at this course is about Neural Networks

The initial topic is the detailed study about Cost Function.

Back Propagation and Forward Propagation are also explained.

The second part of this session is about Back-propagation in Practice

The lesson covers Unrolling Parameters (into vectors). Using reshape in MATLAB.

Gradient Checking is explained and also it is recommended to turn it off for training.

Random Initialization is the method used for Symmetry Breaking.

Last part of session is about putting all these together.

# 17/10/16: Machine Learning at Coursera: Week 4 Week 4 was about Neural Networks

We started reviewing about Neuros and the Brain

The model representation was about Input Layer, Hidden Layers, Output Layer.

Units can be found in each layer

Multi-class Classification is an application of NNs

# 14/10/16: Machine Learning at Coursera: Week 3 Third week we started with Logistic Regression, used for Classification problem

In the context of Representation Model we studied the Cost Function and Gradient Descent

Around Multi-class Classification we reviewed One-vs-all

Solving the Problem of Overfitting, we used Regularization

# 13/10/16: Machine Learning at Coursera: Week 2 This week we started exploring MATLAB.

Multivariate Linear Regression

We study about multiple features for linear regression. It’s necessary to use Feature Scaling for a better regression. Adjusting the Learning Rate is another key action.

Polynomial Regression is a more complex model type.

Parameters can be analytically computed using Normal Equation.

MATLAB

Beyond Basic Operations, Plotting Data is a good resource to understand models, learning curves, algorithm behaviour.

Vectorization is also a necessary technique to get good algorithm efficiency.

# 13/10/16: Machine Learning at Coursera: Week 1 The first week there were presented the Supervised and Unsupervised Learning. Regression and Classification were studied as applications of Supervised Learning. Cluster detection was mentioned as an Unsupervised Learning application.

Linear Regression

There was explained the Model function and also the Cost function.
It was nice how Parameter Learning was obtained with Gradient Descent.

Linear Algebra Review

The review was about Vector and Matrices, operations such as Scalar Multiplication, Matrix-Matrix Multiplication, Inverse and Transpose matrix.