Giter VIP home page Giter VIP logo

prmlt's Introduction

Introduction

This Matlab package implements machine learning algorithms described in the great textbook: Pattern Recognition and Machine Learning by C. Bishop (PRML).

It is written purely in Matlab language. It is self-contained. There is no external dependency.

Note: this package requires Matlab R2016b or latter, since it utilizes a new Matlab syntax called Implicit expansion (a.k.a. broadcasting). It also requires Statistics Toolbox (for some simple random number generator) and Image Processing Toolbox (for reading image data).

Design Goal

  • Succinct: The code is extremely compact. Minimizing code length is a major goal. As a result, the core of the algorithms can be easily spotted.
  • Efficient: Many tricks for speeding up Matlab code are applied (e.g. vectorization, matrix factorization, etc.). Usually, functions in this package are orders faster than Matlab builtin ones (e.g. kmeans).
  • Robust: Many tricks for numerical stability are applied, such as computing probability in logrithm domain, square root matrix update to enforce matrix symmetry\PD, etc.
  • Readable: The code is heavily commented. Corresponding formulas in PRML are annoted. Symbols are in sync with the book.
  • Practical: The package is not only readable, but also meant to be easily used and modified to facilitate ML research. Many functions in this package are already widely used (see Matlab file exchange).

Installation

  1. Download the package to a local folder (e.g. ~/PRMLT/) by running:
git clone https://github.com/PRML/PRMLT.git
  1. Run Matlab and navigate to the folder (~/PRMLT/), then run the init.m script.

  2. Run some demos in ~/PRMLT/demo folder. Enjoy!

FeedBack

If you find any bug or have any suggestion, please do file issues. I am graceful for any feedback and will do my best to improve this package.

License

Released under MIT license

Contact

sth4nth at gmail dot com

prmlt's People

Contributors

cheerconi avatar sth4nth avatar weilinear avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

prmlt's Issues

Label output from logitBin.m seems inverted

Hi Mo,

First, thanks so much for your efforts here.

I want to ask if there is a problem with logitBinPred.m. Output seems inverted (i.e. 0 should be 1, and 1 should be 0).

I need to delve further into the script itself, but please run this test script as an illustrative example.

I create a vector of labels, and features that are linearly related to the labels.

The logistic regression should be able to easily classify these features; I've confirmed with MATLAB built-in functions and other tools like libsvm.

If I invert the output from logitBinPred.m, the performance dramatically improves to expected levels.

Am I missing something? I'll explore your script more.

Thanks.

chapter10: EP

Add expectation propagation for MRF and factor graph (maybe also GMM).

Shouldn't there be biases in the example from Chapter 5?

I am currently unable to make it work for a regression problem. Your example (a classification problem) runs well for me, but I wonder if shouldn't it have biases for increased performance as the book recommends.
Thank you in advance, and best regards!

chapter08: BP

Add belief propagation for MRF and factor graph (maybe also Bayesian Network).

Add subspace method for learning LDS

The EM method for learning LDS in PRML is hardly working without a good initialization. The subspace method can be used for a good initialization. This method is not in PRML but BRML.

Exercise 03

Answer B-1 (Project Topic):

Timothée and I have decided to form a team. For our project, we would like to implement a facial emotion recognition social application based on facial pictures. This app would work as follows on the user's side :

  • The customer takes a picture of her face expressing a specific emotion
  • Our software automatically detects the emotion expressed
  • The user is asked to confirm the detected emotion
  • The user chooses who she wants to share her emotion with amongst her friends
  • The user sends the picture of herself
  • Her friends receive the labelled picture and can reply with another picture.

We found the following page listing a lot of facial recognition oriented databases :
http://www.face-rec.org/databases/

Amongst them, one stood out for us : the Extended Cohn-Kanade Dataset.
(Database direct link).

There is at leat one associated paper.
See page 4 and after for the methodology used.

The software would probably be divided into two main parts :

  • Detection of meaningfull "emotion-points" on the picture and computing of there relative distance (normalized to remove size effects)
    • Method : SVM probably
    • Test method : do tests on previously cited database (Cohn-Kanade)
  • Emotion classification based on the relative distance of the different "emotion-points"
    • Method : K-nearest neighbors ?
    • Test method : on previously cited database (Cohn-Kanade)

All operations will probably be made on grayscale version of the picture, maybe compressed for perfommance issues.
Notice that both parts of the software can be independently tested and developped, which is a great advantage.
Moreover in case we lack the time to finish the whole project, we can at least finish one or two independent parts which are still usefull on their own.

The face database used for classification will most likely be stored on a remote server which will also do at least the intensive computing part and maybe the communication (sending and receiving pictures) too. On the long term this application could enable smartphone companies to monitor their customer's mood based on images of their camera (with their agreement of course). This way, they can adjust the behaviour of their software depending on how their client is feeling. Another way this application could be usefull would be for autist people who have a hard time reading the emotions of people on their face. This could help them interact better with people around them.

Answer B-2:

The first step is to load the packages ggplot2, dplyr, tidyr, maps, ggmap and scales. Then, we filter only the longitude and the latitude of the crimes occured in 2015, capture the map from google and generate the scatterplot as follows:

remove(list=ls())
if (Sys.getenv("JAVA_HOME")!="")
  Sys.setenv(JAVA_HOME="")
library(rJava)
library(ggplot2)
library(dplyr)
library(tidyr)
library(maps)
library(ggmap)
library(scales)
library(MASS)
library(rgl)
ch <- read.table("https://dl.dropboxusercontent.com/s/c5c6un2m1fv0a37/ch.txt?dl=0", sep = "\t", header = T)
## Selecting only crimes of year 2015:
ch$Year[ch$Year != 2015] <- NA
ch <- na.omit(ch)
## Removing empty locations and splitting Location into Latitude and Longitude:
ch$Location[ch$Location == ''] <- NA
ch <- na.omit(ch)
ch <- ch %>% extract(Location, c('Latitude', 'Longitude'), '\\(([^,]+), ([^)]+)\\)')
ch$Longitude <- round(as.numeric(ch$Longitude), 2)
ch$Latitude <- round(as.numeric(ch$Latitude), 2)
lon <- ch$Longitude
lat <- ch$Latitude
mymap = get_map(location = c(mean(lon), mean(lat)),source = "google",zoom=11)

Next, we extract the frequency of crimes happend in each geographical coordinates of chicago, create the heatmap plot and indicate the crimes by colour red in a 2D pichture saved as a png file format in MTH6312 folder on your desktop:

## Get crime locations:
locationCrimes <- as.data.frame(table(ch$Longitude, ch$Latitude))
names(locationCrimes) <- c('long', 'lat', 'Frequency')
locationCrimes$long <- as.numeric(as.character(locationCrimes$long))
locationCrimes$lat <- as.numeric(as.character(locationCrimes$lat))
locationCrimes <- subset(locationCrimes, Frequency > 0)

## 2D Plotting the location heatmap
png(filename = "C:/Users/User/Desktop/MTH6312/Chicagomap.png", width = 800, height = 600, units = "px")
ggmap(mymap) + geom_tile(data = locationCrimes, aes(x = long, y = lat, alpha = Frequency), fill = "red") + theme(axis.title.y = element_blank(), axis.title.x = element_blank())
dev.off()

The created map is:

Answer B-3:

Here, the univariate density of 2015 crimes over latitude and longitude are given by the following codes:

plot(density(lat),main = "Univariate Density of Lat.")
plot(density(lon),main = "Univariate Density of Lon.")

Answer B-4:

Finally, the 3D plot is generated using the following code:

## 3D Plotting the location heatmap:
z = kde2d(lon, lat, n = 50)
persp(z, xlab = "Longitude", ylab = "Latitude", zlab = "Density", phi = 45, shade = 0.35)

All in all, for overlaying the 3D density on the google map directly using the rgl package we have:

mymap2 = as.matrix(mymap)
nx = dim(mymap2)[2]
ny = dim(mymap2)[1]
xmin = min(lon) 
xmax = max(lon) 
ymin = min(lat)
ymax = max(lat)
xc = seq(xmin, xmax, len = ny)
yc = seq(ymin, ymax, len = nx)
colours = matrix(mymap2, ny, nx)
m = matrix(0, ny, nx)
surface3d(xc, yc, m, col = colours)
#density
z = kde2d(lon, lat, n = 50)
z$z = z$z/100
surface3d(z$x,z$y,z$z, col = colours)

Chapter 7: Use similar data as demo, but not work

Hallo, thank you for your all efforts.
Recently I met a problem for remBinFp function. I used my data with size 21000 and label with 11000. But something happened and the output model.index is 3, while in your demo it's 3. Therefore, binPlot func doesn't work.
Could you please take a look for me? I upload the data.mat in google drive. Thanks

%% RVM for classification
load('data.mat');
[model, llh] = rvmBinFp(X, y);
plot(llh);
y = rvmBinPred(model,X)+1;
figure;
binPlot(model,X,y);
---------------------
Index exceeds matrix dimensions.

Error in binPlot (line 25)
y = w(1)*x1+w(2)*x2+w(3);

Chapter02: LogGauss error

There are some errors in logGauss.m

 X = [1 2 4.0]; mu = 1; sigma=2
logGauss(X, mu, sigma)

The result is

   -1.2655   -1.5155   -3.5155

while using the build-in function normpdf provides a different result

normpdf(X, mu, sigma)

ans =

    0.1760    0.1995    0.1210

Chapter 01, relatEntropy: Cannot operate Px and Py together

I have been testing this library with two dummy matrixes:

a = [[10, 73, 55, 83, 84, 2, 9, 8, 88, 48];
[21, 76, 4, 31, 100, 51, 63, 44, 91, 97];
[80, 19, 82, 53, 40, 45, 10, 79, 94, 11];
[69, 57, 60, 63, 90, 86, 4, 31, 41, 55];
[47, 7, 41, 1, 54, 51, 33, 74, 29, 31];
[90, 69, 65, 78, 9, 12, 65, 11, 77, 16];
[41, 67, 37, 22, 42, 80, 59, 18, 95, 43];
[66, 14, 100, 21, 2, 94, 68, 93, 28, 33];
[96, 90, 71, 80, 91, 86, 80, 79, 38, 56];
[31, 17, 20, 27, 38, 16, 39, 78, 28, 19]];

b = [[22, 34, 28, 56, 62, 72, 51, 32, 14, 6];
[57, 30, 53, 75, 19, 33, 23, 46, 2, 52];
[91, 51, 34, 28, 22, 4, 84, 54, 37, 12];
[70, 85, 28, 39, 81, 82, 12, 79, 2, 65];
[89, 95, 38, 76, 11, 5, 100, 42, 48, 17];
[75, 85, 99, 50, 68, 30, 76, 91, 14, 7];
[45, 55, 76, 67, 20, 31, 52, 13, 85, 56];
[12, 53, 45, 91, 38, 54, 48, 88, 96, 15];
[94, 53, 85, 34, 54, 13, 28, 66, 20, 37];
[60, 63, 51, 30, 26, 80, 97, 44, 74, 21]];

All functions work fine with them, except for relatEntropy. In lines 21 and 22 the vectors end up having different lengths (due to different number of nonzero values) and they cannot be substracted in line 24. MATLAB throws Matrix dimensions must agree. Error in relatEntropy (line 24) z = -dot(Px,log2(Py)-log2(Px));

Tweak NN

  1. add bias term
  2. correct unit of gradient

LDS EM is numerical unstable

The EM algorithm for fitting LDS described in the book PRML (ch13) is numerical (very) unstable, that the covariance matrices are often singular during iterations.

There is no easy way to fix this. Two options are
(1) Leave it as it is in the book, so that people can learn it but not use it in practice.
(2) write stable algorithm, which is quite different to the one in the book.

Current status is (1).

Errors in code for chapter 03

>> demo
Error using regress (line 62)
Y must be a vector and must have the same number of rows as X.

Error in demo (line 12)
model = regress(X, t);

Compatibility with GNU Octave

Most of the code including demos runs on Octave, but not everything.
e.g: randg as used in chapter03/linRnd.m doesn't work (even with the statistics forge package)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.