Spring 2017
Updated notes will be available here as ppt and pdf files after the lecture. Older lecture notes are provided before the class for students who want to consult it before the lecture. Pointers to relevant material will also be made available 
I assume you look at least at the Reading and the *ed references.
The dates next to the lecture notes are tentative; some of the material as well as the order of the lectures may change during the semester.

Lecture #0: Course Introduction and
Motivation, pdf
Reading: Mitchell, Chapter 1

Lecture #1: Introduction to Machine
Learning, pdf
Also see: Weather  Whether Example
Reading: Mitchell, Chapter 2

Tutorial: Building a Classifier with Learning Based Java,
pdf,
pdf2
Walkthrough on using LBJava with examples.

Lecture #2: Decision Trees, pdf
Additional notes: Experimental
Evaluation
Reading: Mitchell, Chapter 3
References

J. Quinlan, "Induction of Decision Trees". Machine Learning, 1:81106,
1986.

(*)
R. Rivest, "Learning Decision Lists". Machine Learning, 2(3):229246,
1987.
(link)

J. Quinlan and R. Rivest, "Inferring Decision Trees Using the Minimum
Description Length Principle". Information and Computation,
80:227248, 1989.

T. Dietterich, "Approximate Statistical Tests for Comparing Supervised
Classification Learning Algorithms", Neural Computation 10(7), 1998.
 Learning Rules + ILP
(used to be Lecture #3, will not be covered in Fall 2016)
Reading: Mitchell, Chapter 10
References

(*)
W. Cohen, "Fast Effective Rule Induction". ICML, 1995.
(citeseer)

W. Cohen and Y. Singer, "A Simple, Fast, and Effective Rule Learner".
AAAI, 1999.
(link)

Bratko, I. and Muggleton, S. "Applications of Inductive Logic
Programming". Commun. ACM 38, 11 (Nov. 1995), 6570.
(acm)
 Lecture #4: OnLine Learning: Winnow, Perceptron:
P1.pptx, P2.pptx,P1.pdf,P2.pdf, notes(1) notes(2) notes(3)
References

(*)
D. Roth, "OnLine Learning of Linear Functions (course notes)".
2000.
(
.pdf)

(*)
J. Kivinen and M. Warmuth, "The Perceptron Algorithm vs. Winnow:
Linear vs. Logarithmic Mistake Bounds when few Input Variables are
Relevant". 1995.
(link)

A. Blum, "OnLine Algorithms in Machine Learning". 1996.
(link)

(*)
A. Blum, "Learning Boolean Functions in an Infinite Attribute
Space". Machine Learning, 9(4):373386, 1992.
(.ps)

R. Khardon, D. Roth, and R. Servedio, "Efficiency versus Convergence
of Boolean Kernels for OnLine Learning Algorithms". NIPS, 2001.
(link)

(*)
Y. Freund and R. Schapire, "Large Margin Classification Using the
Perceptron Algorithm". COLT, 1998.
(link)

N. Littlestone, "Learning Quickly When Irrelevant Attributes Abound".
Machine Learning 2(4):285318, 1988.
(link)

Adam J. Grove, Nick Littlestone, Dale Schuurmans, "General Convergence
Results for Linear Discriminant Updates". Machine Learning 43(3):
173210 (2001)
link

Shai BenDavid and Hans Ulrich Simon,
"Efficient Learning of Linear Perceptrons", NIPS 2000
(link)

Large Margin Winnow Methods for Text Categorization, Tong Zhang
(.ps)

Tong Zhang and Frank J. Oles. Text categorization based on regularized
linear classification methods. Information
Retrieval, 4:531, 2001.
 R. Khardon and G. Wachman,
Noise Tolerant Variants of the Perceptron
Algorithm, Journal of Machine Learning
Research , Vol 8, pp 227248, 2007
 K. Crammer, O. Dekel, J. Keshet, S.
ShalevShwartz, and Y. Singer. Online
PassiveAggressive Algorithms. (link)
 John Duchi, Elad Hazan, and Yoram Singer.
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. JMLR. 12 (July 2011), 21212159.
(pdf)

Lecture #5: Computational Learning
Theory, pdf
Reading: Mitchell, Chapter 7
References

Kearns and Vazirani,
Introduction to Computational Learning Theory

(*)
L. Valiant, "A Theory of the Learnable". CACM, pg 11341142, 1984 (link)

L. Pitt and L. Valiant, "Computational Limitations on Learning From
Examples". JACM, 35(4):965984, 1988.
(.pdf)

A. Blumer, A. Ehrenfeucht, D. Haussler, and M. Warmuth, "Learnability
and the VapnikChervonenkis Dimension". JACM, 36(4):929965, 1987.
(.pdf)

V. Vapnik and A. Chervonenkis, "On the Uniform Convergence of Relative
Frequencies of Events to Their Probabilities". Theoretical Probability
and Its Applications, 16(2):264280, 1971.
(link)

(*)
David Haussler: Quantifying Inductive Bias: AI Learning Algorithms
and Valiant's Learning Framework. Artif. Intell. 36(2): 177221 (1988)
(link)

David Haussler: Learning Conjunctive Concepts in Structural Domains.
Machine Learning 4: 740 (1989)
(link)

Lecture #6: Neural Networks, NNP1.pptx, NNP1.pdf, NNP2.pptx, NNP2.pdf
References

Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams. "Learning representations by backpropagating errors." Cognitive modeling 5 (1988): 3.
(link)

Barron, Andrew R. Approximation and estimation bounds for artificial neural networks. Machine Learning, 14: 115133, 1994.
(link)

Livni, Roi, Shai ShalevShwartz, and Ohad Shamir. "On the computational efficiency of training neural networks." In Advances in Neural Information Processing Systems, pp. 855863. 2014. (link)
 Presentation: "On the computational complexity of deep learning", by Shai ShalevShwartz in 2015 (link)

Blum, Avrim L., and Ronald L. Rivest. "Training a 3node neural network is NPcomplete." In Machine learning: From theory to applications, pp. 928. Springer Berlin Heidelberg, 1993. (link)

Lecture #6: Boosting, pdf,
Formal View
References

Robert E. Schapire, "The strength of Weak Learnability".
Machine Learning 5(2):197227, 1990

Yoav Freund and Robert E. Schapire, "A decisiontheoretic
generalization of online learning and an application to
boosting". Journal of Computer and System Sciences,
55(1):119139, 1997. (.ps)

Erin L. Allwein, Robert E. Schapire and Yoram Singer, "Reducing
multiclass to binary: A unifying approach for margin
classifiers". Journal of Machine Learning Research, 1:113141,
2000. (.pdf)

Robert E. Schapire, Yoav Freund, Peter Bartlett and Wee Sun Lee,
"Boosting the margin: a new explanation for the effectiveness of
voting methods". The Annals of Statistics, 26(5):16511686,
1998. (.ps)

Lecture #7: Multiclass Classification,
pdf
References
 Sariel Harpeled, Dan Roth and Dav Zimak,
" Constraint classification for multiclass classification and ranking".
NIPS2003. (.pdf)
 Midterm Review,
pdf
 Midterm Exam (during class on Tue, Oct 25th)

Lecture #8: Support Vector Machines,
pdf
Additional Notes on Optimization and
SVMs
Additional Notes on Logistic Regression and
SVMs
References

C.J. Lin, Optimization, Support Vector Machines, and Machine
Learning. Talk in DIS, University of Rome and IASI, CNR,
Italy. September 12, 2005.
(slides)

C. Burges, "A Tutorial on Support Vector Machines for Pattern
Recognition". Data Mining and Knowledge Discovery, 2(2):121167,
1998.
(citeseer)

Lecture #9: Bayesian
Learning,
pdf
Additional Notes: naive Bayes (1) pdf ,
naive Bayes (2) pdf
Reading: Mitchell, Chapter 6

Lecture #10: The EM Algorithm,
pdf

Lecture #11: Learning Probability Distributions,
pdf

Lecture #12: Clustering,
pdf
 Final Review,
pdf
 Final Exam (during class on Tue, Dec 6th)
Dan Roth