10 more lessons learned from building Machine Learning systems
A Tour of Machine Learning Algorithms
There are only a few main learning styles or learning models that an algorithm can have and we’ll go through them here with a few examples of algorithms and problem types that they suit. This taxonomy or way of organizing machine learning algorithms is useful because it forces you to think about the the roles of the input data and the model preparation process and select one that is the most appropriate for your problem in order to get the best result.
- Supervised Learning: Input data is called training data and has a known label or result such as spam/not-spam or a stock price at a time. A model is prepared through a training process where it is required to make predictions and is corrected when those predictions are wrong. The training process continues until the model achieves a desired level of accuracy on the training data. Example problems are classification and regression. Example algorithms are Logistic Regression and the Back Propagation Neural Network.
- Unsupervised Learning: Input data is not labelled and does not have a known result. A model is prepared by deducing structures present in the input data. Example problems are association rule learning and clustering. Example algorithms are the Apriori algorithm and k-means.
- Semi-Supervised Learning: Input data is a mixture of labelled and unlabelled examples. There is a desired prediction problem but the model must learn the structures to organize the data as well as make predictions. Example problems are classification and regression. Example algorithms are extensions to other flexible methods that make assumptions about how to model the unlabelled data.
When crunching data to model business decisions, you are most typically using supervised and unsupervised learning methods. A hot topic at the moment is semi-supervised learning methods in areas such as image classification where there are large datasets with very few labelled examples. Reinforcement learning is more likely to turn up in robotic control and other control systems development.
- Reinforcement Learning: Input data is provided as stimulus to a model from an environment to which the model must respond and react. Feedback is provided not from of a teaching process as in supervised learning, but as punishments and rewards in the environment. Example problems are systems and robot control. Example algorithms are Q-learning and Temporal difference learning.
What is machine learning?
Implementing a Distributed Deep Learning Network over Spark – Data Science Central
Graphics in reverse: Probabilistic programming does in 50 lines of code what used to take thousands.
In a probabilistic programming language, the heavy lifting is done by the inference algorithm — the algorithm that continuously readjusts probabilities on the basis of new pieces of training data. In that respect, Kulkarni and his colleagues had the advantage of decades of machine-learning research. Built into Picture are several different inference algorithms that have fared well on computer-vision tasks. Time permitting, it can try all of them out on any given problem, to see which works best.
Moreover, Kulkarni says, Picture is designed so that its inference algorithms can themselves benefit from machine learning, modifying themselves as they go to emphasize strategies that seem to lead to good results. “Using learning to improve inference will be task-specific, but probabilistic programming may alleviate re-writing code across different problems,” he says. “The code can be generic if the learning machinery is powerful enough to learn different strategies for different tasks.”
Peter Norvig: Machine Learning for Programming
Q: Can we learn complex nontraditional programs from examples?
A: Not yet, maybe someday.
Q: Can we learn to optimize programs?
A: Yes, short parts
"A Deep Dive into Recurrent Neural Nets" - great follow-up article on "Deep Learning in a Nutshell" via @nkbuduma http://t.co/JL2jG1NVB7
— Sebastian Raschka (@rasbt) January 19, 2015
17 Great Machine Learning Libraries
Python
- Scikit-learn: comprehensive and easy to use, I wrote a whole article on why I like this library.
- PyBrain: Neural networks are one thing that are missing from SciKit-learn, but this module makes up for it.
- nltk: really useful if you’re doing anything NLP or text mining related.
- Theano: efficient computation of mathematical expressions using GPU. Excellent for deep learning.
- Pylearn2: machine learning toolbox built on top of Theano - in very early stages of development.
- MDP (Modular toolkit for Data Processing): a framework that is useful when setting up workflows.
Java
- Spark: Apache’s new upstart, supposedly up to a hundred times faster than Hadoop, now includes MLLib, which contains a good selection of machine learning algorithms, including classification, clustering and recommendation generation. Currently undergoing rapid development. Development can be in Python as well as JVM languages.
- Mahout: Apache’s machine learning framework built on top of Hadoop, this looks promising, but comes with all the baggage and overhead of Hadoop.
- Weka: this is a Java based library with a graphical user interface that allows you to run experiments on small datasets. This is great if you restrict yourself to playing around to get a feel for what is possible with machine learning. However, I would avoid using this in production code at all costs: the API is very poorly designed, the algorithms are not optimised for production use and the documentation is often lacking.
- Mallet: another Java based library with an emphasis on document classification. I’m not so familiar with this one, but if you have to use Java this is bound to be better than Weka.
- JSAT: stands for “Java Statistical Analysis Tool” - created by Edward Raff and was born out of his frustation with Weka (I know the feeling). Looks pretty cool.
http://www.reddit.com/r/MachineLearning/
AMA Andrew Ng and Adam Coates:
Linear/logistic regression and k-means clustering are probably the dominant paradigms in ML, and likely will always be. There's just too much bang for the buck.
No comments:
Post a Comment