Machine Learning (Theory)
Machine learning and learning theory research
Mass Customized Medicine in the Future?
This post is about a technology which could develop in the future.Right now, a new drug might be tested by finding patients with some diagnosis and giving or not giving them a drug according to a secret randomization. The outcome is observed, and if the ...
~ published: 08/24 at 19:00 ~ permalink
Radford Neal starts a blog
here on statistics, ML, CS, and other things he knows well....
~ published: 08/18 at 19:32 ~ permalink
Electoralmarkets.com
Lance reminded me about electoralmarkets today, which is cool enough that I want to point it out explicitly here. Most people still use polls to predict who wins, while electoralmarkets uses people betting real money. They might use polling information,...
~ published: 08/04 at 18:51 ~ permalink
Compositional Machine Learning Algorithm Design
There were two papers at ICML presenting learning algorithms for a contextual bandit-style setting, where the loss for all labels is not known, but the loss for one label is known. (The first might require a exploration scavenging viewpoint to understan...
~ published: 07/26 at 09:44 ~ permalink
Interesting papers at COLT (and a bit of UAI & workshops)
Here are a few papers from COLT 2008 that I found interesting.Maria-Florina Balcan, Steve Hanneke, and Jenn Wortman, The True Sample Complexity of Active Learning. This paper shows that in an asymptotic setting, active learning is always better than supe...
~ published: 07/15 at 04:22 ~ permalink
Interesting papers, ICML 2008
Here are some papers from ICML 2008 that I found interesting. Risi Kondor and Karsten Borgwardt, The Skew Spectrum of Graphs. This paper is about a new family of functions on graphs which is invariant under node label permutation. They show that these q...
~ published: 07/10 at 02:10 ~ permalink
To Dual or Not
Yoram and Shai’s online learning tutorial at ICML brings up a question for me, “Why use the dual?”The basic setting is learning a weight vector wi so that the function f(x)= sumi wi xi optimizes some convex loss function.The functional v...
~ published: 07/06 at 23:26 ~ permalink
More Presentation Preparation
We’ve discussed presentation preparation before, but I have one more thing to add: transitioning. For a research presentation, it is substantially helpful for the audience if transitions are clear. A common outline for a research presentation in m...
~ published: 07/04 at 08:01 ~ permalink
Proprietary Data in Academic Research?
Should results of experiments on proprietary datasets be in the academic research literature?The arguments I can imagine in the “against” column are: Experiments are not repeatable. Repeatability in experiments is essential to science becaus...
~ published: 07/02 at 10:36 ~ permalink
ICML has a comment system
Mark Reid has stepped up and created a comment system for ICML papers which Greger Linden has tightly integrated. My understanding is that Mark spent quite a bit of time on the details, and there are some cool features like working latex math mode. Thi...
~ published: 06/30 at 05:54 ~ permalink
Reviewing Horror Stories
Essentially everyone who writes research papers suffers rejections. They always sting immediately, but upon further reflection many of these rejections come to seem reasonable. Maybe the equations had too many typos or maybe the topic just isn’t a...
~ published: 06/27 at 13:18 ~ permalink
The Minimum Sample Complexity of Importance Weighting
This post is about a trick that I learned from Dale Schuurmans which has been repeatedly useful for me over time.The basic trick has to do with importance weighting for monte carlo integration. Consider the problem of finding:N = Ex ~ D f(x)given samples...
~ published: 06/09 at 16:47 ~ permalink
Inappropriate Mathematics for Machine Learning
Reviewers and students are sometimes greatly concerned by the distinction between: An open set and a closed set.A Supremum and a Maximum.An event which happens with probability 1 and an event that always happens.I don’t appreciate this distinction i...
~ published: 05/25 at 08:33 ~ permalink
Three levels of addressing the Netflix Prize
In October 2006, the online movie renter, Netflix, announced the Netflix Prize contest. They published a comprehensive dataset including more than 100 million movie ratings, which were performed by about 480,000 real customers on 17,770 movies. Compet...
~ published: 05/23 at 10:03 ~ permalink
Concerns about the Large Scale Learning Challenge
The large scale learning challenge for ICML interests me a great deal, although I have concerns about the way it is structured.From the instructions page, several issues come up:Large Definition My personal definition of dataset size is:small A dataset ...
~ published: 04/30 at 20:45 ~ permalink
Watchword: Supervised Learning
I recently discovered that supervised learning is a controversial term. The two definitions are:Known Loss Supervised learning corresponds to the situation where you have unlabeled examples plus knowledge of the loss of each possible predicted choice. T...
~ published: 04/27 at 19:40 ~ permalink
Eliminating the Birthday Paradox for Universal Features
I want to expand on this post which describes one of the core tricks for making Vowpal Wabbit fast and easy to use when learning from text. The central trick is converting a word (or any other parseable quantity) into a number via a hash function. Kisho...
~ published: 04/26 at 11:45 ~ permalink
Taking the next step
At the last ICML, Tom Dietterich asked me to look into systems for commenting on papers. I’ve been slow getting to this, but it’s relevant now.The essential observation is that we now have many tools for online collaboration, but they are not...
~ published: 04/22 at 21:16 ~ permalink
The Science 2.0 article
I found the article about science using modern tools interesting, especially the part about ‘blogophobia’, which in my experience is often a substantial issue: many potential guest posters aren’t quite ready, because of the fear of a per...
~ published: 04/21 at 21:26 ~ permalink
Blog compromised
Iain noticed that hunch.net had zero width divs hiding spammy URLs. Some investigation reveals that the wordpress version being used (2.0.3) had security flaws. I’ve upgraded to the latest, rotated passwords, and removed the spammy URLs. I don...
~ published: 04/12 at 10:40 ~ permalink
It Doesn’t Stop
I’ve enjoyed the Terminator movies and show. Neglecting the whacky aspects (time travel and associated paradoxes), there is an enduring topic of discussion: how do people deal with intelligent machines (and vice versa)?In Terminator-land, the prima...
~ published: 04/12 at 05:08 ~ permalink
Interactive Machine Learning
A new direction of research seems to be arising in machine learning: Interactive Machine Learning. This isn’t a familiar term, although it does include some familiar subjects.What is Interactive Machine Learning? The fundamental requirement is (a)...
~ published: 03/23 at 19:15 ~ permalink
COLT Open Problems
COLT has a call for open problems due March 21. I encourage anyone with a specifiable open problem to write it down and send it in. Just the effort of specifying an open problem precisely and concisely has been very helpful for my own solutions, and the...
~ published: 03/15 at 16:16 ~ permalink
Spock Challenge Winners
The spock challenge for named entity recognition was won by Berno Stein, Sven Eissen, Tino Rub, Hagen Tonnies, Christof Braeutigam, and Martin Potthast....
~ published: 03/07 at 20:19 ~ permalink
The Stats Handicap
Graduating students in Statistics appear to be at a substantial handicap compared to graduating students in Machine Learning, despite being in substantially overlapping subjects.The problem seems to be cultural. Statistics comes from a mathematics backgr...
~ published: 02/27 at 12:35 ~ permalink
The Meaning of Confidence
In many machine learning papers experiments are done and little confidence bars are reported for the results. This often seems quite clear, until you actually try to figure out what it means. There are several different kinds of ‘confidence’...
~ published: 02/17 at 09:36 ~ permalink
Complexity Illness
One of the enduring stereotypes of academia is that people spend a great deal of intelligence, time, and effort finding complexity rather than simplicity. This is at least anecdotally true in my experience.Math++ Several people have found that adding use...
~ published: 02/10 at 17:34 ~ permalink
Sufficient Computation
Do we have computer hardware sufficient for AI? This question is difficult to answer, but here’s a try:One way to achieve AI is by simulating a human brain. A human brain has about 1015 synapses which operate at about 102 per second implying about...
~ published: 01/28 at 17:49 ~ permalink
Turing’s Club for Machine Learning
Many people in Machine Learning don’t fully understand the impact of computation, as demonstrated by a lack of big-O analysis of new learning algorithms. This is important—some current active research programs are fundamentally flawed w.r.t. ...
~ published: 01/25 at 18:55 ~ permalink
Why Workshop?
I second the call for workshops at ICML/COLT/UAI.Several times before, details of why and how to run a workshop have been mentioned. There is a simple reason to prefer workshops here: attendance. The Helsinki colocation has placed workshops directly be...
~ published: 01/23 at 06:57 ~ permalink
Datasets
David Pennock notes the impressive set of datasets at datawrangling....
~ published: 01/18 at 15:47 ~ permalink
2008 Summer Machine Learning Conference Schedule
ConferencePaper due dateConference DateLocationAAAIJanuary 22/23/25/30July 13-17Chicago, IllinoisICMLFeb 8July 5-9Helsinki, FinlandCOLTFeb 20July 9-12Helsinki, FinlandKDDFeb 23/29August 24-27Las Vegas, NevadaUAIFeb 27/Feb 29July 9-12Helsinki, FinlandHelsi...
~ published: 01/07 at 19:11 ~ permalink
Research Political Issues
I’ve avoided discussing politics here, although not for lack of interest. The problem with discussing politics is that it’s customary for people to say much based upon little information. Nevertheless, politics can have a substantial impact ...
~ published: 01/06 at 16:02 ~ permalink
Vowpal Wabbit Code Release
We are releasing the Vowpal Wabbit (Fast Online Learning) code as open source under a BSD (revised) license. This is a project at Yahoo! Research to build a useful large scale learning algorithm which Lihong Li, Alex Strehl, and I have been working on.To...
~ published: 12/21 at 09:10 ~ permalink
Cool and Interesting things at NIPS, take three
Following up on Hal Daume’s post and John’s post on cool and interesting things seen at NIPS I’ll post my own little list of neat papers here as well. Of course it’s going to be biased towards what I think is interesting. Also, I...
~ published: 12/20 at 19:54 ~ permalink
Cool and interesting things seen at NIPS
I learned a number of things at NIPS.The financial people were there in greater force than previously. Two Sigma sponsored NIPS while DRW Trading had a booth.The adversarial machine learning workshop had a number of talks about interesting applications w...
~ published: 12/19 at 17:01 ~ permalink
New Machine Learning mailing list
IMLS (which is the nonprofit running ICML) has setup a new mailing list for Machine Learning News. The list address is ML-news@googlegroups.com, and signup requires a google account (which you can create). Only members can send messages....
~ published: 12/17 at 10:44 ~ permalink
Workshop Summary—Principles of Learning Problem Design
This is a summary of the workshop on Learning Problem Design which Alina and I ran at NIPS this year.The first question many people have is “What is learning problem design?” This workshop is about admitting that solving learning problems doe...
~ published: 12/12 at 16:52 ~ permalink
Learning Track of International Planning Competition
The International Planning Competition (IPC) is a biennial event organized in the context of the International Conference on Automated Planning and Scheduling (ICAPS). This year, for the first time, there will a learning track of the competition. For more...
~ published: 12/10 at 20:47 ~ permalink
The Netflix Crack
A couple security researchers claim to have cracked the netflix dataset. The claims of success appear somewhat overstated to me, but the method of attack is valid and could plausibly be substantially improved so as to reveal the movie preferences of a sm...
~ published: 11/29 at 15:41 ~ permalink
Computational Consequences of Classification
In the regression vs classification debate, I’m adding a new “pro” to classification. It seems there are computational shortcuts available for classification which simply aren’t available for regression. This arises in several si...
~ published: 11/28 at 19:44 ~ permalink
MLSS 2008
… is in Kioloa, Australia from March 3 to March 14. It’s a great chance to learn something about Machine Learning and I’ve enjoyed several previous Machine Learning Summer Schools.The website has many more details, but registration is o...
~ published: 11/16 at 06:55 ~ permalink
BellKor wins Netflix
… but only the little prize. The BellKor team focused on integrating predictions from many different methods. The base methods consist of:Nearest Neighbor MethodsMatrix Factorization Methods (asymmetric and symmetric)Linear Regression on various f...
~ published: 11/14 at 13:53 ~ permalink
CMU wins DARPA Urban Challenge
The results have been posted, with CMU first, Stanford second, and Virginia Tech Third.Considering that this was an open event (at least for people in the US), this was a very strong showing for research at universities (instead of defense contractors, fo...
~ published: 11/05 at 04:52 ~ permalink
The Machine Learning Award goes to …
Perhaps the biggest CS prize for research is the Turing Award, which has a $0.25M cash prize associated with it. It appears none of the prizes so far have been for anything like machine learning (the closest are perhaps database awards).In CS theory, the...
~ published: 11/02 at 10:33 ~ permalink
Contextual Bandits
One of the fundamental underpinnings of the internet is advertising based content. This has become much more effective due to targeted advertising where ads are specifically matched to interests. Everyone is familiar with this, because everyone uses sea...
~ published: 10/24 at 20:49 ~ permalink
