You are here: Scientific Blogs » Sources » Machine Learning (Theory)

Machine Learning (Theory)

Machine learning and learning theory research

Mass Customized Medicine in the Future?

This post is about a technology which could develop in the future.Right now, a new drug might be tested by finding patients with some diagnosis and giving or not giving them a drug according to a secret randomization. The outcome is observed, and if the ...

~ published: 08/24 at 19:00 ~ permalink

Radford Neal starts a blog

here on statistics, ML, CS, and other things he knows well....

~ published: 08/18 at 19:32 ~ permalink

Electoralmarkets.com

Lance reminded me about electoralmarkets today, which is cool enough that I want to point it out explicitly here. Most people still use polls to predict who wins, while electoralmarkets uses people betting real money. They might use polling information,...

~ published: 08/04 at 18:51 ~ permalink

Compositional Machine Learning Algorithm Design

There were two papers at ICML presenting learning algorithms for a contextual bandit-style setting, where the loss for all labels is not known, but the loss for one label is known. (The first might require a exploration scavenging viewpoint to understan...

~ published: 07/26 at 09:44 ~ permalink

Interesting papers at COLT (and a bit of UAI & workshops)

Here are a few papers from COLT 2008 that I found interesting.Maria-Florina Balcan, Steve Hanneke, and Jenn Wortman, The True Sample Complexity of Active Learning. This paper shows that in an asymptotic setting, active learning is always better than supe...

~ published: 07/15 at 04:22 ~ permalink

Interesting papers, ICML 2008

Here are some papers from ICML 2008 that I found interesting. Risi Kondor and Karsten Borgwardt, The Skew Spectrum of Graphs. This paper is about a new family of functions on graphs which is invariant under node label permutation. They show that these q...

~ published: 07/10 at 02:10 ~ permalink

To Dual or Not

Yoram and Shai’s online learning tutorial at ICML brings up a question for me, “Why use the dual?”The basic setting is learning a weight vector wi so that the function f(x)= sumi wi xi optimizes some convex loss function.The functional v...

~ published: 07/06 at 23:26 ~ permalink

More Presentation Preparation

We’ve discussed presentation preparation before, but I have one more thing to add: transitioning. For a research presentation, it is substantially helpful for the audience if transitions are clear. A common outline for a research presentation in m...

~ published: 07/04 at 08:01 ~ permalink

Proprietary Data in Academic Research?

Should results of experiments on proprietary datasets be in the academic research literature?The arguments I can imagine in the “against” column are: Experiments are not repeatable. Repeatability in experiments is essential to science becaus...

~ published: 07/02 at 10:36 ~ permalink

ICML has a comment system

Mark Reid has stepped up and created a comment system for ICML papers which Greger Linden has tightly integrated. My understanding is that Mark spent quite a bit of time on the details, and there are some cool features like working latex math mode. Thi...

~ published: 06/30 at 05:54 ~ permalink

Reviewing Horror Stories

Essentially everyone who writes research papers suffers rejections. They always sting immediately, but upon further reflection many of these rejections come to seem reasonable. Maybe the equations had too many typos or maybe the topic just isn’t a...

~ published: 06/27 at 13:18 ~ permalink

The Minimum Sample Complexity of Importance Weighting

This post is about a trick that I learned from Dale Schuurmans which has been repeatedly useful for me over time.The basic trick has to do with importance weighting for monte carlo integration. Consider the problem of finding:N = Ex ~ D f(x)given samples...

~ published: 06/09 at 16:47 ~ permalink

Inappropriate Mathematics for Machine Learning

Reviewers and students are sometimes greatly concerned by the distinction between: An open set and a closed set.A Supremum and a Maximum.An event which happens with probability 1 and an event that always happens.I don’t appreciate this distinction i...

~ published: 05/25 at 08:33 ~ permalink

Three levels of addressing the Netflix Prize

In October 2006, the online movie renter, Netflix, announced the Netflix Prize contest. They published a comprehensive dataset including more than 100 million movie ratings, which were performed by about 480,000 real customers on 17,770 movies.  Compet...

~ published: 05/23 at 10:03 ~ permalink

Concerns about the Large Scale Learning Challenge

The large scale learning challenge for ICML interests me a great deal, although I have concerns about the way it is structured.From the instructions page, several issues come up:Large Definition My personal definition of dataset size is:small A dataset ...

~ published: 04/30 at 20:45 ~ permalink

Watchword: Supervised Learning

I recently discovered that supervised learning is a controversial term. The two definitions are:Known Loss Supervised learning corresponds to the situation where you have unlabeled examples plus knowledge of the loss of each possible predicted choice. T...

~ published: 04/27 at 19:40 ~ permalink

Eliminating the Birthday Paradox for Universal Features

I want to expand on this post which describes one of the core tricks for making Vowpal Wabbit fast and easy to use when learning from text. The central trick is converting a word (or any other parseable quantity) into a number via a hash function. Kisho...

~ published: 04/26 at 11:45 ~ permalink

Taking the next step

At the last ICML, Tom Dietterich asked me to look into systems for commenting on papers. I’ve been slow getting to this, but it’s relevant now.The essential observation is that we now have many tools for online collaboration, but they are not...

~ published: 04/22 at 21:16 ~ permalink

The Science 2.0 article

I found the article about science using modern tools interesting, especially the part about ‘blogophobia’, which in my experience is often a substantial issue: many potential guest posters aren’t quite ready, because of the fear of a per...

~ published: 04/21 at 21:26 ~ permalink

Blog compromised

Iain noticed that hunch.net had zero width divs hiding spammy URLs. Some investigation reveals that the wordpress version being used (2.0.3) had security flaws. I’ve upgraded to the latest, rotated passwords, and removed the spammy URLs. I don...

~ published: 04/12 at 10:40 ~ permalink

It Doesn’t Stop

I’ve enjoyed the Terminator movies and show. Neglecting the whacky aspects (time travel and associated paradoxes), there is an enduring topic of discussion: how do people deal with intelligent machines (and vice versa)?In Terminator-land, the prima...

~ published: 04/12 at 05:08 ~ permalink

Interactive Machine Learning

A new direction of research seems to be arising in machine learning: Interactive Machine Learning. This isn’t a familiar term, although it does include some familiar subjects.What is Interactive Machine Learning? The fundamental requirement is (a)...

~ published: 03/23 at 19:15 ~ permalink

COLT Open Problems

COLT has a call for open problems due March 21. I encourage anyone with a specifiable open problem to write it down and send it in. Just the effort of specifying an open problem precisely and concisely has been very helpful for my own solutions, and the...

~ published: 03/15 at 16:16 ~ permalink

Spock Challenge Winners

The spock challenge for named entity recognition was won by Berno Stein, Sven Eissen, Tino Rub, Hagen Tonnies, Christof Braeutigam, and Martin Potthast....

~ published: 03/07 at 20:19 ~ permalink

The Stats Handicap

Graduating students in Statistics appear to be at a substantial handicap compared to graduating students in Machine Learning, despite being in substantially overlapping subjects.The problem seems to be cultural. Statistics comes from a mathematics backgr...

~ published: 02/27 at 12:35 ~ permalink

The Meaning of Confidence

In many machine learning papers experiments are done and little confidence bars are reported for the results. This often seems quite clear, until you actually try to figure out what it means. There are several different kinds of ‘confidence’...

~ published: 02/17 at 09:36 ~ permalink

Complexity Illness

One of the enduring stereotypes of academia is that people spend a great deal of intelligence, time, and effort finding complexity rather than simplicity. This is at least anecdotally true in my experience.Math++ Several people have found that adding use...

~ published: 02/10 at 17:34 ~ permalink

Sufficient Computation

Do we have computer hardware sufficient for AI? This question is difficult to answer, but here’s a try:One way to achieve AI is by simulating a human brain. A human brain has about 1015 synapses which operate at about 102 per second implying about...

~ published: 01/28 at 17:49 ~ permalink

Turing’s Club for Machine Learning

Many people in Machine Learning don’t fully understand the impact of computation, as demonstrated by a lack of big-O analysis of new learning algorithms. This is important—some current active research programs are fundamentally flawed w.r.t. ...

~ published: 01/25 at 18:55 ~ permalink

Why Workshop?

I second the call for workshops at ICML/COLT/UAI.Several times before, details of why and how to run a workshop have been mentioned. There is a simple reason to prefer workshops here: attendance. The Helsinki colocation has placed workshops directly be...

~ published: 01/23 at 06:57 ~ permalink

Datasets

David Pennock notes the impressive set of datasets at datawrangling....

~ published: 01/18 at 15:47 ~ permalink

2008 Summer Machine Learning Conference Schedule

ConferencePaper due dateConference DateLocationAAAIJanuary 22/23/25/30July 13-17Chicago, IllinoisICMLFeb 8July 5-9Helsinki, FinlandCOLTFeb 20July 9-12Helsinki, FinlandKDDFeb 23/29August 24-27Las Vegas, NevadaUAIFeb 27/Feb 29July 9-12Helsinki, FinlandHelsi...

~ published: 01/07 at 19:11 ~ permalink

Research Political Issues

I’ve avoided discussing politics here, although not for lack of interest. The problem with discussing politics is that it’s customary for people to say much based upon little information. Nevertheless, politics can have a substantial impact ...

~ published: 01/06 at 16:02 ~ permalink

Vowpal Wabbit Code Release

We are releasing the Vowpal Wabbit (Fast Online Learning) code as open source under a BSD (revised) license. This is a project at Yahoo! Research to build a useful large scale learning algorithm which Lihong Li, Alex Strehl, and I have been working on.To...

~ published: 12/21 at 09:10 ~ permalink

Cool and Interesting things at NIPS, take three

Following up on Hal Daume’s post and John’s post on cool and interesting things seen at NIPS I’ll post my own little list of neat papers here as well. Of course it’s going to be biased towards what I think is interesting. Also, I...

~ published: 12/20 at 19:54 ~ permalink

Cool and interesting things seen at NIPS

I learned a number of things at NIPS.The financial people were there in greater force than previously. Two Sigma sponsored NIPS while DRW Trading had a booth.The adversarial machine learning workshop had a number of talks about interesting applications w...

~ published: 12/19 at 17:01 ~ permalink

New Machine Learning mailing list

IMLS (which is the nonprofit running ICML) has setup a new mailing list for Machine Learning News. The list address is ML-news@googlegroups.com, and signup requires a google account (which you can create). Only members can send messages....

~ published: 12/17 at 10:44 ~ permalink

Workshop Summary—Principles of Learning Problem Design

This is a summary of the workshop on Learning Problem Design which Alina and I ran at NIPS this year.The first question many people have is “What is learning problem design?” This workshop is about admitting that solving learning problems doe...

~ published: 12/12 at 16:52 ~ permalink

Learning Track of International Planning Competition

The International Planning Competition (IPC) is a biennial event organized in the context of the International Conference on Automated Planning and Scheduling (ICAPS). This year, for the first time, there will a learning track of the competition. For more...

~ published: 12/10 at 20:47 ~ permalink

The Netflix Crack

A couple security researchers claim to have cracked the netflix dataset. The claims of success appear somewhat overstated to me, but the method of attack is valid and could plausibly be substantially improved so as to reveal the movie preferences of a sm...

~ published: 11/29 at 15:41 ~ permalink

Computational Consequences of Classification

In the regression vs classification debate, I’m adding a new “pro” to classification. It seems there are computational shortcuts available for classification which simply aren’t available for regression. This arises in several si...

~ published: 11/28 at 19:44 ~ permalink

MLSS 2008

… is in Kioloa, Australia from March 3 to March 14. It’s a great chance to learn something about Machine Learning and I’ve enjoyed several previous Machine Learning Summer Schools.The website has many more details, but registration is o...

~ published: 11/16 at 06:55 ~ permalink

BellKor wins Netflix

… but only the little prize. The BellKor team focused on integrating predictions from many different methods. The base methods consist of:Nearest Neighbor MethodsMatrix Factorization Methods (asymmetric and symmetric)Linear Regression on various f...

~ published: 11/14 at 13:53 ~ permalink

CMU wins DARPA Urban Challenge

The results have been posted, with CMU first, Stanford second, and Virginia Tech Third.Considering that this was an open event (at least for people in the US), this was a very strong showing for research at universities (instead of defense contractors, fo...

~ published: 11/05 at 04:52 ~ permalink

The Machine Learning Award goes to …

Perhaps the biggest CS prize for research is the Turing Award, which has a $0.25M cash prize associated with it. It appears none of the prizes so far have been for anything like machine learning (the closest are perhaps database awards).In CS theory, the...

~ published: 11/02 at 10:33 ~ permalink

Contextual Bandits

One of the fundamental underpinnings of the internet is advertising based content. This has become much more effective due to targeted advertising where ads are specifically matched to interests. Everyone is familiar with this, because everyone uses sea...

~ published: 10/24 at 20:49 ~ permalink