<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-3240496205319586614</id><updated>2011-12-06T14:50:20.356Z</updated><category term='Artificial Intelligence'/><category term='Large Scale ML'/><category term='Microsoft'/><category term='Rants'/><category term='Visualization'/><category term='Scientific Computing'/><category term='Sharing'/><category term='Rangespan'/><category term='Language'/><category term='Meetup'/><category term='Statistics'/><category term='Machine Learning'/><category term='F#'/><category term='iHMM'/><category term='Conference Reports'/><category term='Cognitive Science'/><category term='Cambridge'/><category term='Books'/><title type='text'>Informaniac</title><subtitle type='html'>... writings from someone who believes in the unreasonable effectiveness of data ...</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://www.informaniac.net/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>48</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-1530000457419127755</id><published>2011-12-06T14:47:00.000Z</published><updated>2011-12-06T14:47:28.008Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Machine Learning'/><category scheme='http://www.blogger.com/atom/ns#' term='Rangespan'/><category scheme='http://www.blogger.com/atom/ns#' term='Microsoft'/><category scheme='http://www.blogger.com/atom/ns#' term='Meetup'/><title type='text'>London Machine Learning Meetup ... and joining a startup.</title><content type='html'>&lt;div style="text-align: justify;"&gt;Three months ago I decided to end my employment at Microsoft. I had a great time, met loads of interesting people and will definitely miss being in the Cambridge research lab (as well as the occasional trip to Seattle).&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;Next up I am joining a startup company called &lt;a href="http://www.rangespan.com/"&gt;Rangespan&lt;/a&gt;. I met the founders of Rangespan at &lt;a href="http://siliconmilkroundabout.com/"&gt;Silicon Milkroundabout&lt;/a&gt; in April this year. When I got talking to them it was immediately obvious there was a hell of a lot of machine learning to be done there.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;Rangespan is essentially an e-commerce platform which is organizing a marketplace for suppliers to sell products to big retailers. On one hand Rangespan is trying to build the biggest catalog in the world. We're doing some by fusing many different information sources (supplier feeds, manufacturer feeds, crowdsourced data, crawled data). The three big learning problems in this space are:&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;ol&gt;&lt;li style="text-align: justify;"&gt;&lt;b&gt;Matching&lt;/b&gt;: out of millions of product descriptions (semi-structured &amp;amp; textual), how do you recognize which descriptions talk about the same product? &lt;i&gt;Record linkage is the keyword here, but making it work at large scale and under very small precision and recall margins is a challenge.&lt;/i&gt;&lt;/li&gt;&lt;li style="text-align: justify;"&gt;&lt;b&gt;Reconcilliation&lt;/b&gt;: when you have 100 different descriptions of a product, and you want to construct the best possible description of the product, how do you do that? &lt;i&gt;I think the speech recognition and machine translation community has a few neat tricks up their sleeve to be applied here: in the abstract if we think of each of the 100 descriptions as the output of a noisy channel with the unknown ground truth description as the input, the challenge is to invent a prior and model for the noisy channel. Machine learning baby!&lt;/i&gt;&lt;/li&gt;&lt;li style="text-align: justify;"&gt;&lt;b&gt;Information extraction&lt;/b&gt;: the data we are gathering is only semi-structured and most often marketing text. The question now becomes: how do we figure out form a USB stick description that the product has a "4GB" capacity. How do we figure out that "4GB" is the same as "four gigabytes" etc etc. &lt;i&gt;There's loads of great work on information extraction which we can build on; the good news is that our domain is restricted (physical goods) the bad news is that the domain is quite large (1000's of categories, 1000's of attributes).&lt;/i&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div&gt;&lt;div style="text-align: justify;"&gt;Besides the catalog, Rangespan is also building an order pipeline component. One crucial component of that pipeline is the system that assigns retailer's orders to suppliers. The idea here is that for each product we have a pretty good view of the supply curve (what suppliers sells how many items at what price). We also have a pretty good view of the demand curve: by aggregating many different retailers we can predict what the demand for a product will be. This allows us to make very quantitative decisions about how to choose the price for that product to optimize metrics like user experience etc. &lt;i&gt;In this area a mix of statistics (time series analysis), decision theory, econometrics and optimization are crucial ingredients ... together with a ton of data ofcourse.&lt;/i&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div style="text-align: justify;"&gt;&lt;i&gt;&lt;br /&gt;&lt;/i&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div style="text-align: justify;"&gt;If this all sounds like music to your ears, then let us know. We're &lt;a href="http://www.rangespan.com/jobs/"&gt;hiring&lt;/a&gt;!&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div style="text-align: justify;"&gt;Joining a startup also meant I am now working in London. One of the coolest things I discovered about being in London is the great number of meetups people are organizing. From MongoDB office hours (thank you for all the help 10gen!) to the &lt;a href="http://www.meetup.com/hadoop-users-group-uk/"&gt;Hadoop&lt;/a&gt; and &lt;a href="http://www.meetup.com/big-data-london/"&gt;Big Data&lt;/a&gt; meetups. Although big data is crucial at Rangespan, we think clever algorithms (as in machine learning) are as crucial to building&amp;nbsp;successful&amp;nbsp;insights from data. As such we are kicking off the &lt;a href="http://www.meetup.com/London-Machine-Learning-Interest-Group/"&gt;Machine Learning Meetup&lt;/a&gt; as of January 2012. We'll put some fun talks together with the London machine learning community and probably do some study groups on the Stanford online learning courses.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div style="text-align: justify;"&gt;If you live closeby and haven't yet signed up, you can find all meetup details &lt;a href="http://www.meetup.com/London-Machine-Learning-Interest-Group/"&gt;here&lt;/a&gt;.&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-1530000457419127755?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/1530000457419127755/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=1530000457419127755' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/1530000457419127755'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/1530000457419127755'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2011/12/london-machine-learning-meetup-and.html' title='London Machine Learning Meetup ... and joining a startup.'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-4229815843579020243</id><published>2011-08-29T09:13:00.001+01:00</published><updated>2011-12-06T14:47:45.657Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Artificial Intelligence'/><title type='text'>Who Killed Prolog</title><content type='html'>&lt;div style="text-align: justify;"&gt;Via &lt;a href="http://news.ycombinator.com/"&gt;HackerNews&lt;/a&gt; comes the article &lt;a href="http://vanemden.wordpress.com/2010/08/21/who-killed-prolog/"&gt;"Who Killed Prolog"&lt;/a&gt;. The article wonders why logic programming languages have fallen behind object-oriented, imperative and functional programming languages (my favourite functional programming language F# has just entered the &lt;a href="http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html"&gt;TIOBE programming languages top-20&lt;/a&gt;).&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;The hypothesis of the article is that when a Japanse research project called the fifth generation computer system was ended, Prolog, it's main programming language went down with it.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;This reminded me of my undergraduate class in declarative languages in 2003. It started with our professor saying that there is a big cover up in the computer industry: "NASA, Microsoft, Google, HP and many other companies &lt;i&gt;are&lt;/i&gt; using Prolog in their core products but they don't want to admit it." Instant classic!&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-4229815843579020243?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/4229815843579020243/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=4229815843579020243' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/4229815843579020243'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/4229815843579020243'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2011/08/who-killed-prolog.html' title='Who Killed Prolog'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-4443657001164378594</id><published>2011-05-12T13:05:00.000+01:00</published><updated>2011-05-13T21:36:44.746+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Machine Learning'/><category scheme='http://www.blogger.com/atom/ns#' term='Large Scale ML'/><category scheme='http://www.blogger.com/atom/ns#' term='Microsoft'/><title type='text'>PQL–A Probabilistic Query Language</title><content type='html'>&lt;p align="justify"&gt;At MSR and Bing, when we do machine learning on smaller datasets (say anything below 100GB) we often use relational databases and SQL. Throw in a little bit of Excel and R and you’ve got yourself a very powerful platform for exploratory data analysis.&lt;/p&gt; &lt;p align="justify"&gt;After the exploratory phase, we often build statistical models (adPredictor, TrueSkill, Matchbox, …) to discover more complex structures in the data. &lt;a href="http://research.microsoft.com/en-us/um/cambridge/projects/infernet/"&gt;Infer.Net&lt;/a&gt; helps us prototype these graphical models, but unfortunately it forces you to work in a mode where you first create a binary that performs inference, suck out all data to your machine, run inference locally and then write all inference results back to the DB. My local machine is way slower than the machines which run our DB or our local compute cluster so ideally I’d like to have a platform which computes “close” to the data.&lt;/p&gt; &lt;p align="justify"&gt;The Probabilistic Query Language (or PQL) is a language/tool which I designed two years ago, during an internship with &lt;a href="http://research.microsoft.com/en-us/people/rherb/"&gt;Ralf Herbrich&lt;/a&gt; and &lt;a href="http://research.microsoft.com/en-us/people/thoreg/"&gt;Thore Graepel&lt;/a&gt;, where we had the following goals in mind:&lt;/p&gt; &lt;ul&gt; &lt;li&gt; &lt;div align="justify"&gt;Allow for rapid prototyping of graphical models in a relational environment &lt;/div&gt; &lt;li&gt; &lt;div align="justify"&gt;The focus should be on specifying models, not algorithms &lt;/div&gt; &lt;li&gt; &lt;div align="justify"&gt;It should enable large scale execution and bring the computation to the data, rather than the data to the computation&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p align="justify"&gt;Using SQL Server, DryadLinq (Map-Reduce for .NET) and Infer.Net I built a prototype of PQL and tested it on some frequently used models at Microsoft. In this post I want to introduce the PQL language and give a few examples of graphical models in PQL.&lt;/p&gt; &lt;div align="center"&gt; &lt;hr width="20%"&gt; &lt;/div&gt; &lt;p align="center"&gt;Let’s start with a very simple example where we have a DB with a table containing people’s info and a table with records describing doctor visits for those people. Assume the following relational schema&lt;/p&gt; &lt;p align="justify"&gt;&lt;a href="http://lh3.ggpht.com/_yBbodrC25kU/TcvM9KaP53I/AAAAAAABRDQ/2nAeK7Zv5Kc/s1600-h/image%5B2%5D.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto; padding-top: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/_yBbodrC25kU/TcvM-R6qIDI/AAAAAAABRDU/K61W0wwfWCE/image_thumb.png?imgmax=800" width="244" height="146"&gt;&lt;/a&gt;&lt;/p&gt; &lt;p align="justify"&gt;We assume that people have an unknown weight and when they go to the doctor, she measures this weight. Depending on the time of day (after a heavy lunch), this estimate could be off a bit. A statistical model to capture these assumption is to introduce a random variable for the weight for each person in the &lt;em&gt;People&lt;/em&gt; table, put a prior on this variable and connect it with the observations in the &lt;em&gt;DrVisits&lt;/em&gt; table. So how do we write such a model in PQL?&lt;/p&gt; &lt;p align="justify"&gt;PQL is very much like SQL but with two extra keywords: &lt;strong&gt;AUGMENT&lt;/strong&gt; and &lt;strong&gt;FACTOR&lt;/strong&gt;. AUGMENT allows us to add random variables to the DB schema. In the example above we would write&lt;/p&gt; &lt;p align="center"&gt;&lt;font size="4"&gt;People = &lt;font color="#0000ff"&gt;AUGMENT&lt;/font&gt; DB.People &lt;font color="#0000ff"&gt;ADD&lt;/font&gt; weight &lt;font color="#0000ff"&gt;FLOAT&lt;/font&gt;&lt;/font&gt;  &lt;p align="justify"&gt;This essentially defines a “plate” in graphical model speak: for each row in the &lt;em&gt;People&lt;/em&gt; table, a random variable over the real numbers called &lt;em&gt;weight&lt;/em&gt; is defined.  &lt;p align="justify"&gt;The FACTOR keyword in PQL allows to introduce factors between random variables as well as any other variables in the DB schema. FACTOR follows the relational SQL syntax to specify exactly how to connect variables. To specify a normal prior on the &lt;em&gt;weight&lt;/em&gt; variable we could write  &lt;p align="center"&gt;&lt;font size="4"&gt;&lt;font color="#0000ff"&gt;FACTOR&lt;/font&gt; Normal(p.weight | 75.0,25.0) &lt;font color="#0000ff"&gt;FROM&lt;/font&gt; People p&lt;/font&gt;  &lt;p align="justify"&gt;This introduces a normal factor for each row in the &lt;em&gt;People&lt;/em&gt; table (the FROM People p part). The final component of our program connects the random variable with observations. In this case, we use the familiar SQL JOIN syntax to specify how to connect rows from the &lt;em&gt;People&lt;/em&gt; table to the rows in the &lt;em&gt;DrVisits&lt;/em&gt; table. In PQL we write  &lt;p align="center"&gt;&lt;font size="4"&gt;&lt;font color="#0000ff"&gt;FACTOR&lt;/font&gt; Normal(v.weight | p.weight, 1.0)&lt;br&gt;&lt;/font&gt;&lt;font size="4"&gt;&lt;font color="#0000ff"&gt;FROM&lt;/font&gt; People p&lt;br&gt;&lt;font color="#0000ff"&gt;JOIN&lt;/font&gt; DrVisit v &lt;font color="#0000ff"&gt;ON&lt;/font&gt; p.id = v.personid&lt;/font&gt;&lt;/p&gt; &lt;p align="justify"&gt;Except for the first line this is exactly SQL; instead of doing a query, the FACTOR statement describes the “probabilistic augmentation” of the DB schema”.&lt;/p&gt; &lt;p align="justify"&gt;For the example above, this is it, the PQL program contains five lines of code and can be sent to the DB. It will run inference by performing EP or variational Bayesian inference. The inference itself can be run either within the database (this was implemented by Tina Palla who was an intern with us) or on the DryadLinq cluster.&lt;/p&gt; &lt;hr width="20%"&gt;   &lt;p align="justify"&gt;Another example of PQL is the program to describe the TrueSkill ranking system. In this example we assume two-player games stored using a table of players (called &lt;em&gt;Players&lt;/em&gt;) and a table of game outcomes (called &lt;em&gt;PlayerGames&lt;/em&gt;). Each game played generates two rows in the &lt;em&gt;PlayerGames&lt;/em&gt; table: one for the winner and the loser (with a &lt;em&gt;score&lt;/em&gt;) column specifying who is the winner and who is the loser. The PQL program for TrueSkill is written below&lt;/p&gt; &lt;div align="justify"&gt; &lt;table border="0" cellspacing="0" cellpadding="2" width="455" align="center"&gt; &lt;tbody&gt; &lt;tr&gt; &lt;td valign="top" width="453"&gt; &lt;div&gt;Players = &lt;font color="#0000ff"&gt;AUGMENT&lt;/font&gt; DB.Players &lt;font color="#0000ff"&gt;ADD&lt;/font&gt; skill &lt;font color="#0000ff"&gt;FLOAT&lt;/font&gt;;&lt;br&gt;PlayerGames = &lt;font color="#0000ff"&gt;AUGMENT&lt;/font&gt; DB.PlayerGames &lt;font color="#0000ff"&gt;ADD&lt;/font&gt; performance &lt;font color="#0000ff"&gt;FLOAT&lt;/font&gt;; &lt;/div&gt; &lt;p&gt;&lt;font color="#0000ff"&gt;FACTOR&lt;/font&gt; Normal(p.skill | 25.0, 20.0) &lt;font color="#0000ff"&gt;FROM&lt;/font&gt; Players p;  &lt;p&gt;&lt;font color="#0000ff"&gt;FACTOR&lt;/font&gt; Normal(pg.performance | p.skill, 0.1)&lt;br&gt;&amp;nbsp;&amp;nbsp; &lt;font color="#0000ff"&gt;FROM&lt;/font&gt; PlayerGames pg&lt;br&gt;&amp;nbsp;&amp;nbsp; &lt;font color="#0000ff"&gt;JOIN&lt;/font&gt; Players p &lt;font color="#0000ff"&gt;ON&lt;/font&gt; pg.player_id = p.player_id; &lt;/p&gt; &lt;p&gt;&lt;font color="#0000ff"&gt;FACTOR&lt;/font&gt; IsGreater(pgb.performance, pga.performance)&lt;br&gt;&amp;nbsp;&amp;nbsp; &lt;font color="#0000ff"&gt;FROM&lt;/font&gt; PlayerGames pga &lt;br&gt;&lt;font color="#0000ff"&gt;&amp;nbsp;&amp;nbsp; JOIN&lt;/font&gt; PlayerGames pgb ON pga.game_id = pgb.game_id&lt;br&gt;&amp;nbsp;&amp;nbsp; &lt;font color="#0000ff"&gt;WHERE&lt;/font&gt; pga.player_id &amp;lt; pgb.player_id &lt;font color="#0000ff"&gt;AND&lt;/font&gt; pga.score = 0; &lt;/p&gt; &lt;p&gt;&lt;font color="#0000ff"&gt;FACTOR&lt;/font&gt; IsGreater(pga.performance, pgb.performance)&lt;br&gt;&amp;nbsp;&amp;nbsp; &lt;font color="#0000ff"&gt;FROM&lt;/font&gt; PlayerGames pga&lt;br&gt;&lt;font color="#0000ff"&gt;&amp;nbsp;&amp;nbsp; JOIN&lt;/font&gt; PlayerGames pgb ON pga.game_id = pgb.game_id &lt;br&gt;&lt;font color="#0000ff"&gt;&amp;nbsp;&amp;nbsp; WHERE&lt;/font&gt; pga.player_id &amp;lt; pgb.player_id &lt;font color="#0000ff"&gt;AND&lt;/font&gt; pga.score = 2; &lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt; &lt;p align="justify"&gt;&amp;nbsp;&lt;/p&gt; &lt;p align="justify"&gt;There are a lot of features in PQL I haven’t covered in this blog post (like using random variables in a WHERE clause to create mixture models) but I wanted to give you a flavour of what we’ve been working on so far. &lt;/p&gt; &lt;p align="justify"&gt;While working on PQL I learned a lot about the state of the art in probabilistic databases and statistical relational learning. I think compared to this academic work, PQL does not add many theoretical contributions; our goal is to design a tool which takes statistical relational learning out of the laboratory into the hands of data mining practicioners.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-4443657001164378594?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/4443657001164378594/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=4443657001164378594' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/4443657001164378594'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/4443657001164378594'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2011/05/pqla-probabilistic-query-language.html' title='PQL–A Probabilistic Query Language'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/_yBbodrC25kU/TcvM-R6qIDI/AAAAAAABRDU/K61W0wwfWCE/s72-c/image_thumb.png?imgmax=800' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-6644819809778398954</id><published>2011-05-01T20:23:00.001+01:00</published><updated>2011-05-01T20:23:32.325+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='F#'/><category scheme='http://www.blogger.com/atom/ns#' term='Scientific Computing'/><title type='text'>Math.Net Numerics</title><content type='html'>&lt;p&gt;I’ve been a fan of doing numerical computation on the .NET platform for a very long time. This interest landed me an internship at Microsoft Research with Don Syme’s team in 2007 where we investigated F# suitability for scientific computing. After the internship, I joined the open source community helping out with writing a kick-ass numerical library for the .NET platform.&lt;/p&gt; &lt;p&gt;Today, I am quite proud &lt;a href="http://mathnet.squarespace.com/blog/2011/5/1/final-beta-start-of-a-competition.html"&gt;to announce&lt;/a&gt; that we are releasing the final beta of our open source project: Math.Net Numerics. Moreover, with this announcement, we are also kicking off &lt;a href="http://gemm.codeplex.com/"&gt;a competition&lt;/a&gt; to find the fastest implementation of matrix multiplication in purely managed code. The winner of this competition will receive 1500$ and we will integrate his code into our open source codebase. I’m excited to see some creative coding in the next few weeks!&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-6644819809778398954?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/6644819809778398954/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=6644819809778398954' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/6644819809778398954'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/6644819809778398954'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2011/05/mathnet-numerics.html' title='Math.Net Numerics'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-4305951782519019175</id><published>2011-04-26T09:55:00.001+01:00</published><updated>2011-04-26T09:55:51.727+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Sharing'/><title type='text'>BBC 4 podcasts</title><content type='html'>&lt;p&gt;I just discovered some really good podcasts from BBC 4 which I think some people here might enjoy.&lt;/p&gt; &lt;ul&gt; &lt;li&gt;&lt;a href="http://www.bbc.co.uk/podcasts/series/maths"&gt;A Brief History of Mathematics&lt;/a&gt;: a discussion about some great mathematicians …&lt;/li&gt; &lt;li&gt;&lt;a href="http://www.bbc.co.uk/podcasts/series/iot"&gt;In Our Time&lt;/a&gt;: very general, one-hour discussions with a few experts. I really enjoyed the episode on “Random &amp;amp; Pseudorandom” from January 13.&lt;/li&gt;&lt;/ul&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-4305951782519019175?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/4305951782519019175/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=4305951782519019175' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/4305951782519019175'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/4305951782519019175'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2011/04/bbc-4-podcasts.html' title='BBC 4 podcasts'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-7824900715364769969</id><published>2011-04-20T08:17:00.001+01:00</published><updated>2011-04-20T08:21:27.617+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Visualization'/><title type='text'>ggplot2 and Subway</title><content type='html'>&lt;p&gt;The following article caught my eye a few weeks ago: &lt;a href="http://adage.com/article/news/subway-set-overtake-mcd-s-omnipresence/139145/"&gt;Subway Set to Overtake McD's in Omnipresence&lt;/a&gt;. As I am trying to learn a little bit of ggplot2 (and loving it so far!) I thought it would be fun to try and create some visuals to go with this claim.&lt;/p&gt; &lt;p&gt;I used one of Microsoft’s restaurant datasets and do a simple substring match on “subway”, returning the latitude and longitude. Using the following lines of ggplot and R code&lt;/p&gt;&lt;pre&gt;states &amp;lt;- data.frame(map("state", plot=FALSE)[c("x","y")])&lt;br /&gt;colnames(states) &amp;lt;- c("Lon","Lat")&lt;br /&gt;ggplot(states, aes(x=Lon, y=Lat)) + geom_path() &lt;br&gt;                + geom_point(alpha=0.6,size=0.3,data=subway)&lt;/pre&gt;&lt;br /&gt;&lt;p&gt;we get a cool picture showing all of the metropolitan areas of the United States.&lt;a href="http://lh5.ggpht.com/_yBbodrC25kU/Ta6Ifqr0ZLI/AAAAAAABRCg/98rIF-kMMns/s1600-h/map%5B7%5D.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto; padding-top: 0px" title="map" border="0" alt="map" src="http://lh3.ggpht.com/_yBbodrC25kU/Ta6IgT13kMI/AAAAAAABRCo/d5pZMY7Twg4/map_thumb%5B3%5D.png?imgmax=800" width="477" height="315"&gt;&lt;/a&gt;&lt;/p&gt;&lt;br /&gt;&lt;p&gt;If you click on the image to zoom in you will be able to discern major highways as well. Subway is literally &lt;em&gt;everywhere&lt;/em&gt;.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-7824900715364769969?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/7824900715364769969/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=7824900715364769969' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/7824900715364769969'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/7824900715364769969'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2011/04/ggplot2-and-subway.html' title='ggplot2 and Subway'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/_yBbodrC25kU/Ta6IgT13kMI/AAAAAAABRCo/d5pZMY7Twg4/s72-c/map_thumb%5B3%5D.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-966854390436296783</id><published>2010-10-12T00:30:00.001+01:00</published><updated>2010-10-12T00:30:35.027+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Conference Reports'/><title type='text'>ECML–PKDD 2010 Highlights</title><content type='html'>&lt;p&gt;&amp;nbsp;&lt;/p&gt; &lt;p&gt;&lt;a href="http://lh6.ggpht.com/_yBbodrC25kU/TLOeEKumeyI/AAAAAAABQw8/4b1SNXJXhw8/s1600-h/image2.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; margin: 3px 10px 5px 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image" border="0" alt="image" align="left" src="http://lh3.ggpht.com/_yBbodrC25kU/TLOeFXGqgeI/AAAAAAABQxE/-jF-Q6NaCtY/image_thumb.png?imgmax=800" width="244" height="184"&gt;&lt;/a&gt;The city of Barcelona just hosted ECML/PKDD this year and I had the opportunity to go down and check out the latest and greatest mostly from the European machine learning community. My personal highlights&lt;/p&gt; &lt;ul&gt; &lt;li&gt;The conference for me started with a good tutorial by Francis Bach and Guillaume Obozinsky. They gave an overview of sparsity: in particular various different methods using l1 regularization to induce sparsity. &lt;/li&gt;&lt;/ul&gt; &lt;p&gt;The first invited speaker was Hod Lipson from Cornell (and as far as I know one of the few people in our field who has given a &lt;a href="http://www.youtube.com/watch?v=lMkHYE9-R0A"&gt;TED talk&lt;/a&gt;). The main portion of Hod’s talk was about his work on symbolic regression. The idea is the following: consider the following dataset&lt;/p&gt; &lt;ul&gt;&lt;a href="http://lh6.ggpht.com/_yBbodrC25kU/TLOeFyVn-VI/AAAAAAABQxM/V08rlTEzPk4/s1600-h/image7.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; margin: ; padding-left: 0px; padding-right: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/_yBbodrC25kU/TLOeGv0En-I/AAAAAAABQxU/a01PB2nyg2s/image_thumb3.png?imgmax=800" width="244" height="148"&gt;&lt;/a&gt;&lt;/ul&gt; &lt;p align="justify"&gt;We can apply our favourite regression method, say a spline, to these points and perform accurate interpolation, perhaps even some extrapolation if we chose the right model. The regression function would not give us much insight into why data could be looking like this? In symbolic regression, the idea is that we try to come up with a symbolic formula which interpolates the data. In the picture above, the formula that generated the data was EXP(-x)*SIN(PI()*x)+RANDBETWEEN(-0.001,0.001) (in Excel). Hod and his graduate students have built a very cool (and free!) app called &lt;a href="http://ccsl.mae.cornell.edu/eureqa_smooth_step"&gt;Eureqa&lt;/a&gt; which uses a genetic programming methodology to find a good symbolic expression for a specific dataset. Hod showed us how his software can recover the Hamiltonian and Lagrangian from the measurements of a double pendulum. Absolutely amazing!&lt;/p&gt; &lt;p align="justify"&gt;Another noteworthy invited speaker was &lt;a href="http://www.idsia.ch/~juergen/"&gt;Jurgen Schmidhuber&lt;/a&gt;. He tried to convince us that we need to extend the reinforcement learning paradigm. The idea would be that instead of an agent trying to optimize the amount of long term reward he gets from a teacher, he would also try to collect “internal reward”. The internal reward is defined as follows: as the agent learns, he is building a better model of the world. Another way to look at this learning is that the agent just knows how to “compress” better. The reduction in representation he gets from learning from a particular impression is what Jurgen calls the “internal reward”. In other words, it is the difference between the number of bits needed to represent your internal model before and after an impression.&lt;/p&gt; &lt;p align="justify"&gt;E.g. you listen to a new catchy son: Jurgen says that you think it’s catchy because you’ve never heard anything like this before, it is surprising and hence helps you learn a great deal. This in turn means you’ve just upgraded the “compression algorithm” in your brain and the amount of improvement is now reflected in you experiencing “internal reward”. Listening to a song you’ve heard a million times before doesn’t help compressing at all; hence, no internal reward.&lt;/p&gt; &lt;p align="justify"&gt;I really like the idea about this internal reward a lot, and as far as I understand it would be very easy to test. Unfortunately, I did not see any convincing experiments so allow me to be sceptical …&lt;/p&gt; &lt;p align="justify"&gt;The main conference was cool and I’ve met some interesting people working on things like probabilistic logic, a topic I desperately need to learn more about. &lt;a href="http://research.microsoft.com/en-us/people/gjergjik/"&gt;Gjergji&lt;/a&gt; gave a talk about &lt;a href="http://research.microsoft.com/apps/pubs/default.aspx?id=132648"&gt;our work on crowdsourcing&lt;/a&gt; (more details in a separate post). Some things I marked for looking into are:&lt;/p&gt; &lt;ul&gt; &lt;li&gt;Sebastian Riedel, Limin Yao and Andrew McCallum – “Modeling Relations and Their Mentions Without Labeled Text”: this paper is about how to improve information extraction methods which bootstrap from existing knowledge bases using constraint driven learning techniques. &lt;li&gt;Wanner Meert, Nima Taghipour &amp;amp; Hendrik Blockeel – “First Order Bayes Ball”: a paper on how to use the Bayes ball algorithm to figure out which nodes not to ground before running lifted inference. &lt;li&gt;Daniel Hernández-Lobato, José Miguel Hernández Lobato, Thibault Helleputte, Pierre Dupont – “Expectation Propagation for Bayesian Multi-task Feature Selection”: a paper on how to run EP for spike and slab models. &lt;li&gt;Edith Law, Burr Settles, and Tom Mitchell – “Learning to Tag from Open Vocabulary Labels”: a nice paper on how to deal with tags: they use topic models to do dimensionality reduction on free text tags and then use that in a maximum entropy predictor to tag new music.&lt;/li&gt;&lt;/ul&gt; &lt;p align="justify"&gt;I enjoyed most of the industry day as well: I found it quite amusing that the Microsoft folks essentially gave all &lt;a href="http://www.bing.com/"&gt;Bing&lt;/a&gt; secrets away in one afternoon: &lt;a href="http://rakesh.agrawal-family.com/"&gt;Rakesh Agarwal&lt;/a&gt; mentioned the secret sauce behind Bing’s ranker (neural net) whereas &lt;a href="http://research.microsoft.com/en-us/people/thoreg/"&gt;Thore Graepel&lt;/a&gt; explained the magic behind the advertisements selection mechanism (probit regression). Videos of these talks should be on videolectures.net soon.&lt;/p&gt; &lt;p align="justify"&gt;One last rant: the proceedings are published by Springer who ask me to pay for the proceedings?!?! I’m still trying to figure out what value they’ve added to our camera-ready we sent them a few months ago …&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-966854390436296783?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/966854390436296783/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=966854390436296783' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/966854390436296783'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/966854390436296783'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2010/10/ecmlpkdd-2010-highlights.html' title='ECML–PKDD 2010 Highlights'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/_yBbodrC25kU/TLOeFXGqgeI/AAAAAAABQxE/-jF-Q6NaCtY/s72-c/image_thumb.png?imgmax=800' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-5857193741372432255</id><published>2010-09-14T10:22:00.001+01:00</published><updated>2010-09-14T10:22:08.186+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Rants'/><category scheme='http://www.blogger.com/atom/ns#' term='Microsoft'/><category scheme='http://www.blogger.com/atom/ns#' term='Cambridge'/><title type='text'>From Undirected Grad to Informaniac</title><content type='html'>&lt;p&gt;What drives a man to change the name of his blog? A trademark dispute? A mid-life crisis? None of those actually!&lt;/p&gt; &lt;p&gt;&lt;a href="http://fuse.microsoft.com/"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="fuse-logo" border="0" alt="fuse-logo" align="left" src="http://lh5.ggpht.com/_yBbodrC25kU/TI8-ve3AN7I/AAAAAAABQwo/tk01VnukmAw/fuse-logo%5B10%5D.png?imgmax=800" width="240" height="82"&gt;&lt;/a&gt;After three amazing years, I am exchanging the amazing &lt;a href="http://cbl.eng.cam.ac.uk/Public/MLG/"&gt;machine learning group&lt;/a&gt; for an exciting new adventure with Microsoft &lt;a href="http://fuse.microsoft.com/"&gt;FUSE Labs&lt;/a&gt;. As I am writing this, my laptop is crunching away at the last few experiments to be included in my thesis while a printed version of my thesis text is awaiting final editing.&lt;/p&gt; &lt;p&gt;It has been an interesting few months as I was debating moving into a post doc or evaluating all the amazing opportunities that were available in industry: in the end I decided to join FUSE Labs as an applied researcher. FUSE Labs is an incubation lab at Microsoft that sits in between Microsoft Research and the product groups. My new job allows me to collaborate with people (in or outside of Microsoft) on research while giving me the freedom to also spend time building working prototypes and demo’s. As far as I am concerned: the perfect mix of science and engineering.&lt;/p&gt; &lt;p&gt;Although I will soon be a graduate student no more, I don’t want this blog to die but rather take it to the next level. The foreseeable future will involve more experiments on machine learning, data mining and information processing and I should be able to report much of this to everyone who cares to listen!&lt;/p&gt; &lt;p&gt;welcome to &lt;a href="http://www.informaniac.net"&gt;www.informaniac.net&lt;/a&gt; and stay tuned …&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-5857193741372432255?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/5857193741372432255/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=5857193741372432255' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/5857193741372432255'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/5857193741372432255'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2010/09/from-undirected-grad-to-informaniac.html' title='From Undirected Grad to Informaniac'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/_yBbodrC25kU/TI8-ve3AN7I/AAAAAAABQwo/tk01VnukmAw/s72-c/fuse-logo%5B10%5D.png?imgmax=800' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-149891435412385914</id><published>2010-06-22T07:10:00.001+01:00</published><updated>2010-06-22T07:10:33.328+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Rants'/><title type='text'>Rebooting the CS Publication Process</title><content type='html'>&lt;p&gt;Dan Wallach makes suggestions for revolutionizing the CS publication process in &lt;a href="http://www.cs.rice.edu/~dwallach/pub/reboot-2010-06-14.pdf"&gt;this 13 page report&lt;/a&gt;. I really like this stuff and can only hope things start moving sooner rather than later in this area.&lt;/p&gt; &lt;p&gt;On the time of reviewing, a recent paper I really enjoyed is &lt;a href="http://research.microsoft.com/pubs/122784/ReviewerCalibration.pdf"&gt;Novel Tools To Streamline the Conference Review Process: Experiences from SIGKDD'09&lt;/a&gt; by a Bristol/MSR collaboration. For me the most interesting part of the paper was a probabilistic graphical model which collaboratively learns how to calibrate reviewers. Calibration here means that we’d like to understand how different reviewers have different thresholds for giving a paper 1/2/3/4/5 star reviews. The generative model proposed in the paper can easily be implemented in &lt;a href="http://research.microsoft.com/en-us/um/cambridge/projects/infernet/"&gt;Infer.NET&lt;/a&gt; and although it is perhaps to early to make decisions based on the method, it could certainly help PC’s in making their decisions.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-149891435412385914?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/149891435412385914/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=149891435412385914' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/149891435412385914'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/149891435412385914'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2010/06/rebooting-cs-publication-process.html' title='Rebooting the CS Publication Process'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-7752107473589971755</id><published>2010-04-24T10:48:00.001+01:00</published><updated>2010-04-24T10:48:55.831+01:00</updated><title type='text'>Microsoft Research PhD Scholarships</title><content type='html'>&lt;p&gt;A new round of &lt;a href="http://research.microsoft.com/en-us/collaboration/global/apply-europe.aspx"&gt;Microsoft Research PhD scholarships&lt;/a&gt; is being organized. I’ve enjoyed being on the scholarship for the past two years: it’s been a great opportunity to meet new researchers at Microsoft and other students at the PhD event organized by MSR Cambridge.&lt;/p&gt; &lt;p&gt;For all you upcoming machine learning rock-stars out there: talk to your advisors, they will have to apply for you but you can probably help them a bit!&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-7752107473589971755?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/7752107473589971755/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=7752107473589971755' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/7752107473589971755'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/7752107473589971755'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2010/04/microsoft-research-phd-scholarships.html' title='Microsoft Research PhD Scholarships'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-3587983941391750762</id><published>2010-04-08T23:16:00.001+01:00</published><updated>2010-10-03T22:20:01.381+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Machine Learning'/><title type='text'>Popularizing Machine Learning</title><content type='html'>&lt;p&gt;Three weeks ago I gave a presentation on probabilistic modelling to an audience of data mining practitioners. These people know about machine learning, they use it every day, but their main concern is to analyze data: &lt;em&gt;real data&lt;/em&gt;!&lt;/p&gt; &lt;p&gt;The experience taught me a valuable lesson which I hadn’t come across by interacting with the academic community: Probabilistic models are hard. Even for people that are very close to the machine learning community (data mining), probabilistic models are a very different (new?) way of thinking.&lt;/p&gt; &lt;p&gt;The whole idea of building a generative story for your data and then using Bayes rule to “invert” the model given some dataset has become second nature to me. Nonetheless, I (we?) shouldn’t forget that it took statistics a long time to think about modelling in this sense. Hence I now understand realize that for outsiders the framework of probabilistic modelling is a highly non-trivial concept to grasp.&lt;/p&gt; &lt;p&gt;In this context I am obliged to share a blog post by &lt;a href="http://www.moserware.com"&gt;Jeff Moser&lt;/a&gt;. I have never seen such a great explanation of a non-trivial probabilistic model that is deployed in very large scale on XBox Live: “&lt;a href="http://www.moserware.com/2010/03/computing-your-skill.html"&gt;Computing Your Skill&lt;/a&gt;”, a description of the TrueSkill ranking model. Very very well done Jeff!&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-3587983941391750762?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/3587983941391750762/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=3587983941391750762' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/3587983941391750762'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/3587983941391750762'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2010/04/popularizing-machine-learning.html' title='Popularizing Machine Learning'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-6883977845692652273</id><published>2009-11-02T13:24:00.001Z</published><updated>2009-11-02T13:24:28.089Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Machine Learning'/><category scheme='http://www.blogger.com/atom/ns#' term='Cambridge'/><title type='text'>Machine Learning Summer School Video’s Online</title><content type='html'>&lt;p&gt;For the people who missed the &lt;a href="http://mlg.eng.cam.ac.uk/mlss09/"&gt;Machine Learning Summer School&lt;/a&gt; here in Cambridge this year, the video’s are now available at &lt;a href="http://videolectures.net/mlss09uk_cambridge/"&gt;videolectures.net&lt;/a&gt;! My personal favorites: David MacKay’s &lt;a href="http://videolectures.net/mlss09uk_mackay_it/"&gt;Information Theory&lt;/a&gt;, David Blei’s &lt;a href="http://videolectures.net/mlss09uk_blei_tm/"&gt;Topic Models&lt;/a&gt;, Iain Murray’s &lt;a href="http://videolectures.net/mlss09uk_murray_mcmc/"&gt;Markov Chain Monte Carlo&lt;/a&gt; and Tom Minka’s &lt;a href="http://videolectures.net/mlss09uk_minka_ai/"&gt;Approximate Inference&lt;/a&gt;. Enjoy …&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-6883977845692652273?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/6883977845692652273/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=6883977845692652273' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/6883977845692652273'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/6883977845692652273'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2009/11/machine-learning-summer-school-videos.html' title='Machine Learning Summer School Video’s Online'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-4437656473378746184</id><published>2009-10-19T09:23:00.000+01:00</published><updated>2009-10-19T09:23:00.563+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Rants'/><title type='text'>Statistics Fail</title><content type='html'>&lt;p&gt;From (probably) the dumbest man on earth [&lt;a href="http://www.youtube.com/watch?v=IEuEkyzTKFY"&gt;link&lt;/a&gt;].&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-4437656473378746184?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/4437656473378746184/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=4437656473378746184' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/4437656473378746184'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/4437656473378746184'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2009/10/statistics-fail.html' title='Statistics Fail'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-2720347312271932003</id><published>2009-10-14T22:09:00.001+01:00</published><updated>2009-10-14T22:09:14.998+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Books'/><title type='text'>The Elements of Statistical Learning</title><content type='html'>&lt;p&gt;&lt;a href="http://www-stat.stanford.edu/~tibs/ElemStatLearn//"&gt;The Elements of Statistical Learning&lt;/a&gt; is an absolute classic for anyone wanting to do statistics/machine learning/data mining. I read that the second edition was out and debating whether I should spend the money on this new edition. Via &lt;a href="http://www.johndcook.com/blog/2009/10/14/elements-of-statistical-learning/"&gt;John Cook&lt;/a&gt; I learned that the book is out on pdf (&lt;a href="http://www-stat.stanford.edu/~tibs/ElemStatLearn//"&gt;from their website&lt;/a&gt;). DOUBLE WIN: a) I’ve already paid once and get the upgrade for free, b) I know have a way to electronically search the book.&lt;/p&gt; &lt;p&gt;I also found out today that Koller and Friedman have just released their much anticipated book &lt;a href="http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&amp;amp;tid=11886"&gt;Probabilistic Graphical Models&lt;/a&gt; from MIT press. At a lengthy 1208 pages, this should provide enough reading for a few nights!&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-2720347312271932003?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/2720347312271932003/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=2720347312271932003' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/2720347312271932003'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/2720347312271932003'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2009/10/elements-of-statistical-learning.html' title='The Elements of Statistical Learning'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-8371617448640245720</id><published>2009-10-11T17:26:00.002+01:00</published><updated>2009-10-11T18:26:16.829+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='Rants'/><title type='text'>Is Bayesian Statistics Easier?</title><content type='html'>&lt;p&gt;Many people have debated the merits of Bayesian statistics versus Frequentist statistics: very recently there was &lt;a href="http://ba.stat.cmu.edu/vol03is03.php"&gt;this discussion&lt;/a&gt; in Bayesian Analysis by Gelman. It’s a fascinating discussion and I’m just learning to ins and outs of different arguments myself.&lt;/p&gt; &lt;p&gt;First, I want to point people to &lt;a href="http://www.stat.columbia.edu/%7Egelman/stuff_for_blog/ohagan.pdf"&gt;this (short) but very insightful article&lt;/a&gt; by Tony O’Hagan. There are many ways of looking at the difference between Bayesian and Frequentist methods and this article makes one particular (a bit philosophical) viewpoint very clear.&lt;/p&gt; &lt;p&gt;I’m certainly not knowledgeable enough to take a stand in this debate but one thing I do have a strong opinion about: Bayesian methods are much easier to understand! In most statistics 101 classes we are taught a bit of point estimation (e.g. estimating a mean, a variance) which I think is understandable, but then people get into hypothesis testing and suddenly have to reason about following statements: (quoting Wikipedia here): “&lt;em&gt;Assuming that the null hypothesis is true, what is the probability of observing a value for the test statistic that is at least as extreme as the value that was actually observed?”&lt;/em&gt;.&lt;em&gt; &lt;/em&gt;I certainly find this a really complicated notion to reason about. The Bayesian alternative would say something like: “&lt;em&gt;What is the probability of generating the observed value for this particular model”.&lt;/em&gt; It is the same kind of statements as “What is the probability of generating a five from a fair dice”. I honestly think statistics would be a lot more understandable (and hence I think enjoyable) if we were teaching Bayesian statistics before getting into more advanced frequentist methods. I really wonder whether there are schools out there where the undergrad stats curriculum is based on the Bayesian approach?&lt;/p&gt;&lt;p&gt;&lt;span style="font-style: italic;"&gt;EDIT - I fixed the link to O'Hagan's article.&lt;/span&gt;&lt;br /&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-8371617448640245720?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/8371617448640245720/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=8371617448640245720' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/8371617448640245720'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/8371617448640245720'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2009/10/is-bayesian-statistics-easier.html' title='Is Bayesian Statistics Easier?'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-3035551374667382781</id><published>2009-10-03T21:20:00.001+01:00</published><updated>2009-10-03T21:21:08.665+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Machine Learning'/><category scheme='http://www.blogger.com/atom/ns#' term='iHMM'/><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><title type='text'>Dirichlet Distributions and Entropy</title><content type='html'>&lt;p&gt;In between all the Netflix &lt;a href="http://news.bbc.co.uk/1/hi/technology/8268287.stm"&gt;excitement&lt;/a&gt; I managed to read a paper that was mentioned by David Blei during his machine learning summer school talk: &lt;a href="http://www.princeton.edu/~wbialek/our_papers/nemenman+al_02.pdf"&gt;Entropy and Inference, Revisited&lt;/a&gt; by Ilya Nemenman, Fariel Shafee and William Bialek from NIPS 2002. This paper discusses some issues that arise when learning Dirichlet distributions. Since Dirichlet distributions are so common, I think these results should be more widely known.&lt;/p&gt; &lt;p&gt;&lt;em&gt;[Given that I’ve been reading a lot of natural language processing papers where Dirichlet distributions are quite common, I’m surprised I haven’t run into this work before.]&lt;/em&gt;&lt;/p&gt; &lt;p&gt;First a little bit of introduction on Dirichlet distributions. The Dirichlet distribution is a “distribution over distribution”; in other words: a draw from a Dirichlet distribution is a vector of positive real numbers that sum up to one. The Dirichlet distribution is parameterized by a vector of positive real numbers which captures the mean and variance of the Dirichlet distribution. It is often very natural to work with a slightly constrained Dirichlet distribution called the symmetric Dirichlet distribution: in this case the vector of parameters to the Dirichlet are all the same number. This implies that the mean of the Dirichlet is a uniform distribution and the variance is captured by the magnitude for the vector of parameters. Let us denote with beta the parameter of the symmetric Dirichlet. Then, when \beta is small, samples from the Dirichlet will have high variance while with beta large, samples from the Dirichlet will have small variance. The plot below illustrates this idea for a Dirichlet with 1000 dimensions: the top plot has very small beta and hence a draw from this Dirichlet has only a few nonzero entries (hence high variance) while the Dirichlet with beta = 1, all entries of the sample have roughly the same magnitude (about 0.001).&lt;/p&gt; &lt;p&gt;&lt;a href="http://lh6.ggpht.com/_yBbodrC25kU/Ssex6KnS4cI/AAAAAAAASkg/vGNmpwqGTPo/s1600-h/image%5B4%5D.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_yBbodrC25kU/Ssex7HLhQtI/AAAAAAAASko/GRKe0zWXRBU/image_thumb%5B2%5D.png?imgmax=800" width="458" height="354"&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;Another way to approach the effect of beta is to look at the entropy of a sample from the Dirichlet distribution, denoted by S in the images below. The entropy of a Dirichlet draw is high when beta is large. More in particular, it is upper bounded by ln(D) where D is the dimensionality of the Dirichlet when beta approaches infinity and the Dirichlet distribution will approach a singular distribution at completely uniform discrete distribution. When beta approaches 0, a draw from a Dirichlet distribution approaches a delta peak on a random entry which is a distribution with entropy 1. The key problem the authors want to address is that when learning an unknown distribution, if we use a Dirichlet prior, beta pretty much fixes the allowed shapes while we might not have a good reason a priori to believe that what we want to learn is going to look like either one of these distributions.&lt;/p&gt; &lt;p&gt;The way the authors try to give insight into this problem is by computing the entropy of a random draw of a Dirichlet distribution. In equations, if we denote with S the entropy, it will be a random variable with distribution&lt;/p&gt; &lt;p&gt;&lt;a href="http://lh6.ggpht.com/_yBbodrC25kU/Ssex7juUViI/AAAAAAAASkw/kzOqy_VTRwc/s1600-h/image%5B9%5D.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/_yBbodrC25kU/Ssex8auDEQI/AAAAAAAASk4/rssJWkKkM2E/image_thumb%5B5%5D.png?imgmax=800" width="461" height="72"&gt;&lt;/a&gt; &lt;/p&gt; &lt;p align="center"&gt;&lt;/p&gt; &lt;p&gt;&lt;/p&gt; &lt;p&gt;Computing the full distribution is hard but the authors give a method to compute its mean and variance. The following picture shows the mean and variance of the entropy for draws of a Dirichlet distributions. A bit of notation: K is the dimensionality of the Dirichlet distribution, Xi is the mean entropy (as a function of beta) and sigma is the variance of the entropy as a function of beta.&lt;/p&gt; &lt;p&gt;&lt;a href="http://lh3.ggpht.com/_yBbodrC25kU/Ssex86qYdvI/AAAAAAAASlA/yCaFsrxxkpk/s1600-h/image%5B13%5D.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_yBbodrC25kU/Ssex9-UWRVI/AAAAAAAASlI/HlRWOi1LThQ/image_thumb%5B7%5D.png?imgmax=800" width="465" height="365"&gt;&lt;/a&gt; &lt;/p&gt; &lt;p&gt;As you can see from this plot, the entropy of Dirichlet draws is extremely peaked for even moderately large K. The authors give a detailed analysis of what this implies but the main take-away message is this: &lt;em&gt;as you change beta, the entropy of the implied Dirichlet draws varies smoothly, however, because the variance of the entropy is very peaked, the a priori choice of beta almost completely fixes the entropy.&lt;/em&gt;&lt;/p&gt; &lt;p&gt;This is problematic as it means that unless our distribution is sampled almost completely, the estimate of the entropy is dominated by the choice of our prior beta. So how can we fix this? The authors suggest a scheme which I don’t completely understand but boils down to a mixture of Dirichlet distributions by specifying a prior distribution on beta as well.&lt;/p&gt; &lt;p&gt;This mixture idea ties in with something we did in our &lt;a href="http://mlg.eng.cam.ac.uk/jurgen/pubs/emnlp09.pdf"&gt;EMNLP 09 paper&lt;/a&gt;: when we were training our part-of-speech tagger we had to choose a prior for the distribution which specifies what words are generated for a particular part-of-speech tag. We know that we have part-of-speech tag classes that generate very few words (e.g. determiners, attributes, …) and a few classes that generate a lot of words (e.g. nouns, adjectives, …). At first, we chose a simple Dirichlet distribution (with fixed beta) as our prior and although the results were reasonable, we did run into the effect explained above: if we set beta to be large, we got very few states in the iHMM where each state outputs a lot of words. This is good to capture nouns and verbs but not good for other classes. Conversely, when we chose a small beta we got a lot of states in the iHMM each generating only a few words. Our next idea was to put a Gamma prior on beta; this helped a bit, but still assumed that there is only one value of beta (or one type of entropy distribution) which we want to learn. This is again unreasonable in the context of natural language. Finally, we chose to put a Dirichlet process prior on beta (with a Gamma base measure). This essentially allows different states in the iHMM to have different beta’s (but we only expect to see a few discrete beta’s).&lt;/p&gt; &lt;p&gt;“Entropy and Inference, Revisited” is one of those papers with a lot of intuitive explanations; hopefully it helps you make the right choice for priors in your next paper as well.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-3035551374667382781?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/3035551374667382781/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=3035551374667382781' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/3035551374667382781'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/3035551374667382781'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2009/10/dirichlet-distributions-and-entropy.html' title='Dirichlet Distributions and Entropy'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/_yBbodrC25kU/Ssex7HLhQtI/AAAAAAAASko/GRKe0zWXRBU/s72-c/image_thumb%5B2%5D.png?imgmax=800' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-5510349194588326046</id><published>2009-09-30T11:36:00.003+01:00</published><updated>2009-09-30T11:39:41.366+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='Cambridge'/><title type='text'>Stats Clinic</title><content type='html'>We are starting a new machine learning related initiative in Cambridge: the walk-in &lt;a href="http://www.statslab.cam.ac.uk/clinic/"&gt;Stats Clinic&lt;/a&gt;. The goal of our consulting service is to help applied researchers with their questions in statistics or machine learning. Starting October 21 from 16:30-18:00, we will be in meeting room 5 at the  &lt;a href="http://www.cam.ac.uk/map/v4/drawmap.cgi?mp=main;xx=1400;yy=560;mt=c;mx=1090;my=513;sx=4;tl=Statistical%20Laboratory;gf=png"&gt;Centre     for Mathematical Sciences&lt;/a&gt; every other week.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-5510349194588326046?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/5510349194588326046/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=5510349194588326046' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/5510349194588326046'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/5510349194588326046'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2009/09/stats-clinic.html' title='Stats Clinic'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-7843975113372793162</id><published>2009-09-20T17:32:00.001+01:00</published><updated>2009-09-20T17:32:51.518+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Machine Learning'/><title type='text'>May All Your Wishes Come True</title><content type='html'>&lt;p align="justify"&gt;I am trying to catch up with a stack of 109 papers which are on my “To Read” list. Today I read a very enjoyable paper called &lt;a href="http://pages.cs.wisc.edu/%7Eandrzeje/publications/naacl-2009.pdf"&gt;May all your wishes come true: A study of wishes and how to recognize them&lt;/a&gt; by some of my former UW-Madison friends and colleagues Andrew B. Goldberg, Nathanael Fillmore, David Andrzejewski, Zhiting Xu, Bryan Gibson and Xiaojin Zhu.&lt;/p&gt; &lt;p align="justify"&gt;The story is as follows: in 2007, the Times Square Alliance asked people to send their wished to a &lt;a href="http://www.timessquarenyc.org/nye/nye_interactive.html"&gt;Virtual Wishing Wall&lt;/a&gt; website with the promise to print these wishes on confetti and drop it from the sky at midnight December 31, 2007. The UW-Madison people got a hold of this corpus of wishes and did some statistical analysis.&lt;/p&gt; &lt;p align="justify"&gt;A few highlights:&lt;/p&gt; &lt;ul&gt; &lt;li&gt; &lt;div align="justify"&gt;wishes follow Zipf’s law&lt;/div&gt;&lt;/li&gt; &lt;li&gt; &lt;div align="justify"&gt;topic wise, US people wish more for religion&lt;/div&gt;&lt;/li&gt; &lt;li&gt; &lt;div align="justify"&gt;topic wise, non-US people wish more for love, peace and travel&lt;/div&gt;&lt;/li&gt; &lt;li&gt; &lt;div align="justify"&gt;scope wise, US people wish to their country and family more frequently&lt;/div&gt;&lt;/li&gt; &lt;li&gt; &lt;div align="justify"&gt;scope wise, non-US people wish to the world more frequently&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p align="justify"&gt;After the statistical analysis, the authors wanted to come up with a classifier that could detect wishes. Their approach consists of extracting wish templates in an unsupervised fashion. The idea is quite clever (and I believe more widely applicable): people often wished for “world peace” and “I wish for world peace”, or “health and happiness” and “I wish for health and happiness”. In other words, there is quite a bit of redundancy in how the wish is expressed. By matching similar wishes, the authors extract the “redundant” part and assume it is a template.&lt;/p&gt; &lt;p align="justify"&gt;Matching wishes can be done as follows: you create a bipartite graph and loop over all wishes, if a wish contains another wish (“I wish for world peace” contains “world peace”), you create a content node on the left with the content part (e.g. “world peace”), a template part on the right with the redundancy (e.g. “I wish for”) and you connect these with a directed edge from the content to template node. When you’re done you loop over all template nodes and see whether any of them contain content words. If so, you introduce an edge from the template node to the content node. Finally, you score template nodes by their indegree – outdegree. I will refer you to table 4 to see that the results by this relatively simple algorithm are quite good.&lt;/p&gt; &lt;p align="justify"&gt;The reason I believe this is more widely applicable is that we can imagine using this technique for extracting templates from web queries. People often query for facts: e.g. &lt;a href="http://www.google.com/trends?q=kids+of+michael+jackson&amp;amp;ctab=180866528&amp;amp;geo=all&amp;amp;date=all"&gt;kids of Michael Jackson&lt;/a&gt;. Using the algorithm from this paper, we might be able to extract the most common templates which in turn would give us a good indication of what kind of answers search engines should be trying to answer. Just an idea …&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-7843975113372793162?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/7843975113372793162/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=7843975113372793162' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/7843975113372793162'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/7843975113372793162'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2009/09/may-all-your-wishes-come-true.html' title='May All Your Wishes Come True'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-340544422237647672</id><published>2009-09-16T22:33:00.001+01:00</published><updated>2009-09-16T22:33:07.735+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Artificial Intelligence'/><title type='text'>AI Bugs</title><content type='html'>&lt;p&gt;I thought &lt;a href="http://aigamedev.com/open/articles/bugs-caught-on-tape/"&gt;this&lt;/a&gt; was a very funny set of AI bloopers …&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-340544422237647672?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/340544422237647672/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=340544422237647672' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/340544422237647672'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/340544422237647672'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2009/09/ai-bugs.html' title='AI Bugs'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-6630992476825751134</id><published>2009-08-19T23:06:00.000+01:00</published><updated>2009-08-19T23:06:00.358+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Language'/><category scheme='http://www.blogger.com/atom/ns#' term='Conference Reports'/><title type='text'>ACL 09 &amp; EMNLP 09</title><content type='html'>&lt;p align="justify"&gt;ACL-IJCNLP 2009 and EMNLP 2009 have just finished here in Singapore. As an outsider to the field I had a hard time following many talks but nonetheless enjoyed the conference. The highlight for me was the talk by &lt;a href="http://www.cslu.ogi.edu/~sproatr/"&gt;Richard Sproat&lt;/a&gt; who wondered whether there exists a statistical test to check if a series of symbol sequences is actually a language? If this test would exist, we could use it to decide whether the set of symbols known as the &lt;a href="http://en.wikipedia.org/wiki/Indus_script"&gt;Indus Valey Script&lt;/a&gt; is actually a language. Very fascinating stuff: I immediately bought “Lost Languages” by Andrew Robinson to learn more about the history of deciphering dead languages.&lt;/p&gt; &lt;p align="justify"&gt;The paper had some very cool papers; the first one I really liked was &lt;a href="http://chasen.org/~daiti-m/paper/acl2009segment.pdf"&gt;Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling&lt;/a&gt; by &lt;a href="http://chasen.org/~daiti-m/"&gt;Daichi Mochihashi&lt;/a&gt; et al. They build on the work of Yee Whye Teh and Sharon Goldwater who showed that Kneser-Ney language modelling is really an approximate version of a hierarchical Pitman-Yor based language model (HPYLM). The HPYLM starts from a unigram model over a fixed dictionary and hence doesn’t accommodate for out of vocabulary words. Daichi et al extended the HPYLM so that the base distribution is now an character infinity-gram that is itself an HPYLM (over characters). They call this model the nested HPYLM or NPYLM. There is no need for a vocabulary of words in the NPYLM, rather, the HPYLM base distribution is a distribution over arbitrary long strings. In addition the model will perform automatic word segmentation. The results are really promising: from their paper, consider the following unsegmented English text&lt;/p&gt; &lt;center&gt; &lt;table border="0" cellspacing="0" cellpadding="2" width="400"&gt; &lt;tbody&gt; &lt;tr&gt; &lt;td valign="top" width="400"&gt; &lt;p&gt;&lt;font size="2"&gt;lastly,shepicturedtoherselfhowthissamelittlesisterofhersw&lt;br&gt;ould,intheafter-time,beherselfagrownwoman;andhowshe&lt;br&gt;wouldkeep,throughallherriperyears,thesimpleandlovingh&lt;br&gt;eartofherchildhood:andhowshewouldgatheraboutherothe&lt;br&gt;rlittlechildren,andmaketheireyesbrightandeagerwithmany&lt;br&gt;astrangetale,perhapsevenwiththedreamofwonderlandoo&lt;br&gt;ngago:andhowshewouldfeelwithalltheirsimplesorrows,an&lt;br&gt;dndapleasureinalltheirsimplejoys,rememberingherownc&lt;br&gt;hild-life,andthehappysummerdays. […]&lt;/font&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/center&gt; &lt;p&gt;When the NPYLM is trained on this data, the following is found&lt;/p&gt; &lt;center&gt; &lt;table border="0" cellspacing="0" cellpadding="2" width="400"&gt; &lt;tbody&gt; &lt;tr&gt; &lt;td valign="top" width="400"&gt; &lt;p&gt;&lt;font size="2"&gt;last ly , she pictured to herself how this same little sis-&lt;br&gt;ter of her s would , inthe after - time , be herself agrown woman ; and how she would keep , through allher ripery ears , the simple and loving heart of her child hood : and how she would gather about her other little children ,and make theireyes bright and eager with many a strange tale , perhaps even with the dream of wonderland of longago : and how she would feel with all their simple sorrow s , and a pleasure in all their simple joys , remember ing her own child - life , and thehappy summerday s .&lt;/font&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/center&gt; &lt;div align="left"&gt;&amp;nbsp;&lt;/div&gt; &lt;div align="left"&gt;&lt;a href="http://homepages.inf.ed.ac.uk/pblunsom/pubs/blunsom-acl09-short.pdf"&gt;A note on the implementation of Hierarchical Dirichlet Processes&lt;/a&gt; by &lt;a href="http://homepages.inf.ed.ac.uk/pblunsom"&gt;Phil Blunsom&lt;/a&gt; et al. In this paper the authors discuss how previous approximate inference schemes to the HDP collapsed Gibbs sampler can turn out to be quite bad. In this paper they propose a more efficient and exact algorithm for a collapsed Gibbs sampler for the HDP.&lt;/div&gt; &lt;div align="left"&gt;&amp;nbsp;&lt;/div&gt; &lt;div align="left"&gt;A few other papers I really enjoyed:&lt;/div&gt; &lt;ul&gt; &lt;li&gt; &lt;div align="left"&gt;&lt;a href="http://www.isi.edu/natural-language/mt/acl09-tag.pdf"&gt;Minimized Models for Unsupervised Part-of-Speech Tagging&lt;/a&gt; by Sujith Ravi et al.&lt;/div&gt; &lt;li&gt; &lt;div align="left"&gt;&lt;a href="http://www.cs.umass.edu/~wallach/publications/mimno09polylingual.pdf"&gt;Polylingual Topic Models&lt;/a&gt; by David Mimno et al.&lt;/div&gt; &lt;li&gt; &lt;div align="left"&gt;&lt;a href="http://www.aclweb.org/anthology-new/D/D09/D09-1011.pdf"&gt;Graphical Models over Multiple Strings&lt;/a&gt; by Markus Dreyer and Jason Eisner&lt;/div&gt; &lt;li&gt; &lt;div align="left"&gt;&lt;a href="http://www.aclweb.org/anthology/P/P09/P09-2012.pdf"&gt;Bayesian Learning of a Tree Substitution Grammar&lt;/a&gt; by Matt Post and Daniel Gildea&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-6630992476825751134?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/6630992476825751134/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=6630992476825751134' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/6630992476825751134'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/6630992476825751134'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2009/08/acl-09-emnlp-09.html' title='ACL 09 &amp;amp; EMNLP 09'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-2839126485482256752</id><published>2009-08-12T04:14:00.000+01:00</published><updated>2009-08-12T23:09:49.878+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Machine Learning'/><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><title type='text'>To R or not to R</title><content type='html'>At the moment 90% of the research in our lab is done using Matlab. Some of us have tried Python but were dissapointed by how hard it was to get decent (read: native) numerical routines on par with Matlab. I've been developing on the .NET platform with a little bit of Java development (for Amazon's Elastic Map-Reduce) on the side. We've had a few discussions in our lab recently about switching to R for our machine learning research. A decision hasn't been made and we're wondering what the larger machine learning community thinks about this issue. Here's a list of pros and cons that I've come up with&lt;br /&gt;&lt;br /&gt;Pros:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;great support for statistical functions&lt;/li&gt;&lt;li&gt;great support from the statistical community&lt;/li&gt;&lt;li&gt;good (if not better) plotting that Matlab&lt;/li&gt;&lt;li&gt;reasonable fast (check out &lt;a href="http://mlg.eng.cam.ac.uk/dave/"&gt;Dave&lt;/a&gt;'s &lt;a href="http://mlg.eng.cam.ac.uk/dave/rmbenchmark.php"&gt;benchmarking&lt;/a&gt;)&lt;/li&gt;&lt;li&gt;free (very important!) if we want to run a job on 100 machines (e.g. in the cloud), I believe currently you need a matlab licence for each one&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;Cons:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;pre-historic interface, doesn't compare to modern IDE's&lt;/li&gt;&lt;li&gt;debugging doesn't seem up to Matlab standards&lt;/li&gt;&lt;li&gt;not much support in machine learning community (?)&lt;/li&gt;&lt;li&gt;learning curve&lt;/li&gt;&lt;/ul&gt;The jury is still out on this one ... I really wonder how many machine learners out there already use R?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-2839126485482256752?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/2839126485482256752/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=2839126485482256752' title='22 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/2839126485482256752'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/2839126485482256752'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2009/08/to-r-or-not-to-r.html' title='To R or not to R'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>22</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-8070353770632636302</id><published>2009-08-07T16:28:00.001+01:00</published><updated>2009-08-07T16:29:50.592+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Machine Learning'/><title type='text'>AIStats is coming to Europe</title><content type='html'>&lt;p&gt;That’s right, the conference that only happened every two years in the US is now going to be switching between the US and Europe on an annual basis. The conference website is up and running &lt;a href="http://www.aistats.org"&gt;here&lt;/a&gt;.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-8070353770632636302?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/8070353770632636302/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=8070353770632636302' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/8070353770632636302'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/8070353770632636302'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2009/08/aistats-is-comming-to-europe.html' title='AIStats is coming to Europe'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-2494123202586528103</id><published>2009-06-11T09:02:00.002+01:00</published><updated>2009-06-11T09:03:44.494+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Machine Learning'/><category scheme='http://www.blogger.com/atom/ns#' term='iHMM'/><category scheme='http://www.blogger.com/atom/ns#' term='Large Scale ML'/><title type='text'>Parallel Random Number Generators</title><content type='html'>&lt;p align="justify"&gt;We recently got a paper accepted in EMNLP where we elaborate on a point that was touched upon in Mark Johnson’s &lt;a href="http://acl.ldc.upenn.edu/D/D07/D07-1031.pdf"&gt;“Why doesn’t EM find good HMM POS-taggers?”&lt;/a&gt;: if we are going to do completely unsupervised POS-tagging, how can non-parametric Bayesian methods help us? There were two parts to our paper: extending the iHMM to be able to handle a medium sized text corpus and an evaluation of the results to see how the iHMM states correspond to POS tags. I’ll blog about our findings later but one thing that came up during this research was the following: because of the size (1,000,000 datapoints) of the corpus (Wall Street Journal of Penn Treebank) we used, we wanted to parallelize the forward-filtering backward-sampling (FFBS) part of the inference. Because all the sentences are independent given the transition matrix and output parameters, we can easily run the FFBS in parallel.&lt;/p&gt; &lt;p align="justify"&gt;One thing I started thinking about is the following: imagine I run on a machine with 4 cores and I start a thread for each core to process a chunk of the data. Since the FFBS is a sampling algorithm, each of these 4 threads needs to access a random number generator (RNG). I know of two ways to go about this:&lt;/p&gt; &lt;ol&gt; &lt;li&gt; &lt;div align="justify"&gt;We use one RNG and access it from the 4 threads. Since the RNG’s I use are stateful, we need to lock them when we query them from a number so it’s internal state doesn’t get messed up. Locking an RNG can incur a large overhead and easily becomes the bottleneck of the parallelization.&lt;/div&gt; &lt;/li&gt;&lt;li&gt; &lt;div align="justify"&gt;Use one RNG for each thread. How do we initialize them though? I use the .NET &lt;a href="http://msdn.microsoft.com/en-us/library/system.random.aspx"&gt;System.Random&lt;/a&gt; RNG which uses a time dependent random seed. I don’t think this is a good idea because of the resolution of this time dependent seed is to coarse, and my 4 threads start almost at the same time, they might all end up using the same seed. An alternative is to initialize a global RNG to produce the seeds for the thread local RNG’s.&lt;/div&gt;&lt;/li&gt;&lt;/ol&gt; &lt;p align="justify"&gt;The second alternative in (2) is what we used but it does worry me a little bit: how can I know that the streams of random numbers in different threads are going to be ‘good enough’ (whatever that means). As a non-expert on RNG’s I’d like to know whether I’m going to get into trouble with these kinds of approaches. Matlab has a way to generate &lt;a href="http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/math/brnuahp.html"&gt;3 statistically independent streams&lt;/a&gt; using the same RNG with the same seed. This seems like a step in the right direction, but it doesn’t solve my quad core problem, let alone when we run our algorithm on many more cores in Amazon Elastic Map Reduce …&lt;/p&gt; &lt;p align="justify"&gt;Any creative solutions out there?&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-2494123202586528103?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/2494123202586528103/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=2494123202586528103' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/2494123202586528103'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/2494123202586528103'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2009/06/parallel-random-number-generators.html' title='Parallel Random Number Generators'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-9032920941722230723</id><published>2009-05-06T16:09:00.003+01:00</published><updated>2009-05-06T16:12:02.094+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Machine Learning'/><title type='text'>Machine Learning Summer School Application Process Opens</title><content type='html'>Just to let all of you know that the application process for the summer school has opened. More info &lt;a href="http://mlg.eng.cam.ac.uk/mlss09/application.htm"&gt;here&lt;/a&gt;. The deadline for applications is June 1. The list of &lt;a href="http://mlg.eng.cam.ac.uk/mlss09/index.html"&gt;confirmed speakers&lt;/a&gt; is pretty a-ma-zing if you ask me.&lt;br /&gt;&lt;br /&gt;Hopefully I get to meet some of you in September!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-9032920941722230723?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/9032920941722230723/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=9032920941722230723' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/9032920941722230723'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/9032920941722230723'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2009/05/machine-learning-summer-school.html' title='Machine Learning Summer School Application Process Opens'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-1054822592883199893</id><published>2009-03-27T07:54:00.001Z</published><updated>2009-03-27T07:54:48.739Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Large Scale ML'/><title type='text'>The Unreasonable Effectiveness of Data</title><content type='html'>&lt;p&gt;Alon Halevy, Peter Norvig, and Fernando Pereira (from the Google) just wrote an intriguing article called &lt;a href="http://www.computer.org/portal/cms_docs_intelligent/intelligent/homepage/2009/x2exp.pdf"&gt;the unreasonable effectiveness of data&lt;/a&gt; in IEEE Intelligent Systems. They argue a few points: 1) use more data to increase the performance of our learning algorithms, don’t make your models to complex; 2) the semantic web is not the right approach, it’s too expensive to “label” the web with its semantics, we need to learn it. However for an non-parametric Bayes person this is what got me very excited:&lt;/p&gt; &lt;p&gt;&lt;em&gt;So, follow the data. Choose a representation that can use unsupervised learning on unlabeled data, which is so much more plentiful than labeled data. Represent all the data with a nonparametric model rather than trying to summarize it with a parametric model, because with very large data sources, the data holds a lot of detail.&lt;/em&gt;&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-1054822592883199893?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/1054822592883199893/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=1054822592883199893' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/1054822592883199893'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/1054822592883199893'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2009/03/unreasonable-effectiveness-of-data.html' title='The Unreasonable Effectiveness of Data'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-257449943022397476</id><published>2008-12-12T16:32:00.001Z</published><updated>2008-12-12T16:34:09.533Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Machine Learning'/><title type='text'>Machine Learning Summer School in Cambridge!</title><content type='html'>&lt;p align="justify"&gt;The machine learning summer school 2009 will be organized in Cambridge (UK) from August 29 to September 10, 2009. On &lt;a href="http://mlg.eng.cam.ac.uk/mlss09/"&gt;this website&lt;/a&gt; we will soon be posting the list of speakers, the application procedure and the other practical information.&lt;/p&gt; &lt;p align="justify"&gt;See you in Cambridge!&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-257449943022397476?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/257449943022397476/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=257449943022397476' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/257449943022397476'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/257449943022397476'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2008/12/machine-learning-summer-school-in.html' title='Machine Learning Summer School in Cambridge!'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-6214488328481510290</id><published>2008-10-04T22:19:00.001+01:00</published><updated>2008-10-04T22:19:39.401+01:00</updated><title type='text'>Inference in Infinite Capacity Models</title><content type='html'>&lt;p&gt;Through the work we've done on nonparametric Bayesian (NPBayes) time series models, I've been thinking a lot about how we can improve inference in infinite capacity models. One observation is that we don't want to throw away all the knowledge and experience we've built up in doing deterministic inference for finite models.&lt;/p&gt; &lt;p&gt;Expectation propagation (EP), belief propagation (BP) and variational Bayes (VB) are three deterministic inference techniques which have proven to be useful in many different scenarios. People have tried to apply these to NPBayes with mixed success:&lt;/p&gt; &lt;ul&gt; &lt;li&gt;&lt;a href="http://www.cs.toronto.edu/~beal/npbayes/Minka.pdf"&gt;Minka &amp;amp; Ghahramani&lt;/a&gt; applied EP to do inference in DP mixtures; as far as I understand their work, their are a few annoyances: a) the approximation can depend heavily on the order in which one processes the data points; b) the posterior distribution ends up having a cluster centre for each data point and these centres don't necessarily (and it their experiments never) fall on top of each other. This is weird as the whole point of the DP is to make sure that there is only a small number of cluster centres. I can imagine there are other ways to apply EP to DP mixtures but so far this endeavour hasn't drastically changed the business of inference for NPBayes.  &lt;li&gt;There have been a few approaches (&lt;a href="http://www.cs.berkeley.edu/~jordan/papers/blei-jordan-ba.pdf"&gt;Blei&lt;/a&gt;, &lt;a href="http://kenichi.kurihara.googlepages.com/Kurihara-IJCAI07.pdf"&gt;Kurihara&lt;/a&gt;, ...) to apply VB to DP mixtures and related models. The inference is quite elegant and quite a few applications have shows that VB inference is competitive with Gibbs sampling. VB has advantages such as returning a lower bound on the marginal likelihood which are hard to get using sampling methods. As far as I am concerned, the disadvantage of VB is that the math can become a bit hairy. &lt;li&gt;I am not aware of any BP inference for NPBayes. I'd love to hear if anyone has tried this?&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;What I've been thinking about lately is something completely different. Our work on the &lt;a href="http://mlg.eng.cam.ac.uk/jurgen/pubs/icml2008ihmm.pdf"&gt;beam sampler&lt;/a&gt; has shown that if one uses a slice sampler in the appropriate way, one can adaptively slice up an infinite model into a finite model and use deterministic inference methods on a finite graphical model. The beam sampler is essentially a combination of this slice sampling idea with belief propagation (more in particular, the forward-backward algorithm). The scope of this technique is broader than the iHMM though: we applied this technique to the iFHMM and one can also imagine this scheme to work on infinite state Bayesian networks (ISBN). In the ISBN context we would introduce a slice for every infinite dimensional distribution and switch between sampling the slice variables and deterministic inference in the finite Bayesian network. The disadvantage of the slice sampler is that one ends up with samples and for many applications it is not clear how to combine them.&lt;/p&gt; &lt;p&gt;After a suggestion by &lt;a href="http://www.cs.toronto.edu/~radford/"&gt;Radford Neal&lt;/a&gt; I also looked into a different scheme to adaptively truncate the iHMM. The algorithm is an adaptation of the embedded HMM sampling scheme for nonlinear dynamical systems. Imagine we draw the trellis for the infinite dimensional state space of an iHMM:&lt;a href="http://lh3.ggpht.com/jurgen.vangael/SOfd4DEzNbI/AAAAAAAANUM/rV4t1vaHmY0/image18.png"&gt;&lt;img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="297" alt="image" src="http://lh3.ggpht.com/jurgen.vangael/SOfd478kkpI/AAAAAAAANUU/AAqk-BFSoxA/image_thumb14.png" width="389" border="0"&gt;&lt;/a&gt; &lt;/p&gt; &lt;p&gt;The horizontal ... denote that there are T time steps in the iHMM, the vertical ... denote that there are a possible infinite number of state the iHMM can be in. The grayed out nodes denote a initial state sequence assignment (we are going to sample here). Note that I haven't draw all the possible transition arrows, in general there is a nonzero transition from every node at time t-1 to time t.&lt;/p&gt; &lt;p&gt;The embedded HMM sampler works as follows: for each time step, we create a &lt;em&gt;pool&lt;/em&gt; of states which we want to include in the deterministic inference which we will run later. In the image below, I denote the states in the pool by shading them with light gray. The size of this pool will be a parameter which we can choose later (in the picture below it is 2).&lt;/p&gt; &lt;p&gt;&lt;a href="http://lh4.ggpht.com/jurgen.vangael/SOfd56fRojI/AAAAAAAANUc/HJc4d58qkVY/image17.png"&gt;&lt;img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="299" alt="image" src="http://lh5.ggpht.com/jurgen.vangael/SOfd6jSTd0I/AAAAAAAANUk/e-oQ1FWOXcY/image_thumb13.png" width="392" border="0"&gt;&lt;/a&gt; &lt;/p&gt; &lt;p&gt;We can choose how to draw the pools but one must make sure that the states from the previous sample (see picture above) are certainly included in the pool; it is also desirable to not spend too much computation on selecting the pool of states.&lt;/p&gt; &lt;p&gt;One iteration of a sampler consists of sampling the pool states and then running the forward-filtering backward-sampling algorithm in a finite graphical model that only uses pool states. If you go through the math you will find that you'll have to slightly modify the dynamic program to make this work. I will post a short technical report with some mathematical details on my web page soon.&lt;/p&gt; &lt;p&gt;The embedded HMM sampler shares many of the properties of the slice sampler but differs in one major respect: with the slice sampler one has little control over how much computation one will spend on one iteration. The reason is that between iterations the size of the finite model we consider grows or shrinks according to the sampled slice variables. In the embedded HMM however, we explicitly control the size of the finite model we want to do inference over. The flip side is that if we choose the pools in the embedded HMM sampler to be small we will likely increase the mixing time of the sampler.&lt;/p&gt; &lt;p&gt;Nonetheless the basic idea is the same as for the beam sampler: you choose a finite subset of the NPBayes graphical model and then run any deterministic inference algorithm that will get you a sample form the finite subset.&lt;/p&gt; &lt;p&gt;The message I want to get across here is that if we want to do inference in infinite capacity models while at the same time leveraging a powerful deterministic inference algorithm for finite models we can do so by combining a method that truncates the infinite model (say using slice sampling or the embedded HMM sampler) and then applying the deterministic inference scheme on a finite graphical model. I am eager to play around with Infer.NET and see if it will be possible to do use this idea to automatically generate inference algorithms for infinite capacity models.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-6214488328481510290?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/6214488328481510290/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=6214488328481510290' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/6214488328481510290'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/6214488328481510290'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2008/10/inference-in-infinite-capacity-models.html' title='Inference in Infinite Capacity Models'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/jurgen.vangael/SOfd478kkpI/AAAAAAAANUU/AAqk-BFSoxA/s72-c/image_thumb14.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-8069429482338110576</id><published>2008-09-26T15:04:00.001+01:00</published><updated>2008-09-26T15:04:09.389+01:00</updated><title type='text'>NIPS 2008 Accepted Papers Out!</title><content type='html'>&lt;p&gt;Hot from the presses, the full list of &lt;a href="http://nips.cc/Conferences/2008/Program/accepted-papers.php"&gt;NIPS 2008 accepted papers&lt;/a&gt; are out. I quickly browsed through the papers and can say I am looking forward to these:&lt;/p&gt; &lt;p&gt;Cognitive Science&lt;/p&gt; &lt;ul&gt; &lt;li&gt;&lt;em&gt;Analyzing human feature learning as nonparametric Bayesian inference, &lt;/em&gt;J. Austerweil, T. Griffiths&lt;/li&gt; &lt;li&gt;&lt;em&gt;Depression: an RL formulation and a behavioural test, &lt;/em&gt;Q. Huys, j. vogelstein, P. Dayan&lt;/li&gt; &lt;li&gt;&lt;em&gt;Modeling human function learning with Gaussian processes, &lt;/em&gt;T. Griffiths, C. Lucas, J. Williams, M. Kalish&lt;/li&gt; &lt;li&gt;&lt;em&gt;Modeling the effects of memory on human online sentence processing with particle filters, &lt;/em&gt;R. Levy, F. Reali, T. Griffiths &lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Neuroscience&lt;/p&gt; &lt;ul&gt; &lt;li&gt;&lt;em&gt;Characterizing neural dependencies with Poisson copula models, &lt;/em&gt;P. Berkes, F. Wood, J. Pillow &lt;/li&gt; &lt;li&gt;&lt;em&gt;Dependent Dirichlet Process Spike Sorting&lt;/em&gt;, J. Gasthaus, F. Wood, D. Gorur, Y. Teh&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Machine Learning&lt;/p&gt; &lt;ul&gt; &lt;li&gt;&lt;em&gt;Gates&lt;/em&gt;, T. Minka, J. Winn: the next big thing in graphical models?&lt;/li&gt; &lt;li&gt;&lt;em&gt;Non-stationary dynamic Bayesian networks&lt;/em&gt;, J. Robinson, A. Hartemink&lt;/li&gt; &lt;li&gt;&lt;em&gt;Nonparametric Bayesian Learning of Switching Linear Dynamical Systems, &lt;/em&gt;E. Fox, E. Sudderth, M. Jordan, A. Willsky: some of which we've seen at the NPBayes 2008 workshop&lt;/li&gt; &lt;li&gt;&lt;em&gt;The Mondrian Process&lt;/em&gt;, D. Roy, Y. Teh: some of which we've seen at the NPBayes 2008 workshop&lt;/li&gt;&lt;/ul&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-8069429482338110576?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/8069429482338110576/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=8069429482338110576' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/8069429482338110576'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/8069429482338110576'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2008/09/nips-2008-accepted-papers-out.html' title='NIPS 2008 Accepted Papers Out!'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-2914497612699827550</id><published>2008-09-21T15:12:00.001+01:00</published><updated>2008-09-21T15:12:05.621+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Rants'/><title type='text'>Graduate School in Cambridge</title><content type='html'>&lt;p&gt;Someone mails me to ask what it is like to do a PhD in machine learning in Cambridge. With my first year of graduate school behind me I have a thing or two to say about the program here. So to all aspiring Cambridge graduate students: listen up!&lt;/p&gt; &lt;p align="justify"&gt;Overall I think Cambridge is an amazing place to be. It's like Disneyland for academics. There is so much stuff going on that every day I am disappointed about which talks I had to miss because there was another one going on at the same time. There are so many machine learning people around that I haven't even met them all. The ones I meet frequently with are&lt;/p&gt; &lt;ul&gt; &lt;li&gt; &lt;div align="justify"&gt;&lt;a href="http://www.inference.phy.cam.ac.uk/mackay/"&gt;David MacKay's group&lt;/a&gt; &lt;/div&gt; &lt;li&gt; &lt;div align="justify"&gt;&lt;a href="http://research.microsoft.com/mlp/"&gt;Microsoft Research Cambridge&lt;/a&gt; &lt;/div&gt; &lt;li&gt; &lt;div align="justify"&gt;&lt;a href="http://www.cl.cam.ac.uk/"&gt;the Computer Lab&lt;/a&gt; &lt;/div&gt; &lt;li&gt; &lt;div align="justify"&gt;&lt;a href="http://www.statslab.cam.ac.uk/"&gt;the Statistics Laboratory&lt;/a&gt; &lt;/div&gt; &lt;li&gt; &lt;div align="justify"&gt;&lt;a href="http://www.eng.cam.ac.uk/research/div-f/divhomeF.shtml"&gt;the Engineering Department&lt;/a&gt; &lt;/div&gt; &lt;li&gt; &lt;div align="justify"&gt;&lt;a href="http://www.ebi.ac.uk/"&gt;the European Bioinformatics Institute&lt;/a&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p align="justify"&gt;Needless to say there is plenty of opportunity to talk to people about machine learning. Each group has different accents and if you're interested in applying you might want to evaluate what kind of machine learning you want to do first. If pure machine learning or neuroscience is your thing, you'd probably fit in well with David's or our group. If you are more into applications the other engineering or computer lab groups would be a better fit. If learning theory is your thing, you'll also find people in the computer lab that do this kind of stuff. If you're more into theory the statistics laboratory would probably suit you well. Finally, if you don't mind a 45 minute train ride to London you will find yourself having access to the Gatsby unit, the UCL machine learning guys and probably many others I haven't met (Imperial College, ... ?)&lt;/p&gt; &lt;p align="justify"&gt;Life in Cambridge is good too; the colleges will provide you with a place to stay, friendly people to have dinner with, social events and tons of sports to choose from. Cambridge is also very friendly towards foreigners and I've noticed that in grad school probably more than 70% of the people are internationals.&lt;/p&gt; &lt;p align="justify"&gt;As far as I am concerned, there are two big turn-offs in Cambridge. The first is the application process itself. The whole thing is completely outdated and you will find yourself being accepted by a professor in February but only getting official letters about it in June or so. The whole process is slow and when you get all your US offers mid April you will find yourself completely in the dark with respect to your Cambridge status. I also found funding much more fragmented than in the US; a lot of people who get funding offers a few weeks before the academic year starts (we're talking September here). The other thing is graduate level classes: Cambridge does not have graduate level classes at the level that a big American university will organize them. Given my interest in machine learning I've been to a couple of statistics classes in the &lt;a href="http://www.maths.cam.ac.uk/postgrad/casm/"&gt;part III maths&lt;/a&gt; program which I found to be comparable to the lower level graduate classes in the US. So my advice for anyone who hasn't taken any graduate level machine learning classes is to first go to the US for a one or two years Master program or check out the &lt;a href="http://web4.cs.ucl.ac.uk/research/csml/msccsml/"&gt;programs at UCL&lt;/a&gt; or &lt;a href="http://www.inf.ed.ac.uk/postgraduate/ai.html"&gt;Edinburgh&lt;/a&gt; and then apply to Cambridge.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-2914497612699827550?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/2914497612699827550/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=2914497612699827550' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/2914497612699827550'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/2914497612699827550'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2008/09/graduate-school-in-cambridge.html' title='Graduate School in Cambridge'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-6250898441531501803</id><published>2008-07-29T22:37:00.000+01:00</published><updated>2008-07-29T22:38:13.225+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='iHMM'/><title type='text'>The Infinite Hidden Markov Model</title><content type='html'>&lt;p&gt;I am at ICML two weeks ago I presented some of our work on the infinite hidden Markov model (also known as iHMM or HDP-HMM). Together with a result from Emily Fox, I believe we have come full circle and it is time for a little summary.&lt;/p&gt; &lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Hidden_Markov_model"&gt;Hidden Markov models&lt;/a&gt; (HMM) have been the workhorse of the discrete time, discrete state time series community for over 40 years. We have seen applications in speech, robotics, natural language processing, vision, ... Sometimes, domain knowledge can be used to choose the dimensionality of the latent state space, but very frequently there is no reason to prefer one dimension over the other. Moreover, maybe we explicitly want to model the uncertainty in the dimensionality of the latent state space! Hence, variational Bayes, BIC or other model selection techniques do not apply and one option is to turn to infinite capacity models.&lt;/p&gt; &lt;p&gt;In 2002, Matt Beal, Zoubin Ghahramani and Carl Rasmussen &lt;a href="http://books.nips.cc/papers/files/nips14/AA01.pdf"&gt;introduced&lt;/a&gt; a first version of an infinite capacity model which they called the iHMM. Their proposal is a procedure based on a two level hierarchy of urns which generates a potentially infinite transition matrix. Although it is an extremely elegant proposal, it turned out to be pretty hard to come up with a correct &lt;em&gt;and&lt;/em&gt; efficient sampling scheme to learn this model. The model has three parameters: one to control the number of states, one to control the sparsity of the transition matrix and one to control the self transition probability.&lt;/p&gt; &lt;p&gt;Then, in 2006, Yee Whye Teh, Mike Jordan, Matt Beal and Dave Blei introduced the &lt;a href="http://www.gatsby.ucl.ac.uk/~ywteh/research/npbayes/hdp2004.pdf"&gt;Hierarchical Dirichlet Process&lt;/a&gt; (HDP). The HDP is a very natural prior for building hierarchies of clusters: in other words, cluster datapoints in different groups so that clusters are shared between groups. Although the HDP is technically a bit complex, it is theoretically well founded and in the 2006 paper a relatively efficient Gibbs sampling scheme was introduced. In the same 2006 paper, an infinite capacity hidden Markov model was built on top of the HDP. This model is very close to the original iHMM but only had two parameters: the self transition control was left out. This is unfortunate as for many applications of HMM's (changepoint detection) we know a priori that the Markov model is going to stay in the same state. Finally, this "version 2.0" of the iHMM still only used a Gibbs sampling scheme for inference. This is unfortunate as we have &lt;a href="http://www.jstor.org/pss/3085787"&gt;come to understand&lt;/a&gt; that Gibbs sampling can perform poorly in time series models.&lt;/p&gt; &lt;p&gt;Enter ICML 2008: the first iHMM related result was a paper by Emily Fox, Erik Sudderth, Michael Jordan, and Alan Willsky called "&lt;a href="An HDP-HMM for Systems with State Persistence"&gt;An HDP-HMM for Systems with State Persistence&lt;/a&gt;". In this paper, the "version 2.0" iHMM model was extended to include the self transition parameter again. As you can read in their paper, this makes a very big difference: combined with some other neat ideas they were able to present some impressive performance on speaker diarization. Next, was our (Jurgen Van Gael, Yunus Saatci, Yee Whye Teh, and Zoubin Ghahramani) paper called "&lt;a href="http://mlg.eng.cam.ac.uk/jurgen/pubs/icml2008ihmm.pdf"&gt;Beam Sampling for the Infinite Hidden Markov Model&lt;/a&gt;". The main idea in our work is to use a slice sampling scheme to adaptively truncate the infinite model to a finite model. This then allows us to run a dynamic program and resample the whole state sequence at once. Although we've only presented the beam sampler in the context of the iHMM, the technique is applicable to most nonparametric constructions I know of. One thing that I hope will happen is that someone will try to use the slice sampling technique to cut up nonparametric models into finite models, and use our best inference machinery (variational, BP, EP, ...) to perform inference in the finite model. This would allow us to leverage all the knowledge we've built up for finite graphical models and use them in the nonparametric setting.&lt;/p&gt; &lt;p&gt;Stepping back and taking a bird's eye view on things: I think that after the four iHMM papers, I think we now have a strong platform to start using the iHMM as a building block in more complicated models. At the NPBayes workshop we've already seen some &lt;a href="http://www.gatsby.ucl.ac.uk/~ywteh/npbayes2008/abstracts/Fox.pdf"&gt;cool results&lt;/a&gt; by Emily who uses the iHMM to switch between different auto-regressive processes and linear dynamic systems. Another trivial extension which we've implemented is to create an IO-iHMM: condition the iHMM on a certain input sequence. &lt;a href="http://mlg.eng.cam.ac.uk/finale/"&gt;Finale&lt;/a&gt; and I are currently experimenting with this setup for learning infinite POMDP's; more about that in a future post. I've been promised that more interesting extensions are in the making ... let's see what NIPS brings!&lt;/p&gt; &lt;p&gt;As I have to write up a first year report (Cambridge's idea of a thesis proposal), I have been thinking very hard about a short, self contained and easy explanation of the iHMM. A description of the model so everyone can use and implement it without having to work through the whole HDP formalism. I will definitely post the outcome on my webpage early September. At least one cool spin-off has come out of this report: based on a suggestion by &lt;a href="http://www.cs.toronto.edu/~radford/"&gt;Radford Neil&lt;/a&gt;, we've come up with a different dynamic programming based sampling scheme based on the &lt;a href="http://www.cs.toronto.edu/~radford/ftp/emb-hmm-nips.pdf"&gt;embedded HMM&lt;/a&gt;. As it is more incremental work, I will probably only describe it in my first year report and this blog so stay tuned ...&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-6250898441531501803?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/6250898441531501803/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=6250898441531501803' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/6250898441531501803'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/6250898441531501803'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2008/07/infinite-hidden-markov-model.html' title='The Infinite Hidden Markov Model'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-5940438125771481318</id><published>2008-07-26T18:48:00.001+01:00</published><updated>2008-07-26T18:48:27.989+01:00</updated><title type='text'>The Last Lecture</title><content type='html'>&lt;p&gt;I had never heard of Randy Pausch before until reading &lt;a href="http://googleresearch.blogspot.com/2008/07/remembering-randy-pausch.html"&gt;this post&lt;/a&gt;. His "&lt;a href="http://www.youtube.com/watch?v=ji5_MqicxSo"&gt;Last Lecture&lt;/a&gt;" is amazing ... this is a great man with an inspiring message!&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-5940438125771481318?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/5940438125771481318/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=5940438125771481318' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/5940438125771481318'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/5940438125771481318'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2008/07/last-lecture.html' title='The Last Lecture'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-6050926915459799467</id><published>2008-07-10T10:23:00.001+01:00</published><updated>2008-07-10T10:23:03.106+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Conference Reports'/><title type='text'>NPBayes workshop at ICML</title><content type='html'>&lt;p&gt;Yesterday, between ICML and UAI there was the &lt;a href="http://npbayes.wikidot.com/"&gt;nonparametric Bayes (NPBayes) workshop&lt;/a&gt; organized by &lt;a href="http://www.gatsby.ucl.ac.uk/~ywteh"&gt;Yee Whye Teh&lt;/a&gt;, &lt;a href="http://www.eecs.berkeley.edu/~thibaux"&gt;Romain Thibaux&lt;/a&gt;, &lt;a href="http://www.soe.ucsc.edu/~thanos/"&gt;Athanasios Kottas&lt;/a&gt;, &lt;a href="http://learning.eng.cam.ac.uk/zoubin"&gt;Zoubin Ghahramani&lt;/a&gt; &amp;amp; &lt;a href="http://www.eecs.berkeley.edu/~jordan"&gt;Michael Jordan&lt;/a&gt;. The &lt;a href="http://npbayes.wikidot.com/schedule"&gt;program was packed&lt;/a&gt; with a lot of very interesting talks which for your convenience were recorded and will be put online at videolectures.net soon.&lt;/p&gt; &lt;p&gt;Just after lunch, there was a panel discussion on software for NPBayes.&amp;nbsp; I thought much of the discussion also applied to graphical models in general. The main contenders for general software are (with main pros and cons):&lt;/p&gt; &lt;ul&gt; &lt;li&gt;the &lt;a href="http://www.cs.utah.edu/~hal/HBC/"&gt;Hierarchical Bayes Compiler&lt;/a&gt; by fellow &lt;a href="http://nlpers.blogspot.com/"&gt;blogger&lt;/a&gt; Hal Daume III. Pros: software is freely available, continuing development, quite a large feature set (including NPBayes components) and a group of active users to show that it actually works. Cons: the language itself is more limited than some of its contenders.&lt;/li&gt; &lt;li&gt;&lt;a href="http://www.mit.edu/~ndg/papers/churchUAI08_rev2.pdf"&gt;Church&lt;/a&gt; by Noah Goodman, Vikash Mansighka, Dan Roy, Keith Bonawitz &amp;amp; Josh Tenenbaum. Pros: very flexible language. Cons: no software available yet.&lt;/li&gt; &lt;li&gt;A proposal by Max Welling: Max and one of his students are working on a library for fast inference and are planning to add a graphical UI to design graphical models visually.&lt;/li&gt; &lt;li&gt;&lt;a href="http://research.microsoft.com/mlp/ml/Infer/Infer.htm"&gt;Infer.NET&lt;/a&gt; by Microsoft Research Cambridge. Pros: integrates with many programming languages through .NET, a great variety of inference algorithms. Cons: no free software available yet (a release is scheduled for later this year).&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;The workshop ended with a panel discussion on the future and prospects of NPBayes. Here are some of the questions and answers I can remember of the top of my head&lt;/p&gt; &lt;ul&gt; &lt;li&gt;David Sontag: we've seen many Markov chain algorithms and some mean field for NPBayes. What are the prospects of using different inference algorithms? More specifically, the marginal polytope has taught us that belief propagation, Kikuchi approximations, etc ... give us approximations that have a different flavour than mean field methods. Answers:&lt;/li&gt; &lt;ul&gt; &lt;li&gt;I think everyone agreed that there is a lot of room to explore these algorithms.&lt;/li&gt;&lt;/ul&gt; &lt;li&gt;Myself: how should we position infinite capacity models compared to finite capacity models: is the main motivation to be able to model uncertainty in the model capacity or are there other reasons to strongly favour NPBayes? Answers:&lt;/li&gt; &lt;ul&gt; &lt;li&gt;Eric Xing: NPBayes is also interesting because you are solving one problem while for model selection you have to solve many problems at once.&lt;/li&gt; &lt;li&gt;David Blei: NPBayes also makes it easier to define prior distributions over combinatorial objects.&lt;/li&gt;&lt;/ul&gt; &lt;li&gt;Vikash Mansighka: asked about the role of consistency of infinite capacity models: aka. how much should we care about exchangeability and such. A long discussion followed from which I just recall that the statisticians in the panel all agreed that this is crucial and one should really spend time proving these properties.&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Finally, the panel was asked how they see the future of NPBayes, some answers:&lt;/p&gt; &lt;ul&gt; &lt;li&gt;Zoubin Ghahramani: believes we will see applications of NPBayes on very large problems,&lt;/li&gt; &lt;li&gt;Yee Whye Teh: believes we will see more interesting use of optimization in NPBayes,&lt;/li&gt; &lt;li&gt;Lawrence Carin: thinks we will have priors adapted to our custom applications, not just use the few building blocks (DP, IBP, ...) we have now,&lt;/li&gt; &lt;li&gt;David Blei: believes we will see nonparametric distributions over more complicated combinatorial distributions.&lt;/li&gt;&lt;/ul&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-6050926915459799467?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/6050926915459799467/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=6050926915459799467' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/6050926915459799467'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/6050926915459799467'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2008/07/npbayes-workshop-at-icml.html' title='NPBayes workshop at ICML'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-3095286647186095417</id><published>2008-07-02T14:10:00.001+01:00</published><updated>2008-07-02T14:10:47.967+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Machine Learning'/><title type='text'>Testing Markov Chain Monte Carlo Methods</title><content type='html'>&lt;p&gt;The projects I was involved in this year all used Markov Chain Monte Carlo (MCMC) methods for inference. MCMC methods are nice because they are so widely applicable. However, I often find them much harder to debug than variational methods. When coding up a new MCMC algorithm I use the following "procedure" to debug my code:&lt;/p&gt; &lt;ol&gt; &lt;li&gt;Fix parameters in the model and generate a lot of data. Initialise the sampler with all hidden variables set to the values used to generate the data. Run the sampler and check that it doesn't move too much. It is crucial to generate a lot of data so there is little uncertainty in the parameters and there is little reason for the sampler to move around. This test is really to check that if the inference algorithm is working, it should recognize that it has found a solution.  &lt;li&gt;Fix parameters in the model and generate a lot of data. Initialise the sampler with random hidden variables. Run the sampler and check that it discovers the parameters you used to generate the data. Again, this would work if there is enough data so there is little uncertainty in the posterior distribution of the parameters. This test is to check the search abilities of the inference algorithm.  &lt;li&gt;Finally, I apply the model and algorithm to real data. At this point, I will probably dig up &lt;a href="http://books.google.com/books?id=TNYhnkXQSjAC"&gt;Bayesian Data Analysis&lt;/a&gt; and use more principled ways to assess convergence.&lt;/li&gt;&lt;/ol&gt; &lt;p&gt;In variational methods one can check that a lower bound increases between iterations; if it doesn't, something is wrong. Are there similar kind of checks for sampling algorithms? Do people have some good suggestions for debugging MCMC methods?&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-3095286647186095417?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/3095286647186095417/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=3095286647186095417' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/3095286647186095417'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/3095286647186095417'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2008/07/testing-markov-chain-monte-carlo.html' title='Testing Markov Chain Monte Carlo Methods'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-8545963081244314433</id><published>2008-06-30T20:44:00.001+01:00</published><updated>2008-06-30T20:44:48.858+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Machine Learning'/><title type='text'>Machine Learning for Bug Discovery</title><content type='html'>&lt;p&gt;&lt;em&gt;One of the reasons I have this blog is to talk about some of my own research. I intend to give some more background on things that haven't made it into a paper, put some more experiments or references online, answer questions people have about my work, ... I hope that by putting these discussions online, I can catch the attention of Google's indexer and this information is available for anyone who might ever work on similar things.&lt;/em&gt;&lt;/p&gt; &lt;p&gt;Today we (&lt;a href="http://mlg.eng.cam.ac.uk/finale/"&gt;Finale Doshi&lt;/a&gt; and I) want to talk about a small side project of ours. Early this year we ran into a pretty cool project at the University of Wisconsin - Madison called the &lt;a href="http://www.cs.wisc.edu/cbi/"&gt;Cooperative Bug Isolation&lt;/a&gt; (CBI) project. In a nutshell, this project aims to find software bugs by first instrumenting the source code with little probes. These probes count when certain events - such as the execution of a particular line of code - occur.&amp;nbsp; The CBI software collects these probe counts, and remembers whether the program exited successfully or whether the program crashed. For example: imagine we wrote the following program:&lt;/p&gt; &lt;table cellspacing="0" cellpadding="2" width="400" border="1"&gt; &lt;tbody&gt; &lt;tr&gt; &lt;td valign="top" width="400"&gt;function foo()&lt;br&gt;a = read_line()&lt;br&gt;if a &amp;lt; 10 then&lt;br&gt;&amp;nbsp; print "smaller than 10"&lt;br&gt;&amp;nbsp; a = 10 / a - a&lt;br&gt;else&lt;br&gt;&amp;nbsp; print "at least 10"&lt;br&gt;return a&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; &lt;p&gt;Although this program is completely trivial, it has a potential division by zero. The CBI software will take a piece of source code as above and add some probes: say the bold italic ones below:&lt;/p&gt; &lt;table cellspacing="0" cellpadding="2" width="400" border="1"&gt; &lt;tbody&gt; &lt;tr&gt; &lt;td valign="top" width="400"&gt;function foo()&lt;br&gt;a = read_line()&lt;br&gt;&lt;strong&gt;&lt;em&gt;log( a == 0, probe_1)&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;if a &amp;lt; 10 then&lt;br&gt;&amp;nbsp; print "smaller than 10"&lt;br&gt;&amp;nbsp; a = 10 / a - a&lt;br&gt;&amp;nbsp; &lt;strong&gt;&lt;em&gt;log( a == 0, probe_2)&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;else&lt;br&gt;&amp;nbsp; print "at least 10"&lt;br&gt;&lt;strong&gt;&lt;em&gt;log( a == 0, probe_3)&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;return a&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; &lt;p&gt;Note that there is a whole science to what probes are useful and where to put them, but the main point is that each probe is a counter, which gets augmented every time its first argument is true. As we mentioned above, when a program executes,various probes get augmented and at the end of the execution, the counts are stored in a file in addition to a flag that says whether the program exited gracefully or whether it crashed. Let's suppose that &lt;em&gt;foo()&lt;/em&gt; is called several times in a larger program. After running that program a couple of times, we might get data that looks like this:&lt;/p&gt; &lt;table cellspacing="0" cellpadding="2" width="400" border="1"&gt; &lt;tbody&gt; &lt;tr&gt; &lt;td valign="top" width="80"&gt;Run&lt;/td&gt; &lt;td valign="top" width="80"&gt;Probe 1&lt;/td&gt; &lt;td valign="top" width="80"&gt;Probe 2&lt;/td&gt; &lt;td valign="top" width="80"&gt;Probe 3&lt;/td&gt; &lt;td valign="top" width="80"&gt;Exit Flag&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt; &lt;td valign="top" width="80"&gt;1&lt;/td&gt; &lt;td valign="top" width="80"&gt;0&lt;/td&gt; &lt;td valign="top" width="80"&gt;1&lt;/td&gt; &lt;td valign="top" width="80"&gt;2&lt;/td&gt; &lt;td valign="top" width="80"&gt;Good&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt; &lt;td valign="top" width="80"&gt;2&lt;/td&gt; &lt;td valign="top" width="80"&gt;1&lt;/td&gt; &lt;td valign="top" width="80"&gt;2&lt;/td&gt; &lt;td valign="top" width="80"&gt;2&lt;/td&gt; &lt;td valign="top" width="80"&gt;Fail&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt; &lt;td valign="top" width="80"&gt;3&lt;/td&gt; &lt;td valign="top" width="80"&gt;0&lt;/td&gt; &lt;td valign="top" width="80"&gt;5&lt;/td&gt; &lt;td valign="top" width="80"&gt;1&lt;/td&gt; &lt;td valign="top" width="80"&gt;Good&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; &lt;p&gt;The CBI project has collected thousands of traces for different open source projects. The goal now is to mine these huge datasets to discover bugs in the source code. Although there are many ways of going about it, one way is to make the following assumption: if we run a program, we are executing a sequence of complex but fairly atomic "usage patterns". For example, imagine we are executing a web browser: when we press the homepage button a series of functions gets executed (and the corresponding counters are augmented). However, when we press the back button a different (but maybe overlapping) sequence of functions will be executed. We call these sequences "usage patterns".  &lt;p&gt;In &lt;a href="http://pages.cs.wisc.edu/~jerryzhu/pub/rlda.pdf"&gt;this paper&lt;/a&gt;, some Programming Language&amp;nbsp; and Machine Learning people in Madison (&lt;a href="http://pages.cs.wisc.edu/~andrzeje/"&gt;David Andrzejewski&lt;/a&gt;, &lt;a href="http://pages.cs.wisc.edu/~mulhern/"&gt;Anne Mulhern&lt;/a&gt;, &lt;a href="http://pages.cs.wisc.edu/~liblit/"&gt;Ben Liblit&lt;/a&gt;, and &lt;a href="http://pages.cs.wisc.edu/~jerryzhu/"&gt;Jerry Zhu&lt;/a&gt;) developed a statistical model (called DeltaLDA) based on Latent Dirichlet Allocation to learn two sets of usage patterns at the same time: one set of "good" usage patterns which both successful and failed runs use, and an extra set of "bad" usage patterns which only occur in failed runs. After learning the DeltaLDA model, one can go back and look at the bad usage patterns and see which counters are expressed. Since the counters in bad usage patters correlate with programs crashing, these could then be indicative of actual bugs in the source code. It turns out that it is a reasonable idea; read the DeltaLDA paper for details on how it performs in practice. &lt;p&gt;When we decided to work with this dataset, one obvious extension of the DeltaLDA model was to try and make the whole thing non-parametric. In the original DeltaLDA paper, one has to specify the number of usage patterns beforehand. We think this is unrealistic and allowing the number of usage patterns to adapt according to the data seems to be the right way to go. So we developed a model called the DeltaHDP which is exactly this: a non-parametric extension of DeltaLDA using the Hierarchical Dirichlet Process framework. On some of the smaller datasets which we tried, we found that the DeltaHDP behaved exactly as one would expect: it finds similar usage patterns as DeltaLDA but you don't have to specify the number of usage patterns beforehand. One other nice thing about the DeltaHDP is that the Gibbs sampler, which is fairly straightforward to implement, performs well.  &lt;p&gt;Another unrealistic artifact of both the LDA/HDP type models is that the corresponding generative model does not correspond to the usage pattern story above. LDA/HDP define each run of a program to be a mixture of usage patterns while we really want each run to be an integer combination of usage patterns. The reason is that we believe that each time a usage pattern is executed, it augments the counter more or less deterministically - counters do not randomly fail to increment. In the end, we came up with a model based on the Indian Buffet Process. Check out our &lt;a href="http://mlg.eng.cam.ac.uk/jurgen/pubs/crism2008bug.pdf"&gt;poster&lt;/a&gt; for more information on the model. It turned out the inference was a little nightmare and we never found a good sampling mechanism which could scale to very large datasets. However, the results on smaller datasets suggested that the model was OK and we got similar results as the DeltaHDP. In addition, the IBP returns more information: it can tell you exactly how many times specific usage patterns were executed whereas the DeltaHDP only returns the ratio that each usage pattern was executed.  &lt;p&gt;In the end it was a fun project. We enjoyed working on this data, had a lot of fun coming up with the models and we learned a lot about non-parametric methods. Unfortunately, both Finale and I have different priorities and we think the cooperative bug isolation project deserves 100% of one's attention to make progress on using statistical models for bug isolation. However, we think &lt;a href="http://www.cs.wisc.edu/cbi/"&gt;CBI&lt;/a&gt; is a great project, and we will be more than happy to give some more background on our models and answer questions about our experiences.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-8545963081244314433?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/8545963081244314433/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=8545963081244314433' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/8545963081244314433'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/8545963081244314433'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2008/06/machine-learning-for-bug-discovery.html' title='Machine Learning for Bug Discovery'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-804240476972349041</id><published>2008-06-08T09:38:00.001+01:00</published><updated>2008-06-08T09:38:27.315+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Artificial Intelligence'/><title type='text'>The Singularity</title><content type='html'>&lt;p&gt;Although I consider myself a machine learning researcher, I have to say I got all excited about this field because of its relation with Artificial Intelligence. (I consider machine learning to be more about solving "small" practical problems like information retrieval, machine translation, biological data analysis ... whilst AI is the grand search for making machines actually understand the world, I consider AI more about giving machines a deeper semantic understanding of the world.&lt;/p&gt; &lt;p&gt;This post is about a group of enthusiasts called singularitarians whom on one hand I admire but on the other hand approach with skepticism. &lt;a href="http://en.wikipedia.org/wiki/Singularitarianism"&gt;Singularitarians&lt;/a&gt; are people who believe that (soon?) there will be technological breakthrough's enabling the creation of smarter-than-human intelligence. The point in time at which this will occur is called the singularity. What I think is great about this movement is that there are &lt;a href="http://en.wikipedia.org/wiki/Ben_Goertzel"&gt;singularitarians&lt;/a&gt; who try design architectures that make &lt;a href="http://en.wikipedia.org/wiki/Strong_AI"&gt;strong AI&lt;/a&gt; possible. This is absolutely fantastic and it is sad that there are so little opportunities in the academic world for this kind of blue sky research. The reason I am skeptical at points is that a lot of singularity talk ignores the software side of things: there is much ado about peripheral issues and little research that gives us insight into how a conscious machine should work. E.g., Ray Kurzweil (a brilliant thinker by the way!) wrote &lt;a href="http://singularity.com/"&gt;a book&lt;/a&gt; about how available computing power will soon match the brain's computational power but hardly gives any insight on what to do with so much hardware.&lt;/p&gt; &lt;p&gt;IEEE Spectrum has a &lt;a href="http://www.spectrum.ieee.org/singularity"&gt;special issue&lt;/a&gt; on the singularity with articles by both singularitarians and skeptics. A highly enjoyable read!&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-804240476972349041?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/804240476972349041/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=804240476972349041' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/804240476972349041'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/804240476972349041'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2008/06/singularity.html' title='The Singularity'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-7064183371201408247</id><published>2008-05-28T21:32:00.001+01:00</published><updated>2008-05-28T21:32:24.456+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cognitive Science'/><title type='text'>Gold's Theorem</title><content type='html'>&lt;p&gt;After seeing &lt;a href="http://videolectures.net/icml07_tenenbaum_bmhi/"&gt;this&lt;/a&gt; &lt;em&gt;amazing&lt;/em&gt; talk by Josh Tenenbaum on videolectures.net, I started reading up on some very cool stuff at the intersection of machine learning and cognitive science. This brought me to read on Gold's theorem and the &lt;a href="http://en.wikipedia.org/wiki/Poverty_of_the_stimulus"&gt;poverty of the stimulus&lt;/a&gt;. Very roughly, Gold's theorem says that any learner (be it a child or a computer) cannot "learn" a language by only acquiring sentences from the language she has to learn. Some people use this theorem to make the following argument: a toddler will only hear sentences from the language she is learning, she never gets to hear "wrong" (as in not in the language) sentences. Hence, since by Gold's theorem this toddler cannot learn the language, it must be innate: language abilities must be wired into our brains in some way. &lt;a href="http://www.cog.jhu.edu/courses/680/papers/Johnson.GoldsTheorem.pdf"&gt;Gold's Theorem and Cognitive Science&lt;/a&gt;, by Kent Johnson is a very enjoyable read for more background on Gold's theorem and how it applies to the question of language acquisition.&lt;/p&gt; &lt;p&gt;Johnson's paper mentions something that I had never thought about: according to &lt;a href="http://www.google.co.uk/search?hl=en&amp;amp;lr=&amp;amp;q=%22Morgan%22+%22Learnability+Considerations+*+*+Nature%22"&gt;Morgan&lt;/a&gt;, a child acquires language after hearing about 4 million sentences. Now think &lt;a href="http://trec.nist.gov/data/reuters/reuters.html"&gt;about&lt;/a&gt; &lt;a href="http://www.inf.ed.ac.uk/resources/corpora/"&gt;how&lt;/a&gt; &lt;a href="http://www.ldc.upenn.edu/Catalog/index.jsp"&gt;many&lt;/a&gt; sentences we have access to to train our NLP algorithms on. This is orders of magnitude more than a person ever gets to hear and yet I would say we are far from building a computer system that can manipulate language as accurate as humans. From a Bayesian perspective, this could translate into assuming children having a really good prior which they start from when learning language. If the Bayesian way is the right way to look at this question, I really wonder how humans acquire this prior: how much is wired up in our brains, how much is it influenced by our sensory system, ...?&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-7064183371201408247?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/7064183371201408247/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=7064183371201408247' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/7064183371201408247'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/7064183371201408247'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2008/05/gold-theorem.html' title='Gold&amp;#39;s Theorem'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-2533668980590912924</id><published>2008-05-06T14:38:00.002+01:00</published><updated>2008-05-06T14:41:36.981+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Machine Learning'/><category scheme='http://www.blogger.com/atom/ns#' term='iHMM'/><title type='text'>ICML &amp; UAI 2008 Accepted Papers</title><content type='html'>The accepted papers for &lt;a href="http://icml2008.cs.helsinki.fi/accepted_papers.shtml"&gt;ICML 2008&lt;/a&gt; and &lt;a href="http://uai2008.cs.helsinki.fi/programme.shtml"&gt;UAI 2008&lt;/a&gt; are online!&lt;br /&gt;&lt;br /&gt;As one of my own papers got accepted, I am preparing a post with some more background information and maybe some extra plots and pictures.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-2533668980590912924?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/2533668980590912924/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=2533668980590912924' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/2533668980590912924'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/2533668980590912924'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2008/05/icml-uai-2008-accepted-papers.html' title='ICML &amp; UAI 2008 Accepted Papers'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-2116698604412491639</id><published>2008-03-06T18:23:00.001Z</published><updated>2008-03-06T18:25:24.764Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Machine Learning'/><title type='text'>Machine Learning lecture series on videolecture.net</title><content type='html'>Our machine learning group organizes a series called the &lt;span style="font-weight: bold; font-style: italic;"&gt;Advanced Tutorial Lecture Series on Machine Learning&lt;/span&gt; where we get to see some amazing researchers talk about recent developments in Machine Learning. &lt;a href="http://mlg.eng.cam.ac.uk/karsten"&gt;Karsten&lt;/a&gt;, a postdoc in our group has been so kind to film all lectures and slowly but surely these videos are finding their way onto &lt;a href="http://videolectures.net/mlcued08_cambridge/"&gt;videolectures.net&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Currently on videolectures.net are videos on:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Applications of Group Theory to Machine Learning by &lt;a href="http://www.gatsby.ucl.ac.uk/%7Erisi/"&gt;Risi Kondor&lt;/a&gt;; this math heavy lecture was cool in that Risi explained how tools from abstract algebra and harmonic analysis can be used in Machine Learning too,&lt;/li&gt;&lt;li&gt;Spectral Clustering by &lt;a href="http://www.gatsby.ucl.ac.uk/%7Eazran"&gt;Arik Azran&lt;/a&gt;; our local spectral methods expert shared his insight into how spectral methods work.&lt;/li&gt;&lt;/ul&gt;Coming up soon is a lecture by &lt;a href="http://www.inference.phy.cam.ac.uk/mackay/"&gt;David MacKay&lt;/a&gt; on error correcting codes and how they can be decoded using local message passing algorithms. David is a very good speaker and his lecture is certainly in my top 5 of excellent machine learning talks.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-2116698604412491639?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/2116698604412491639/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=2116698604412491639' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/2116698604412491639'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/2116698604412491639'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2008/03/machine-learning-lecture-series-on.html' title='Machine Learning lecture series on videolecture.net'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-5387068115574715460</id><published>2008-03-03T14:53:00.005Z</published><updated>2008-03-03T15:21:16.355Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><title type='text'>If pigs can whistle, then horses can fly ... or ... De Finetti's Theorem</title><content type='html'>We talked about the concept of exchangeability &lt;a href="http://undirectedgrad.blogspot.com/2007/12/what-is-exchangeability.html"&gt;a few weeks ago&lt;/a&gt;. I presented three urn models, two of which defined exchangeable probability distributions. The main idea I wanted to convey using the first and second urn models was that exchangeability is not just independence: urn model 2 does not &lt;span style="font-style: italic;"&gt;look&lt;/span&gt; exchangeable at first sight but some simple algebra convinced us that it is.&lt;br /&gt;&lt;br /&gt;Today I want to continue this discussion with a theorem that is strongly related to exchangeability called &lt;span style="font-style: italic;"&gt;De Finetti's theorem&lt;/span&gt;. The theorem goes as follows: say we have an infinite sequence of binary random variables (say the color of the balls in our urn model) and this sequence is exchangeably, then there exists a probability distribution for &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Ctheta" align="middle" border="0" /&gt; such that&lt;br /&gt;&lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?p%28x_1,%5Ccdots,x_n%29%20=%20%5Cint_%7B%5CTheta%7D%20%5Cprod_%7Bi=1%7D%5En%20p%28x_i%7C%5Ctheta%29%20p%28%5Ctheta%29%20d%5Ctheta" align="middle" border="0" /&gt;.&lt;br /&gt;In other words, if our sequence of random variables is exchangeable, than it really is a sample from a mixture model with mixture distribution &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?p%28%5Ctheta%29" align="middle" border="0" /&gt;.&lt;br /&gt;&lt;br /&gt;This kind of theorem reminds me of a quote I once heard in a complexity theory talk: if pigs can whistle, then horses can fly. What do I mean by this? Assuming that a set of random variables is exchangeably doesn't sounds like a very big assumption: e.g. if you plan to cluster something like the MNIST digit recognition dataset, there is nothing a priori that distinguishes one image from another; aka. you could assume they are exchangeable. However, once you make the exchangeability assumption, De Finetti's theoremy suddenly says that really your datapoints are part of a hierarchicaly Bayesian model. In other words: by making the relatively innocent exchangeability assumption, De Finetti's theorem imposes the graphical model structure upon you. A weak assumption implies a rather strong consequence.&lt;br /&gt;&lt;br /&gt;There is a catch however: the theorem does not specify what kind of random variable &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Ctheta" align="middle" border="0" /&gt; is, nor what its distribution function is. As a matter of fact, &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Ctheta" align="middle" border="0" /&gt; could be an infinitely large random variable (which would lead us into the realm of nonparametric Bayesian models such as the Dirichlet Process, but that's for another time). For simple models such as urn model 2 we can compute the distribution though: &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Ctheta" align="middle" border="0" /&gt; turns out to be a Beta distributed random variable.&lt;br /&gt;&lt;br /&gt;All in all, De Finetti's theorem is an interesting theorem but it is not clear to me of how much practical value it is. There are many extensions and modifications of the theorem; for more information check&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://www.uv.es/%7Ebernardo/Exchangeability.pdf"&gt;The Concept of Exchangeability - Jose Bernardo&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.stat.berkeley.edu/%7Ealdous/Papers/me22.pdf"&gt;Exchangeability and Related Topics - David Aldous&lt;/a&gt;&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-5387068115574715460?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/5387068115574715460/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=5387068115574715460' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/5387068115574715460'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/5387068115574715460'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2008/03/if-pigs-can-whistle-then-horses-can-fly.html' title='If pigs can whistle, then horses can fly ... or ... De Finetti&apos;s Theorem'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-697307738505208039</id><published>2008-01-27T11:41:00.000Z</published><updated>2008-01-27T11:46:23.833Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Machine Learning'/><title type='text'>Machine Learning Summer School 2008</title><content type='html'>Perfect: the website for the &lt;a href="http://mlss08.futurs.inria.fr/"&gt;next machine learning summer school&lt;/a&gt; is up and running. It will be organized on Porquerolles Island, right of the coast of France. The confirmed speaker list is still a work in progress but looks promising already:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Shai Ben-David (University of Waterloo) &lt;i&gt;"Theoretical Foundations of Clustering"&lt;/i&gt;&lt;/li&gt;&lt;li&gt;Stephane Canu (INSA de Rouen) &lt;i&gt;"Introduction to Kernel Methods"&lt;/i&gt;&lt;/li&gt;&lt;li&gt;Manuel Davy (INRIA) &lt;i&gt;"Parametric and Non-parametric Bayesian Learning"&lt;/i&gt;&lt;/li&gt;&lt;li&gt;Pierre Del Moral (Universite de Bordeaux) &lt;i&gt;"On the Foundations and the Applications of i-MCMC Methods"&lt;/i&gt;&lt;/li&gt;&lt;li&gt;Isabelle Guyon (ClopiNet)&lt;/li&gt;&lt;li&gt;Yann LeCun (New York university) &lt;i&gt;"Supervised and Unsupervised Learning with Energy-Based Models"&lt;/i&gt;&lt;/li&gt;&lt;li&gt;Rich Sutton (University of Alberta) &lt;i&gt;"Reinforcement Learning and Knowledge Representation"&lt;/i&gt;&lt;/li&gt;&lt;li&gt;Patrick Wolfe (Harvard University) &lt;i&gt;"Overcomplete Representations with Incomplete Data: Learning Bayesian Models for Sparsity"&lt;/i&gt;&lt;/li&gt;&lt;/ul&gt;I'll definitely try to apply and hope for the best!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-697307738505208039?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/697307738505208039/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=697307738505208039' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/697307738505208039'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/697307738505208039'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2008/01/machine-learning-summer-school-2008.html' title='Machine Learning Summer School 2008'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-6587371957950147262</id><published>2008-01-19T22:35:00.000Z</published><updated>2008-01-19T07:34:57.534Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Large Scale ML'/><title type='text'>The Elastic Compute Cloud (Amazon EC2)</title><content type='html'>&lt;p&gt;Today I ran into this interesting offering from Amazon called the elastic compute cloud. The idea is that you create an image that consists of your application, libraries and data and load it up to Amazon. They will run the application for you and charge you for the computing resources that you've consumed. I was a little suprised to see how cheap it actually is: 0.10$ for every hour that you run a process that consumes not more than 1.7 GB of memory, 160 GB storage; an extra 0.10$ per gigabyte you transfer into their service and an extra 0.18$ per gigabyte you transfer out of their service.&lt;/p&gt; &lt;p&gt;I think this is going to be a very useful platform for machine learning research. EC2 means that large scale&lt;a href="http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873&amp;amp;categoryID=100" mce_href="http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873&amp;amp;categoryID=100"&gt; map-reduce&lt;/a&gt; isn't just for Google anymore. Although I haven't tried it myself, creating the EC2 images should be straightforward as it is based on &lt;a href="http://www.cl.cam.ac.uk/research/srg/netos/xen/" mce_href="http://www.cl.cam.ac.uk/research/srg/netos/xen/"&gt;Xen virtualization&lt;/a&gt;; the virtualization platform out of our very own Cambridge computer lab.&lt;br /&gt;I wonder why anyone would want to spend time and money on maintaining their own cluster when there is this cheap alternative available? As soon as I get the chance, I will run an experiment on Amazon EC2 and report on the experience.&lt;/p&gt;&lt;p&gt;PS &lt;a href="http://www.datawrangling.com/"&gt;Data Wrangling&lt;/a&gt; has some &lt;a href="http://www.datawrangling.com/tags/amazon-ec2/"&gt;good posts&lt;/a&gt; on how to use Amazon EC2.&lt;br /&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-6587371957950147262?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/6587371957950147262/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=6587371957950147262' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/6587371957950147262'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/6587371957950147262'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2007/01/elastic-compute-cloud-amazon-ec2.html' title='The Elastic Compute Cloud (Amazon EC2)'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-5953219550989103971</id><published>2008-01-13T10:19:00.000Z</published><updated>2008-01-13T15:28:35.957Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Machine Learning'/><title type='text'>Silicon Minds</title><content type='html'>The guys at Microsoft Research &lt;a href="http://blogs.technet.com/apg/archive/2007/12/23/silicon-minds.aspx"&gt;announced &lt;/a&gt;a very exciting competition: &lt;a href="http://www.dreambuildplay.com/"&gt;the silicon minds challenge&lt;/a&gt;. The goal of the competition is to foster novel ideas in the area of game AI.&lt;br /&gt;&lt;br /&gt;Many years ago I wrote a computer game called De Profundis where I was in charge (among other things) of the game AI. Moving on to become an AI researcher it is interesting to reminisce and draw some connections.&lt;br /&gt;&lt;br /&gt;On one hand, the game AI field is the perfect arena to try out new ideas for AI researchers. For AI researchers working on agents, planning and human interaction (speech, NLP) I could imagine it would be extremely valueable to interact with MMORPG's (Massive Multiplayer Online Role Playing Games). I don't know whether anyone in the research community has ever done this before, but having an unlimited source of humans to interact with seems like quite the experimental setup. This also applies to virtual worlds like second life ofcourse. AI Research has contributed to the game AI field ofcourse, so let me highlight two recent projects:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;a href="http://www.cs.ualberta.ca/%7Egames/"&gt;The university of Alberta games group&lt;/a&gt;: these guys do some amazing work on several kinds of games. As far as I understand it, most of their efforst are focussed on games where the mathematics are in some sense understood: chess, poker, ... What I mean by the mathematics are understood is that with infinite computational capabilities, we would be able to solve these games. The U of A group also do some work on AI for real time strategy games (e.g. Age of Empires). A mathematical analysis of these games is much harder (if possible at all). The AI necessary for these games is much closer to what I would think of as strong AI.&lt;/li&gt;&lt;li&gt;&lt;a href="http://research.microsoft.com/mlp/apg"&gt;The Applied Games Group at Microsoft research&lt;/a&gt;: the organizers of the silicon minds challenge have developed a few innovations for game AI themselves. Their machine learning approach to inferring gamer skills (know as &lt;a href="http://research.microsoft.com/mlp/apg/trueskill.aspx"&gt;TrueSkill&lt;/a&gt;) is used by the XBox Live service. They have also enhanced &lt;a href="http://research.microsoft.com/mlp/forza/"&gt;Forza Motorsport&lt;/a&gt; with a reinforcement learning agent that learns to drive from observing human drivers.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;Unfortunately, the game AI field has very special requirements that prohibit the use of many innovations from the research community. First and foremost, game AI are supposed to make games more fun. More sophisticated agents do not necessarily mean more fun: one can spend a large amount of time making opponents (in first person shooters or racing games) smarter, but if that means the player always looses, he or she might not enjoy the game that much. Also, games are big business, and game engineers want to understand the behavior of their agents. It is unacceptable to release an agent out in the open which in the middle of a battle starts to act weird. Hence, game engineers often limit the intelligence of agents to (pre-historic ?!?) methods such as rule based systems and (heuristic) search because they can understand the behavior and debug it more easily. (It would be unfair to forget to give credit to the people that have applied reinforcement learning and neural networks to games; afaik mostly in the areas of racing games.) To get a rough idea about what is hot-or-not in the game AI field, take a look at &lt;a href="http://www.aiwisdom.com/"&gt;AI Wisdom&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;One could say, who cares what technique to use: rules and search work incredibly well! Very true. In my humble opinion, the AI/machine learning community has sometimes over-focussed on new algorithms and models and too little on building intelligent solutions. Although in fields like robotics, biology and vision, our machine learning tools have had a huge impact, I think there are many fields where the AI community does not have a good understanding on how to integrate all our tools to make a large working system. Hence, silicon minds looks like a promising challenge and I am very excited to see what people come up with.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-5953219550989103971?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/5953219550989103971/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=5953219550989103971' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/5953219550989103971'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/5953219550989103971'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2008/01/silicon-minds.html' title='Silicon Minds'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-7659829193307113248</id><published>2008-01-07T19:23:00.000Z</published><updated>2008-01-07T19:34:49.929Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Machine Learning'/><title type='text'>UCI Website Revamped</title><content type='html'>First of all, happy new 2008 to all readers!&lt;br /&gt;&lt;br /&gt;UCI hosts a famous collection of datasets at the &lt;a href="http://archive.ics.uci.edu/ml/"&gt;UCI Machine Learning Repository&lt;/a&gt;. Recently, they have completely updated their webpage and are starting to offer new datasets. This is a great service to the machine learning community, but I would like to see us take this one step further: we should match this repository of datasets with a repository of algorithms. This would not only allow us to compare algorithms, but also get a lot of intuition about the nature of the datasets: a well-understood algorithm that does great on - say - digit classification but performs really poorly on the Wisconsin breast cancer dataset teaches us something about the nature of the data. A &lt;a href="http://www.jmlr.org/papers/volume8/sonnenburg07a/sonnenburg07a.pdf"&gt;recent JMLR paper*&lt;/a&gt; calling for more open source machine learning software, mentions a project at the university of Toronto called &lt;a href="http://www.cs.toronto.edu/%7Edelve/"&gt;Delve&lt;/a&gt; that meant to do exactly this. Unfortunately, the project seems to be dead as of 2003 or so.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;* &lt;a href="http://jmlr.csail.mit.edu/papers/volume8/sonnenburg07a/sonnenburg07a.pdf"&gt;The Need for Open Source Software in Machine Learning&lt;/a&gt; - Sören Sonnenburg, Mikio L. Braun, Cheng Soon Ong, Samy Bengio, Leon Bottou, Geoffrey Holmes, Yann LeCun, Klaus-Robert Müller, Fernando Pereira, Carl Edward Rasmussen, Gunnar Rätsch, Bernhard Schölkopf, Alexander Smola, Pascal Vincent, Jason Weston, Robert Williamson; Journal of Machine Learning Research; 8(Oct):2443--2466, 2007.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-7659829193307113248?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/7659829193307113248/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=7659829193307113248' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/7659829193307113248'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/7659829193307113248'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2008/01/uci-website-revamped.html' title='UCI Website Revamped'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-2297851791510066359</id><published>2007-12-22T22:49:00.000Z</published><updated>2007-12-22T14:40:29.667Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Statistics'/><title type='text'>What is ... Exchangeability?</title><content type='html'>Talking about exchangeability, a friend once commented that exchangeability is "too simple too understand". On one hand, it is true that the statement of exchangeability (see below) sounds somewhat trivial, I found that I had absolutely no intuition as to why it is important for machine learning. So after some reading, I present my take on the concept of exchangeability.&lt;br /&gt;&lt;br /&gt;&lt;h4&gt;What is Exchangeability?&lt;/h4&gt;&lt;span style="font-weight: bold;"&gt;Scenario 1&lt;/span&gt;. Imagine we have an urn with r red balls and b blue balls. We draw 3 balls from the urn as follows: we pick a random ball, write down its color and put it back in the urn before drawing a new ball. We introduce 3 random variables: A, B, C which denote the color of the first, second and third ball. It is not hard to see that p(A=r, B=b, C=b) = p(A=b, B=r, C=b); in other words, we can exchange the values of the random variables without changing the joint probability. Intuitively, the reason we can exchange the observations is that our random variables are IID (independent and identically distributed).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Scenario 2&lt;/span&gt;. We again pick 3 balls from an urn with r red and b blue balls. We still pick a random ball and note its color, but we put two balls of that color back in the urn. It may not be obvious that the sequence A=r, B=b, C = b has the same probability as the sequence A=b, B=b, C=r since the individual probabilities of picking the red ball first or last are completely different: r/[r+b] when it is the first ball versus r/[r+b+2] when it is the last ball (since two blue balls were added in the mean time). Writing down the equations makes it clear that the two sequence &lt;span style="font-style: italic;"&gt;are &lt;/span&gt;equi-probable&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?p%28A=r,%20B=b,%20C=b%29%20=%20%5Cfrac%7Br%7D%7Br+b%7D%5Cfrac%7Bb%7D%7Br+b+1%7D%5Cfrac%7Bb+1%7D%7Br+b+2%7D%20=%20%5Cfrac%7Bb%7D%7Br+b%7D%5Cfrac%7Bb+1%7D%7Br+b+1%7D%5Cfrac%7Br%7D%7Br+b+2%7D%20=%20p%28A=b,%20B=b,%20C=r%29" align="middle" border="0" /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;It is trivial to generalize this expression to longer sequences. Again, it doesn't matter in what order we pick the balls, the only thing that matter is how many red and how many blue balls we pick. This is reflected in the formula in the sense that denominator of the probability of a sequence only depends on how long the sequence is. The nominator part only needs to know how many balls of each color there are. In our example: it only needs to know that there is a first and second blue ball (contributing b * (b+1) to the nominator) and a first red ball (contributing a).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Scenario 3&lt;/span&gt;. Both examples above were exchangeable since reordering the values of the random variables didn't change the probability. Let us consider a similar setup where exchangeability does not apply anymore. We again use the urn scheme with r red balls and b blue balls. However, now when we pick a red ball we note its color and simply put it back, but when we pick a blue ball we note its color and put two back. It is easy to see that we cannot exchange the value of the random variables anymore since&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;/div&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?p%28A=r,%20B=b,%20C=b%29%20=%20%5Cfrac%7Br%7D%7Br+b%7D%5Cfrac%7Bb%7D%7Br+b%7D%5Cfrac%7Bb+1%7D%7Br+b+1%7D" align="middle" border="0" /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;while&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;/div&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?p%28A=b,%20B=b,%20C=r%29%20=%20%5Cfrac%7Bb%7D%7Br+b%7D%5Cfrac%7Bb+1%7D%7Br+b+1%7D%5Cfrac%7Br%7D%7Br+b+2%7D" align="middle" border="0" /&gt;.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;I think the following definition of exchangeability now becomes much more intuitive; we say a set of n random variables Y is exchangeable under a distribution p iff for any permutation pi of the integers 1..n&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?p%28Y_1=y_1,%20%5Ccdots,%20Y_n=y_n%29%20=%20p%28Y_1=y_%7B%5Cpi%281%29%7D,%20%5Ccdots,%20Y_n=y_%7B%5Cpi%28n%29%7D%29" align="middle" border="0" /&gt;.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;h4 style="text-align: left;"&gt;Properties of Exchangeability&lt;/h4&gt;&lt;div style="text-align: left;"&gt;Let us now briefly discuss some consequences of exchangeability as it will allow us to see why it is such an important concept. First, we compute the marginal probability of the second draw p(B = r) under the different scenarios. Under scenario 1 this is trivial, just before the second draw the content of our urn is exactly as it was when we started: hence p(B=r) = r/(r+b). Under scenario 2, after some simple algebra we find that p(B=r) = p(A=b, B=r) + p(A=r, B=r) = r/(r+b). Now here is the exciting part: we shouldn't have done all the algebra; if we are convinced that the random variables are exchangeable under the distribution of scenario 2, we could have acted as if we were computing the marginal probability for the first draw. Formally, since p(A=b,B=r) = p(A=r, B=b) and substituting this in the expression for p(B=r), we could have marginalized out the B=b part. This property - changing the order around - is incredibly useful when computing probabilities.&lt;br /&gt;&lt;br /&gt;More abstractly, here is one way to think of exchangeable sequences. In scenario 2, if a friend just drew a ball from the urn, didn't show it to us and put one extra ball back in the urn, this is not going to make a difference as to the probability of our next draw. However, in scenario 3 above, whether someone drew a ball before us is very important: it drastically changes the probabilities for our next draw. I think this is a very important distinction that sets exchangeable and non-exchangeable distributions appart.&lt;br /&gt;&lt;br /&gt;Although exchangeability and IID variables look very similar they are not exactly the same. From scenario one above, it is easy to see that IID random variables are exchangeable. The converse is not true: in scenario 2, p(A=b, B=r) is not equal to p(A=b) p(B=r) and thus the random variables are not independently distributed.&lt;br /&gt;&lt;br /&gt;&lt;h4 style="text-align: left;"&gt;Exchangeability and Machine Learning&lt;/h4&gt;Exchangeable distributions are very common in machine learning. The most famous modelling assumption for text processing is just exchangeability: the bag of words model. This modelling assumption states that the probability of a text document depends only on word counts and not on word order. This is exactly the same model as scenario 1 above except that instead of red and blue balls, we now have words from a fixed vocabulary. Is this a realistic assumption one may ask? It certainly is not! We don't expect natural language to be exchangeable: the probability of using the word "States" should certainly be dependent on the word in front of it (a.k.a. higher if that word is "United"). But who cares, the bag of words assumption works incredibly well ...&lt;br /&gt;&lt;br /&gt;There are many other exchangeable distributions in common use for machine learning: the Dirichlet Process and its Chinese Restaurant Process equivalent are exchangeable distributions, the Indian Buffet Process is an exchangeable distribution on binary matrices. Non-exchangeable distribution are also common: many Markov models (e.g. Hidden Markov Models) aren't exchangeable.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I hope this little overview of exchangeability was useful. I left one improtant concept out of our discussion so far: De Finetti's theorem. This is a very important theorem that applies to exchangeable sequences and I will discuss the theorem in a future post.&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-2297851791510066359?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/2297851791510066359/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=2297851791510066359' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/2297851791510066359'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/2297851791510066359'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2007/12/what-is-exchangeability.html' title='What is ... Exchangeability?'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-1627278598984369313</id><published>2007-12-17T21:22:00.000Z</published><updated>2007-12-22T12:14:32.638Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Machine Learning'/><title type='text'>Machine Learning News</title><content type='html'>From &lt;a href="http://hunch.net/?p=306"&gt;Machine Learning (Theory)&lt;/a&gt;; a new machine learning mailing list has been created &lt;a href="http://groups.google.com/group/ML-news"&gt;here.&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-1627278598984369313?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/1627278598984369313/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=1627278598984369313' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/1627278598984369313'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/1627278598984369313'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2007/12/machine-learning-news.html' title='Machine Learning News'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-2744454098115029188</id><published>2007-12-13T21:22:00.000Z</published><updated>2007-12-22T12:14:02.897Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Machine Learning'/><title type='text'>Wanted: PostDocs</title><content type='html'>... in our very own Cambridge group. More info &lt;a href="http://learning.eng.cam.ac.uk/zoubin/postdocs08.html"&gt;here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-2744454098115029188?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/2744454098115029188/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=2744454098115029188' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/2744454098115029188'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/2744454098115029188'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2007/12/wanted-postdocs.html' title='Wanted: PostDocs'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-6610900428034900760</id><published>2007-12-07T21:17:00.000Z</published><updated>2008-02-07T11:57:44.779Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='F#'/><title type='text'>F# and the Task Parallel Library</title><content type='html'>&lt;div class="entry"&gt;     &lt;p&gt;In a &lt;a href="http://cs.hubfs.net/blogs/f_team/archive/2007/10/18/3774.aspx"&gt;previous post&lt;/a&gt; I discussed how F# asynchronous workflows can be used for parallelizing computations. More specifically, I chose to work with the standard matrix multiplication algorithm as it illustrates the point of how easy it is with F# to take some existing (embarrassingly parallelizable) code and actually parallelizing it. Although F# asynchronous workflows can be used for parallelization, they are really designed for helping with asynchronous I/O for reactive programs.&lt;/p&gt; &lt;p&gt;Enter the &lt;a href="http://www.microsoft.com/downloads/details.aspx?FamilyID=e848dc1d-5be3-4941-8705-024bc7f180ba&amp;amp;displaylang=en"&gt;Microsoft Parallel Extenions&lt;/a&gt; for .NET 3.5. This is a very exciting new technology that is specifically tailored to please developers who are keen to learn how to use the quad-core machines they are sitting at. In other words, the parallel extensions are a new concurrency library which, to my best knowledge, is well suited for parallizing scientific applications. Although I haven't played around with features like PLinq, I did look a bit into the task parallel library (TPL) component of the parallel extensions. The TPL library makes our life easier by providing sophisticated algorithms for dynamic work distribution. In the following example I used the Parallel.For class to parallelize the matrix multiplication algorithm from my previous blog post.&lt;/p&gt; &lt;pre class="fsharp"&gt;&lt;br /&gt;&lt;span style="color: rgb(0, 102, 204); font-weight: bold;"&gt;let&lt;/span&gt; PFXMultiply &lt;span style="color: rgb(102, 204, 102);"&gt;(&lt;/span&gt;A: &lt;span style=""&gt;float&lt;/span&gt;&lt;span style="color: rgb(102, 204, 102);"&gt;[&lt;/span&gt;,&lt;span style="color: rgb(102, 204, 102);"&gt;]&lt;/span&gt;&lt;span style="color: rgb(102, 204, 102);"&gt;)&lt;/span&gt; &lt;span style="color: rgb(102, 204, 102);"&gt;(&lt;/span&gt;B: &lt;span style=""&gt;float&lt;/span&gt;&lt;span style="color: rgb(102, 204, 102);"&gt;[&lt;/span&gt;,&lt;span style="color: rgb(102, 204, 102);"&gt;]&lt;/span&gt;&lt;span style="color: rgb(102, 204, 102);"&gt;)&lt;/span&gt; =&lt;br /&gt;&lt;span style="color: rgb(0, 102, 204); font-weight: bold;"&gt;let&lt;/span&gt; n = Array2.&lt;span style="color: rgb(0, 102, 0);"&gt;length1&lt;/span&gt; A&lt;br /&gt;&lt;span style="color: rgb(0, 102, 204); font-weight: bold;"&gt;let&lt;/span&gt; C = Array2.&lt;span style="color: rgb(0, 102, 0);"&gt;create&lt;/span&gt; n n &lt;span style="color: rgb(204, 102, 204);"&gt;0.0&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(0, 102, 204); font-weight: bold;"&gt;let&lt;/span&gt; RowTask i =&lt;br /&gt;&lt;span style="color: rgb(0, 102, 204); font-weight: bold;"&gt;  for&lt;/span&gt; j=&lt;span style="color: rgb(204, 102, 204);"&gt;0&lt;/span&gt; &lt;span style="color: rgb(0, 102, 204); font-weight: bold;"&gt;to&lt;/span&gt; n&lt;span style="color: rgb(204, 102, 204);"&gt;-1&lt;/span&gt; &lt;span style="color: rgb(0, 102, 204); font-weight: bold;"&gt;do&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(0, 102, 204); font-weight: bold;"&gt;  for&lt;/span&gt; k=&lt;span style="color: rgb(204, 102, 204);"&gt;0&lt;/span&gt; &lt;span style="color: rgb(0, 102, 204); font-weight: bold;"&gt;to&lt;/span&gt; n&lt;span style="color: rgb(204, 102, 204);"&gt;-1&lt;/span&gt; &lt;span style="color: rgb(0, 102, 204); font-weight: bold;"&gt;do&lt;/span&gt;&lt;br /&gt;   C.&lt;span style="color: rgb(102, 204, 102);"&gt;[&lt;/span&gt;i,j&lt;span style="color: rgb(102, 204, 102);"&gt;]&lt;/span&gt; &lt;- C.&lt;span style="color: rgb(102, 204, 102);"&gt;[&lt;/span&gt;i,j&lt;span style="color: rgb(102, 204, 102);"&gt;]&lt;/span&gt; + A.&lt;span style="color: rgb(102, 204, 102);"&gt;[&lt;/span&gt;i,k&lt;span style="color: rgb(102, 204, 102);"&gt;]&lt;/span&gt; * B.&lt;span style="color: rgb(102, 204, 102);"&gt;[&lt;/span&gt;k,j&lt;span style="color: rgb(102, 204, 102);"&gt;]&lt;/span&gt;&lt;br /&gt;&lt;span style="color: rgb(102, 204, 102);"&gt;  (&lt;/span&gt;&lt;span style="color: rgb(102, 204, 102);"&gt;)&lt;/span&gt;&lt;br /&gt;Parallel.&lt;span style="color: rgb(0, 102, 204); font-weight: bold;"&gt;For&lt;/span&gt;&lt;span style="color: rgb(102, 204, 102);"&gt;(&lt;/span&gt;&lt;span style="color: rgb(204, 102, 204);"&gt;0&lt;/span&gt;, &lt;span style="color: rgb(102, 204, 102);"&gt;(&lt;/span&gt;n&lt;span style="color: rgb(204, 102, 204);"&gt;-1&lt;/span&gt;&lt;span style="color: rgb(102, 204, 102);"&gt;)&lt;/span&gt;, &lt;span style="color: rgb(0, 102, 204); font-weight: bold;"&gt;new&lt;/span&gt; Action&lt;int&gt;&lt;span style="color: rgb(102, 204, 102);"&gt;(&lt;/span&gt;RowTask&lt;span style="color: rgb(102, 204, 102);"&gt;)&lt;/span&gt;&lt;span style="color: rgb(102, 204, 102);"&gt;)&lt;/span&gt;&lt;br /&gt;C&lt;br /&gt;&lt;/int&gt;&lt;/pre&gt; &lt;p&gt;As you can see, there is not much to to it. You just create the method which you want to execute in parallel (RowTask in our case) and pass it to the Action class which creates a delegate for you and passes it along to the TPL.&lt;/p&gt; &lt;p&gt;Here are the results when running the parallel matrix multiply for a 1000 by 1000 matrix on my quad core 2.4GHz machine: non-parallel version: 19.2 seconds, async version: 5.6 seconds, TPL version: 5.2 seconds. Feel free to download &lt;a href="http://mlg.eng.cam.ac.uk/jurgen/undirectedgrad/1207-tpl.fs" title="Parallel Matrix Multiplication in F#"&gt;this file&lt;/a&gt; to play with it yourself.&lt;/p&gt; &lt;p&gt;Stay tuned for more neat (machine learning) applications of the TPL.&lt;/p&gt;    &lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-6610900428034900760?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/6610900428034900760/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=6610900428034900760' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/6610900428034900760'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/6610900428034900760'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2007/12/f-and-task-parallel-library.html' title='F# and the Task Parallel Library'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3240496205319586614.post-8435849383292857848</id><published>2007-12-06T19:58:00.000Z</published><updated>2007-12-20T21:16:42.030Z</updated><title type='text'>Hello, World!</title><content type='html'>Welcome to my blog; I am planning to post discussions on &lt;a href="http://en.wikipedia.org/wiki/Machine_learning"&gt;machine learning&lt;/a&gt;, functional programming (&lt;a href="http://research.microsoft.com/fsharp/fsharp.aspx"&gt;F#&lt;/a&gt;) and how it can be used in machine learning, fields related to machine learning such as numerical optimization, statistics, ...&lt;p&gt;&lt;/p&gt;&lt;p&gt;Stay tuned.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3240496205319586614-8435849383292857848?l=www.informaniac.net' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.informaniac.net/feeds/8435849383292857848/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3240496205319586614&amp;postID=8435849383292857848' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/8435849383292857848'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3240496205319586614/posts/default/8435849383292857848'/><link rel='alternate' type='text/html' href='http://www.informaniac.net/2007/12/hello-world.html' title='Hello, World!'/><author><name>Jurgen Van Gael</name><uri>http://www.blogger.com/profile/16153896873199339437</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
