MathJax

Tuesday, 6 December 2011

London Machine Learning Meetup ... and joining a startup.

Three months ago I decided to end my employment at Microsoft. I had a great time, met loads of interesting people and will definitely miss being in the Cambridge research lab (as well as the occasional trip to Seattle).

Next up I am joining a startup company called Rangespan. I met the founders of Rangespan at Silicon Milkroundabout in April this year. When I got talking to them it was immediately obvious there was a hell of a lot of machine learning to be done there.

Rangespan is essentially an e-commerce platform which is organizing a marketplace for suppliers to sell products to big retailers. On one hand Rangespan is trying to build the biggest catalog in the world. We're doing some by fusing many different information sources (supplier feeds, manufacturer feeds, crowdsourced data, crawled data). The three big learning problems in this space are:

  1. Matching: out of millions of product descriptions (semi-structured & textual), how do you recognize which descriptions talk about the same product? Record linkage is the keyword here, but making it work at large scale and under very small precision and recall margins is a challenge.
  2. Reconcilliation: when you have 100 different descriptions of a product, and you want to construct the best possible description of the product, how do you do that? I think the speech recognition and machine translation community has a few neat tricks up their sleeve to be applied here: in the abstract if we think of each of the 100 descriptions as the output of a noisy channel with the unknown ground truth description as the input, the challenge is to invent a prior and model for the noisy channel. Machine learning baby!
  3. Information extraction: the data we are gathering is only semi-structured and most often marketing text. The question now becomes: how do we figure out form a USB stick description that the product has a "4GB" capacity. How do we figure out that "4GB" is the same as "four gigabytes" etc etc. There's loads of great work on information extraction which we can build on; the good news is that our domain is restricted (physical goods) the bad news is that the domain is quite large (1000's of categories, 1000's of attributes).
Besides the catalog, Rangespan is also building an order pipeline component. One crucial component of that pipeline is the system that assigns retailer's orders to suppliers. The idea here is that for each product we have a pretty good view of the supply curve (what suppliers sells how many items at what price). We also have a pretty good view of the demand curve: by aggregating many different retailers we can predict what the demand for a product will be. This allows us to make very quantitative decisions about how to choose the price for that product to optimize metrics like user experience etc. In this area a mix of statistics (time series analysis), decision theory, econometrics and optimization are crucial ingredients ... together with a ton of data ofcourse.

If this all sounds like music to your ears, then let us know. We're hiring!

Joining a startup also meant I am now working in London. One of the coolest things I discovered about being in London is the great number of meetups people are organizing. From MongoDB office hours (thank you for all the help 10gen!) to the Hadoop and Big Data meetups. Although big data is crucial at Rangespan, we think clever algorithms (as in machine learning) are as crucial to building successful insights from data. As such we are kicking off the Machine Learning Meetup as of January 2012. We'll put some fun talks together with the London machine learning community and probably do some study groups on the Stanford online learning courses.

If you live closeby and haven't yet signed up, you can find all meetup details here.

Monday, 29 August 2011

Who Killed Prolog

Via HackerNews comes the article "Who Killed Prolog". The article wonders why logic programming languages have fallen behind object-oriented, imperative and functional programming languages (my favourite functional programming language F# has just entered the TIOBE programming languages top-20).

The hypothesis of the article is that when a Japanse research project called the fifth generation computer system was ended, Prolog, it's main programming language went down with it.

This reminded me of my undergraduate class in declarative languages in 2003. It started with our professor saying that there is a big cover up in the computer industry: "NASA, Microsoft, Google, HP and many other companies are using Prolog in their core products but they don't want to admit it." Instant classic!

Thursday, 12 May 2011

PQL–A Probabilistic Query Language

At MSR and Bing, when we do machine learning on smaller datasets (say anything below 100GB) we often use relational databases and SQL. Throw in a little bit of Excel and R and you’ve got yourself a very powerful platform for exploratory data analysis.

After the exploratory phase, we often build statistical models (adPredictor, TrueSkill, Matchbox, …) to discover more complex structures in the data. Infer.Net helps us prototype these graphical models, but unfortunately it forces you to work in a mode where you first create a binary that performs inference, suck out all data to your machine, run inference locally and then write all inference results back to the DB. My local machine is way slower than the machines which run our DB or our local compute cluster so ideally I’d like to have a platform which computes “close” to the data.

The Probabilistic Query Language (or PQL) is a language/tool which I designed two years ago, during an internship with Ralf Herbrich and Thore Graepel, where we had the following goals in mind:

  • Allow for rapid prototyping of graphical models in a relational environment
  • The focus should be on specifying models, not algorithms
  • It should enable large scale execution and bring the computation to the data, rather than the data to the computation

Using SQL Server, DryadLinq (Map-Reduce for .NET) and Infer.Net I built a prototype of PQL and tested it on some frequently used models at Microsoft. In this post I want to introduce the PQL language and give a few examples of graphical models in PQL.


Let’s start with a very simple example where we have a DB with a table containing people’s info and a table with records describing doctor visits for those people. Assume the following relational schema

image

We assume that people have an unknown weight and when they go to the doctor, she measures this weight. Depending on the time of day (after a heavy lunch), this estimate could be off a bit. A statistical model to capture these assumption is to introduce a random variable for the weight for each person in the People table, put a prior on this variable and connect it with the observations in the DrVisits table. So how do we write such a model in PQL?

PQL is very much like SQL but with two extra keywords: AUGMENT and FACTOR. AUGMENT allows us to add random variables to the DB schema. In the example above we would write

People = AUGMENT DB.People ADD weight FLOAT

This essentially defines a “plate” in graphical model speak: for each row in the People table, a random variable over the real numbers called weight is defined.

The FACTOR keyword in PQL allows to introduce factors between random variables as well as any other variables in the DB schema. FACTOR follows the relational SQL syntax to specify exactly how to connect variables. To specify a normal prior on the weight variable we could write

FACTOR Normal(p.weight | 75.0,25.0) FROM People p

This introduces a normal factor for each row in the People table (the FROM People p part). The final component of our program connects the random variable with observations. In this case, we use the familiar SQL JOIN syntax to specify how to connect rows from the People table to the rows in the DrVisits table. In PQL we write

FACTOR Normal(v.weight | p.weight, 1.0)
FROM People p
JOIN DrVisit v ON p.id = v.personid

Except for the first line this is exactly SQL; instead of doing a query, the FACTOR statement describes the “probabilistic augmentation” of the DB schema”.

For the example above, this is it, the PQL program contains five lines of code and can be sent to the DB. It will run inference by performing EP or variational Bayesian inference. The inference itself can be run either within the database (this was implemented by Tina Palla who was an intern with us) or on the DryadLinq cluster.


Another example of PQL is the program to describe the TrueSkill ranking system. In this example we assume two-player games stored using a table of players (called Players) and a table of game outcomes (called PlayerGames). Each game played generates two rows in the PlayerGames table: one for the winner and the loser (with a score) column specifying who is the winner and who is the loser. The PQL program for TrueSkill is written below

Players = AUGMENT DB.Players ADD skill FLOAT;
PlayerGames = AUGMENT DB.PlayerGames ADD performance FLOAT;

FACTOR Normal(p.skill | 25.0, 20.0) FROM Players p;

FACTOR Normal(pg.performance | p.skill, 0.1)
   FROM PlayerGames pg
   JOIN Players p ON pg.player_id = p.player_id;

FACTOR IsGreater(pgb.performance, pga.performance)
   FROM PlayerGames pga
   JOIN PlayerGames pgb ON pga.game_id = pgb.game_id
   WHERE pga.player_id < pgb.player_id AND pga.score = 0;

FACTOR IsGreater(pga.performance, pgb.performance)
   FROM PlayerGames pga
   JOIN PlayerGames pgb ON pga.game_id = pgb.game_id
   WHERE pga.player_id < pgb.player_id AND pga.score = 2;

 

There are a lot of features in PQL I haven’t covered in this blog post (like using random variables in a WHERE clause to create mixture models) but I wanted to give you a flavour of what we’ve been working on so far.

While working on PQL I learned a lot about the state of the art in probabilistic databases and statistical relational learning. I think compared to this academic work, PQL does not add many theoretical contributions; our goal is to design a tool which takes statistical relational learning out of the laboratory into the hands of data mining practicioners.

Sunday, 1 May 2011

Math.Net Numerics

I’ve been a fan of doing numerical computation on the .NET platform for a very long time. This interest landed me an internship at Microsoft Research with Don Syme’s team in 2007 where we investigated F# suitability for scientific computing. After the internship, I joined the open source community helping out with writing a kick-ass numerical library for the .NET platform.

Today, I am quite proud to announce that we are releasing the final beta of our open source project: Math.Net Numerics. Moreover, with this announcement, we are also kicking off a competition to find the fastest implementation of matrix multiplication in purely managed code. The winner of this competition will receive 1500$ and we will integrate his code into our open source codebase. I’m excited to see some creative coding in the next few weeks!

Tuesday, 26 April 2011

BBC 4 podcasts

I just discovered some really good podcasts from BBC 4 which I think some people here might enjoy.

  • A Brief History of Mathematics: a discussion about some great mathematicians …
  • In Our Time: very general, one-hour discussions with a few experts. I really enjoyed the episode on “Random & Pseudorandom” from January 13.

Wednesday, 20 April 2011

ggplot2 and Subway

The following article caught my eye a few weeks ago: Subway Set to Overtake McD's in Omnipresence. As I am trying to learn a little bit of ggplot2 (and loving it so far!) I thought it would be fun to try and create some visuals to go with this claim.

I used one of Microsoft’s restaurant datasets and do a simple substring match on “subway”, returning the latitude and longitude. Using the following lines of ggplot and R code

states <- data.frame(map("state", plot=FALSE)[c("x","y")])
colnames(states) <- c("Lon","Lat")
ggplot(states, aes(x=Lon, y=Lat)) + geom_path()
+ geom_point(alpha=0.6,size=0.3,data=subway)

we get a cool picture showing all of the metropolitan areas of the United States.map


If you click on the image to zoom in you will be able to discern major highways as well. Subway is literally everywhere.

Tuesday, 12 October 2010

ECML–PKDD 2010 Highlights

 

imageThe city of Barcelona just hosted ECML/PKDD this year and I had the opportunity to go down and check out the latest and greatest mostly from the European machine learning community. My personal highlights

  • The conference for me started with a good tutorial by Francis Bach and Guillaume Obozinsky. They gave an overview of sparsity: in particular various different methods using l1 regularization to induce sparsity.

The first invited speaker was Hod Lipson from Cornell (and as far as I know one of the few people in our field who has given a TED talk). The main portion of Hod’s talk was about his work on symbolic regression. The idea is the following: consider the following dataset

    image

We can apply our favourite regression method, say a spline, to these points and perform accurate interpolation, perhaps even some extrapolation if we chose the right model. The regression function would not give us much insight into why data could be looking like this? In symbolic regression, the idea is that we try to come up with a symbolic formula which interpolates the data. In the picture above, the formula that generated the data was EXP(-x)*SIN(PI()*x)+RANDBETWEEN(-0.001,0.001) (in Excel). Hod and his graduate students have built a very cool (and free!) app called Eureqa which uses a genetic programming methodology to find a good symbolic expression for a specific dataset. Hod showed us how his software can recover the Hamiltonian and Lagrangian from the measurements of a double pendulum. Absolutely amazing!

Another noteworthy invited speaker was Jurgen Schmidhuber. He tried to convince us that we need to extend the reinforcement learning paradigm. The idea would be that instead of an agent trying to optimize the amount of long term reward he gets from a teacher, he would also try to collect “internal reward”. The internal reward is defined as follows: as the agent learns, he is building a better model of the world. Another way to look at this learning is that the agent just knows how to “compress” better. The reduction in representation he gets from learning from a particular impression is what Jurgen calls the “internal reward”. In other words, it is the difference between the number of bits needed to represent your internal model before and after an impression.

E.g. you listen to a new catchy son: Jurgen says that you think it’s catchy because you’ve never heard anything like this before, it is surprising and hence helps you learn a great deal. This in turn means you’ve just upgraded the “compression algorithm” in your brain and the amount of improvement is now reflected in you experiencing “internal reward”. Listening to a song you’ve heard a million times before doesn’t help compressing at all; hence, no internal reward.

I really like the idea about this internal reward a lot, and as far as I understand it would be very easy to test. Unfortunately, I did not see any convincing experiments so allow me to be sceptical …

The main conference was cool and I’ve met some interesting people working on things like probabilistic logic, a topic I desperately need to learn more about. Gjergji gave a talk about our work on crowdsourcing (more details in a separate post). Some things I marked for looking into are:

  • Sebastian Riedel, Limin Yao and Andrew McCallum – “Modeling Relations and Their Mentions Without Labeled Text”: this paper is about how to improve information extraction methods which bootstrap from existing knowledge bases using constraint driven learning techniques.
  • Wanner Meert, Nima Taghipour & Hendrik Blockeel – “First Order Bayes Ball”: a paper on how to use the Bayes ball algorithm to figure out which nodes not to ground before running lifted inference.
  • Daniel Hernández-Lobato, José Miguel Hernández Lobato, Thibault Helleputte, Pierre Dupont – “Expectation Propagation for Bayesian Multi-task Feature Selection”: a paper on how to run EP for spike and slab models.
  • Edith Law, Burr Settles, and Tom Mitchell – “Learning to Tag from Open Vocabulary Labels”: a nice paper on how to deal with tags: they use topic models to do dimensionality reduction on free text tags and then use that in a maximum entropy predictor to tag new music.

I enjoyed most of the industry day as well: I found it quite amusing that the Microsoft folks essentially gave all Bing secrets away in one afternoon: Rakesh Agarwal mentioned the secret sauce behind Bing’s ranker (neural net) whereas Thore Graepel explained the magic behind the advertisements selection mechanism (probit regression). Videos of these talks should be on videolectures.net soon.

One last rant: the proceedings are published by Springer who ask me to pay for the proceedings?!?! I’m still trying to figure out what value they’ve added to our camera-ready we sent them a few months ago …

Tuesday, 14 September 2010

From Undirected Grad to Informaniac

What drives a man to change the name of his blog? A trademark dispute? A mid-life crisis? None of those actually!

fuse-logoAfter three amazing years, I am exchanging the amazing machine learning group for an exciting new adventure with Microsoft FUSE Labs. As I am writing this, my laptop is crunching away at the last few experiments to be included in my thesis while a printed version of my thesis text is awaiting final editing.

It has been an interesting few months as I was debating moving into a post doc or evaluating all the amazing opportunities that were available in industry: in the end I decided to join FUSE Labs as an applied researcher. FUSE Labs is an incubation lab at Microsoft that sits in between Microsoft Research and the product groups. My new job allows me to collaborate with people (in or outside of Microsoft) on research while giving me the freedom to also spend time building working prototypes and demo’s. As far as I am concerned: the perfect mix of science and engineering.

Although I will soon be a graduate student no more, I don’t want this blog to die but rather take it to the next level. The foreseeable future will involve more experiments on machine learning, data mining and information processing and I should be able to report much of this to everyone who cares to listen!

welcome to www.informaniac.net and stay tuned …

Tuesday, 22 June 2010

Rebooting the CS Publication Process

Dan Wallach makes suggestions for revolutionizing the CS publication process in this 13 page report. I really like this stuff and can only hope things start moving sooner rather than later in this area.

On the time of reviewing, a recent paper I really enjoyed is Novel Tools To Streamline the Conference Review Process: Experiences from SIGKDD'09 by a Bristol/MSR collaboration. For me the most interesting part of the paper was a probabilistic graphical model which collaboratively learns how to calibrate reviewers. Calibration here means that we’d like to understand how different reviewers have different thresholds for giving a paper 1/2/3/4/5 star reviews. The generative model proposed in the paper can easily be implemented in Infer.NET and although it is perhaps to early to make decisions based on the method, it could certainly help PC’s in making their decisions.

Saturday, 24 April 2010

Microsoft Research PhD Scholarships

A new round of Microsoft Research PhD scholarships is being organized. I’ve enjoyed being on the scholarship for the past two years: it’s been a great opportunity to meet new researchers at Microsoft and other students at the PhD event organized by MSR Cambridge.

For all you upcoming machine learning rock-stars out there: talk to your advisors, they will have to apply for you but you can probably help them a bit!