Microsoft Cloud Services
Partner Solution Case Study
/ Company Creates Crowdsourcing Platform in the Cloud to Solve Complex Problems


Overview
Country or Region:United States
Industry:Professional services
Partner Profile
Kaggle is a startup company with an eponymous platform that provides a link between organizations with complex problems and researchers and scientists from around the world. Founded in 2010, it is based in San Francisco.
Business Situation
The founder of Kaggle wanted to create an easy-to-use site that would connect the world’s top researchers, analysts, and scientists with organizations seeking to solve complex problems.
Solution
Kaggle built its site on Windows Azure for a scalable, easily modified site that can accommodate rapid growth.
Benefits
  • Platform supports constant innovation
  • Scales easily to handle heavy user and data traffic
  • Helps startup company control costs
/ “With Windows Azure, we don’t have to give much thought to the infrastructure that’s supporting our company. It’s a very powerful, pragmatic platform for a startup firm.”
Jeremy Howard, President and Chief Scientist, Kaggle
Jeremy Howard, President and Chief Scientist, Kaggle
Kaggle is a company and platform launched to provide a link between organizations needing specialized analytic and scientific skills witha global pool of researchers and scientists who can provide those skills. The Kaggle offering caught on so fast that it quickly grew beyond the capabilities of its original online platform, which could not provide the scalability or flexibility that Kaggle needed to grow and support market demand. So the company turned to Windows Azure and Microsoft development tools to relaunch the site. With Windows Azure, Kaggle has a highly scalable platform capable of supporting quick spikes in new users and data traffic, along with a development environment that supports continuous innovation.

Situation

A frustrating and sometimes costly challenge faced by many large organizations is finding the right people with the skills to research, test, and validate products and ideas that may have a major impact on the company’s bottom line. Without access to the right skillsets, projects can be delayed or wither altogether. On the other side of the equation, skilled researchers and analysts are often looking for projects where they can put their solutions, algorithms, and other intellectual efforts to work in real-world scenarios.

Finding a way to bring those two parties together was the brainchild of Anthony Goldbloom, an entrepreneur and analyst who had puzzled over the issue during stints at various banking and financial institutions, including the Australian Treasury, the Reserve Bank of Australia, and ANZ—the third largest bank in Australia. Goldbloom had also served briefly as a writer at The Economist magazine where, during interviews with different organizations, the problem came into sharp focus.

“Most large companies have huge amounts of corporate data, and their top executives are well aware of the need to make sense of it,” says Goldbloom. “CEOs and CIOs may have predictive modeling high on their priority lists, but it’s hard to find the talent or solution that is going to fit their specific needs. It’s especially problematic when particular solutions might cost millions of dollars, yet the companies don’t know if that particular solution is the right fit for what they’re trying to accomplish.”

The scope of the problem and lack of a good solution led Goldbloom to found Kaggle, a company and online platform that links organizations facing tough problems with some of the best minds in the world. Kaggle uses a crowdsourcing model to solve complex problems, with competitions and reward money acting as incentives to attract analysts and scientists from around the world to tackle complex scientific and industrial challenges.

Kaggle was originally built using Amazon Web Services, the PHP scripting language, and the MySQL open source database. Before long, however, the platform became problematic for the Kaggle team.

“Amazon Web Services is relatively complex to use,” says Jeremy Howard, President and Chief Scientist for Kaggle. “You cannot just turn it on and it runs. There is quite a bit of setup and configuration required. Also, maintaining code with PHP was too difficult in terms of using it as the language over a long period. It does not lend itself to concise, easily maintainable code.”

Equally important was a need to scale quickly and easily on demand—a need that Kaggle felt Amazon could not meet. The issue came to a head at the end of 2010, when Kaggle signed a large client—the Heritage Provider Network, which offered a $3 million award for the best algorithm that could predict and prevent unnecessary patient hospitalizations in the United States.

“After signing this client, we felt that the existing platform supporting Kaggle could not scale and support the levels of activity we would experience in much larger competitions like this one,” says Howard.

Solution

Kaggle decided to switch to a Microsoft environment, using Windows Azure as its cloud platform along with Windows development tools. The site was rewritten using Microsoft Visual Studio development tools, includingMicrosoft Visual C#. Kaggle received assistance in its efforts through the Microsoft BizSpark program.

“We decided to use Visual C# as our primary tool. It had the expressiveness that we were looking for in a programming language that could help us build on the excitement generated by the Kaggle competitions. At the same time, it provided the speed to do more sophisticated programming,” says Howard. “And, because Windows Azure is specifically designed for ASP.NET apps, it was easy for us to get the solution up and running with the tight integration of the Visual Studio tools.”

The Kaggle site also uses Windows Azure Compute, which lets the company run its application code in the cloud. Each Windows Azure Compute instance runs as a virtual machine that is isolated from other customers, and is supported by the network load balancing and failover capabilities of Windows Azure. The site also uses Windows Azure web roles, Windows Azure worker roles, and blob storage. A cloud-based database is provided by Microsoft SQL Azure.

Organizations with predictive modeling problems fill out a simple wizard on the Kaggle website, which automatically creates a competition for participating data scientists. Since the deployment of the Windows Azure-based site, Kaggle has grown significantly, with more than 32,000 users participating in the site by early 2012. Organizations such as Allstate, NASA, and Ford have posted data and their related problems on Kaggle.

Benefits

By turning to Windows Azure and the Microsoft development tools, the Kaggle team was able to quickly rewrite the code. The site is easier to modify and upgrade as needed to keep pace with the growing popularity of the Kaggle service. It is highly scalable to accommodate large clients and uploading of large data sets. It is also a cost-effective solution for Kaggle as the company moves out of its startup phase.

Platform Supports Constant Innovation

Howard says the power of the Windows development tools and their integration with Windows Azure simplified the task of rewriting the code for the site while also supporting innovation.

“It took us about one month to completely rewrite the code and move the site to Windows Azure,” he says. “That included moving the database to Microsoft SQL Azure, which was relatively simple using the Microsoft SQL Server Integration Services. It was very cool to have much of this process automated because it streamlined the entire process.”

Howard adds that the Microsoft tools support the fast implementation of new ideas. “Being a startup, we’re always changing things,” he says. “Windows Azure and the Microsoft development environment support the kind of continuous innovation that’s important for our growth. And with Visual C#, we can make all of our changes in one place—it’s very expressive, concise, and fast.”

Scales Easily to Support User, Data Traffic

Windows Azure is highly scalable, in terms of being able to handle large spikes in the number of users as well as the data traffic that occurs with Kaggle competition activity. “It’s good to know that if we sign a big client or competition and get thousands of new users signing on, it’s a simple matter of adding more computing capability on Windows Azure. If we get a burst of new participants on the site and were not able to scale as easily as we can on Windows Azure, people would have to wait a long time to get on the site,” says Howard.

He notes that the platform easily handles large data sets that are downloaded by competition participants.

“It can be anywhere from a few kilobytes to hundreds of gigabytes of information at once,” Howard says. “Windows Azure has provided a solid performance experience for our users.”

Helps Control Costs for Startup

With Windows Azure, Kaggle was able to adopt a pay-as-you-go platform model for its business.

“This is incredibly important for any young company in a rapid growth stage,” Howard says. “We don’t have to worry about and plan for potentially expensive and complex IT overhead. Windows Azure provides the platform backend so we can focus on innovation.”

He adds that Windows Azure provides a very “frictionless way” to host the kind of company that Kaggle is trying to build.

“With Windows Azure, we don’t have to give much thought to the infrastructure that’s supporting our company,” he says. “It’s a very powerful, pragmatic platform for a startup firm.”

Microsoft Cloud Services

Microsoft offers a complete set of cloud-based solutions to meet business needs, including solutions for advertising, communications (email, meetings), collaboration (document storage, sharing, workflow), business applications (customer resource management, business productivity), data storage and management, and infrastructure services. In addition, customers can take advantage of an entire ecosystem of solution providers and Microsoft partners. For more information, please visit