2013-07-14

Heritage 2

Building on the efforts of HHP, we are very excited to announce that HPN is launching a $3 million private "masters" competition, which Kaggle will also host. The competition will be open to the top eligible finishers from the first Heritage Health Prize.

The challenge will be the same as the first prize — to predict hospitalization of individuals — with one very substantial difference: there will be little, if any, data anonymization. For privacy reasons, the public competition used data that had been very heavily anonymized.

This new competition will be the first time that the impact of data anonymization on health outcomes will really be directly comparable, and will likely provide strong evidence for a more nuanced approach to data privacy legislation. Kaggle has always sat in the narrow slice of the Venn between the public and private data domains and we're striving to make this area wider, both by making more data accessible to the community, and by helping to clarify the cost/benefit (or insight/security) tradeoff of releasing datasets rather than making everything closed by default. As noted data scientist and Jetpac Founder, Pete Warden, has pointed out "there’s so much good that can be accomplished using open datasets, it would be a tragedy if we let this slip through our fingers."

This will also be the first time that there has been an invitation-only Kaggle competition with such a large purse. It will be very exciting to see how the top data scientists, who have already gone head to head in the open phase, respond to this high-stakes rematch. Competition details are still being finalized, but stay tuned for more updates in the coming months.

Introducing Kaggle Connect

As you may have noticed from changes to our website, Kaggle Connect, our new matching service, is becoming a key part of what Kaggle does. The idea is to give Kaggle data scientists the opportunity to monetize their Kaggle profiles and give companies the chance to work with elite data scientists. We're rolling this out slowly, so for the time-being we've only opened it up to a small number of data scientists. As demand increases we'll starting inviting more data scientists to Kaggle Connect.

If you work for a company that might like to use Kaggle Connect, you can learn more at on the solutions page.

New Competition from Amazon

When an employee at any company starts work, they need to obtain the computer access necessary to fulfill their role. This usually requires a supervisor to manually grant the necessary access. As employees move throughout a company, this access discovery/recovery cycle wastes time and money.

Amazon has provided a large-but-simple data set and asked Kagglers to solve this permission problem. The competition is off to a rip-roaring start (200 teams in 2 days!), but is this data deceptively simple? Can a machine learning model do well enough to take this mundane job off a boss' hands? If we get the access codes to the submission scoring server in two months, we'll let you know ...

Creative Commons License on Kaggle Wiki

When we put up the Kaggle Wiki, we didn't include a license, so all contributions still belong to the contributors (not much of a public wiki!). We apologize for this oversight. We're moving the Kaggle Wiki to a creative commons license similar to that used by Wikipedia. Now when you create the article on entering your first submission, add common forum questions to the member FAQ, or add a totally new page on a topic you think should be covered, you can be confident that your work won't be lost to the world of orphaned works.

We would love to make the Wiki a great data science resource - helping to define the discipline while helping data scientists to learn and improve.

Show more