Founded in 2007, CrunchBase is a website offering massive amounts of data about startup activity. Want to know who founded a startup, who invested in it, or who they’re competing with? CrunchBase has the answers. And in a marketplace that is somewhat frothy, CrunchBase is an increasingly heavily trafficked web property. The site contains over 650,000 profiles of individuals and companies and is a massive repository of data. As such, CrunchBase has a massive opportunity to monetize that data, and is accordingly concerned about people who seek to use that data for their own commercial aims.

I spent time talking with Kurt Freytag, head of product at CrunchBase, to have a look at the engineering work that goes into the site. As the site grew in size and traffic, Freytag noticed oddly shaped traffic and random spikes that were putting significant strain on its infrastructure. Of course, it could have simply thrown more horsepower at the site, but Freytag was keen to identify real root causes for the issues. He quickly concluded that bot traffic was hitting the site hard and crawling through its data. While this is a primary concern in terms of performance, it also introduces real commercial risk…

Ben Kepes

Ben Kepes is a technology evangelist, an investor, a commentator and a business adviser. Ben covers the convergence of technology, mobile, ubiquity and agility, all enabled by the Cloud. His areas of interest extend to enterprise software, software integration, financial/accounting software, platforms and infrastructure as well as articulating technology simply for everyday users.