Nearly eight years ago, Josh McHugh in a great Wired piece asked the question, Should Web Giants Let Startups Use the Information They Have About You? The article examines the pros and cons of allowing small companies to scrape data.
In the nearly eight years since the publication of that piece, scraping data remains a controversial practice. To be sure, there’s significant demand for tools that pull data from websites and return it in usable formats. Startups such as Grepsr, Krakio, promptcloud, and import.io1 allow non-technical users to grab data en masse from websites and create customized application program interfaces (APIs). Put differently, these go well beyond old-school copying and pasting.
For those with mad Python chops, libraries such as Beautiful Soup and Scrapy can typically go well beyond what WISWYG scrapers can do.
Across the aisle, many companies view scraping their data as a tremendous threat. They’re not wrong. For instance, the practice represents one way to get yourself banned from Facebook. Zuck understandably doesn’t want people gobbling up reams of Facebook data, without question one of his company’s most valuable assets.
The Larger Trend
As companies grow, they start to restrict access to their APIs.
I’m not going to argue the merits and demerits of scraping here. I do, however, want to call attention to the larger trend going on here. The data wars are not confined to popular sites such as Facebook and Google. The battle for data is becoming increasingly bloody. What’s more, it’s manifesting itself in decidedly unsexy areas such as HR software. (See my post earlier this month on the Zenefits-ADP scuffle.)
- Build a custom or proprietary API. No longer is the sole purview of tech behemoths.
- Build a data moat, something that Netflix, Amazon, and Facebook have effectively done.
- Close or limit access to its API. Many have done this, including Twitter and LinkedIn. Yes, developers can violate the terms of an API and get slapped for doing so.
Simon Says: The data wars have arrived.
To be sure, there are pros and cons with all strategies. For instance, option three might “protect” data, but it’s going to earn the ire of developers and users. Wooing developers and partners by opening “platforms” and APIs is standard practice at the beginning. As companies grow, however, they start to restrict access to their APIs.
In The Age of the Platform, there are no simple answers.
What say you?
This post comes from IBM for MSPs. The opinions expressed here are my own.
- Read my interview with import.io CEO David White here.