10 years of web scraping: a perspective about selling web data
Selling web data: how we started and where we're headed to. Why old models are not working great and why a web data marketplace is what we need

Click
Use
to move to a smaller summary and to move to a larger one
Reflections on a Decade in the Web Scraping Industry
- Started web scraping journey ten years ago.
- Co-founded Databoutique.com with Andrea Squatrito.
- Realized the potential of capturing data available on the web.
- Initially used a simple scraping infrastructure without proxies or rotating IPs.
- Used C++ programs with cURL for scraping.
- Evolved tech stack and target websites over time.
- Challenges of selling web data remain difficult.
- Explored selling alternative data to finance industry and web data to retail industry.
- Learned the process of being a data vendor.
- Financial markets are forward-looking.
- Investors seek data to predict future trends.
- Example: using credit card transactions to predict consumer goods industry performance during Christmas season.
Challenges and Limitations of Web Data Extraction and Monetization
- Web data extraction requires fast and accurate fixing of broken extractors to maintain data accuracy.
- Backtesting data models using extracted data is necessary to determine correlation with stock prices.
- Monetizing web data is challenging as exclusivity is desired by customers, but not beneficial for the business in the long run.
- Creating scalable and profitable businesses solely from alternative data extraction is difficult due to high costs.
- Companies with proprietary data sources have been more successful in monetizing data.
- Selling raw web data may not be enough as companies often require insights and expertise in processing and analyzing the data.
- Selling custom-made services based on web data can be time-consuming and may not generate as much revenue as selling mass-produced products.
- Artisans in the fashion industry, like professional tailors, may not be able to compete with larger retail brands in terms of revenue.
The Potential of Web Scraping and the Role of Databoutique.com
- Tailor-made suits are expensive and not accessible to many people, similar to the cost and limited adoption of web scraping.
- Internal web scraping projects often fail due to challenges and expensive solutions on the market.
- There is a large audience of potential customers that remains untapped in the web scraping market.
- Web scraping requires professional expertise and specialized teams.
- Companies should focus on their core business and outsource data collection.
- Databoutique.com is a marketplace for web scraped data that decouples data production from value proposition.
- Users can browse the catalog, buy datasets, and request refreshes if needed.
- Companies can integrate their offerings with other websites and become sellers on the marketplace.
- Startups can validate ideas by buying data instead of investing in learning to scrape.
- Web scraping has become industrialized and productized, allowing for lower prices and serving more customers.
- Sellers benefit from selling more datasets and investing in tools, creating more reliable data feeds.
- Databoutique aims to bring more data buyers onto the platform and create a virtuous circle that benefits both sellers and buyers.
Help us grow Databoutique and the web scraping industry!
- Positive feedback received since our launch a few months ago.
- Requesting readers to share the article with friends or colleagues who need web data and talk about Databoutique.
- Sharing will help us and other professionals in the web scraping industry to grow and attract new players.
The Evolution and Challenges of Web Scraping: Insights from Databoutique.com
- Started web scraping journey ten years ago, co-founded Databoutique.com.
- Realized the potential of capturing data available on the web.
- Initially used a simple scraping infrastructure without proxies or rotating IPs.
- Evolved tech stack and target websites over time.
- Challenges of selling web data remain difficult.
- Explored selling alternative data to finance industry and web data to retail industry.
- Learned the process of being a data vendor.
- Financial markets are forward-looking, investors seek data to predict future trends.
- Web data extraction requires fast and accurate fixing of broken extractors to maintain data accuracy.
- Backtesting data models using extracted data is necessary to determine correlation with stock prices.
- Monetizing web data is challenging as exclusivity is desired by customers, but not beneficial for the business in the long run.
- Creating scalable and profitable businesses solely from alternative data extraction is difficult due to high costs.