Skip to content
Back to work

Better Collective · May 2024 — May 2026

Metrify — Next-Generation Affiliate Data Scraping

Led product delivery of a next-generation scraping system to collect, process, and store affiliate account data at scale — replacing a slower, less reliable legacy setup.

Product Delivery
Data Platform
AWS
Scalability
Scraping speed
8× faster
Data quality
Significantly improved
Architecture
Modular & scalable
01

Problem

  • The business relied on a legacy scraping setup to collect affiliate account data — but it was slower, less reliable, and struggling to keep pace with growing data needs and partner complexity.
  • Reporting, decision-making, and onboarding new affiliate sources all depended on data that was hard to trust and expensive to maintain operationally.
02

Research & Discovery

  • Mapped pain points across stakeholders: slow scrape cycles blocking timely reporting, data quality issues eroding confidence, and high manual effort when partners or requirements changed.
  • Evaluated what the legacy architecture could not fix without a fundamental rethink — reliability, throughput, and adaptability all required a new approach.
  • Aligned on success criteria: faster performance, better data accuracy, scalable storage, and a modular design that could onboard new affiliate sources quickly.
03

Solution

  • Led the product delivery of Metrify — a next-generation scraping system designed to collect, process, and store affiliate account data at scale.
  • Built on cURL-based requests and REST APIs for reliable data extraction, with RabbitMQ workers handling asynchronous, distributed processing.
  • Used AWS S3 for scalable, durable data storage and designed a modular architecture so new affiliate sources could be onboarded fast as the business evolved.
04

Impact & Metrics

  • Delivered 8× faster scraping performance compared to the previous solution.
  • Significantly improved data accuracy and data quality across affiliate account datasets.
  • Created a highly scalable, easily adaptable platform that responds quickly to new business requirements and partner changes.
  • Reduced operational overhead through automation and resilient processing — strengthening the data foundation for faster reporting, better decision-making, and a setup that scales with the business.
05

Trade-offs

Every project involves choices. Here I explain what I didn't do, what I chose instead, and why that was the better call for the business.

  • Instead of patching the legacy scraping system, I chose a full rebuild — incremental fixes could not deliver 8× performance or the data quality the business needed.
  • Instead of shipping the fastest possible MVP, I prioritized modular architecture and data accuracy first — so onboarding new affiliate partners would not require rework later.
  • Instead of a simple monolithic setup, I invested in RabbitMQ and AWS S3 — accepting more complexity upfront in exchange for async processing, durability, and long-term scale.
06

What I Learned

  • Data infrastructure is a product decision — the architecture you ship directly determines how fast the business can report, decide, and adapt.
  • Modularity is a competitive advantage when partners and requirements change frequently; designing for onboarding speed pays off quickly.
  • Measurable impact (8× performance, improved quality, lower ops overhead) is what turns a technical migration into a business story stakeholders remember.