Back to work
Better Collective · May 2024 — May 2026
Metrify — Next-Generation Affiliate Data Scraping
Led product delivery of a next-generation scraping system to collect, process, and store affiliate account data at scale — replacing a slower, less reliable legacy setup.
Product Delivery
Data Platform
AWS
Scalability
- Scraping speed
- 8× faster
- Data quality
- Significantly improved
- Architecture
- Modular & scalable
01
Problem
- The business relied on a legacy scraping setup to collect affiliate account data — but it was slower, less reliable, and struggling to keep pace with growing data needs and partner complexity.
- Reporting, decision-making, and onboarding new affiliate sources all depended on data that was hard to trust and expensive to maintain operationally.
02
Research & Discovery
- Mapped pain points across stakeholders: slow scrape cycles blocking timely reporting, data quality issues eroding confidence, and high manual effort when partners or requirements changed.
- Evaluated what the legacy architecture could not fix without a fundamental rethink — reliability, throughput, and adaptability all required a new approach.
- Aligned on success criteria: faster performance, better data accuracy, scalable storage, and a modular design that could onboard new affiliate sources quickly.
03
Solution
- Led the product delivery of Metrify — a next-generation scraping system designed to collect, process, and store affiliate account data at scale.
- Built on cURL-based requests and REST APIs for reliable data extraction, with RabbitMQ workers handling asynchronous, distributed processing.
- Used AWS S3 for scalable, durable data storage and designed a modular architecture so new affiliate sources could be onboarded fast as the business evolved.
04
Impact & Metrics
- Delivered 8× faster scraping performance compared to the previous solution.
- Significantly improved data accuracy and data quality across affiliate account datasets.
- Created a highly scalable, easily adaptable platform that responds quickly to new business requirements and partner changes.
- Reduced operational overhead through automation and resilient processing — strengthening the data foundation for faster reporting, better decision-making, and a setup that scales with the business.
05
Trade-offs
Every project involves choices. Here I explain what I didn't do, what I chose instead, and why that was the better call for the business.
- Instead of patching the legacy scraping system, I chose a full rebuild — incremental fixes could not deliver 8× performance or the data quality the business needed.
- Instead of shipping the fastest possible MVP, I prioritized modular architecture and data accuracy first — so onboarding new affiliate partners would not require rework later.
- Instead of a simple monolithic setup, I invested in RabbitMQ and AWS S3 — accepting more complexity upfront in exchange for async processing, durability, and long-term scale.
06
What I Learned
- Data infrastructure is a product decision — the architecture you ship directly determines how fast the business can report, decide, and adapt.
- Modularity is a competitive advantage when partners and requirements change frequently; designing for onboarding speed pays off quickly.
- Measurable impact (8× performance, improved quality, lower ops overhead) is what turns a technical migration into a business story stakeholders remember.