xuly.io
Engineering

Scraping affiliate portals without breaking (or getting banned)

Residential proxies, session rotation, selector drift, and the self-healing loop we built.

Engineering2026-03-29·11 min read
Engineering

Roughly half our brand integrations use portal scraping rather than public APIs. Here's how we keep 100+ scrapers alive with a two-person infrastructure team.

Why scrape at all

Most iGaming affiliate programs don't expose APIs. The stats live in a portal that requires login. For our customers, who legitimately own those accounts, scraping is the only option. We only scrape portals using credentials the user provided — never bulk-scraping public data.

The three things that break scrapers

  • IP blocks (the portal detects too many requests from one data-center IP)
  • Selector drift (the portal changes its HTML)
  • Credential rejections (session invalidated, 2FA re-prompt, etc.)

Fixing IP blocks: residential proxies + session stickiness

We route every scrape through Bright Data's residential pool with session stickiness — same IP for an entire login session, fresh IP on the next sync. Brands don't see us as a bot because we look like a residential user who visits their portal every 2 hours, from a consistent location.

Fixing selector drift: declarative manifests + visual diffs

Instead of imperative Playwright code per brand, we use declarative manifests that describe the portal in ~40 lines: where's the login form, what's the success indicator, which CSS selectors map to which columns. When a selector stops matching, the scraper captures a screenshot + HTML dump. Our internal dashboard diff-views the HTML against last-good-run and flags which selector changed — a 5-minute fix instead of a half-hour investigation.

Self-healing (WIP)

We're training a smaller model on (broken manifest, diff, fixed manifest) triplets. Early results: it generates correct selector fixes ~70% of the time. Not good enough to auto-deploy yet, but good enough to propose fixes that a human approves in one click.

Credential rejections

When a brand requires 2FA and we can't proceed, we surface a 'Needs reauthentication' state in the dashboard. The user completes the flow in a popup, we capture the fresh cookie, and resume. No automated 2FA bypass.

The bottom line

Over 100+ scrapers, we see ~2 selector-drift incidents per week. With the diff tooling, each is a 5–15 minute fix. That's sustainable with a 2-person team — and it gets better as the fleet grows, because we build shared selector patterns for common portal frameworks.