Methodology

Why most native ad spy data is wrong

A methodology critique of how Anstrex, Adplexity, and other "spy tools" actually collect native ad data — and the seven layers of rotation, gating, and bot-mitigation that make 90%+ of the corpus incomplete or stale.

By Eyal RosenthalMay 6, 202616 min readAI-assisted research

I have spent the last two years staring at native-ad spy tools and the last six months building one. Both experiences have led me to roughly the same uncomfortable conclusion: the data you see when you log in to Anstrex, Adplexity, AdHeart, NativeAdBuzz, or any of the other dozen tools that promise to "show you every running ad in 92 countries" is, on a good day, somewhere between thirty and fifty percent of the truth. On a bad day — and there are many bad days — it is closer to ten. This is not a vendor-bashing piece. The vendors are competent. The problem is structural: the way native ad serving works in 2026, no single scraper, no matter how well-funded, can see the whole picture. Below is what I have learned, with sources, about why that is.

The mental model spy-tool users have is wrong

When a media buyer logs into a spy tool and sees "ad X has been running for 47 days with creative variant A on Outbrain in the United States," they imagine that the tool has been continuously watching ad X serve to a stable audience. That is not what is happening. What is happening is the tool's scraper has, at some cadence, hit some publisher pages from some IPs in some user-agent configurations and has observed creative A. It has not observed creatives B through L on the same campaign, because those did not serve to that scraper at that moment. It has not observed the same campaign in Canada because the scraper was not in Canada. It has not seen the mobile-only variant because the scraper was on desktop. Every "running ad" record in a spy tool is a sample, not a census. Most users do not treat it that way.

The IAB's OpenRTB 2.6 specification — the protocol that governs how programmatic ad requests work, including most native — is explicit about this: a single ad slot can be filled by any one of dozens of campaigns based on bid, frequency capping, dayparting, geo, device, OS, browser, language, and dozens of other targeting parameters. A scraper hitting the same publisher page twice in a row will, by design, often see two completely different ads. Multiply this across the seven layers below and the sampling problem compounds.

Layer 1: Ad rotation inside a single campaign

A single Outbrain or Taboola campaign typically has between three and forty creative variants. The platforms rotate them — sometimes evenly, sometimes weighted toward the highest-CTR variant, sometimes via a multi-armed-bandit allocation. Outbrain's Help Center describes "Conversion Bid Strategy" and "Smart Decisioning" as the system that "automatically rotates creatives to find the best performer." Taboola's developer docs describe the same pattern under "creative-level optimization."

A scraper that hits a page once will see exactly one creative. To see all forty variants of one campaign, the scraper would have to hit that exact slot forty-plus times, with the right cookie state, the right entropy in user-agent, and the right timing — and even then, the platform's optimizer may have already deprioritized the lower-CTR variants down to one-percent serving rates, which means the scraper will need to hit it hundreds of times to surface them at all. The math does not work at scale.

Layer 2: Country gating

Every major native network supports geo-targeting at the country, region, city, and DMA level. Taboola's Advertiser Help Center lists country-level targeting as a default in every campaign builder. RevContent's targeting documentation confirms the same. A campaign serving the US and Canada is a different campaign — different bid, different creative, often different landing page — than the same advertiser's UK campaign.

Spy tools advertise "92 countries of coverage" or similar. What this typically means is that they have proxies (residential or datacenter) in those 92 countries and they rotate scrapers through them. The honest version of the disclosure would be: "we sample each of those 92 countries on some cadence, and the cadence depends on traffic costs and proxy availability." The unhonest version — what users actually internalize — is "we see every ad running in every country in real time." The gap between those two statements is where most of the bad inference happens.

Cloudflare's 2024 Radar bot report notes that automated traffic accounts for roughly 30% of all internet requests it sees and that bot-mitigation systems are increasingly geo-aware: a request from a residential IP in Brazil is treated differently than a request from a datacenter IP in Brazil, which is treated differently than a request from a residential IP in the US. Spy-tool scrapers that cannot cleanly mimic local residential profiles get served either no ad, a default house ad, or a deliberately-misleading "fingerprinting" ad designed to identify them.

Layer 3: Device and OS gating

The split between iOS, Android, desktop Chrome, desktop Safari, desktop Firefox, and the long tail of older devices is enormous in native. Mobile traffic dominates Outbrain and Taboola supply (roughly 70% mobile by Taboola's 2024 10-K filing), but the highest-CPV inventory often skews desktop. Different creative formats serve to different device classes — for example, certain "short-form video" creatives only serve to mobile, while "1200x628 hero image" creatives are desktop-default. RevContent, MGID, and Outbrain all have explicit device-class bidding controls.

A scraper that is desktop-only will never see the mobile-only creatives, no matter how many pages it hits. A scraper that is iOS-Safari only will never see the Android-Chrome creatives. The ones that try to "rotate" device profiles often do so badly — using a desktop Chrome user-agent on a request that has a desktop Chrome TLS fingerprint and viewport, then claiming it is "mobile coverage." The bot-mitigation services catch this trivially. Akamai's State of the Internet bot reports document the mismatched-fingerprint detection patterns that have been mainstream since 2021.

Layer 4: Time-of-day rotation

Dayparting is a default-on feature of most native networks. Outbrain explicitly supports hour-and-day-of-week scheduling. Taboola supports it. MGID supports it. Advertisers running offers in regulated verticals — finance, gambling, supplements — often serve aggressively during off-hours when compliance review staffing is low and turn off during business hours. A scraper that runs exclusively at 3 PM UTC will systematically miss the campaigns that only serve between midnight and 6 AM in the target country.

Spy tools rarely disclose their scraping schedule. The honest answer, when I have asked, is some version of "we run continuous crawls but the proxy pool is limited so the actual coverage per country per hour is uneven." This is fine. It just means the data is, again, a sample.

Layer 5: A/B test variants and split URLs

This is the layer that makes me laugh. A typical search-arbitrage advertiser running on Outbrain to a Tonic search feed will have, simultaneously, four to twelve A/B tests running across pre-landers, ad copy, and search-feed configurations. From the network's perspective, these are usually sub-IDs on a single "campaign," not separate campaigns. From the spy tool's perspective, they look like ten different ads — but the spy tool has no idea which sub-ID won, lost, was paused, or was scaled. It just shows you the creative.

This means when a media buyer copies "the winning ad" from a spy tool, they are usually copying one of N variants without knowing which one was the winner. The winner is often the one that isn't the most-frequently-seen — because the most-frequently-seen is sometimes the control that is being held flat while the test variants get optimized.

The IAB's Ad Operations 2.0 best practices does not require networks to surface A/B-test winner-loser metadata to third parties. None of them do, because doing so would let competitors copy their best-performing structures.

Layer 6: Bot mitigation and ad poisoning

This is the dirtiest secret in the spy-tool industry. Several major native networks actively detect known scraper fingerprints and serve them either (a) deliberately-stale creatives, (b) creatives that no real user is being served, or (c) creatives belonging to a "honey-pot" advertiser who has paid to identify scrapers in the wild. I am not going to name the specific networks because some of this is contractual gray area, but the Cloudflare bot mitigation documentation describes the general pattern under "decoy" responses, and several papers from the USENIX Security symposium have documented bot-trap creatives in display and native advertising specifically.

The implication: spy tools that do not invest seven figures a year in residential-proxy infrastructure and TLS-fingerprint randomization are, in a non-trivial percentage of their captures, scraping ads that no human ever saw. There is no clean public number for what that percentage is. A scraping engineer I know who runs a competing tool put the rough range at "five to twenty percent of records on the worst-defended scrapers." I cannot independently verify that figure.

Layer 7: Hashed creative deduplication

Even when a scraper successfully captures every creative, it then has to decide which captures are "the same ad." This is a non-trivial problem. A 1200x628 hero image with a one-pixel watermark difference is, byte-for-byte, a different file than the original. Most spy tools use perceptual hashing (pHash, dHash, or learned image embeddings) to bucket "near-duplicate" creatives together. The thresholds they pick directly determine how many "unique" ads they report. Adobe's research on image-similarity hashing and the open-source ImageHash library both document how a small change in threshold flips two creatives from "same" to "different."

This is why two spy tools, looking at the same network on the same day, will give you different "running ad" counts that differ by 30%+. They are not lying. They have made different deduplication choices. Without knowing which choices, you cannot meaningfully compare the numbers.

So what is actually true?

Here is my best summary of what the data is and is not. A spy tool's "running" record is approximately equivalent to: this creative was served to one of our scrapers from a proxy of class X in country Y in device class Z at time T, and we have not yet confirmed it has stopped running. That is a useful signal. It is not the truth.

A spy tool's "first seen" date is approximately equivalent to: the first time our scraper happened to capture this creative. Not the first time it served. The actual first-served date can be days or weeks earlier.

A spy tool's "last seen" date is approximately equivalent to: the most recent time our scraper happened to recapture this creative. If the creative is still running but only serves to mobile in Brazil at 4 AM and the scraper did not run that exact configuration, the "last seen" date will drift backward and the tool will declare the ad "dead" while it is still cheerfully running.

A spy tool's "this ad is on Outbrain" is approximately true at the network level but is approximately useless at the publisher level — the same campaign serves to thousands of publishers and the spy tool only saw it on one.

What media buyers should actually do with spy data

Treat it as direction, not evidence. If you see a creative running on Outbrain that has been "captured" 40 times across 6 countries over 90 days, you can be reasonably confident it is a real, active, scaled campaign — because it has too much coverage to be a fluke. If you see a creative captured 2 times in 1 country over 7 days, you are looking at a sample of one with a confidence interval that includes "this ran for an hour and got paused."

Cross-reference with at least two tools whenever a decision matters. The overlap between Anstrex and Adplexity captures, in my experience benchmarking the two, is somewhere in the range of 40-60% on the same network in the same week. The non-overlap is not "one tool is right and one is wrong." The non-overlap is "neither tool sees the whole picture and the union is closer to truth than either alone."

Demand transparency from the tool you pay for. Ask: How many proxies do you operate? In how many countries? On what residential-vs-datacenter mix? At what TLS-fingerprint rotation? At what device-class distribution? At what schedule? What is your dedupe threshold? If they cannot answer, you are paying for vibes.

Build your own where it matters. The cost of a small in-house scraper has fallen dramatically. A media buyer with a $10K/month spend who relies on a $300/month spy tool is rationally underinvested in their own data. A serious operation will have a small team or contractor running custom captures on the publishers and networks that matter to them, with full transparency on the methodology. The economics are obvious once you do the math.

What we are doing differently at mediabuyer.site

For the avoidance of doubt: we are not selling a spy tool yet, and when we do, we will publish our methodology transparently in this section. The reason this site exists is partly to write the kind of meta-coverage of the spy industry that does not currently exist outside private Slack groups. If you operate a spy tool and want to push back on anything in this piece, the email is at the bottom.

A short history of how the industry got here

It is worth a few paragraphs to explain how native-ad spy data ended up structurally broken, because the answer is not "the vendors are lazy." It is "the underlying serving infrastructure changed faster than the scraping infrastructure could keep up." A short version of that history:

In 2014-2016, the era of Anstrex and AdPlexity's first version, native ad serving was substantially simpler. Outbrain and Taboola served creatives via straightforward HTTP requests with predictable URL parameters. A scraper that hit a publisher page, parsed the widget HTML, and stored the creative reference could capture the network's catalog with high recall. The sampling problems described above were present but small. Spy tools at that era really did approximate a census.

From 2017 onward, the networks moved aggressively to dynamic serving — server-rendered widgets with per-request creative selection driven by the bidder, optimizer, and frequency cap state. The IAB's OpenRTB 2.4 update (2016) and 2.5 (2018) formalized the request-response model that made every impression effectively unique. Scrapers that had been doing static-catalog capture had to switch to per-request capture. Coverage dropped. Sampling problems got worse.

From 2019-2021, the bot-mitigation industry matured. Cloudflare, Akamai, and the platform-specific defenses got dramatically better at fingerprinting non-human requests. The State of Bot Mitigation 2021 Imperva report documented a 25% year-over-year increase in successful detection of automated traffic. Spy-tool scrapers that had previously gotten away with simple proxy rotation found themselves served decoy creatives or stub pages. The vendors who didn't invest in residential-proxy networks and TLS fingerprint randomization saw their data quality collapse without telling their customers.

From 2022 onward, the AI-creative explosion increased the rate at which new creatives appear. Where a 2018 native ad campaign might have 5 creative variants, a 2024 campaign routinely has 50. The deduplication problem (Layer 7) compounded as the volume of near-duplicate creatives exploded. Spy tools that quietly relaxed their dedupe thresholds to keep "unique creative" counts looking impressive ended up with corpus inflation. Spy tools that kept thresholds tight ended up looking sparse.

This is the era we are in. None of the major spy-tool vendors will publish their post-2022 methodology evolution because the methodology has been hard to keep up with. The honest answer most of them would give, if pushed, is some version of "we are doing our best and the industry has gotten harder."

What an honest spy-tool product page would say

I keep coming back to this thought experiment. If a spy-tool vendor wrote a product page that was actually accurate, it would say something like:

"We capture native-ad creatives across [N networks] using residential proxies in [M countries] on a rotating schedule. Our typical recapture rate per (creative, country, device) tuple is approximately every [X hours] for tier-1 inventory and [Y hours] for tier-2. Our coverage of mobile-app inventory is near zero. Our coverage of in-feed social (Facebook, Instagram, TikTok) is near zero. We deduplicate creatives at perceptual-hash distance [Z]. Our 'first seen' date reflects first capture, not first serve. Our 'running' status reflects most-recent capture, not confirmed-active state. The estimated overlap with our nearest competitor's corpus on the same network in the same week is approximately [W%]; the union of the two corpora is closer to truth than either alone. We recommend treating our data as direction, not census."

No vendor writes this. Some come closer than others. Anstrex's methodology page is, in fairness, more transparent than most about coverage limits, though it still doesn't fully describe the sampling structure. Adplexity's pages are heavier on capability marketing.

The industry's incentive structure rewards vague claims of comprehensiveness. The buyer's incentive is to demand specificity. The gap between those incentives is where I think transparent meta-coverage like this site can add real value over the next few years.

A brief technical aside on TLS fingerprinting

For the technically-inclined, a useful primer on why scraper detection has gotten so much harder lately. Modern bot-mitigation services don't just look at User-Agent headers; they fingerprint the full TLS handshake, the cipher suite ordering, the HTTP/2 settings frame, the JavaScript-execution profile of the browser (canvas, WebGL, audio context, font enumeration), and the timing of mouse and keyboard events. The JA3 TLS fingerprinting technique and its successors (JA4, JARM) are now industry-standard inputs to bot-mitigation decisions.

A scraper running headless Chrome with default Puppeteer settings has a TLS fingerprint that is distinguishable from human Chrome traffic in seconds. A scraper that has invested in puppeteer-extra-plugin-stealth and similar countermeasures gets harder to detect but is still typically detectable by serious mitigation infrastructure. The arms race continues, and the cost of running a not-detected scraper has climbed an order of magnitude over five years.

This is most of why "build your own scraper" is harder than it sounds. It is also why the spy-tool vendors that actually invest in their infrastructure are not cheap and the ones that are cheap are typically running on borrowed time.