Skip to content
OddsRelay

What makes an arbitrage data feed actually usable

Most arb feeds look fine in a demo and fall apart in production. Here are the four things that actually decide whether a feed is usable, plus the exchange-lay side that back/lay arbs live or die on.

James6 min read

A usable arbitrage data feed is judged on four things: coverage breadth, freshness, completeness, and consistent event and market normalisation across books. For back/lay arbs, add a fifth: the exchange lay side, priced and liquid enough to be real. A feed can demo well and still fail on any one of these, and each failure shows up as a different bug in your scanner. This is how buyers tell a good feed from a frustrating one before they commit an integration to it.

If you want the ground-level definitions first, arbitrage data explained covers what an arb actually is. This piece assumes you know that and asks the harder question: what makes the data behind it worth building on.

How much coverage do you actually need?

Coverage breadth decides how many arbs your scanner can ever see, because an arb needs at least two books quoting the same selection. A feed covering ten books surfaces a fraction of the opportunities a feed covering sixty does. Breadth is the ceiling on your signal count, and no amount of clever scanning raises it.

Breadth is not only book count, though. It is which books, and whether the hard ones are in. OddsRelay covers 60+ UK books with bet365 included as standard, with coverage built to extend into the domestic South African and Nigerian books the large aggregators tend to skip. Those emerging-market books are where the wider margins still sit, so a feed that reaches them gives your scanner arbs the crowded UK-only feeds cannot.

Why does freshness matter more than it looks?

Freshness is how recently each price was seen, and it is the difference between a real arb and a phantom one. A stale price still looks like a valid quote, so your scanner flags an opportunity that closed minutes ago. The bet fails at placement, the qualifying loss lands wrong, and you learn nothing until it has already cost you. A stale price is worse than a missing one, because a missing price never lies to you.

This is why every row needs a timestamp you can trust, not a feed-level promise. A book that updated three seconds ago and one that updated three minutes ago cannot be treated the same, and only per-selection freshness lets your scanner tell them apart and drop the ones that have gone cold.

What happens when the data has gaps?

Completeness means selections are never silently dropped, and its absence is the quietest failure of the four. When a selection vanishes from the feed without a marker, your scanner does not see an error. It sees one fewer book on that market, so it either misses an arb that existed or, worse, computes a signal from a half-populated market and gets it wrong.

The dangerous part is that gaps are invisible in a demo. A market that is whole today can lose a selection next week, and nothing in the payload tells you unless the feed is built to keep the market whole or flag when it cannot. Ask a prospective supplier what happens to a selection when its data goes missing. A good answer distinguishes absent from stale from dropped.

Why is normalisation the criterion buyers forget?

Consistent normalisation means the same event and the same market carry the same identity across every book, and getting it wrong produces false arbs from thin air. If one book labels a fixture Man Utd vs Spurs and another calls it Manchester United vs Tottenham, a naive matcher either misses the pairing or, if it matches too loosely, pairs two different events and reports an arb that does not exist. Both failures erode trust in the scanner fast.

This is unglamorous, continuous work: mapping every book's naming onto one canonical event and market scheme, and holding it as books rename things. It is also the reason a matched feed is worth more than raw prices. The normalisation is done before you receive the data, so your scanner compares like with like. If you are weighing what to build yourself, building an arb scanner walks through how much of the cost lives here.

What does a usable row look like?

For a back/lay arb, one usable row carries the back price, the paired exchange lay price, and enough liquidity for the lay to be placeable. The criteria above are the point; the row is just where they show up together. Here is the shape (illustrative, not live data):

One matched row · illustrative shape
{
  "event": "Arsenal vs Chelsea",
  "market": "match_odds",
  "selection": "Arsenal",
  "back": { "bookmaker": "bet365", "odds": 2.10 },
  "lay":  { "exchange": "betfair", "odds": 2.14, "liquidity": 1840 },
  "rating": 98.1,
  "qualifying_loss": -0.12
  // ... region, feed_type and freshness fields elided
}

The back block is one of your 60+ books; the lay block is one of three exchanges (Betfair, Smarkets, Matchbook). The rating scores the pair and qualifying_loss states the cost of the position, both computed before delivery. A raw price feed gives you the back block alone and leaves the rest to you.

Is the lay side actually liquid?

For any back/lay arb the exchange lay price is only real if there is money behind it, so liquidity belongs in the row next to the price. A lay quote of 2.14 with nothing available to match is a number, not an opportunity: your scanner rates the arb, you try to place the lay, and it does not fill. The position you thought you had was never there.

How fast does the feed need to be?

Fast enough for the arbs you are trading, and honest about it. Our posture is pre-match polling on roughly a few-second cycle, which suits pre-match arbitrage well. We do not claim sub-second in-play streaming, because that is not what we ship, and a vendor promising it without proof is asking you to trade on trust.

The more useful signal than any promised latency number is whether the supplier shows you reliability instead of asserting it. Freshness, uptime and latency are published on the coverage dashboard, so the claim is checkable. A dashboard you can watch beats a figure in a sales deck, because it holds the vendor to the same standard every day, not just on the day you asked.

The short version

A usable arb feed scores on all four criteria at once: broad and deep coverage, per-row freshness, no silent gaps, and consistent normalisation, with a liquid exchange lay side for back/lay arbs. A feed that nails three and fails one is still frustrating, because the failing one becomes the bug you chase in production. OddsRelay is built to hold all four, with bet365 included and prices matched against three exchanges, and it powers a leading UK matched-betting platform today. The quickest way to judge it against your own criteria is a free trial, or watch what is live right now on the coverage dashboard.

Arbitrage & value betting

Written by

James

Founder, OddsRelay

James is the founder of OddsRelay — the odds-data feed behind matched betting, arbitrage and odds-comparison products: 60+ UK bookmakers with bet365 included, matched against exchange lay prices and delivered as one clean, documented API. He writes here about how that data layer actually behaves — coverage, matching, freshness and the trade-offs — from the side that builds and runs it. The same feed powers a leading UK matched-betting platform today.

Part of the Arbitrage & value betting cluster

Arbitrage betting data, explained: how live arb works

18+ · Data product for licensed operators. Please gamble responsibly.