In-Sample vs. Out-of-Sample Testing
In-sample (IS) data is the historical period you use to develop and optimise a strategy. Out-of-sample (OOS) data is held back — untouched — until the strategy is finalised, then tested exactly once. The OOS result is your only unbiased estimate of how the strategy performs on data it has never seen. Every time you use OOS results to make a decision, the holdout is contaminated and must be replaced.
Definition
In-sample (IS) — data used during strategy development. Any decision informed by IS results is implicitly fitting to that data.
Out-of-sample (OOS) — data held back and never referenced during development. Valid OOS results are used exactly once: after the strategy is fully specified. The moment OOS results influence any decision, the period becomes in-sample by definition.
Why does the in-sample/out-of-sample split matter?
Every parameter you optimise, every filter you add, every threshold you adjust based on IS results is a form of fitting to that specific historical path. The IS Sharpe ratio is optimistic by construction — you searched over options and kept the best. It tells you how good the strategy looks on data it was designed for, not how good it is.
OOS performance is the only unbiased estimate of future performance. A strategy that maintains strong Sharpe ratio on the OOS period has genuine evidence of an edge. A strategy that collapses on OOS data is overfit — it memorised the IS period rather than learning from it.
How to structure the IS/OOS split correctly
- Put the OOS period at the end of your data. The OOS period must be the most recent data, not a random slice from the middle. Using a random sample as OOS would mean your IS data contains future bars — a form of look-ahead bias.
- Choose a 70/30 or 80/20 IS/OOS ratio. 70/30 is the most common split. Use 80/20 when your strategy has many parameters and needs more IS data to optimise reliably. Use 60/40 only when you have a very long dataset and the strategy is simple.
- Lock the OOS period before development begins. Decide on the split date before you look at any data. Write it down. The OOS period is off-limits until you have finished.
- Test the OOS period exactly once. Run the fully specified strategy on the OOS period. Record the result. Do not iterate. If the result is disappointing, start over — do not adjust parameters to improve the OOS result.
- Treat any OOS-informed decision as contamination. If you adjust anything after seeing OOS results, that period is now IS. Reserve a new holdout from a different time range, or switch to walk-forward analysis.
Common IS/OOS split ratios compared
| Split | IS data available | OOS window | Best for |
|---|---|---|---|
| 60 / 40 | 3 years of a 5-year dataset | 2 years | Long datasets, simple strategies |
| 70 / 30 | 3.5 years of a 5-year dataset | 1.5 years | Standard starting point |
| 80 / 20 | 4 years of a 5-year dataset | 1 year | Many parameters, few trades |
| WFA | Rolling IS windows | Many OOS windows | Most rigorous — see walk-forward analysis |
What does a good IS/OOS result look like?
There is no fixed threshold, but these benchmarks are widely used:
| OOS Sharpe / IS Sharpe | Interpretation |
|---|---|
| > 0.7 | Strong generalisation — parameters transfer well to unseen data |
| 0.4 – 0.7 | Moderate — some degradation, acceptable with corroborating evidence |
| 0.1 – 0.4 | Likely overfit — significant IS/OOS performance gap |
| Negative or near zero | Definitively overfit — strategy fails on unseen data |
Even a ratio above 0.7 is not a guarantee of live performance. Live trading introduces slippage, partial fills, and latency that no simulation captures. Treat a strong OOS result as evidence of a real edge, not a promise of profit.
When a single split is not enough
A single OOS window gives you one data point. It could be an unusually good or bad period by chance. If the OOS window happened to coincide with a market crash, a low result does not necessarily mean the strategy is overfit — it might just be a bad market period. Conversely, a strong OOS result in a strong bull market may be flattering the strategy.
The solution is walk-forward analysis, which produces 6–12 independent OOS windows by rolling the IS/OOS split across the full dataset. Multiple OOS results average out the luck of any single window and give a far more reliable robustness assessment.
For a statistically rigorous approach to multiple trials, the Deflated Sharpe Ratio adjusts for the number of strategy configurations you tested before selecting the winner — even across a single IS period.
Validate your strategy on unseen data — no code required
backtester.run runs your plain-English strategy through a full zipline backtest with configurable IS/OOS splits and walk-forward windows, returning per-period Sharpe ratios and an OOS efficiency ratio.
Start free →Frequently Asked Questions
- What is in-sample and out-of-sample testing in backtesting?
- In-sample (IS) data is the historical period you use to develop and optimise your strategy — choosing indicators, parameters, and filters. Out-of-sample (OOS) data is a separate period held back and never touched during development. Once the strategy is finalised, you run it on the OOS period exactly once. The OOS result is your only unbiased estimate of how the strategy will perform on data it has not seen.
- What is the correct in-sample to out-of-sample split ratio?
- A 70/30 IS/OOS split is the most common starting point. For strategies with many parameters or few trades, a 80/20 split gives more development data. The OOS period should always be the most recent data — never a random sample from the middle of the series, which would introduce look-ahead bias.
- Can you use the out-of-sample period more than once?
- No. The moment you use OOS results to make any decision about the strategy — adjusting a parameter, adding a filter, changing position sizing — it becomes in-sample data. You have 'peeked' at it and it is no longer a clean holdout. Reserve a second OOS period if you need to iterate, or use walk-forward analysis to produce many independent OOS windows.
- What is the difference between out-of-sample testing and walk-forward analysis?
- A single OOS test gives you one holdout result, which could be lucky or unlucky. Walk-forward analysis produces many independent OOS windows by rolling the optimisation window forward across the dataset. Multiple OOS windows give a much more reliable picture of generalisation than a single split.
- What if out-of-sample performance is much worse than in-sample?
- A large IS/OOS performance gap is the primary signal of overfitting. The strategy learned the noise of the IS period rather than a genuine market pattern. Start over with fewer parameters, a simpler hypothesis, or a longer IS period. Do not try to 'fix' the strategy by re-optimising — that contaminates the OOS period.
- How does out-of-sample testing relate to live trading performance?
- OOS performance is a more realistic estimate of live performance than IS performance — but it is still a simulation. Live trading introduces real slippage, partial fills, latency, and psychological pressure that no backtest captures. Expect live performance to be 20–40% below OOS performance as a rule of thumb.