MatFinProtocol Leaderboard – Financial AI Backtests

Reference ⇅	Asset⇅	Initial balance⇅	Final balance⇅	% annualised profit⇅	% max drawdown⇅	Sharpe ratio⇅	Backtest period⇅	Total trades⇅	Win rate %⇅	Model / notes⇅
Uygun et al.(2025) DOI: 10.1007/s00521-025-11586-8	FX & Crypto	100	170	35.70%	27.49%	0.76	104 weeks (Ending 2023)	Not given	54.95%	MTGNN (Multivariate Graph Neural Network with temporal convolutions)
Wangchailert et al. (2025). DOI:10.37936/ecti-cit.2025191.256994	8 pairs EUR/USD	Not given	+$10,999.00	$916 per year	Not given	Not given	2010–2022 (13 years)	28,525	51.3%	Candle pattern based trading
Papatsimpas et. al (2025) DOI:10.1007/s10898-025-01505-5	EUR/USD	Not given	Not given	+2198.00 pips (Profit)	-95.90 pips	N/A	July-Oct 2022 (64 days)	36	86.11%	Hybrid EMD + Stacked LSTM + PSO aggregation optimization
López-Herrera, et al. (2025) DOI: 10.1007/s44163-025-00424-4	USD vs EUR, CNY, JPY, AUD, CHF, MXN, ZAR, and TRY	Not given	Not given	31.1% (TRY/USD Max)	Near-Zero	3.38 (Median Bootstrap)	2018–2023	35 (Avg per period)	71.1%	Logistic Regression with Mean Absolute Directional Loss (MADL) optimization
Iswara et al. (2025)DOI: 10.1109/ISCT66099.2025.11297372	XAU/USD	Not given	(Net Profit: +$1.53)	Not given	$0.41	Not given	4 to 5 hours	26	57.69%	LSTM + Inductive Conformal Prediction (ICP) uncertainty filter
Zhang & Khushi (2020). DOI:10.48550/arXiv.2008.09471	EUR/USD	100,000	187,900	+13.4	‑19.7	1.68	2010‑01‑01 → 2019‑12‑31	2,305	61.0	SS Ratio‑optimised GA
Arabha et. al. (2024). arXiv:2411.01456	EUR/USD	Not given	Not given	42.22% (DS2 OOS, 7 months)	Not given	0.47 (DS2 OOS)	DS1: Jan–Jul 2017; DS2: Jan–Jul 2023	Not given	Not given	PPO + Auxiliary Task (PPO+AXT); LSTM Actor-Critic; DRL with OHLC + auto-encoder features; qualitative cost mention only; no slippage stated; short OOS windows (7 months each)
Pillai et. al. (2026). arXiv:2601.19504.	S&P 500 (100 stocks)	$100,000	$235,492.83	53.46% CAGR	15.60%	1.68	Jan 2023 – Jan 2025 (2 years OOS)	Not given	61.5%	Hybrid XGBoost + FinBERT sentiment + EMA/MACD/RSI regime filter; Backtrader simulation; 70/30 train/test split; no transaction costs stated; long-only daily
Nyo et al. (2026). arXiv:2603.15848.	S&P 500 (equities)	$100,000	Not given	Not given (189.10% total return on validation 2018–2024)	23.50% (validation)	2.04 (validation); 2.07 (test)	Dev: 2000–2017; Validation: 2018–2024; held-out test set	Not given	38.50% (validation)	Enhanced momentum + FinBERT sentiment; EMA50/200, ATR14 trailing stop, top-10 cross-sectional momentum; strict 3-way train/val/test split; no transaction costs stated; student project paper – treat with caution
Saly-Kaufmann et. al. (2026). arXiv:2603.01820	Multi-asset futures & FX (bonds, commodities, energy, equity indices, FX – 60+ instruments)	Not given	Not given	26.32% CAGR (VLSTM, best model)	22.90% (VLSTM)	2.40 (VLSTM, 2010–2025)	2010–2025 (15 years, rolling OOS)	Not given (turnover: ~967 ann.)	58.8% hit rate (VLSTM)	Large-scale benchmark of 16 DL architectures; Sharpe-ratio optimisation objective; rolling OOS; gross returns with breakeven cost analysis per asset; statistically significant vs. passive (HAC t=8.81); Oxford-Man Institute
Azevedo et. al. (2024). SSRN:4702406	US equities (stock anomaly long-short)	Not given	Not given	Not given	Not given	0.84 net (LSTM, 1 hidden layer)	Not given (post-2000 implied)	Not given	Not given	LSTM anomaly-based long-short; net Sharpe 0.84 after all frictions; 57% cumulative performance reduction from full cost modelling; transaction costs, post-publication decay, post-decimalization all modelled
Buchanan & Benhamou (2026). arXiv:2603.14453	Top 30 S&P 500 stocks	Not given	Not given	Not given	Not given	1.10 (OOS 2019–2025; baseline 0.85)	In-sample: 2005–2018; OOS: 2019–2025	Not given	Not given	E-TRENDS LSTM with Sharpe-ratio training loss; 70/15/15 train/val/test split; 2–5 bps round-trip transaction costs modelled; OOS Sharpe +0.25 vs. baseline; no total return or MDD stated
Cohen, Aiche & Eichel (2025). DOI: 10.3390/e27060550	NASDAQ-100 stocks	Not given	Not given	24.99% avg annual return (best: technical-quarterly framework)	Not given	1.2967 (technical-quarterly, best model)	Jan 2020 – Jan 2025 (5 years, rolling-window OOS)	Not given	Not given	ChatGPT-4o semantic intelligence + ML (technical/fundamental/entropy frameworks); top-10 equal-weighted portfolio; cumulative return 573.37% (best); rolling-window OOS retraining; no transaction costs or MDD stated; peer-reviewed MDPI Entropy
Ghatak et al. (2025). arXiv:2509.16707	814 US equities (walk-forward production system)	Not given	Not given	26.38% cumulative (Jun 2021 – Jun 2025)	3.04%	2.54	Jun 2021 – Jun 2025 (4-year production walk-forward)	8,859	56.6%	Deep learning (feed-forward + recurrent), six-quarter rolling calibration; production system (not pure academic backtest); trading commissions/slippage not stated; very low MDD 3.04% notable; Zanista AI industry paper
Holzer et al. (2024). arXiv:2501.10709	Stock task: DJIA 30 stocks; Crypto task: BTC LOB data	Not given	Not given	63.37% cumulative (stock task, PPO best, Jan 2021 – Dec 2023)	Not given	1.55 (stock task, PPO); 0.28 (crypto task, ensemble)	Stock: Jan 2021 – Dec 2023; Crypto: Apr 7–19 2021 (very short LOB window)	Not given	Not given (win/loss ratio 1.62 crypto task)	PPO, SAC, DDPG ensemble; rolling 30-day training / 5-day test windows; Sortino (stock) 2.44; costs mentioned but no numeric value; crypto window 13 days only; ACM ICAIF competition paper
Hajdini et al.(2025). DOI: 10.7717/peerj-cs.3630	US equities: AAPL, MSFT, AMZN, BAC, NVDA	$1,000,000 per security	Not given	53.87% avg annualised return (LLM-MAS-DRL framework)	12.54% avg	1.702 avg (LLM-MAS-DRL)	Jul 2024 – Jun 2025 (OOS, 12 months)	25 avg per security	71.30% avg	Three-layer LLM multi-agent + DRL framework (market analysis, risk management, execution); 0.1% transaction cost per trade modelled; Sortino, Calmar, CVaR also reported; comparison vs. Buy&Hold and PPO/A3C baselines; peer-reviewed PeerJ Computer Science
Fan et al. (2025). DOI: 10.1145/3766918.3766922	Ethereum (ETH) – multi-factor quantitative model	Not given	Not given	97% annualised return (main result, threshold ±1.0)	22%	2.5 (threshold ±1.0); 2.2 at ±0.5; 2.3 at ±1.5	Q4 2024 (Oct–Jan 2025) bull test; Q1 2025 (Jan–Apr 2025) bear test; sensitivity range Oct 2024–Apr 2025	Not given (1.7 trades/week at ±1.0)	59%	ML-driven multi-factor model (RSI, MACD + on-chain gas/active address metrics); Information Ratio 1.2; avg holding period 4.5 days; IC mean 0.12; no transaction costs stated; note: 6-month backtest window is short; bull and bear sub-period tests reported separately
Huang et al. (2025). arXiv:2502.17493	US stocks (daily rebalancing, public stock data)	Not given	Not given	61.73% p.a. (2019–2024 test period)	Not given	1.18 (2019–2024)	Test 1: 2019–2024 (1,340 days); Test 2: 2005–2010 (1,360 days)	Not given (daily rebalancing)	Not given	Return-weighted loss function for deep learning stock selection; 37.61% p.a. on 2005–2010 OOS test (Sharpe 0.97); two disjoint test windows including bear-market 2005–2010; no transaction costs or MDD stated; arXiv preprint 2025
Nguyen (2026). arXiv:2602.11708	Crypto perpetual swaps (150+ pairs, Binance Futures; top 20 by market cap)	Not given	Not given	40.5% annualised (70/30 long-short)	12.70%	2.41	Jan 2022 – Dec 2024 (OOS, 36 months; in-sample Jan–Dec 2021)	~142 trades/month portfolio-wide	54.2% (bull regime); overall not stated	AdaptiveTrend: 6-hour momentum + dynamic trailing stop + Sharpe-based asset selection; 4 bps taker fee + slippage + funding modelled; Sharpe retained >2.0 at 8 bps; strict OOS separation; bootstrap significance tests
Zarattini, Pagani & Barbon (2025). SSRN:5209907	Crypto (Bitcoin + top-20 liquid altcoin rotation)	Not given	Not given	+10.8% annualised alpha vs. BTC	Not given	>1.5	2015 onwards (exact end date not stated)	Not given	Not given	Ensemble Donchian channel trend models (multi-lookback) + volatility position sizing; rotational portfolio; net-of-fees returns; transaction cost impact assessed; limited metric disclosure – backtest end date not stated
Jay & Berlanga (2024). SSRN:4987237	Cryptocurrency (BTC / crypto market)	Not given	Not given	Not given (annualised return reported per model)	Reported (model-specific)	Reported per model (DQN best profit; LSTM best consistency; RF best drawdown control)	Not explicitly stated	Not given	Not given	Benchmarks DQN, LSTM, RF agents on crypto TA indicators; full suite: Total Return, Ann. Return, Volatility, Sharpe, Sortino, MDD, Calmar; no transaction costs stated; no single extractable top-line; exact period not published
Sattarov & Choi (2024). DOI: 10.1038/s41598-024-51408-w	Bitcoin (BTC-USD)	Not given	Not given	29.93% annualised	Not given	2.74	Training: Oct 2014 – Oct 2018; Test: last 30 days (~720 hrs) within Oct 2018 – Mar 2019	Not given	Not given	M-DQN (Multi-level Deep Q-Network) + Twitter sentiment; 3-module architecture; 1.5% round-trip transaction fee explicitly modelled; caution: test window only 30 days; Nature Scientific Reports (peer-reviewed)
Nguyen et al. (2025). DOI: 10.1080/23322039.2025.2594873	Bitcoin (BTC-USD)	$1,000,000	$125,903,751.60	Not given (total return: +12,490.38% over Jan 2022 – May 2025)	Not given	Not given	Train: Jan 2012 – Dec 2021; Test (OOS): Jan 2022 – May 2025	1,240	Not given	DQN meta-strategy selector (chooses among RSI, SMA Crossover, Bollinger Bands, Momentum-20d, VWAP Reversion); ⚠ No Sharpe or MDD stated; no transaction costs stated; 12,490% return is unvalidated by risk metrics – credibility caution; peer-reviewed journal
Ni, Zhang & Fu (2025). arXiv:2412.18202.	Cryptocurrency (BTC, ETH and major crypto assets)	Not given	Not given	Not given (outperforms buy-and-hold benchmark)	Not given	Not given	Not stated (dataset ends 2024)	Not given	Not given	Denoising autoencoder + CNN + GAN pipeline for crypto trading signal generation; outperforms buy-and-hold; Sharpe/return/drawdown numerics not stated in abstract – insufficient for main leaderboard row; listed for transparency
Al-Waked & Al-Zoubi (2025). DOI: 10.14569/IJACSA.2025.0161181	Gold (XAU/USD)	Not given	Not given (cumulative: 80.21%)	27.10% CAGR (PPO + Kalman filtering)	0.48%	12.10 (PPO + Kalman) ⚠	Jan 2017 – Jan 2025 (8 years, hourly data; N=47,304)	Not given	Not given	PPO + Kalman filter for noise-resilient DRL trading; Kalman reduces microstructure noise; raw PPO baseline: Sharpe 0.45, CAGR 3.46%; ⚠ Sharpe 12.10 is unusually high – likely driven by very low realised volatility in filtered series and no transaction costs stated; no train/test OOS split stated; IJACSA peer-reviewed
Fatouros et al. (2024). arXiv:2412.19245	US common stocks (news-based long-short portfolio)	Not given	Not given	Not given	Not given	3.05 (OPT long-short); 2.11 (BERT); 2.07 (FinBERT)	Aug 2021 – Jul 2023 (24 months OOS)	Not given	Not given	Sentiment trading with LLMs (OPT, BERT, FinBERT, Loughran-McDonald); daily long-short strategy on next-day stock returns using news sentiment; Loughran-McDonald baseline Sharpe 1.23; no transaction costs or MDD stated; Sharpe only metric extractable; arXiv preprint 2024