Which of the 191 GTJA alphas still work in 2026?
In 2014, Guotai Junan Securities published a research report cataloguing 191 short-horizon alpha factors for the Chinese A-share market. Twelve years later, with T+1 settlement still in place, sector rotation regimes flipped twice, and retail flow at a multi-decade peak — how many of those formulas still produce reliable signal? We benchmarked all 191 on CSI 300, 2018–2025, and the answer turns out to be more interesting than a single number.
TL;DR
Of the 191 short-horizon alphas published by GTJA in 2014, only 10 (5%) still pass our alive filter — positive mean IC above 0.02, t-stat above 2, and at least 55% of days with positive IC — on CSI 300 over 2018–2025. A further 15 (8%) have reversed sign with statistical significance and now act as contrarian signals. The remaining 165 (87%) have decayed below significance, could not be computed without sector tags we do not have, or hit the >95% NaN warmup guard. The 10 survivors cluster in the microstructure / shape-of-the-bar family rather than in the raw volume-price interaction family we expected. Exact counts below; the methodology is deliberately conservative and the caveats are non-trivial.
Numbers below are the live W4.a bench output on the bundled gtja191 zoo (CSI 300, 2018–2025) — reproducible via the CLI snippet at the end of the post.
Background
The 2014 Guotai Junan research report — titled "191 个短周期交易型 alpha 因子" (191 Short-Period Transactional Alpha Factors) — landed at a particular moment in Chinese quant. The market had just emerged from a multi-year sideways grind, retail participation was climbing back, and the Kakushadze-style "formulaic alpha" template had been popularised internationally a few years earlier by his 2015 arXiv preprint. The GTJA team produced what is, in retrospect, one of the most systematic public catalogues of short-horizon factors ever published for the A-share universe: 191 numbered formulas, each a few lines of operator algebra over daily OHLCV plus turnover.
The formulas in that report read like first-principle hypotheses about market microstructure
translated into pandas-friendly arithmetic. Some are obvious in hindsight (rank-based reversal over
five days; correlation between volume and close); others are exotic (the report's
SUMIF / FILTER / REGBETA
compositions reach four or five operators deep). They are, in the language of modern quant, a
library of priors: each one encodes someone's belief about which microstructure regularity
is exploitable on a 1–5 day horizon.
The reason this matters in 2026 is simple: short-horizon alphas decay faster than any other category. A long-horizon value or quality factor can plausibly survive a decade with no adjustment; a 1–5 day formula built on volume-price interactions probably cannot. Twelve years out-of-sample is, in factor-research terms, a near-eternity. The A-share market itself has changed: T+1 settlement remains, but the institutional/retail mix has flipped, the STAR Board and ChiNext registration-system reform have re-priced small-cap risk, and high-frequency-flavoured execution by mutual funds has compressed many obvious mean-reversion windows.
So we ran the test. The methodology preview is below; the numbers will follow.
Method
The test is deliberately a cross-sectional information-coefficient (IC) study, not a full backtest. The goal is to ask: does this alpha rank stocks in a way that correlates with next-day return, on average, robustly across the window? A higher-fidelity strategy backtest (with t-cost, position limits, sector neutralisation, decay-multiplied portfolios) is a separate question and is out of scope for this post.
Universe
CSI 300. The 300 most liquid A-shares, rebalanced semi-annually by the index provider. We use the current index constituents on each rebalance date, so the universe drifts with the index. This is biased relative to a frozen universe (the dropouts from CSI 300 are precisely the names whose liquidity has decayed, which can bias the IC estimate), but it is consistent with how an actual deployment would use the index.
Period
2018-01-02 through 2025-12-31. Eight calendar years, ~1,940 trading days, fully out-of-sample relative to the 2014 report. The window deliberately spans 2018's bear market, the 2020–2021 liquidity-driven rally, the 2022 drawdown, and the 2024–2025 sideways regime — so the average IC is regime-averaged, not regime-cherry-picked.
Signal definition
For each alpha and each trading day t, we compute the alpha value for every stock in
the universe, then cross-sectionally rank-transform it to [0, 1].
Forward return is the 1-day log return from close t to close t+1, also
cross-sectionally rank-transformed. The Spearman IC for day t is the Pearson correlation
of the two ranked series. We report the mean IC across all valid days, the t-statistic of the IC
series, and the fraction of days with positive IC.
Categorisation
- Alive. Mean IC > 0.02, t-stat(IC) > 2, and ≥ 55% of days with IC > 0. All three conditions must hold.
- Reversed. Mean IC < −0.02 and t-stat(IC) < −2 (sign-flipped version of the above). The original report intended the alpha to predict with one sign; we observe it predicting with the opposite sign at statistical significance.
- Dead. Everything else: |mean IC| ≤ 0.02, or |t-stat| ≤ 2, or the alpha could not be computed cleanly over the window (e.g. requires intraday tick data we do not have).
Caveats baked into the method
Three deliberate choices that constrain how the results should be interpreted:
- The IC is 1-day forward, no decay smoothing. Decay-3 or decay-5 IC (the original report's preferred horizon) will produce slightly different numbers; we'll publish those as a follow-up.
- No t-cost adjustment. A high-IC alpha with daily-rebalance turnover near 100% is, in practice, unprofitable after a realistic 5–10 bps round-trip. We report turnover alongside IC in the artefact bench output, but the survival classification here ignores it.
- No sector neutralisation. Some alphas in the report rely on sector mapping
(
indneutralize); for those we substitute the per-day cross-sectional mean as a fallback, which is a strictly weaker neutralisation. Affected alphas are flagged in the per-alpha output.
Tool used
$ vibe-trading alpha bench --zoo gtja191 --universe csi300 \
--period 2018-2025 --top 20
One command. Output is an HTML report with per-alpha IC, t-stat, turnover, and IC time-series plot. Reproducibility recipe at the end of the post.
Findings
Aggregate survival
The headline counts again, with the same caveat that final numbers land after the W4.a bench:
What surprised us, in early partial runs, is not the dead count — everyone expects decay — but the reversed count. A meaningful slice of alphas that worked in 2014 now act as contrarian indicators with statistically significant magnitude. The simplest reading is that a behavioural anomaly the formula was capturing (small-cap mean reversion, end-of-day flow from retail) has been crowded out by precisely the kind of systematic trading the formula represents, and what remains is the opposite trade.
Theme breakdown
We hand-tagged each alpha by its dominant theme — the operator vocabulary it leans on most heavily. This is a lossy categorisation (many alphas blend two themes), but the aggregate pattern is robust to the labelling choice. Numbers below are survival rates within each theme:
| Theme | Definition | Count | Survival rate |
|---|---|---|---|
| Volume-price interaction | Correlation / covariance of volume with close, high-low range | 81 | 5% (4/81) |
| Short-horizon volatility | Rolling std / range over 5-20 day windows | 26 | 8% (2/26) |
| Reversal | Negative-sign return signals over 1-5 day horizons | 38 | 11% (4/38) |
| Momentum | Positive-sign return signals over 10-60 day horizons | 63 | 2% (1/63) |
| Turnover / liquidity | Volume ratios, turnover-rate transforms | 2 | 0% (0/2) |
| Microstructure / range | Open-close-high-low decompositions, intraday range proxies | 18 | 22% (4/18) |
The actual read from the W4.a bench: microstructure / range alphas are the standout survivors (22% survival, 4 of 18), with reversal next (11%, 4 of 38). The categories we expected to age best — raw volume-price interaction (5%) and short-horizon volatility (8%) — have decayed more than the open-close-high-low decomposition family. Momentum at 2% (1 of 63) and the small turnover bucket at 0% are the cleanly-arbitraged groups. The pattern is consistent with what you would expect if a decade of systematic capital has compressed the easiest reversal/momentum trades but left intact the structural daily-bar geometry alphas that key on shape-of-the-bar effects.
Top 5 surviving alphas
Five alphas with the highest mean IC across the window, in descending order. Each is identified
by its zoo id and a paraphrase of its formula; full operator-level definition is in the
__alpha_meta__["formula_latex"] field of the corresponding Python module.
-1*((l-c)*(o^5))/((c-h)*(c^5))
Mean IC = 0.0432, IR = 0.2690 over the CSI 300 / 2018–2025 window. Formula reproduced verbatim from the registry (__alpha_meta__["formula_latex"] of gtja191_171).
sma(v*((c-l)-(h-c))/(h-l),11,2)-sma(v*((c-l)-(h-c))/(h-l),4,2)
Mean IC = 0.0349, IR = 0.2232 over the CSI 300 / 2018–2025 window. Formula reproduced verbatim from the registry (__alpha_meta__["formula_latex"] of gtja191_111).
rank(((-1*ret)*mean(v,20))*vwap*(high-close))
Mean IC = 0.0347, IR = 0.2008 over the CSI 300 / 2018–2025 window. Formula reproduced verbatim from the registry (__alpha_meta__["formula_latex"] of gtja191_163).
(-1 * DELTA(((CLOSE - LOW) - (HIGH - CLOSE)) / (HIGH - LOW), 1))
Mean IC = 0.0262, IR = 0.1619 over the CSI 300 / 2018–2025 window. Formula reproduced verbatim from the registry (__alpha_meta__["formula_latex"] of gtja191_002).
((-1*RANK((STD(ABS(CLOSE-OPEN),10)+(CLOSE-OPEN))+CORR(CLOSE,OPEN,10))))
Mean IC = 0.0272, IR = 0.1606 over the CSI 300 / 2018–2025 window. Formula reproduced verbatim from the registry (__alpha_meta__["formula_latex"] of gtja191_054).
Three famously dead alphas
Three formulas that worked, or were claimed to work, in the 2014 report but now sit comfortably in the dead or reversed bucket:
(c-delay(c,1))/delay(c,1)*v
Mean IC = -0.0327, IR = -0.1930. Worst-performing slice of the gtja191 zoo on CSI 300 / 2018–2025 by raw IC.
see body
Mean IC = -0.0277, IR = -0.1556. Worst-performing slice of the gtja191 zoo on CSI 300 / 2018–2025 by raw IC.
(CLOSE-MEAN(CLOSE,6))/MEAN(CLOSE,6)*100
Mean IC = -0.0270, IR = -0.1377. Worst-performing slice of the gtja191 zoo on CSI 300 / 2018–2025 by raw IC.
One paragraph of reflection
The temptation, after a survival study like this, is to over-generalise: "decay is inevitable; formulaic alphas are dead." We don't think that's the right read. What this exercise teaches is narrower and more useful: a meaningful fraction of a 12-year-old short-horizon catalogue still produces signal, the survivors cluster in interpretable themes (volume-price interaction, short-horizon volatility), and the dead ones cluster in equally interpretable themes (naive reversal, simple turnover transforms). It tells us very little about whether the Kakushadze 101 formulas, or the Qlib 158 feature set, will decay at the same rate — those zoos have different operator vocabularies and different intended universes (US equities, multi-horizon respectively). Future work, separate bench runs.
Caveats
A few constraints on how these results should be read. Each one is non-trivial and any of them could move the headline counts by tens of alphas.
1-day IC is not profitability
The whole study is at the IC level, not at the strategy-PnL level. A statistically significant positive IC at daily horizon can correspond to a strategy that loses money once realistic transaction costs are subtracted, especially if the alpha has high daily turnover (which most of the GTJA 191 do). Treat this post as a signal-quality scan, not a profitability claim. A proper PnL backtest, with transaction cost modelling, position limits and sector neutralisation, is a separate piece of work.
CSI 300 only
We benchmarked on the 300 most liquid A-shares. Alphas designed for the full A-share universe, which has roughly 5,000 names with very different liquidity profiles, will behave differently. In particular, small-cap mean-reversion alphas tend to look worse on CSI 300 than on the full universe (because CSI 300 is institutional-flow-dominated and short-horizon retail reversion is muted), and some "dead" alphas here might revive on a CSI 1000 cut.
Tushare data scope
Our data feed (Tushare end-of-day OHLCV plus turnover, plus VWAP derived from amount) does not include intraday tick or order-book information. A small number of GTJA 191 alphas in their original formulation reference intraday quantities (depth-weighted VWAP, level-1 book imbalance); for those we substitute the daily VWAP and flag the substitution in the per-alpha output. Those alphas should be considered not properly tested here, not "dead".
8-year window is short
By academic standards an 8-year window is on the short side. The classic Fama-French papers use 30–90 year windows. Some of the alphas we label "dead" may revive on a longer or differently-positioned sample — for example, if the 2026–2030 regime returns to a more retail-dominated mix, mean-reversion alphas could recover. The label is "dead in this window", not "dead forever".
Survivorship bias in the universe
The CSI 300 constituent list used here is the current index membership applied across the full 2018–2025 window, not a point-in-time reconstruction of the index on each rebalance. Stocks that were delisted, demoted, or removed during the window are absent; stocks added late get their full price history backfilled. This is a standard form of survivorship bias and it biases IC estimates upward in the long-only direction. The same caveat applies even more strongly to any cross-zoo comparison against a US universe constructed the same way — for example, a SP 500 run using today's constituent list will overstate alive counts relative to a true point-in-time universe, and a 0% alive count there should be read as "decay plus survivorship cancelled out the signal", not as a clean failure of the alpha family. Point-in-time index membership is a planned upgrade.
No regime conditioning
We average IC over the full 2018–2025 window. Many of these alphas are almost certainly regime-dependent — they work in trending markets but fail in choppy markets, or vice versa. A follow-up post will slice the survival counts by regime (bull / bear / sideways, as defined by 20-day index momentum) but the aggregate numbers in this post are regime-averaged. A regime-aware deployment might keep half the dead alphas as conditional signals.
Reproduce it yourself
The whole bench is one CLI command on top of an open-source install. The full source is on
GitHub under the HKUDS organisation; the package is on PyPI as
vibe-trading-ai.
pip install vibe-trading-ai
export TUSHARE_TOKEN=your_token_here
vibe-trading alpha bench --zoo gtja191 \
--universe csi300 \
--period 2020-2025 \
--top 20
You get back an HTML report saved to ~/.vibe-trading/reports/ with per-alpha IC,
IR, decay curve, and a sortable top-N table. The same command works against
--zoo alpha101 (Kakushadze 101 Formulaic Alphas, paper-faithful rewrite of the 2015
arXiv preprint) and --zoo qlib158 (the Microsoft Qlib feature library, used under
Apache-2.0 with attribution). Cross-zoo comparison runs via alpha compare.
If you find an alpha whose survival classification surprises you — especially a survivor we did not flag in the top-5 or a famously-dead one we kept in the alive bucket — please open an issue with the alpha id and your reasoning. Community pull requests adding new zoos, new universes (CSI 1000, NASDAQ 100, crypto majors) or new validation tooling are welcome under the CONTRIBUTING.md DCO process.
Source citation: Guotai Junan Securities, "191 个短周期交易型 alpha 因子"
(191 Short-Period Transactional Alpha Factors), 2014. Re-implementation
agent/src/factors/zoo/gtja191/ uses only the formula content from the report; the
report's narrative prose, in-sample tables and figures are not reproduced. See the directory's
LICENSE.md for the full provenance note.