Praxis: An AI trading evaluation workspace
OVERVIEW
Overview
Praxis is a personal project built on top of Freqtrade, an open-source quant-trading framework used for strategy backtesting and paper trading. It is designed as a front-end interface layer that consolidates and accelerates the strategy-evaluation process.
In real development, AI can generate many strategy variants at high speed, sharply lowering the cost of generation. The downstream understanding and quality judgment, however, still have to be done by hand, one by one: reading complex backtest data, comparing how different strategies behave in the market, and assessing whether a strategy has the potential to move to the next stage.
As the number of strategies grows exponentially, the cognitive load and time cost of evaluation climb right alongside it.
Praxis is positioned to consolidate these scattered data outputs into one unified workspace, so strategies can be understood, compared, and validated in the shortest possible time, markedly improving decision efficiency before a strategy goes into a dry run (paper trading).
- Role
- Product Designer
- Time
- 2 weeks
- Team
- Solo project
- Impact
- Evaluation cycle: ~3d to ~1.5d
Exploration density: ~2×
A single source of truth
DISCOVER
AI sent strategy generation soaring, but the evaluation that follows became the bottleneck.
This project comes from a pain point I felt firsthand while developing quant trading strategies day to day.
With AI in the loop, writing code and generating strategies became extremely efficient, spinning off large numbers of different variants in very little time.
Yet the review workflow after generation did not change at all. Every strategy variant still takes time to read, understand, and evaluate for stability, one at a time.
As the pile of strategies waiting to be reviewed grows, the pressure of reading and filtering them hits a serious bottleneck.
Core issue
The bottleneck in the process sits in reading and validating strategies, not in the earlier code generation.
The existing workflow leans heavily on several fragmented, disconnected sources:
- backtest logs in the terminal
- standalone JSON strategy-parameter config files
- messy strategy-output source code across different versions
In the past, all of this scattered information had to be assembled by hand just to make a single complete strategy decision.
DEFINE
Defining the core problem of the evaluation layer
Goal
Build a system dedicated to supporting strategy understanding and validation, so the huge volume of AI-generated strategies can be compared and filtered within a very short cycle.
Only by reading and stress-testing strategies for robustness in a very short time can the generative potential of AI truly be unlocked, keeping the whole strategy-development cycle agile.
Constraints and boundaries
- The system has to run seamlessly inside the local Freqtrade environment.
- Because LLMs cannot reliably hold long context, the mapping between strategies and their backtest results has to be persisted at the interface layer.
- Information transparency and traceability are the foundation of the design.
- With only a two-week timeline, the feature scope had to be tightly narrowed to an MVP (minimum viable product).


DEVELOP
When filtering strategies, which pieces of information actually determine the quality of the decision?
Color system and accessibility
The interface design starts from a rigorous design-token color system: a global canvas color, trading-semantic colors, and a version-specific color for each strategy. Every level is strictly contrast-checked, so the experience stays fully consistent when switching between dark and light mode.

1. A centralized evaluation workspace
All the scattered strategy outputs are pulled into one workspace. The system automatically parses and consolidates the fragmented data, so the user can read and compare across versions seamlessly within a single window.

2. Three core views focused on evaluation
Following the review funnel of quant strategies, the system is split into three core interfaces:
- Comparison View: for comparing how different strategy versions behave, side by side.
- Strategy Dashboard: for an overall view of every strategy currently running.
- Robustness Analysis: for digging deep into a single strategy's resilience to risk and overfitting.
Trade-off: a fixed three-view structure was chosen on purpose, rather than a highly customizable, chaotic layout, to lower the barrier to understanding and getting started as much as possible.
3. A professional interface that fits the quant-trading context
The interface style takes its cues from professional trading terminals like Bloomberg and TradingView:
- a high information-density layout
- a strict, rigorous visual hierarchy
- monospace typography throughout for numbers, so the data stays aligned and readable
Trade-off: drawing on a visual language users already know from the industry removes the learning cost for newcomers and, almost subconsciously, builds trust in the tool's data.
4. Deliberately excluding the FreqAI module
FreqAI is the module in the Freqtrade framework used to run machine-learning predictions.
But machine-learning output is black-box by nature and very hard to trace or explain clearly. That runs directly against this evaluation system's core emphasis on traceability and reasonable interpretability, so it was firmly cut from the MVP.

5. Narrowing the scope of custom strategies
Some highly customized strategies embed extremely complex nested logic and parameters, well beyond the scope of a two-week MVP. So the system supports standardized Freqtrade strategy structures first, focusing on validating how smoothly the core evaluation flow runs.
Trade-off: within a limited timeline, deliver a usable, end-to-end working system first, rather than chasing full coverage of every edge case.

DELIVER
Freeing effort from data wrangling and turning it toward real strategy decisions.
Praxis ships as a local toolkit that slots cleanly into the existing Freqtrade development flow, automating the work of organizing strategy outputs and backtest results.
That lets a developer's focus shift from the low-value work of manually assembling data to the genuinely high-value work of deeply optimizing and judging strategies.

Quantified results
50%
Faster decisions
The combined evaluation and QA time for a single strategy dropped sharply from about 3 days to 1.5.
2×
Strategy exploration density
Within the same time and opportunity cost, the number of strategies that can be validated and filtered in parallel roughly doubled.
1
Single source of truth
A single, consistent strategy-review dashboard, ending the old mess of cross-referencing across files and logs.

REFLECTION
Reflection
Once AI compresses the cost of code generation toward zero, the real human work and core advantage shift quickly to understanding, quality-checking, judgment, and decision-making.
For this evaluation layer in an AI-assisted trading workflow, two directions are worth digging into next:
Higher-resolution time-series data handling
Improving how the underlying time-series data is parsed, to support finer-grained, tick-by-tick analysis of micro-level behavior.
Low-code tooling for building strategies
Turning the strong factors found in evaluation into tooling, with a more intuitive visual node-based editor for assembling strategies and tuning hyperparameters.
The product opportunity ahead is making the back-end work of robustness filtering as intuitive, smooth, and near-zero-cost as the front-end AI generation already is.