TradokiTradoki/blog
Subscribe
← back to indexblog / ai / llms-as-research-assistants-not-traders
AI

LLMs As Research Assistants, Not Traders

The right way to use a frontier model in trading work is as a fast, well-read junior analyst — not as a principal making calls. The framing changes which prompts you write, which outputs you trust, and where the model's value actually lives.

A
ArthurFounder, Tradoki
publishedFeb 04, 2026
read9 min
LLMs As Research Assistants, Not Traders

Every working knowledge profession in 2026 has had to figure out the same question. What is the right way to use a frontier language model when the work involves judgment, money, and consequences? The accountants worked it out. The lawyers

Every working knowledge profession in 2026 has had to figure out the same question. What is the right way to use a frontier language model when the work involves judgment, money, and consequences? The accountants worked it out. The lawyers worked it out, mostly. The doctors are still arguing. Trading is somewhere in the middle of that distribution, and the answer the field is converging on is the one that the more thoughtful firms in adjacent industries reached first: treat the model as a fast, well-read, occasionally wrong junior — not as a principal. The mental shift from "AI as oracle" to "AI as assistant" is the difference between using LLMs productively and being used by them.

The frame that fixes everything

The single most useful sentence I have read about working with LLMs in any professional context is from a hedge-fund engineer who put it like this: imagine the model is a smart graduate hire who has read everything ever written, has perfect recall, has zero domain experience, and will hand you a confident document in five seconds when asked. What do you give a person like that to do?

The answer is: bounded research tasks where the cost of being wrong is small, where verification is cheap, and where you remain the person making decisions. The same person you would not let trade your book without supervision is the model. Once you internalise this framing, almost every prompt design question answers itself.

The opposite framing — model as oracle, model as principal, model as autonomous decision-maker — is what produces both the over-promised AI signal services and the rage-quit traders who tried to "ask Claude what to buy" and concluded the whole thing was theatre. Both are downstream of the wrong frame.

ManyTasks suitable for LLM delegation (research/process)
ZeroTasks suitable for LLM decision-making (live trading)
10:1Effort spent on workflow design vs model selection (rough ratio)

What a junior analyst actually does

To use the framing properly it helps to be specific about what a junior analyst actually does on a real desk. Six things, roughly.

They summarise. A senior analyst does not read the full earnings transcript on first pass. The junior reads it and produces a one-page summary with the three things that moved consensus.

They structure unstructured input. Hand them a stack of notes from a sector tour and they produce a tagged spreadsheet you can query.

They draft. First draft of the memo, first draft of the backtest code, first draft of the position-sizing model. The senior edits.

They surface. They flag patterns, anomalies, things worth a closer look. They do not decide whether the things matter.

They check facts. Cross-referencing, reconciling sources, finding the original of a quote. Tedious, useful, low-judgment work.

They learn the team's playbook. Within months the junior knows how the team thinks well enough to draft in the team's voice and pre-empt the team's questions.

A frontier model can do all six of these now, with caveats. Notice what is missing. The junior does not pick what to invest in. The junior does not size. The junior does not make the call. That is preserved for the principals. For exactly the same reasons, that is what should be preserved for you when working with a model.

The seven workflows that genuinely work

Concretely. The workflows below are ones I run myself or have seen working traders run productively. None of them are about prediction.

Workflow one: earnings and central-bank summarisation. Paste the transcript or release. Ask for a structured summary against a fixed template — guidance change, capex commentary, regional softness, reaction to the prior quarter's commentary. Verify against the source for anything that informs a position. Cost: minutes. Value: hours of grunt reading saved.

Workflow two: code drafting for backtests. Describe the strategy in plain English. Ask for the backtest code in your language of choice. Run it. Check the output against a hand-calculated expected value for a known case. Iterate. The model's value here is not insight; it is typing speed.

Workflow three: journal structuring. Paste an unstructured trade note. Ask the model to extract setup name, instrument, R, rationale, and emotional state into a fixed schema. Append to the database. The model is excellent at this, you save twenty minutes a day, and your dataset becomes queryable in ways it never would have been if you relied on yourself to fill the fields.

Workflow four: bias auditing. Paste your last fifty journal entries. Ask the model to identify recurring patterns in your behaviour — instruments you over-trade after losses, setups you skip on Mondays, language you use when tilted. The model has no skin in your ego. The output is uncomfortable and useful.

Workflow five: edge-case generation for strategies. Describe your strategy. Ask the model what conditions would break it. The list will contain seven obvious things and three you would not have thought of. The three are why the workflow is worth running.

Workflow six: documentation drafting. Strategy playbooks, risk-framework documents, post-mortem templates. Boring writing the model is excellent at producing.

Workflow seven: education and mental-model construction. Stuck on a concept? The model will explain it five different ways patiently. This is, for me, the most underrated use of LLMs in trading. Learning is bandwidth-limited by patience-of-explainer, and the model has infinite patience.

Writing Pine Script with Claude and GPT and What AI can and cannot do in trading in 2026 cover the code-drafting and capability-mapping pieces of this in more depth.

Where to draw the bright line

There is one rule I hold without exceptions. The model does not make decisions about my money. It can summarise the inputs to a decision. It can structure the artefacts around a decision. It can draft the document in which the decision is communicated. It does not make the decision.

This sounds obvious until you watch how easily the line erodes in practice. You ask the model for a strategy hypothesis. It returns one. You ask it which is most promising. It tells you. You ask it whether it would size up. It answers. You are now letting it size your book by indirection. The slope is gentle and the path is well-greased. The discipline is to notice when the question you are about to ask the model is a decision question, and to not ask it of the model.

A useful reflex: before asking the model anything, classify the request as "research" or "decision." If it is research, proceed. If it is decision, do the work yourself. Some questions are ambiguous, and the discipline is to default ambiguous to the decision side.

"Every time I have lost money to AI in trading, the loss traces back to a moment where I asked the model a question I should have answered myself. The model answered confidently. I trusted the confidence. Neither of us knew anything about the actual market in that moment."

Tradoki internal AI workflow review, 2026

Verification, the part everyone hates

Verification is the unglamorous part of the workflow that separates a research assistant from a corruption vector.

The rule: every claim that informs a decision is traceable to a source you can open. The model's summary is a starting point. The original transcript, paper, or filing is what you cite in the artefact. If you cannot find the source, the claim does not appear in the artefact. This is annoying. It is the price of using a confident-sounding system that occasionally fabricates.

In practice this means a workflow with three layers. The model produces a draft with citations. You spot-check citations on the highest-stakes claims. You promote the verified version into your research artefact. Time cost: maybe twenty percent of total workflow time. Reliability cost of skipping it: eventually catastrophic.

There is a temptation to skip verification on small tasks. Resist it on tasks that feed downstream decisions. A wrong fact in a transcript summary that informs a position is worse than a wrong fact in a meme. Build the verification reflex when stakes are low so it is automatic when stakes are high.

What changes if model capability jumps

A reasonable question is whether this framing survives if frontier models continue to improve at the rate they have been. My best guess: yes, mostly, with the exceptions narrowing.

The summarisation workflows will get better and faster. The code drafting will get to the point where the model produces working backtest infrastructure on first attempt for most well-defined strategies. The bias-auditing workflow will get more sophisticated as context windows allow models to ingest years of journal data.

What will not change is the bright line on decisions. Not because the model will not be capable of producing decision-quality outputs at some point, but because the structural reasons for the line — your money, your risk, your responsibility, no path to liability for the model — do not change with capability. Even if the model becomes a brilliant trader, the relationship is wrong if you are delegating your book to it without supervision.

The future of retail trading 2026 to 2030 gets into the longer arc. The short version: more model capability changes which research workflows are possible, not whether the analyst-not-principal framing remains correct.

A short list of practical habits

Five small things I do that have made the workflow more useful over time.

Habit one. Keep a personal prompt library. The five-line preamble you have refined over months that gets the model into the right mode is worth more than swapping models.

Habit two. Reuse the model on a few tasks deeply rather than spreading thinly across many. Compounding familiarity with how the model fails on your specific workflows beats novelty.

Habit three. Write your own conclusions before reading the model's. Confirmation bias from the model's draft is real and pernicious. If you write first, you can compare honestly.

Habit four. Periodically delete the prompt library and rebuild from scratch. Old prompts encode obsolete model behaviours.

Habit five. Keep a "where the model was wrong" file. Not for shame — for calibration. Looking back at the file once a month tunes your trust dials in ways that nothing else does.

The trading journal and post-mortem template is the artefact where the model is most useful in process work. Why we built Tradoki, not a signals room explains why we are aggressive about not crossing the analyst-to-principal line in the products we build.

● FAQ

What does it mean to use an LLM as a research assistant?
It means treating the model as a junior on your team. You give it well-defined tasks, you check its outputs, you keep authority over decisions, and you reuse the same model for the parts of work where it is reliably good. You do not delegate judgment.
Why not let the LLM make trading calls?
Because it produces confident wrong answers at a non-zero rate, has no real-time market context, no risk model that resembles yours, and no consequences for being wrong. None of those are fixable in the next twenty-four months. Delegating decisions to it is malpractice on yourself.
What kinds of research tasks work well?
Earnings transcript summarisation, central bank statement comparison, code drafting for backtests, structured note-taking, hypothesis generation, edge-case enumeration, and bias auditing of your own journal entries. All of these are bounded tasks where verification is cheap and hallucination cost is low.
How do I avoid the model's hallucinations corrupting my research?
Two rules. One: every fact the model cites in a research artefact must be traceable to a source you can open. Two: you do the synthesis. The model summarises, surfaces, structures. You decide what it means. The boundary is hard.
Does the model choice matter?
Less than people think for most research tasks. Frontier models are within a small margin of each other on the workflows that matter. Workflow design — prompts, file context, verification loops — matters far more than which model you pick.
— filed underAILLMsResearch
— share
— keep reading

Three more from the log.