The PM's Guide to LLMs: What You Actually Need to Know
You don't need to understand backpropagation to use AI tools effectively. But PMs who understand a few key things about how large language models (LLMs) work make better decisions about when to trust the output, when to be sceptical, and how to design around the failure modes.
This guide explains LLM fundamentals at the level of abstraction that's actually useful for a product manager — no maths, no architecture diagrams, just the mental models that matter for your work.
What an LLM Actually Is
At the most practical level, an LLM is a very sophisticated autocomplete. It predicts the most likely next token (roughly a word or word fragment) given everything that came before it. Do this millions of times in sequence and you get a paragraph, a PRD, or a code snippet.
That framing is reductive but useful. It explains a lot of LLM behaviour that otherwise seems mysterious:
- Why LLMs generate confidently wrong answers: The model is optimising for the most likely next token, not for truth. "The sky is green because..." produces a confident continuation because the training data contains many confident statements — regardless of whether the content is accurate.
- Why context matters so much: The more relevant context the model has, the better its prediction of what comes next. A prompt with specific customer evidence gives the model better signals than a vague prompt with no evidence.
- Why "hallucination" is a feature not a bug: The model was trained to be fluent and coherent. Saying "I don't know" is a less likely continuation than generating a plausible-sounding answer. Hallucination is coherent completion of a gap the model can't fill from its training data.
The Context Window
One of the most important practical constraints to understand is the context window — the amount of text a model can "see" at once when generating a response.
Modern models have large context windows (100,000+ tokens for many models), but context doesn't decay uniformly. Research shows that models attend better to content at the beginning and end of the context than to content buried in the middle. If you're pasting 50 pages of research notes, don't bury your key question at the end.
For PMs using AI to synthesise research:
- Put your instructions and key question first
- Paste the research
- Repeat the key question at the end
This "sandwich" pattern improves output quality for long-context tasks.
Why Models Don't Know "Recent" Things
LLMs are trained on data up to a cutoff date. After that, they have no knowledge of what happened — no new product launches, no updated pricing, no recent research findings. When you ask an LLM about a competitor's current features, you're getting information from whenever its training data ended.
For PM work this matters in a few places:
- Competitive analysis: verify claims about competitor products from primary sources
- Industry benchmarks: training data numbers may be from 2-3 years ago
- Product-specific knowledge: the model knows nothing about your product unless you tell it
The solution is either to use a model with web access (Perplexity, ChatGPT with browsing, Claude with tools) or to explicitly provide current information in your prompt.
Temperature and Determinism
Most AI tools let you influence how "creative" vs "deterministic" the model's output is. This is controlled by a parameter called temperature.
- Low temperature (more deterministic): The model more reliably picks the highest-probability token. Output is consistent and predictable but potentially repetitive.
- High temperature (more creative): The model samples more broadly from possible continuations. Output is more varied but less predictable.
For product documentation, you generally want lower temperature — you want the model to be precise and consistent, not creative. Most PM-focused AI tools set this appropriately; if you're using a raw API, keep temperature at 0.1–0.3 for requirements and spec work.
Few-Shot Learning
One of the most powerful prompting techniques you should know about is few-shot learning — giving the model examples of what you want before asking it to generate new instances.
Instead of: "Write user stories for our export feature."
Try: "Here are two user stories that match our format. Write three more in the same style for our export feature.
Example 1: [story]
Example 2: [story]"
The model learns the format, vocabulary, and level of specificity from your examples without needing a lengthy format specification. This is particularly powerful for teams that have an established house style for product documents.
Retrieval-Augmented Generation (RAG)
Many enterprise AI tools now use a pattern called RAG — the model retrieves relevant chunks from a knowledge base and uses them as context when generating a response. This is what tools like PMRead use when they surface relevant customer quotes during PRD generation.
Understanding RAG helps you understand why some AI product tools are more accurate than general-purpose models:
- The model isn't guessing what your customers said — it's been given the actual quotes and asked to synthesise from them
- The output is grounded in your specific evidence base, not generic training data
- When the tool cites a specific customer quote, that citation is real — the model retrieved that text and used it
If you're evaluating AI tools for product work, "does this tool use RAG against your actual customer data?" is a meaningful differentiating question. The answer correlates with how accurate and specific the output will be.
Failure Modes You'll Encounter
Confident fabrication
The model generates a specific statistic, quote, or technical detail that sounds real but doesn't exist. Common in:
- Competitive analysis (invented product features)
- Market sizing (invented market research figures)
- Technical requirements (invented API constraints)
Detection: If a specific claim is suspiciously convenient or precise, verify it from a primary source.
Sycophancy
Models are trained with human feedback and tend to agree with what the user seems to believe. If you write "I think the main user problem is X, right?" the model is more likely to agree than to challenge you.
Detection: Ask the model to steelman the opposing view or to identify weaknesses in your argument before asking it to validate it.
Middle-of-distribution thinking
Models are trained on the full distribution of content on the internet, which means their outputs tend toward what's common and expected. For novel problems, product categories, or user segments that are underrepresented in training data, the model defaults to familiar patterns.
Detection: When working on genuinely novel product challenges, treat AI output as a baseline to depart from, not a best-practice recommendation to follow.
Context drift in long conversations
In long chat sessions, models can lose track of constraints and context specified early in the conversation. A constraint you established in message 3 may not be honoured in message 20.
Detection: Re-specify key constraints when generating important output later in a long session.
What This Means for Building AI-Adjacent Products
If you're a PM working on products that incorporate AI features, this foundational understanding has direct product implications:
Design for failure gracefully. LLMs will sometimes produce wrong, unhelpful, or off-topic output. Your product needs a clear UX for what happens when this occurs — not an assumption that the model is always right.
Build traceability in. When AI makes a claim or generates content that users will act on, they need a way to verify it. Citation, source links, and confidence indicators are UX patterns, not just nice-to-haves.
Manage expectation calibration. Users who trust AI too much get burned by hallucinations; users who trust it too little won't use it. Product UX should help users develop accurate intuitions about when to trust the output.
Think about latency. Inference takes time, especially for long context. Product flows that depend on AI generation need to account for 2–10 second generation times, which are long in UX terms and require thoughtful loading states.
You don't need to be an ML engineer to make good product decisions about AI. But understanding what an LLM is, what it can't do, and how to prompt it well gives you a significant edge — both as an AI user and as a PM building AI-adjacent products.
Want PRDs grounded in your actual customer data?
PMRead ingests your interviews, Slack threads, and feedback files — and generates PRDs backed by real evidence, not guesses.
Try PMRead free →