seren_ — research statement
Where research becomes a decision.
A formal model for evidence-based research workspaces.
Seren Research · 2025
Abstract
Research reasoning is mostly invisible. Hypotheses evolve without record. Evidence accumulates without structure. Decisions get made in meetings and vanish. When a supervisor, collaborator, or regulator asks why a researcher chose a direction six months ago, the answer has to be reconstructed from memory — imperfect, incomplete, and unprovable.
We propose a formal workspace model that makes the structure of research reasoning explicit, checkable, and durable. The model comprises three components: a hypothesis tree that maps the structure of thinking and colours it by evidence strength; an evidence foundation that turns every paper into a checkable proposition keyed to a specific hypothesis; and a decision log that preserves every significant choice with a frozen snapshot of the evidence state at that moment.
We define this model mathematically — as sets, relations, derived functions, and invariants — and describe its implementation in Seren, a research workspace built specifically for biomedical and life science research. The model supports continuous literature monitoring, shareable decision snapshots, pivot-aware evidence re-evaluation, and a Research Terminal that answers questions about any hypothesis using current literature searched fresh at the moment of the question.
The result is a single source of truth for what a researcher believes, what the literature says about it, and why they decided as they did — at any point in time.
1. Introduction
Research thinking is structured thinking. Hypotheses have sub-questions. Sub-questions rest on assumptions. Assumptions require mechanisms. Mechanisms need experimental validation. At every level, claims are either supported, contested, or unresolved — and the balance of that support should determine what a researcher decides to do next.
In practice, this structure is rarely made explicit. Most researchers hold it in their heads, partially in Notion, partially in Zotero, partially in email threads and lab meeting notes. When the structure is invisible, three things go wrong predictably.
First, decisions lose their reasoning. A researcher makes a consequential choice — which vector to pursue, which compound series to fund, which approach to commit to — and six months later cannot precisely reconstruct why. The reasoning was real at the time. It is gone now.
Second, evidence is not organised against the question. A library of 200 papers is not an evidence foundation. An evidence foundation maps each paper's specific claim against a specific hypothesis and classifies it: does this paper support the hypothesis, contradict it, qualify it, or say nothing about it? Without that mapping, the researcher cannot know whether their hypothesis is well-supported or merely well-surrounded by papers about the same general topic.
Third, gaps are invisible until someone else finds them. A paper published three months ago contradicts the central claim. A trial just reported results that challenge the approach. A retraction notice was issued for a key citation. The researcher did not know. The reviewer did.
Seren addresses all three by making the structure of reasoning explicit in a formal workspace model. This document defines that model precisely, so that the design is unambiguous and extensible.
2. The Workspace Model
2.1 Primitive Sets
Let:
- U — the set of users.
- W — the set of workspaces (projects). Each workspace w ∈ W has a unique owner u ∈ U and a confirmed research question with a structured frame: comparison, outcome, model system, and scope.
- H — the set of hypothesis nodes. Each node h ∈ H has a label (a proposition or question), belongs to exactly one workspace, and has an optional parent node in the same workspace. Nodes may be of several types: root hypothesis, child hypothesis, assumption, mechanism, experimental plan, research proposal, literature map, or decision node. Each type has a distinct evidence and interaction model.
- D — the set of documents (papers, preprints, PDFs). Each d ∈ D has metadata (title, authors, journal, year, identifiers), belongs to a workspace, and carries an ingested abstract and, where available, full text retrieved via open access.
- C — the set of claims extracted from documents. Each claim c ∈ C is associated with a document and represents the specific proposition that document makes relevant to a hypothesis node.
- L — the set of evidence links. A link ℓ ∈ L associates a document (and optionally a specific claim) to a hypothesis node with a support status. The same document can be linked to multiple nodes with different support statuses, because a paper's relevance depends on which hypothesis it is being evaluated against.
- A — the set of assumptions. Each assumption a ∈ A is associated with a hypothesis node, has a text statement, a strength classification (high, medium, low), and an optional link to a supporting document.
- M — the set of mechanisms. Each mechanism m ∈ M is associated with a hypothesis node and has a text statement with an optional supporting document link.
- Δ — the set of decisions. Each decision δ ∈ Δ is associated with a hypothesis node and records the decision text, reasoning, confidence level, key uncertainty accepted, and an evidence snapshot taken at the moment of logging.
- Π — the set of pivot events. Each pivot π ∈ Π records a significant change to a hypothesis label: old label, new label, reason, and timestamp. Pivots are a distinct entry type in the decision log, preserving the intellectual history of direction changes.
2.2 Relations and Functions
The parent of a hypothesis node. ⊥ denotes a root node with no parent. The parent relation is restricted so that (H, parent) forms a rooted forest per workspace — one or more trees, no cycles.
Each hypothesis node belongs to exactly one workspace.
The support status of an evidence link: how the linked document bears on the hypothesis it is attached to. This classification is hypothesis-specific — the same document may support one hypothesis and qualify another.
Each link ℓ connects one hypothesis node to one document. The relation is many-to-many: a document can be linked to multiple nodes; a node has many links.
The evidence-strength function maps each hypothesis node to one of four labels derived from its link counts (see Section 2.4).
2.3 Tree Invariant
For each workspace w ∈ W, the subgraph of hypothesis nodes in w under the parent relation forms a rooted forest: every node has at most one parent; roots have parent = ⊥; and there are no cycles.
Formally: for all h ∈ H with workspace(h) = w, the relation parent defines a directed acyclic graph whose weakly connected components are trees rooted at nodes where parent(h) = ⊥.
2.4 Evidence Strength
For each hypothesis node h, define link counts by support type:
n_supports(h) = |{ℓ ∈ L : node(ℓ) = h ∧ σ(ℓ) = supports}|
n_contradicts(h) = |{ℓ ∈ L : node(ℓ) = h ∧ σ(ℓ) = contradicts}|
n_qualifies(h) = |{ℓ ∈ L : node(ℓ) = h ∧ σ(ℓ) = qualifies}|
The evidence strength of h is derived from these counts by a rule set:
- Green (established): n_supports ≥ 3 and n_contradicts = 0
- Amber (contested): n_contradicts ≥ 1, or n_supports ∈ {1, 2} with no contradictions
- Red (contradicted): n_contradicts > n_supports and n_contradicts ≥ 2
- Grey (insufficient): n_supports = 0 and n_contradicts = 0
This function colours every node in the hypothesis tree and drives the uncertainty partition.
2.5 The Uncertainty Partition
For a given workspace, the uncertainty map partitions the set of active (non-archived) hypothesis nodes into three subsets:
- Known (K_w) — nodes whose evidence strength is green. The literature supports the hypothesis with multiple concordant papers and no significant contradiction.
- Contested (T_w) — nodes whose evidence strength is amber or red. The evidence is mixed, the hypothesis is contradicted, or the picture is unclear. These are the nodes that require a decision or additional evidence before the researcher can move forward with confidence.
- Unknown (U_w) — nodes with grey evidence strength. No evidence has been found or added. These may represent genuinely novel territory or hypotheses not yet investigated.
Formally: H_w = K_w ∪ T_w ∪ U_w where H_w = {h ∈ H : workspace(h) = w ∧ archived(h) = false}, and the three subsets are mutually exclusive and exhaustive.
The uncertainty map is recomputed whenever evidence is added or removed, a hypothesis is pivoted or archived, or a decision is logged.
2.6 Decisions and Snapshots
A decision δ ∈ Δ is a tuple:
(nodeId, decisionText, reasoning, confidenceLevel, keyUncertaintyAccepted, evidenceSnapshot, timestamp)
The evidence snapshot is a frozen view of the evidence state at decision time: which links existed, their support statuses, the full list of supporting and contradicting papers, and the core claims extracted from each. The snapshot does not change when new evidence is added later. It preserves exactly what the researcher knew when they decided.
This satisfies the core invariant of the decision log: at time t, researcher u decided X on hypothesis h, accepting uncertainty Y, and the evidence state was Z. That record is permanent, timestamped, and exportable.
A pivot event π ∈ Π is a tuple:
(nodeId, oldLabel, newLabel, reason, timestamp)
Pivots preserve history: when a hypothesis label changes significantly, existing evidence links remain attached and are flagged as pre-pivot. The evidence strength function and uncertainty partition are recomputed with the new label. The researcher can re-evaluate each pre-pivot card against the revised hypothesis.
3. The Research Terminal
The formal model above defines the structure of a workspace. The Research Terminal defines how a researcher interrogates that structure against the current literature.
The terminal is not a general AI assistant. It is a question-and-answer interface that receives every question with the full workspace context — all hypotheses, all evidence links, all assumptions, all mechanisms, the experimental plan, the research proposal, and the full decision log — and searches PubMed and Semantic Scholar fresh before generating any response.
This means two things. First, every answer is grounded in current published literature, not in the model's training data, which may be months or years out of date. Second, every answer is specific to the researcher's exact intellectual situation — not to the general topic, but to this hypothesis, this evidence state, this assumption, this gap in the uncertainty map.
The terminal operates with three invariants that distinguish it from general AI tools:
- Invariant 1 — Search before answer. PubMed and Semantic Scholar are queried on every question. The terminal does not answer from memory alone.
- Invariant 2 — Cite every empirical claim. No factual assertion about what research shows is made without a citation to a paper retrieved from the search. The terminal does not fabricate authors or findings.
- Invariant 3 — Name what the researcher missed. Every substantive response contains at least one finding the researcher did not ask about but needs to know — a finding that changes the picture of their hypothesis or their experimental design. This is the standard that makes the terminal irreplaceable for a researcher who already knows their field well.
When the terminal's external search returns nothing — a situation that occurs for very specific or novel research questions — it synthesises from the workspace papers the researcher has already added, is explicit about what it is drawing from, and identifies the absence of published literature as itself a finding: this question may be genuinely unanswered.
4. The Writing Review
A researcher's writing is the primary artifact of their work. Every claim in a literature review, every methodological choice in a methods section, every conclusion in a discussion — these are checkable propositions that either rest on solid evidence or do not.
The writing review feature takes a section of the researcher's actual writing and evaluates every empirical claim against the evidence in their workspace and the current literature. Claims that are unsupported, overstated, or contradicted by papers in the researcher's own foundation are highlighted. The specific evidence that makes each claim weak is shown alongside it.
This is not a grammar checker. It is not a writing assistant. It is a pre-reviewer — the first rigorous read of the researcher's writing that happens before any human sees it.
The formal basis: each sentence in the writing that makes an empirical claim is mapped against the evidence foundation. For each mapped claim, the system computes whether the claim is supported by the researcher's linked evidence (σ = supports), contradicted (σ = contradicts), qualified in a way that limits the claim (σ = qualifies), or not yet in the foundation (unlinked). Claims in the fourth category trigger a literature search to determine whether the claim can be grounded or should be qualified.
5. Monitoring
Literature monitoring runs continuously over active workspace nodes. For each hypothesis node, the monitoring pipeline:
- Constructs a search query from the node's hypothesis label, its existing evidence links, and the key terms from its assumptions and mechanisms
- Runs the query against PubMed and Semantic Scholar new publication feeds
- Evaluates each new paper for relevance and impact on the hypothesis
- Checks all existing papers in the foundation against Retraction Watch
- Creates alerts for: new papers that support or contradict active hypotheses; retraction or correction notices for papers in the foundation; preprints that have been updated or published; clinical trial status changes for trials associated with the hypothesis
Alerts are classified by severity — Critical (retraction of a foundation paper, direct contradiction of a logged decision), High (new paper that significantly affects the evidence picture), Medium (qualifying paper, new evidence that refines but does not reverse the picture), Low (tangentially relevant new work).
Monitoring does not add papers to the foundation automatically. It creates alerts. The researcher reviews each alert and decides whether to add the paper, which preserves the researcher's agency over their evidence foundation.
6. Properties of the Model
6.1 Completeness
The model is complete in the sense that every piece of intellectual work a researcher does during a project can be captured within it: hypothesis formation, evidence evaluation, assumption identification, mechanism mapping, experimental planning, proposal writing, writing review, and decision logging. No aspect of the research reasoning process requires going outside the workspace.
6.2 Durability
Because evidence links carry support statuses and decisions carry evidence snapshots, the workspace is a durable record. Deleting a paper does not erase the decision that was made when the paper was in the foundation — the snapshot preserves what existed at decision time. Archiving a hypothesis preserves its full evidence history. Pivoting a hypothesis preserves the pre-pivot evidence cards with a flag for re-evaluation.
6.3 Shareability
A decision snapshot is a human-readable export of a decision plus its evidence snapshot and current monitoring status. It can be shared with a supervisor, collaborator, or regulator who has no Seren account. The reader sees exactly what the researcher saw at the time of the decision — which papers existed, what they said, what was unknown, and what the researcher chose to accept as uncertain in order to move forward.
6.4 Honest Uncertainty
The uncertainty partition is not aspirational — it is computed from actual link counts. A hypothesis with no evidence is grey, not green. A hypothesis with three supporting papers and one contradicting paper is amber, not green. The system does not allow a researcher to claim certainty they do not have. The map shows what the evidence actually supports, not what the researcher wishes it supported.
7. What the Model Does Not Do
The model does not make research decisions. It creates the conditions under which research decisions can be made well — by making the evidence visible, the gaps explicit, and the reasoning recordable. The decision is always the researcher's.
The model does not replace literature search tools. PubMed, Semantic Scholar, and Research Rabbit help researchers find papers. Seren helps them know what to do with the papers they find. These are sequential steps in the same process, not competing approaches to the same problem.
The model does not evaluate the quality of a researcher's hypotheses in absolute terms. It evaluates them against the available evidence. A hypothesis with strong evidence support in the current literature may still be wrong. A hypothesis with weak evidence may be correct but simply understudied. The model represents the current state of the evidence, not the truth.
8. Conclusion
We have specified a formal workspace model — sets (U, W, H, D, C, L, A, M, Δ, Π), relations (parent, workspace, support status, node–document links), derived quantities (evidence strength, uncertainty partition), and decision snapshots — that makes research reasoning explicit, checkable, and durable.
The model is implemented in Seren, a research workspace built specifically for biomedical and life science research. It supports continuous literature monitoring, shareable decision snapshots, pivot-aware evidence re-evaluation, hypothesis-aware literature interrogation via the Research Terminal, and pre-submission writing review.
The core claim of the model is simple: a researcher who uses it produces a record that proves they thought carefully, evaluated the evidence honestly, knew what they were betting on when they committed to a direction, and will know when the literature moves in a way that changes the picture.
That record is what distinguishes a researcher who can defend their work from one who cannot.
Notation Summary
| U, W, H, D, C, L, A, M, Δ, Π | Users, workspaces, nodes, documents, claims, links, assumptions, mechanisms, decisions, pivots |
| parent : H → H ∪ {⊥} | Tree structure — parent node or root |
| workspace : H → W | Node-to-workspace membership |
| σ : L → {supports, contradicts, qualifies, unrelated} | Support status of evidence link |
| n_supports, n_contradicts, n_qualifies | Link counts by support type |
| strength : H → {green, amber, red, grey} | Evidence strength derived from counts |
| K_w, T_w, U_w | Known, Contested, Unknown partition of workspace w |
| δ = (nodeId, text, reasoning, confidence, uncertainty, snapshot, t) | Decision record |
| π = (nodeId, oldLabel, newLabel, reason, t) | Pivot record |
seren_ — the thinking environment for biomedical and life science research