Why Domain Expertise Still Matters in an AI World

EXPERTS STILL MATTER

Why Domain Expertise Still Matters in an AI world

Generative AI is now materially useful in software and capability delivery. Across software engineering and adjacent knowledge work, it can accelerate drafting, summarisation, code generation, test creation, documentation, and some forms of requirements transformation. Empirical studies show meaningful gains in several contexts: multi-firm field experiments with 4,867 developers found a 26.08% increase in completed tasks with AI assistance; a BIS field experiment reported more than 50% higher code output, especially for junior staff; and McKinsey’s controlled work with developers found routine tasks such as documentation and new-code drafting could be completed much faster, sometimes close to twice as fast. Yet those same studies also show strong heterogeneity: gains fall on complex tasks, some junior developers can be slower, and experienced open-source developers working in familiar repositories were 19% slower with early-2025 AI tools in one randomised trial. The central implication is not that AI is weak, but that its value is conditional on task structure, context quality, and human oversight (Cui et al., 2026; Gambacorta et al., 2024; McKinsey & Company, 2023; METR, 2025).

That distinction is decisive for requirements engineering and business analysis. These disciplines are not merely about producing artefacts quickly; they are about defining the right problem, surfacing tacit needs, resolving ambiguity, testing fitness-for-purpose, and governing value through the life cycle. Recent reviews of LLM use in requirements engineering show rapid expansion into elicitation, validation, test generation, and artefact conversion, but they also show that the evidence base is still dominated by controlled settings rather than complex industrial environments. A 2025 systematic review of 74 primary RE studies found that LLMs were being used heavily for elicitation and validation, usually with GPT-based models and zero-shot/few-shot prompts (prompts that provide either no examples or a limited set), yet with limited integration into real workflows and limited industrial evaluation. A 2025 systematic mapping study on domain knowledge in RE, spanning 75 papers, concluded that domain knowledge is fundamental to elicitation, analysis, and validation, while recurring challenges remain in knowledge transfer, formalisation, acquisition, and long-term maintenance—especially where knowledge is tacit and embedded in work practices (Zadenoori et al., 2025; Araújo et al., 2025).

The standards position is even clearer. The International Institute of Business Analysis defines business analysis as enabling change by defining needs and recommending solutions that deliver value to stakeholders in context, while the BABOK framework explicitly treats governance, requirements life-cycle management, validation, and solution evaluation as core tasks rather than optional add-ons. ISO/IEC/IEEE 29148 similarly frames requirements engineering as a life-cycle discipline with distinct business, stakeholder, system, and software requirement information items. These standards do not describe requirements as neutral text objects that can be generated and accepted at face value. They describe requirements as traceable, contextual, validated representations of stakeholder needs and enterprise intent (IIBA, n.d.-a; IIBA, n.d.-b; IIBA, n.d.-c; IIBA, n.d.-d; IEEE, 2018; ISO, 2018).

The strongest analytical conclusion, therefore, is that domain expertise remains indispensable because it performs functions that present-day generative AI does not reliably perform on its own. Domain experts know which exceptions matter, which language is overloaded, which constraints are implicit, which stakeholders really hold veto power, which risks are tolerable, and which “requirements” are actually symptoms of deeper business problems. They also understand operational context: incentives, sequencing, edge cases, legacy dependencies, regulatory obligations, and the difference between a stakeholder preference and a viable requirement. In short, AI can accelerate the conversion of known context into draft artefacts, but it does not remove the need to establish, test, and govern that context (Aranda et al., 2016; Ferrari, Spagnolo and Gnesi, 2016; Araújo et al., 2025).

Human-in-the-loop design is therefore not a temporary compensating control. It is THE appropriate operating model for high-stakes, context-dependent analysis work. Hybrid intelligence literature argues that the best outcomes arise when humans and AI combine complementary strengths rather than when one is naively substituted for the other. In requirements-heavy environments, the human contribution is not limited to “final proofreading”; it includes framing the problem, curating relevant context, structuring prompts, interrogating assumptions, validating outputs with stakeholders, deciding trade-offs, and evaluating realised value after deployment. This is also the direction of governance guidance. NIST’s Generative AI Profile explicitly treats confabulation, information integrity, and human–AI configuration as core risk areas, and recommends documenting the extent of human domain knowledge used, reviewing citations and sources, testing in real-world settings, monitoring human overrides, and incorporating structured feedback into go/no-go decisions. OECD and ISO/IEC 42001 place similar emphasis on accountability, transparency, robustness, and organisational governance (Dellermann et al., 2021; NIST, 2024; OECD, 2024; ISO, 2023).

For organizations, the strategic implication is straightforward. AI should be used aggressively for speed, scale, first-draft generation, evidence collation, scenario expansion, and trace-document support. It should not be treated as an autonomous substitute for domain-led analysis, stakeholder interpretation, or business intent definition. A sound policy is not “AI everywhere” nor “AI nowhere”; it is “AI inside a governed, domain-led workflow.” That means explicit role design, approved data/context packs, traceability, validation checkpoints across the life cycle, and training that elevates analysts from document producers to context designers, evaluators, and stewards of value realisation (McKinsey & Company, 2025; Gartner, 2024; Gartner, 2025; NIST, 2024).

Introduction
The question is no longer whether generative AI is useful in software and capability delivery. It plainly is. The better question is what kind of usefulness matters most when organisations are trying to deliver change that works in the real world. In delivery settings, AI can draft user stories, acceptance criteria, process summaries, test cases and technical explanations at high speed. But requirements engineering and business analysis are not exhausted by the production of those artefacts. They are concerned with need, context, value, risk, prioritisation, validation, and realised outcomes. The thesis of this paper is that domain expertise, human-in-the-loop oversight, and a clear understanding of business intent remain essential because they are the mechanisms through which “fast artefacts” become “correct, useful, and deployable change” (IIBA, n.d.-a; IEEE, 2018; McKinsey & Company, 2025).

A cross-jurisdictional reading of current standards and policy frameworks reinforces this conclusion. OECD’s AI Principles, updated in 2024, emphasise transparency, explainability, robustness, safety and accountability, and the OECD notes that its definition of the AI lifecycle is now used in frameworks across the European Union, Council of Europe, United States and United Nations settings. ISO/IEC 42001 adds a management-system view of AI governance, while NIST’s AI RMF and Generative AI Profile provide an operational model for managing AI risks across design, development, deployment, operation and decommissioning. In other words, the direction of travel internationally is not toward removing human judgement from consequential work; it is toward making human accountability and organisational governance more explicit as AI becomes more capable (OECD, 2024; ISO, 2023; NIST, 2024).

The recent development path of the field illustrates why. AI has moved from experimental aid to mainstream delivery infrastructure in less than a decade, but the most consequential milestones have not been the appearance of drafting tools alone; they have been the parallel emergence of standards, productivity evidence, and risk frameworks. The timeline below shows that progress in capability has been matched by a growing recognition that context, validation and governance cannot be left implicit.

Period

Development

Why it matters

2015-2018

Controlled Requirements Engineering studies find that analyst domain knowledge affects elicitation effectiveness, while ambiguity and tacit knowledge remain central in requirements interviews

Early empirical evidence that requirements quality depends on more than formal technique; expertise changes what is elicited and understood

2018

ISO/IEC/IEEE 29148 second edition consolidates requirements engineering as a life-cycle discipline with distinct business, stakeholder, system and software information items

Formalises the idea that business intent and stakeholder need must be translated, traced and validated, not merely documented

2019

OECD AI Principles are adopted and hybrid intelligence is articulated as combining human and AI strengths for superior results

Establishes both governance and complementarity as the correct frame for AI adoption

2023

ChatGPT is empirically tested on requirements information retrieval; NIST releases AI RMF 1.0; business field studies begin to show heterogeneous productivity effects

AI becomes relevant to RE, but evidence already shows promise mixed with precision and control problems

2024

NIST publishes the Generative AI Profile; BIS publishes a coding productivity field experiment; Thoughtworks reports a requirements-analysis pilot

Governance shifts from abstract principle to lifecycle controls; industry pilots show “context is key”

2025-2026

Systematic reviews of LLMs in RE proliferate; McKinsey and Gartner frame AI-native SDLCs; field experiments and practitioner studies show both gains and slowdowns depending on context

The mature lesson is complementarity: AI can accelerate delivery, but domain-led oversight still determines whether that delivery is correct and valuable

The timeline is compiled from the cited standards, empirical studies and industry analyses (Aranda et al., 2016; Ferrari, Spagnolo and Gnesi, 2016; ISO, 2018; OECD, 2024; Dellermann et al., 2021; Zhang et al., 2023; NIST, 2024; Gambacorta et al., 2024; Böckeler et al., 2024; Zadenoori et al., 2025; Cui et al., 2026; Quattrocchi et al., 2026).

Literature and Standards
The recent literature on AI in requirements engineering is substantial enough to support cautious generalisation, but not yet strong enough to justify claims of autonomous replacement. A 2025 systematic literature review of 74 primary studies found that LLM applications in RE were concentrated around elicitation and validation, with growing attention to test generation and adjacent software-engineering tasks. The same review found that most studies relied on GPT-based models and zero-shot or few-shot prompting, were typically evaluated in controlled environments, and had limited integration into industry settings and complex workflows. This matters because a tool that performs well on isolated artefact transformations may still fail when confronted with conflicting stakeholder goals, incomplete domain memory, or changing operational constraints (Zadenoori et al., 2025).

The companion literature on domain knowledge sharpens the point. Araújo et al.’s 2025 mapping study synthesised 75 papers and concluded that domain knowledge is central to elicitation, analysis and validation, but that organisations repeatedly struggle to acquire, formalise and maintain it. The authors specifically identify tacit knowledge, expert availability, structured representation and long-term maintenance as persistent bottlenecks. This is an important corrective to technology-centred narratives. If the key input to high-quality requirements work is knowledge that is tacit, local, and only partially documented, then a general-purpose LLM cannot be assumed to possess it simply because it can produce plausible prose in the vocabulary of the domain (Araújo et al., 2025).

More focused studies reinforce the same conclusion. Zhang et al. found that ChatGPT showed high recall but low precision on requirements information retrieval under zero-shot conditions: it could retrieve broadly relevant material, but had limited ability to retrieve specific requirements information. That pattern is highly revealing. For analysis work, breadth without precision is not enough. High recall can be useful for brainstorming, issue discovery and first-pass summarisation, but low precision pushes the burden back onto domain analysts to verify relevance, filter noise, and resolve subtle but material differences in meaning (Zhang et al., 2023).

The emerging user-story literature is similarly mixed. Quattrocchi et al. report that LLM-generated user stories can be comparable to human-authored stories in coverage and stylistic quality, and that LLMs can help at scale with quality assessment when given explicit evaluation criteria. However, the same work also finds lower diversity and creativity in generated stories, and shows that LLM-generated stories meet acceptance-quality criteria less frequently than human-authored ones. That is precisely the pattern a serious business-analysis function should expect: AI is often strongest where form and convention dominate, but weaker where uniqueness, independence, subtle trade-offs and boundary cases matter (Quattrocchi et al., 2026).

The standards literature tilts even more strongly toward a domain-led interpretation. IIBA defines business analysis as enabling change by defining needs and recommending solutions that deliver value to stakeholders. Public BABOK materials also show that the discipline encompasses business analysis governance, requirements life-cycle management, validation, design-option analysis and solution evaluation. This is analytically important because it means business analysis is not only about constructing requirement statements; it is also about deciding how requirement decisions are made, how requirements are traced and maintained, how designs are validated against business needs, and how actual value is assessed after implementation (IIBA, n.d.-a; IIBA, n.d.-b; IIBA, n.d.-c; IIBA, n.d.-d; IIBA, n.d.-e).

ISO/IEC/IEEE 29148 supports the same thesis from the requirements-engineering side. The standard describes a unified treatment of the processes and products involved in engineering requirements throughout the life cycle of systems and software; it also distinguishes business, stakeholder, system and software requirement information items. The existence of that layered structure matters because it formalises translation across levels of meaning. Business intent must be transformed into stakeholder requirements; stakeholder needs must be interpreted into system and software requirements; and the result must remain traceable through later lifecycle activities. An LLM can help draft across those levels, but it does not eliminate the need to determine whether the translation itself is valid (ISO, 2018; IEEE, 2018).

For AI governance specifically, the converging standards are again notable. NIST’s Generative AI Profile states that its purpose is to help organisations govern, map, measure and manage risks that are novel to or exacerbated by generative AI across the lifecycle. It explicitly treats human–AI configuration, confabulation, information integrity and accountability-related controls as first-class concerns. ISO/IEC 42001 provides a management-system structure for responsible AI; OECD’s principles add cross-jurisdictional norms around accountability, transparency, robustness, safety and human capacity. Taken together, these standards imply that the mature question is not “Can AI generate requirements artefacts?” but “Under what governance conditions can AI-generated artefacts be trusted, validated and used responsibly?” (NIST, 2024; ISO, 2023; OECD, 2024).

Why Domain Expertise Still Matters
The deepest reason domain expertise still matters is that business intent and operational reality are not fully observable in text. In requirements work, much of what matters is implicit: the exception that only occurs at quarter-end, the workaround the operations team never documented, the internal politics around approval thresholds, the language that means one thing to customer service and another to legal, or the risk threshold at which a workflow becomes unacceptable even if “most cases” succeed. Requirements literature has long recognised that ambiguity and tacit knowledge are central problems in elicitation, and the more recent mapping work confirms that tacit and hybrid knowledge remain hard to capture and reuse. Domain expertise is therefore not a decorative layer added after AI drafting. It is the interpretive substrate that makes drafting meaningful in the first place (Ferrari, Spagnolo and Gnesi, 2016; Araújo et al., 2025).

A related issue is that business intent sits upstream of artefact form. Standards such as ISO/IEC/IEEE 29148 distinguish business requirements from stakeholder, system and software requirements, while IIBA treats strategy analysis, governance and solution evaluation as integral to the discipline. This means that “the requirement” is not a freestanding sentence to be polished; it is a negotiated representation of enterprise purpose, stakeholder need, feasibility, risk, and value. AI can help convert an agreed intent into a cleaner draft. It cannot, by itself, determine whether the intent is coherent, whether the requirement reflects the right unit of value, or whether an apparent stakeholder request is actually a poor proxy for the underlying problem (IIBA, n.d.-a; IIBA, n.d.-b; ISO, 2018).

Hybrid-intelligence research gives the right conceptual frame. Dellermann and colleagues argue that current real-world tasks cannot yet be solved by machines alone and instead require socio-technological ensembles in which humans and AI collectively achieve superior results and improve through learning from one another. Later hybrid-intelligence work similarly describes jointly solved tasks, superior outcomes relative to isolated effort, and continuous learning as defining features. In requirements and business-analysis work, that means the machine contributes speed, pattern recall, variation generation and scale, while humans contribute context curation, judgement, ethics, organisational interpretation and decision rights (Dellermann et al., 2021; Mirbabaie et al., 2021).

The empirical productivity literature strongly supports this complementarity view. Brynjolfsson, Li and Raymond found that a generative-AI assistant increased customer-support productivity by roughly 14–15% on average, with the largest gains for novice and lower-skilled workers and little effect for experienced or highly skilled workers. BIS reported that coding output increased by more than 50% in its field experiment, but again the statistically significant gains were concentrated among junior staff. Cui et al.’s 2026 multi-firm software-development experiment found a 26.08% increase in completed tasks, with greater gains for less experienced developers. Across these studies, AI appears especially strong where it can disseminate codified best practices or accelerate well-structured work. That is valuable, but it is not the same as proving that domain expertise is obsolete. On the contrary, it suggests that AI often compresses the distance between novice and intermediate performance while leaving expert judgement, constraint handling and problem framing highly valuable (Brynjolfsson, Li and Raymond, 2025; Gambacorta et al., 2024; Cui et al., 2026).

The counter-evidence is just as important. METR’s 2025 randomised trial with experienced open-source developers found that AI use made participants 19% slower on tasks in repositories they already knew well, even though those developers believed AI had sped them up. McKinsey’s research similarly found that time savings collapsed on tasks developers judged highly complex, and that developers with less than a year of experience could take 7–10% longer in some conditions. These results matter for business analysis because they show that the net value of AI depends not simply on having the tool but on the relationship between the task, the user, and the context. Where the human already holds deep, situational knowledge, AI can introduce friction through review effort, incorrect assumptions, and context reconstruction overhead (METR, 2025; McKinsey & Company, 2023).

Thoughtworks’ 2024 requirements-analysis pilot offers one of the most useful industrial illustrations. In a relatively complex domain, the team found AI was not very helpful until they invested in reusable domain and architecture context. Once that context was well defined, the BA reported better preparation for developer conversations, the QA estimated roughly 10% fewer bugs and rework due to better edge-case coverage, and the team estimated around a 20% reduction in analysis time across a small sample of three epics. Yet the same case study also stresses that users needed time to learn how to work with non-deterministic outputs, that impact was difficult to measure cleanly, and that experienced BA and QA practitioners remained the principal users and reviewers. The headline lesson from this case is not “AI replaces analysis”; it is “context orchestration and expert review are prerequisites for value” (Böckeler et al., 2024).

The governance case is equally strong. NIST’s Generative AI Profile recommends documenting the extent to which human domain knowledge is used to improve system performance, reviewing and verifying sources and citations, evaluating real-world system performance rather than extrapolating from narrow assessments, monitoring human overrides, involving end-users and practitioners in prototyping and testing, and feeding structured feedback into design, deployment approval and decommission decisions. That guidance recognises something crucial: in practice, the risk of AI is not only in model output quality, but in how humans place, trust, override, and operationalise those outputs inside socio-technical systems. Requirements and business-analysis work lives precisely at that boundary (NIST, 2024).

The Air Canada chatbot case shows what happens when that boundary is poorly governed. Commentary on Moffatt v. Air Canada notes that the airline was held liable for negligent misrepresentation after a customer relied on inaccurate chatbot information presented on its website. The importance of this case for software-delivery teams is not merely legal. It demonstrates that organisations remain accountable for AI-mediated outputs delivered through their systems, and that the common instinct to treat the model as a “separate entity” is untenable. In practical terms, if an AI-generated requirement, acceptance criterion, policy explanation or customer-facing workflow is wrong, the responsibility sits with the deploying organisation and the humans who approved the deployment, not with the model (McCarthy Tétrault, 2024; American Bar Association, 2024; Brand, 2025).

The comparison below summarises the strategic point. In requirements-heavy delivery, the problem is not choosing between “human” and “AI.” It is choosing the correct division of labour.

Delivery Mode

Strengths

Main failure mode

Best fit

AI-only drafting

Very fast first drafts; good for summarisation, transformation and format compliance

Misframes the problem, fabricates specifics, misses tacit rules, amplifies noise

Low-risk internal brainstorming and disposable drafts

Generic Human-in-the-Loop (HITL)

Better than AI-only because a human reviews output after generation

Review becomes superficial if the human lacks domain context or only proofreads form

Medium-risk drafting support where context is moderately stable

Compliance and security gaps

Fast code violates policy or design controls

Apply policy-aware design review, code review, scanning, and guardrails

Architectural drift and debt

Domain-led HITL

Combines speed with business framing, exception handling, validation, traceability and accountability

Requires process discipline, role clarity and context curation effort

High-value requirements engineering, business analysis, regulated or operationally complex delivery

This comparison is synthesised from hybrid-intelligence theory, software productivity studies, RE literature and governance standards (Dellermann et al., 2021; McKinsey & Company, 2023; METR, 2025; Böckeler et al., 2024; NIST, 2024).

Recommendations for Organizations
The most defensible recommendations are organisation-level and process-level rather than tool-specific. The strategic objective should be to institutionalise domain-led hybrid intelligence: AI accelerates drafting and evidence processing, while analysts, SMEs and delivery leads retain responsibility for framing, validation and value realisation. This is consistent with BABOK’s emphasis on governance, validation, life-cycle management and solution evaluation, and with NIST/ISO governance models for trustworthy AI (IIBA, n.d.-b; IIBA, n.d.-c; IIBA, n.d.-d; IIBA, n.d.-e; NIST, 2024; ISO, 2023).

The first recommendation is policy clarity. Organizations should define which artefacts may be AI-assisted, which require mandatory human sign-off, and which are prohibited from unsupervised generation. High-risk artefacts should include business problem statements, policy or regulatory interpretations, production acceptance criteria, public-facing explanations, workflow rules that affect customers or staff, and any requirement that creates legal, financial, safety or reputational exposure. The policy should also require provenance-aware usage: only approved internal sources, approved prompt patterns, and retention of prompt/output histories for auditable work products. This follows directly from NIST’s emphasis on legal requirements, information integrity, real-world evaluation and structured feedback, and from ISO/IEC 42001’s management-system approach (NIST, 2024; ISO, 2023).

The second recommendation is a context-pack discipline. Before AI is used for requirements drafting, the team should prepare a lightweight but structured context pack containing business intent, success metrics, stakeholder map, domain lexicon, process map, constraints, interfaces, policy rules, known edge cases, non-functional requirements, and source documents. Thoughtworks’ case study shows that AI only became materially useful after such contextualisation, and McKinsey’s research similarly notes that off-the-shelf tools do not know the specific needs of the project and organisation unless humans provide context. The context pack is therefore the key bridge between general model competence and local business relevance (Böckeler et al., 2024; McKinsey & Company, 2023).

The third recommendation is role design. Organizations should explicitly separate responsibility for AI usage into at least four functions: a domain analyst or business analyst who owns business framing and stakeholder interpretation; a domain SME who validates tacit assumptions and edge cases; a solution/architecture lead who assesses feasibility and systemic impacts; and a QA or test lead who validates acceptance criteria, negative scenarios and coverage. A fifth role—a platform or AI steward—should manage approved models, prompt libraries, retrieval sources, logging and policy compliance. Gartner’s forecasts that software roles will shift toward orchestration, context steering and AI-native engineering are useful here, but they should be applied as role enrichment rather than role erasure (Gartner, 2024; Gartner, 2025).

The fourth recommendation is lifecycle checkpoints rather than a single “review step.” Organizations should embed validation at five points: an intent gate before drafting, where the business objective and success measures are agreed; a requirement gate, where AI-generated or AI-assisted artefacts are checked for ambiguity, traceability and business alignment; a design-risk gate, where architecture, security, privacy, compliance and operational constraints are tested; a release gate, where stakeholder validation and test evidence are reviewed; and a value-realisation gate after deployment, where realised outcomes, defects, overrides, clarification loops and unmet assumptions are analysed. This is consistent with IIBA’s view of life-cycle management and solution evaluation, and with NIST’s requirement to verify feedback incorporation into go/no-go decisions and ongoing monitoring (IIBA, n.d.-b; IIBA, n.d.-e; NIST, 2024).

The fifth recommendation is measurement discipline. Organizations should resist the temptation to optimise only for time-to-draft. Better metrics are time-to-validated requirement, stakeholder clarification rate, escaped defect rate linked to requirement errors, requirement volatility after sprint commitment, percentage of AI-assisted content accepted without major rewrite, percentage of stakeholder-facing statements with verifiable provenance, and realised value against predefined business outcomes. This is strategically important because many disappointing AI programmes fail not because the models are useless but because organisations measure surface throughput rather than validated impact. McKinsey’s broader PDLC work and practitioner evidence on expectation management both warn against mistaking local efficiency gains for end-to-end value (McKinsey & Company, 2025; Jensen et al., 2025).

The sixth recommendation is a training model that elevates analysts rather than narrows them into prompt operators. Training should cover prompt and context design, but it should also strengthen interviewing, facilitation, decision analysis, process modelling, risk analysis, traceability design, testability review, and post-deployment evaluation. Gartner argues that 80% of the engineering workforce will need to upskill through 2027 and that human expertise and creativity will remain essential for complex software. For Organizations, that means the right skills agenda is not “teach everyone to use the chatbot.” It is “teach analysts to govern, challenge and exploit AI while deepening domain judgement.” That is especially important if the firm wants durable differentiation rather than commodity drafting speed (Gartner, 2024; Gartner, 2025; IIBA, 2025).

Capability Area

Recommendation

Strategic rationale

Policy

Define approved use cases, prohibited unsupervised uses, provenance rules and human sign-off responsibilities

Prevents AI from being treated as an autonomous authority

Process

Require context packs and lifecycle validation gates

Converts generic model capability into business-relevant outputs

Roles

Formalise BA, SME, architect, QA and AI steward accountabilities

Avoids responsibility gaps and “everyone reviewed it” failure modes

Tooling

Prefer RAG over free-form prompting for internal work; log prompts, sources and outputs

Improves traceability, auditability and consistency

Quality control

Review requirement ambiguity, edge cases, non-functional impacts and source validity explicitly

Addresses high-recall/low-precision failure patterns in RE tasks

Metrics

Measure validated value, not just drafting speed

Aligns AI usage with business outcomes rather than artefact volume

Training

Build dual capability in AI literacy and domain-led analysis

Creates durable advantage and supports AI-native but human-accountable delivery

This table is synthesised from IIBA, ISO/NIST governance guidance, empirical software-engineering evidence and practitioner case studies (IIBA, n.d.-a; NIST, 2024; ISO, 2023; McKinsey & Company, 2023; Böckeler et al., 2024; Gartner, 2025).

Research Gaps, Limitations and Conclusion
Several research gaps remain. First, the RE literature still has too little longitudinal evidence from live, heterogeneous organisational settings. Systematic reviews note that most studies occur in controlled environments with limited industrial embedding. Second, productivity evidence is strongest for coding and adjacent tasks, but less mature for end-to-end business analysis, stakeholder validation and solution evaluation. Third, the field still lacks robust methods for measuring value at the level that matters most for requirements work: not faster drafting, but fewer wrong turns, lower clarification costs, fewer post-release surprises, and higher realised business value. Fourth, there is still limited comparative evidence across domains with different levels of tacitness, regulation and operational complexity. These gaps matter because they make naive universal claims about AI-led analysis scientifically premature (Zadenoori et al., 2025; Araújo et al., 2025; Cui et al., 2026).

There are also practical limitations to the present report. Some of the industry evidence comes from pilots and field studies with bounded samples, and some task-level estimates—such as those in the Thoughtworks requirements-analysis case—are explicitly exploratory. Public IIBA materials were used to represent BABOK concepts because the full guide itself is not openly available. Recommendations for Organizations are therefore designed at the governance and operating-model level rather than tailored to internal process documents, templates or tool configurations that were not provided. None of these limitations overturn the core argument, but they do mean that implementation should be piloted, measured and adapted rather than copied mechanically (Böckeler et al., 2024; IIBA, n.d.-a; Jensen et al., 2025).

The overall conclusion is strong.
Domain expertise still matters in an AI world because meaning still matters more than syntax. Generative AI is increasingly effective at producing plausible and often useful artefacts, but requirements engineering and business analysis depend on something deeper: contextual interpretation, tacit operational knowledge, stakeholder alignment, trade-off judgement, lifecycle governance and post-deployment evaluation. The best evidence does not show that these tasks disappear as AI improves. It shows that they become more important, because the speed of generation increases the cost of getting the framing wrong. In that sense, domain expertise is not the residue of a pre-AI era; it is the control system for responsible and successful AI-enabled delivery. Organizatgions should therefore pursue AI assertively, but only within a domain-led, human-accountable model that governs meaning, priority and fitness for operational use (IIBA, n.d.-a; ISO, 2018; NIST, 2024; McKinsey & Company, 2025).

REFERENCES
American Bar Association (2024) ‘BC Tribunal Confirms Companies Remain Liable for Information Provided by AI Chatbot’. Business Law Today, 29 February.
Araújo, M., Araújo, J., Oliveira, R., Romao, L. and Kalinowski, M. (2025) ‘Domain Knowledge in Requirements Engineering: A Systematic Mapping Study’. arXiv, 2506.20754.
Aranda, A.M., Dieste, O. and Juristo, N. (2016) ‘Effect of Domain Knowledge on Elicitation Effectiveness: An Internally Replicated Controlled Experiment’. IEEE Transactions on Software Engineering, 42(5).
Böckeler, B., KP, A., Mukhopadhyay, S., Sivadass, R. and B., J. (2024) ‘Using AI for requirements analysis: A case study’. Thoughtworks Insights, 17 September.
Brand, J.L.M. (2025) ‘Air Canada’s chatbot illustrates persistent agency and responsibility gap problems for AI’. AI & Society, 40, pp. 3361–3363.
Brynjolfsson, E., Li, D. and Raymond, L.R. (2025) ‘Generative AI at Work’. Quarterly Journal of Economics, 140(2).
Cui, K.Z., Demirer, M., Jaffe, S., Musolff, L., Peng, S. and Salz, T. (2026) ‘The Effects of Generative AI on High-Skilled Work: Evidence from Three Field Experiments with Software Developers’. Management Science, advance online publication.
Dellermann, D., Ebel, P., Söllner, M. and Leimeister, J.M. (2019) ‘Hybrid Intelligence’. Business & Information Systems Engineering, 61, pp. 637–643.
Dellermann, D., Calma, A., Lipusch, N., Weber, T., Weigel, S. and Ebel, P. (2021) ‘The future of human-AI collaboration: a taxonomy of design knowledge for hybrid intelligence systems’. arXiv, 2105.03354.
Ferrari, A., Spagnolo, G.O. and Gnesi, S. (2016) ‘Ambiguity and tacit knowledge in requirements elicitation interviews’. Requirements Engineering. doi:10.1007/s00766-016-0249-3.
Gambacorta, L., Qiu, H., Shan, S. and Rees, D. (2024) ‘Generative AI and labour productivity: a field experiment on coding’. BIS Working Papers, No. 1208.
Gartner (2024) ‘Gartner Says Generative AI will Require 80% of Engineering Workforce to Upskill Through 2027’. Press release, 3 October.
Gartner (2025) ‘Gartner Identifies the Top Strategic Trends in Software Engineering for 2025 and Beyond’. Press release, 1 July.
IIBA (n.d.-a) ‘What is Business Analysis?’. International Institute of Business Analysis. Accessed 12 June 2026.
IIBA (n.d.-b) ‘Requirements Life Cycle Management’. BABOK Guide public KnowledgeHub page. Accessed 12 June 2026.
IIBA (n.d.-c) ‘Validate Requirements’. BABOK Guide public KnowledgeHub page. Accessed 12 June 2026.
IIBA (n.d.-d) ‘Plan Business Analysis Governance’. BABOK Guide public KnowledgeHub page. Accessed 12 June 2026.
IIBA (n.d.-e) ‘Solution Evaluation’. BABOK Guide public KnowledgeHub page. Accessed 12 June 2026.
IIBA (2025) ‘Global Research & Reports’. International Institute of Business Analysis. Accessed 12 June 2026.
IEEE (2018) ‘IEEE/ISO/IEC 29148-2018: Requirements Engineering’. IEEE Standards Association.
ISO (2018) ‘ISO/IEC/IEEE 29148:2018: Systems and software engineering — Life cycle processes — Requirements engineering’. International Organization for Standardization.
ISO (2023) ‘ISO/IEC 42001:2023 Information technology — Artificial intelligence — Management system’. International Organization for Standardization.
Jensen, V.V., Alami, A., Bruun, A.R. and Persson, J.S. (2025) ‘Managing expectations towards AI tools for software development: a multiple-case study’. Information Systems and e-Business Management. doi:10.1007/s10257-025-00704-7.
McCarthy Tétrault (2024) ‘Moffatt v. Air Canada: A Misrepresentation by an AI Chatbot’. Blog post, 19 February.
McKinsey & Company (2023) ‘Unleashing developer productivity with generative AI’. 27 June.
McKinsey & Company (2025) ‘How an AI-enabled software product development life cycle will fuel innovation’. 10 February.
METR (2025) ‘Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity’. 10 July.
Mirbabaie, M. et al. (2021) ‘Hybrid intelligence in hospitals: towards a research agenda’. Electronic Markets.
NIST (2023) Artificial Intelligence Risk Management Framework (AI RMF 1.0). National Institute of Standards and Technology.
NIST (2024) Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile. NIST AI 600-1. National Institute of Standards and Technology.
OECD (2024) ‘OECD AI Principles overview’. Organisation for Economic Co-operation and Development. Updated May 2024.
Quattrocchi, G., Pasquale, L., Spoletini, P. and Baresi, L. (2026) ‘Can LLMs Generate User Stories and Assess Their Quality?’. IEEE Transactions on Software Engineering, 52(5), pp. 1773–1790.
Zadenoori, M.A., Dąbrowski, J., Alhoshan, W., Zhao, L. and Ferrari, A. (2025) ‘Large Language Models for Requirements Engineering: A Systematic Literature Review’. arXiv, 2509.11446.
Zhang, J., Chen, Y., Niu, N., Wang, Y. and Liu, C. (2023) ‘Empirical Evaluation of ChatGPT on Requirements Information Retrieval Under Zero-Shot Setting’. arXiv, 2304.12562.

Why Domain Expertise Still Matters in an AI world

Ready to bring structure to AI-enabled delivery?