Definition. AI in HR software refers to capabilities driven by machine learning models (predictive ML), large language models (generative AI) or autonomous reasoning systems (agentic AI), embedded within HR applications such as HRIS, talent acquisition, learning, performance or workforce analytics platforms. The category covers everything from a recommendation engine flagging at-risk employees, to a chatbot answering policy questions, to an agent autonomously screening candidates.

TL;DR

  • Start with the problem, not the AI.
  • Apply standard triage - hygiene, market standard, differentiating.
  • Test with data, not demos.
  • Govern AI on your side; vendors cannot give you governance.
  • Contract every AI commitment before signature, not after.

What you’ll learn

Start with the problemIf you’re starting from ‘we need AI’, you’re already evaluating wrong.
Triaging AI capabilitiesMost AI is market standard, not differentiating. Classify before vendors influence you.
Value-driven decision makingWithout a value case, AI is unaffordable at scale.
Data testsThe single most useful evaluation instrument for AI. Design to neutralise vendor pushback.
AI governanceA buyer-side responsibility. Vendors can’t give you governance.
Capture commitments contractuallyIf it isn’t in the contract, it doesn’t exist.
Platform lock-inAI deepens lock-in. Contract for the exit while you have leverage.
After go-liveEvaluation doesn’t stop at signature. Measure post-launch.

Start with the problem, not the AI

The most common mistake I see buyers make in 2026 is the same mistake they were making in 2018, just with louder marketing: letting vendor capabilities define the solution. AI amplifies the problem because the technology is novel, the demos are compelling and the temptation to anchor a procurement around ‘we need AI’ is strong.

In my book I describe the three root causes of HR technology failure: the technology solves the wrong problem, the wrong vendor is chosen and the implementation is botched. An UNLEASH study found that 42% of HR tech implementations had failed or underperformed two years after installation1, and PwC reports that 36% of buyers are likely to switch vendors at contract renewal2. AI doesn’t change those patterns. If anything, it accelerates them. A vendor with an impressive AI demo and a thin product can take a buyer further off course, faster, than a vendor with no AI at all.

The right starting point is design thinking: empathising with users, defining the real problem, ideating solutions and only then asking whether AI has a useful role to play. Phase A of the selection process (‘Know What You Want’) is the first defence against AI-led procurement.

Once you’ve decided you need AI, that decision sits inside a problem you can measure success against. You no longer evaluate ‘AI capability’ in the abstract. You evaluate whether each vendor’s AI solves your problem better than the alternatives, including doing nothing.

What “AI in HR software” actually means

The term is used loosely. For evaluation purposes, it helps to distinguish three categories:

Predictive machine learning. Models trained on historical data to score, classify, or predict. Examples: attrition risk scoring, candidate match scoring, anomaly detection in payroll. Mature technology, well-understood evaluation methods.

Generative AI. Large language models producing text, summaries, or structured outputs. Examples: chatbot interfaces, automated job description generation, performance review summaries, conversational policy lookup. Rapidly evolving; evaluation methods less standardised.

Agentic AI. Systems that combine generative models with tools and the ability to take actions autonomously. Examples: an agent that screens candidates, schedules interviews and drafts rejection emails. The newest category; vendor offerings range from real to vapourware.

Autonomous action is qualitatively different from generative assistance. Agentic systems introduce challenges that don’t arise with summarisation or recommendation: orchestration complexity, auditability of chained decisions, prompt drift over time, tool-permission and security boundaries and failure containment when the agent acts on incorrect reasoning. In HR specifically, autonomous action creates exposure to employment law risk, discrimination claims, opaque accountability and reputational damage. I’d treat agentic AI claims with substantially more scrutiny than other AI categories, particularly any agent that makes or executes decisions affecting employment, pay or progression.

Each category has different risk profiles, different evaluation techniques and different regulatory implications. Most ‘AI’ labels in HR vendor marketing today cover predictive ML or generative AI; agentic AI claims warrant the most scrutiny.

You’ll also encounter two architectural patterns: embedded AI (the vendor’s own model, trained on their data, integrated natively into the product) and bolt-on AI (a third-party model, typically from OpenAI, Anthropic or Google, accessed via API). Neither is better in the abstract, but the questions you ask vary by pattern. A bolt-on solution means you’re also evaluating the vendor’s API supplier, indirectly.

The vendor sales paradox in the age of AI

In the preface of my book I quote an industry veteran: ‘If you understand nothing else when selecting software, understand that software vendors are incented to say "yes". Very few will flat-out lie to you; however, if there is any possibility of a "yes", you will get a "yes".’ That observation has aged well. Vendor sales teams are skilled in reading buyers, not in domain expertise. Apparently, Salesforce discovered that their best salespeople come from the car sales world. I’ve sat through hundreds of vendor demos and witnessed many very polished presentations. In the AI era, this dynamic intensifies.

‘AI-washing’ is the practice of presenting non-AI capabilities as AI, or basic AI as proprietary advanced AI. It takes several common forms:

The asymmetry is greater than with traditional features. With a workflow engine, you can usually tell from a demo whether the capability exists. With AI, the technology obscures rather than reveals: a confident answer from a chatbot doesn’t tell you whether the underlying model is good, accurate or even consistent.

The defences are the same as for any other capability, but applied more rigorously: insist on shipped product, demonstrated against your data, with commitments written into the contract.

Anything that isn’t in the shipped product or written into the contract should not be scored.

Triaging AI capabilities: hygiene, market standard or differentiating?

Buyers routinely over-score AI as a category. They treat it as inherently differentiating when, in most cases, it isn’t. In my book I argue strongly for triaging requirements into three groups, each treated very differently during evaluation. The same discipline applies to AI capabilities.

‘Hygiene’ requirements are pass/fail. For AI specifically, hygiene includes: data residency and sovereignty, bias governance and audit, model risk classification, GDPR compliance for automated decision-making (Article 22), security of training and inference data and EU AI Act conformity for high-risk HR systems. Fail any of these and the vendor is eliminated regardless of other factors.

‘Market standard’ AI capabilities are what leading vendors routinely offer: intelligent search, summarisation, basic conversational interfaces, predictive scoring on common HR signals. Many first-generation AI features are commoditising at feature level, but execution quality, integration depth and operational usefulness still vary materially between vendors. Two vendors may both claim ‘AI summarisation’ while one is transformative and the other is barely usable. Confirm market-standard features exist, then assess execution quality without treating them as differentiators.

‘Differentiating’ AI capabilities are where scoring matters. These are the AI features that, for your specific problem and context, deliver materially different value across vendors. Examples vary by use case but might include: domain-specific model accuracy on your data, agentic workflow that genuinely removes a process step, integration of AI with your data lake or a unique training approach.

The discipline is in correctly classifying. Most vendor AI marketing positions market-standard features as differentiating. Do the classification yourself, before vendors get to influence you.

Evaluating AI across the six capability areas

In my book I describe six capability areas every HR tech evaluation should cover. AI sits inside this structure, not alongside it.

Functional. What problem does the AI solve, and does it solve it in your context? Functional evaluation for AI starts with use case fit. A model that achieves 92% accuracy in the vendor’s benchmark may collapse to 60% on your data. Ask: what is the AI’s job, and how will we know it’s done it?

User experience. How does AI surface in the flow of work? Is it explainable to the user? What is the human override path? An AI feature that doesn’t show users why it made a recommendation won’t be trusted, and untrusted features don’t get used.

Technical. What model is it: proprietary, fine-tuned or third-party? Where does inference happen? What customer data is used for training versus inference? What integration patterns are supported? For bolt-on AI, the technical evaluation extends to the API supplier’s terms.

Service delivery. How often is the model updated? Are you notified before changes? What happens when the AI gets it wrong, and what is the support escalation path? Model regressions are a real risk: the AI that worked for you in month one may behave differently in month nine.

Commercial. AI introduces pricing models that didn’t exist five years ago: token-based, query-based, agent-based, capacity-based. Understand the unit economics. A pilot priced at ‘free for the first 10,000 queries’ becomes a different conversation at scale.

Implementation. What does ‘go-live with AI’ actually mean? Many AI features require customer data to be useful. Plan for the data preparation, governance setup and bias monitoring required before the AI delivers value. Implementation readiness work should begin during Phase C, not after contract signature.

Value-driven decision making

Evaluation only matters if you can value what you’re evaluating. The right vendor is the one that drives the best return, not the one with the highest score on a generic capability matrix and not the one with the most impressive AI. Without a defensible value case, scoring becomes subjective and high-cost AI features get over-rewarded simply because they’re visible.

A value driver tree is the structuring instrument I recommend, mapping strategic objectives, benefits, value drivers, metrics and solution capabilities. Built properly in Phase A, it tells you which AI capabilities matter to your value case, what targets they need to hit and which vendors deliver them.

AI consumption pricing sharpens this discipline considerably. Token-based, query-based or agent-based pricing introduces variable costs that scale with adoption: precisely the scenario your value case has to anticipate. A vendor priced at ‘free for the first 10,000 queries’ looks rather different at 100,000. If your value driver tree doesn’t connect AI usage to business benefit, you can’t reason about whether consumption costs are justified. The flip side is that consumption metrics also make benefit attribution easier: if you can count queries, you can count value per query. Both halves of the ROI calculation become more measurable.

Without a value case, AI is unaffordable at scale. With one, AI becomes a calculable bet.

The decisive technique: data tests

AI brings a problem that traditional software doesn’t: even its creators often can’t fully explain how it works. Anthropic’s own research on large language models (Tracing Thoughts in Language Models) acknowledges that the internal mechanisms remain stubbornly opaque even to the people who built them. For HR, where you might one day need to explain to a rejected candidate or a passed-over employee why the AI said no, that opacity matters. ‘The AI said no. We’re not entirely sure why, but we trust it’ isn’t a conversation that goes well.

Data tests are the most useful way around the black box. They give the vendor a known dataset, ask them to process it with their AI and compare outputs across vendors against criteria defined in advance. Better than vendor demos. More revealing than RFP responses. More practical than POCs in early evaluation.

Designing a good data test:

Used well, data tests cut weeks out of evaluation and remove the influence of vendor demo polish.

Designing around vendor pushback

Vendors push back on data tests, and not always unreasonably. They’re protecting IP exposure, sales cycle timing and demo control. The trick is to design the test in a way that neutralises their concerns: narrow the scope (two scenarios, four hours of their time), use their own sandbox or pre-prod environment with synthetic or anonymised data, set the test as an RFP entry condition rather than a late ask and pre-clear data handling via NDA. Phase the depth as well: scripted demos for all vendors, data tests for the shortlist, POCs for the preferred vendor only. Each stage costs vendors more, so only the serious survive. Position the test as standard enterprise AI procurement practice, not a bespoke favour.

If a vendor refuses every form of data validation, that’s your test result. Treat it as a hygiene failure, not an inconvenience. Customer references who ran their own validation are a workable substitute when direct testing fails: less rigorous, but better than the demo alone.

Supporting evidence for validation, audit and testing

Buyer-side AI data tests are rarely public, because enterprise procurement is confidential by nature. But the case for systematic validation, audit and testing of AI in HR is well-supported by adjacent evidence across four categories:

None of these is a clean precedent for buyer-side data testing during selection. They support the underlying principle rather than the specific practice. The absence of named enterprise case studies is itself a finding: most buyers who run rigorous tests don’t publicise them.

POCs and pilots for immature AI features

For genuinely novel AI capabilities, agentic workflows in particular, data tests may not be enough. The next step is a proof of concept (POC) or pilot.

A POC is a cut-down version of the solution, with limited configuration and test data, running outside production. It lets buyers experience the AI hands-on with their own people and processes, without committing to deployment.

A pilot is a cut-down version of the production system, with real users, live data and some integrations. Pilots are typically run with one vendor only and follow vendor selection.

Both are time and resource intensive, and both carry a specific trap: POCs that drift into production without proper due diligence on hygiene requirements. I’ve seen this happen more than once. If you run a POC, run it deliberately, with success criteria, a clear end date and a decision rule that returns you to the formal selection process at the end.

AI governance: a buyer-side responsibility

Responsible AI isn’t something the vendor delivers. It’s something the buyer governs. Organisations using AI in HR should develop governance, policies and guardrails specific to HR applications, ideally before vendor selection.

At minimum, your AI governance should cover:

This list assumes an organisation with the capability to design and operate these controls. In practice, most HR functions, procurement teams and legal departments are still building their AI maturity. Acknowledging this honestly is part of buyer-side governance. Many organisations will need external support, whether through internal AI committees, external counsel or specialist advisors, to establish proportionate controls. The governance challenge is organisational as much as technical.

Governance work belongs in Phase A and Phase E of the SelectionWise method. Define it before procurement, and have it operational before the AI goes live. Vendors will help with conformance documentation but they can’t give you governance. That’s yours.

Employee buy-in, adoption and change management

AI adoption in HR is as much a change management challenge as a technology one. Many AI failures in HR won’t be technical failures. They’ll be buy-in failures, adoption failures, cultural failures or industrial relations failures. The risk concentrates in the use cases with the highest stakes for individual employees: performance management, internal mobility, workforce planning and recruitment scoring.

Stakeholder management here is broader than the buying team. Ask: can the AI explain its outputs in language an affected employee would understand and accept? What happens when an employee challenges an AI recommendation? Where there are unions or works councils, have they been engaged on the proposed use cases? In some jurisdictions, that engagement is a legal precondition, not good practice.

Perceived fairness matters as much as measured fairness. An AI tool that’s technically unbiased but feels opaque to employees will erode trust and harm adoption. Build employee transparency into the selection criteria, not as a compliance afterthought.

Regulatory snapshot

AI regulation affecting HR is one of the fastest-moving regulatory areas in technology. The named laws below are correct at the time of writing, but the picture changes quickly. Treat this as a snapshot, not a current legal position.

This section is a snapshot. The picture changes faster than this page can be kept current. Confirm the position with specialist counsel before acting on any specific obligation.

Capture AI commitments contractually

I’ve written elsewhere about how much of what gets demonstrated and promised during a sales process is contractually invalid: presales information is normally deemed inadmissible, and vendors typically resist incorporating RFP responses as binding. For AI capabilities specifically, this is a particularly expensive gap.

AI-specific commitments to capture in the contract:

Don’t accept ‘we’ll send you a notice’ as a substitute for contractual commitments. AI moves quickly. Contracts last five years.

AI and platform lock-in

Modern HR platforms are no longer single applications. They’ve become data layer, workflow layer, AI layer and orchestration layer combined. AI accelerates the depth of lock-in because adoption embeds the platform into daily operational behaviour in ways that previous SaaS lock-in didn’t.

Switching costs increase as the following accumulate inside a platform: configured prompts and prompt libraries, custom automations, AI workflow chains, embedded copilots, agent permissions, training feedback loops and the muscle memory of users who’ve learned the AI’s quirks. Replacing the platform replaces all of it.

In contract negotiation, push for portability commitments specific to AI:

Buyers who treat AI as a product feature rather than an embedded layer will be surprised by switching costs in five years. The discipline now is to contract for the exit while you still have leverage.

How this fits the SelectionWise method

AI evaluation runs across the full SelectionWise lifecycle. The toolkit provides the templates, checklists and AI accelerators to operationalise it at each phase.

A quick note on AI on both sides of the table. AI isn’t just what you’re buying; it can also be a tool that helps you buy well. It can generate value driver trees, draft RFP documents from your requirements, analyse vendor responses and summarise reference calls. AI evaluation and AI-assisted evaluation are two sides of the same selection.

After go-live: operational measurement

AI evaluation doesn’t end at contract signature. The framework needs operational measurement to confirm that the AI is delivering the value case and behaving as expected in production. Set these up before go-live, not after.

Build the measurement plan during Phase B. The metrics you need post go-live are the metrics you should be using to evaluate vendors during selection.