The response to my previous piece on AI fluency in pharma, posted yesterday, was both encouraging and telling. Encouraging, because it’s clear there is real appetite to engage more deeply with this topic. Telling, because many of the follow-up conversations (both internally and externally) pointed to the same underlying reality: we are still early in establishing a shared language.
And as AI moves from experimentation into daily workflows -- across insights, commercial, and medical -- that shared language becomes more important, not less.
At ThinkGen, we’ve been leaning into this. Internally, we are continuing to build fluency across our teams. Externally, we see an opportunity to contribute to the broader dialogue: not by overcomplicating things, but by making them more accessible, more practical, and more grounded in how this technology is actually being used in our industry.
In that spirit, this is a continuation of the first essay -- a second pass at some of the terms that are increasingly showing up in conversations, decks, and vendor proposals. These are not academic definitions. They are working definitions, shaped by how these concepts show up in real-world pharma use cases. Thank you in particular to my colleagues Brian Hull, Himavanth Chandra, MBA and John Capano, PhD for helping me to shortlist the concepts and terms that should be elucidated and defined.
Most of us have now heard of large language models, or LLMs. These are broad, general-purpose models trained on massive datasets, capable of generating text, summarizing information, and supporting a wide range of use cases.
What’s becoming more relevant in pharma, however, is the rise of specialized language models (SLMs). I attended at talk at the PharmaUSA conference where this distinction was called out as something of increasing relevance in our industry.
An SLM is trained -- or fine-tuned -- on a narrower, domain-specific dataset. In our world, that might mean clinical literature, regulatory documents, or brand-specific materials. The trade-off is straightforward: LLMs offer breadth; SLMs offer depth. And in a highly regulated environment like pharma, depth and precision often matter more.
Another pair of terms that are often used interchangeably, but shouldn’t be, are agents and agentic systems.
An agent is a system that can take action toward a goal -- for example, such as retrieving data, generating a report, or triggering a workflow.
An agentic system, on the other hand, refers to a broader capability: systems that can plan, iterate, and execute multi-step tasks with some degree of autonomy. In other words, not just doing a task, but figuring out how to do it.
In pharma, this distinction matters. A single agent might generate an insight. An agentic system might ingest data, analyze trends, draft a report, and recommend actions -- all within a defined framework.
As AI-generated outputs become more commonplace, data transparency becomes critical.
At its core, this means understanding where the data comes from, how it was processed, and what assumptions are embedded within it. In the world of I&A, this certainly is not new -- but AI introduces an added layer of opacity.
Without transparency, it becomes difficult to validate outputs, explain recommendations, or build trust with stakeholders. And in a regulated industry, that lack of clarity is more than an inconvenience -- it’s a risk.
One of the most important concepts to hold onto is human in the loop.
This simply means that a human remains actively involved in reviewing, validating, and guiding AI outputs. Not as an afterthought, but as an integral part of the process.
In practice, this is what separates responsible use of AI from over-automation. Especially in areas like medical affairs, market research, and patient engagement, human judgment remains foundational.
A frequent concern we hear -- and rightly so -- is around data privacy.
A secure, siloed system refers to a digital environment where a client’s data is isolated, protected, and not used to train broader models or merged across use cases. In other words, your data stays your data.
For pharma companies handling sensitive commercial and patient information, this is not just a technical feature. It is a prerequisite for adoption.
AI can generate outputs quickly. That doesn’t mean those outputs are correct.
Data validation is the process of ensuring that inputs and outputs are accurate, consistent, and fit for purpose. In traditional analytics, this has always been a core discipline. With AI, it becomes even more crucial, because errors can be subtle, and the confidence of AI can be compelling - but misleading.
The speed of AI does not eliminate the need for rigor. If anything, it increases it.
As AI becomes increasingly embedded in workflows, organizations throughout our ecosystem require clear frameworks for how it is used.
AI governance encompasses the policies, standards, and oversight mechanisms that guide responsible use. This includes everything from model selection and validation to data privacy, auditability, and accountability.
In pharma, governance is not a barrier to innovation. It is what makes innovation sustainable.
Behind many AI applications is a less visible, but equally important concept: data orchestration.
This refers to how data is gathered, integrated, and made available across systems. It’s what links CRM data, claims data, marketing research, and external sources into something usable.
Without orchestration, AI operates in fragments. With it, AI can start to generate more holistic, actionable insights.
Finally, there is the concept of a data lake: a centralized repository that stores large volumes of structured and unstructured data. Many of the technical slides presented at PharmaUSA featured data lakes.
Think of this as the raw material layer. AI systems draw from this foundation, but the quality, organization, and accessibility of that data will ultimately shape the quality of the outputs.
A sophisticated model cannot compensate for poorly structured or incomplete data.
If there is a common thread across all of these concepts, it is this: understanding them changes how we use AI and what we can expect AI to do.
Not in a theoretical sense, but in very practical ways: how we evaluate vendors, how we design workflows, how we interpret outputs, and how we manage risk.
At ThinkGen, we see this as an ongoing journey. Building fluency is not a one-time exercise. It’s a capability that will continue to evolve alongside the technology itself.
And as with the previous essay, I’d be very interested to hear from others.
What terms are you encountering more frequently? What concepts are still unclear? And how are you building fluency within your teams?
More to come.