AI is rapidly reshaping marketing research. Synthetic personas, digital twins, and LLM-generated “respondents” are being positioned as the next leap forward: faster insights, lower cost, reduced participant burden, and research at scale.
Some of the early evidence is promising. Some frameworks are genuinely useful. And for certain mainstream use cases, synthetic respondents can absolutely help teams move faster. But there’s a critical blind spot that deserves far more attention:
What happens when these tools are used to research communities that are already under-resourced, hard to reach, and historically harmed by extractive research – especially in a moment when fear is rising and trust is fraying? Because in those contexts, the limitations aren’t footnotes. They’re the headline.
The core problem: synthetic personas optimize for “plausible,” not “true”
Most synthetic persona approaches assume something like this:
If you give an LLM the right demographic, psychographic, and behavioral data, it will generate credible responses. But credibility is not the same as truth.
Joy Buolamwini puts it plainly: “Whoever codes the system embeds her views. Limited views create limited systems. Let’s code with a more expansive gaze.” When synthetic respondents are trained on incomplete or majority-skewed data, they don’t just generate “responses.” They reproduce the worldview embedded in the system.
In equity-focused research, public health, DEI, immigrant communities, communities with histories of institutional harm, people’s behavior is not simply a function of demographics. It’s a function of:
-
- lived experience
- risk
- distrust (earned, not imagined)
- cultural context
- survival strategy
- power dynamics
- what cannot safely be said out loud
A synthetic persona can sound coherent and still miss the actual signal.
Worse: it can sound confident while being wrong.
The lexicon problem is deeper than vocabulary
Language in underrepresented communities doesn’t behave like language in corporate research decks. Real communication in these spaces includes:
-
- coded language and cultural references that don’t show up in training data
- skepticism rooted in real harm (not just “low engagement”)
- nonlinear storytelling that carries truth but doesn’t map cleanly to survey logic
- informal decision-making shaped by constraints: time poverty, money, access, trust, safety
- selective disclosure (e.g. what people choose not to say because of risk)
If the model can’t “hear” those patterns, it will produce outputs that look clean but are fundamentally miscalibrated and wrong.
The data problem: absence of evidence isn’t evidence of absence
AI research tools depend on data. That sounds obvious, but here’s what’s easy to miss: the most important truths about marginalized and hard-to-engaged populations are systematically under-documented and have relatively small lexicons of real thoughts, information, and writings.
This includes:
-
- informal economies and survival strategies
- mistrust shaped by historical and ongoing harm
- access barriers: transportation, childcare, digital divides
- fear of surveillance, deportation, retaliation, or consequences
- discrimination in healthcare, finance, housing, and education
So when an AI model “fills gaps,” it doesn’t fill them neutrally. It fills them using patterns from majority populations, or worse, stereotypes disguised as insight.
And because these tools scale through reuse, the risk compounds. As Buolamwini warns: “If the code being reused has bias, this bias will be reproduced by whoever uses the code for a new project or research endeavor.” In equity-focused research, that reproduction isn’t abstract, it can become policy, funding, messaging, and program design.
The last 18 months changed the research math: fear is up
Even without AI, many communities are more cautious right now. That means “participant burden” isn’t only about time. It’s about safety.
When fear rises:
- people avoid institutions
- people reduce disclosure
- people change behavior to minimize risk
- participation becomes selective
- responses become protective, inconsistent, or strategically vague
If your method assumes people will answer freely and consistently, you’re already off track.
This is why “socially desirable answers” is not a minor caveat
One of the most common failure modes of synthetic personas is that they generate “respectable” responses. They produce the answer that “sounds appropriate.”
But many communities do not operate on “what sounds appropriate.” They operate on:
-
- what keeps you safe
- what avoids attention
- what prevents judgment
- what preserves dignity
- what doesn’t trigger institutional consequences
So when synthetic respondents drift toward socially desirable outputs, the model isn’t just inaccurate. It is systematically biased toward the perspective of people who can afford to be candid.
The incentive trap: efficiency becomes extraction
The ethical risk AI marketing research often ignores is that synthetic personas can become a way to avoid the hard, and necessary work of earning trust. When companies adopts these tools, the pressure becomes:
-
- skip relationship-building
- treat validation as optional
- accept “good enough” accuracy
- make decisions about communities without their voice
That isn’t a technical problem. It’s an extraction model problem that cannot just be updated with better UX. And for communities that have been researched-on rather than researched-with, it repeats a familiar harm.
What most AI research frameworks miss on equity work
Many AI-driven research roadmaps are solid for mainstream market research. But for under-resourced and underrepresented communities, they often skip critical steps:
1) No community leadership in design
Who sets the questions? Who defines what “truth” looks like? Who interprets outputs? If it’s not grounded in the community closest to the work, bias will go unchecked.
2) No cultural humility metrics
Frameworks measure correlation and consistency, but not whether the model understands context, nuance, or lived experience.
3) No acknowledgment of historical harm
These communities are often over-researched and under-compensated. Synthetic respondents can feel like another layer of extraction.
4) No guidance on when not to use AI
Sometimes the slower path, real conversations with real people, fairly compensated is the only ethical choice.
A quick note for my market research colleagues
We know the response that many of you may have:
“Just prompt better.”
“Add a few demographic variables.”
“Use better synthetic training data.”
“Run a quick validation study.”
But that’s not how trust works. And it’s not how fear works. And it’s definitely not how communities who have been harmed decide whether to show up, speak freely, or opt out entirely. If anything, the need for a niche, community-trust-based approach isn’t becoming obsolete. It’s becoming non-negotiable.
Because when shortcuts backfire, the people with credibility in these communities are the ones left rebuilding what got broken, explaining why the data was wrong, why the model missed the signal, and why participation collapsed.
That’s not a threat. It’s just math.
You can’t extract efficiency from trust and expect the relationship to hold.
A more honest path forward (solutions)
If you’re going to use synthetic personas in equity-focused research, the bar must be higher than “It sounds plausible.” At minimum this approach must:
· Treat synthetic personas as hypothesis generators — not substitutes
Use them to pressure-test assumptions and map what you need to learn from humans.
· Require community-grounded calibration
If you don’t have real, recent qualitative input from the community, synthetic output is guesswork.
· Measure failure modes, not just averages
“88% accurate” can still mean the 12% error is concentrated in the exact groups you claim to serve.
· Build cultural humility checks into the workflow
Ask: does this reflect lived experience or what the model thinks is socially acceptable?
· Pay people and protect them
If fear shapes participation, then privacy, consent, and compensation aren’t operational details – they are the research.
· Be explicit about when not to use AI
If the stakes are high (health, safety, legal status, discrimination), the cost of being wrong is too high for synthetic shortcuts.
Bottom line
AI will transform marketing research. But transformation isn’t always progress. For under-resourced and hard-to-engage communities, the gap between synthetic personas and real people isn’t just technical, it’s cultural, historical, and ethical. Until these systems can reliably capture nuance, context, and the lived experience of marginalized communities, the safest and most respectful approach is simple:
Let AI accelerate the work around humans and not replace the humans the work is supposed to serve.
Sources for quotes used • Joy Buolamwini, InCoding — In The Beginning Was The Coded Gaze (MIT Media Lab / Medium): https://medium.com/mit-media-lab/incoding-in-the-beginning-4e2a5c51a45d





