Murtaza Nasir, Assistant Professor
Department of Finance, Real Estate and Decision Sciences
April 2, 2025
Using LLMs, we quantify how users adapt communication across key dimensions:
Convergence / Divergence | Interpersonal Control / Discourse Management | Interpretability / Emotional Expression |
---|---|---|
Mirroring vs. Contrasting language & tone patterns Measures how closely users match or deliberately differ from others' communication style (Builds rapport? Signals expertise/discord?) |
Managing topics, assertiveness, turn-taking & message flow Examines how users guide conversations and establish roles (Influences interaction dynamics & quality?) |
Clarity, relevance & sharing/matching emotions Focuses on comprehensibility and emotional connection (Essential for understanding & empathy?) |
Can computational analysis reveal distinct communication 'styles' or trajectories associated with positive vs. negative recovery outcomes?
What is the prevalence and specific nature of stigmatizing language within these OUD recovery communities? (Beyond simple presence/absence)
How does the expression or reception of stigma interact with different communication accommodation strategies? (e.g., Does convergence amplify or mitigate stigma's impact?)
"Its my birthday, I'm one month clean, I'm lonely and all I want is a fat shot... feeling awful lonely and isolated... got no one..."
"I feel your pain man, it's a shitty feeling... The struggle's real, but it's worth it... Best of luck and hope you make the choice to stay clean! :)"
"Picked up something special today... [Link] I'm very proud of myself... going to NA for about 5 months now and finally picked this sucker up."
"Congrats on the milestone!"
Example Trajectories: "Rocky but Resolute"
Dataset Overview: Dyads | |
Total Extracted: | 79,297 |
Complete (Post & Reply): | 79,241 |
Dyads Per User | |
Count: | 160 |
Mean: | ~496 |
Median: | 293 |
Std Dev: | ~648 |
Range: | 12 to 4,686 |
Quantifying convergence, divergence, and control patterns in user interactions based on Communication Accommodation Theory.
Identifying primary emotions and their intensity levels within recovery discussions to understand emotional expression patterns.
Detecting stigmatizing language (type, severity), references to relapse/clean time, and other substance use discussions.
# Relevant snippet from the Python function: def analyze_cat(post, reply, subreddit): # ... (truncation code omitted for brevity) ... question = f"""You are an expert analyzing online communication patterns in Reddit posts from r/{subreddit}. Your task is to evaluate how a reply accommodates or differs from the original post using Communication Accommodation Theory (CAT). WHAT YOU'RE ANALYZING: A Reddit post and its direct reply. We want to understand how the replier adjusts their communication style in response to the original poster. SCORING SYSTEM: - Low: Minimal or no evidence of the characteristic - Moderate: Some clear evidence, but not consistently strong - High: Strong and consistent evidence throughout - For message length only: Short (1-2 sentences), Medium (3-5 sentences), Long (6+ sentences) KEY DIMENSIONS TO EVALUATE: 1. CONVERGENCE (How much does the reply match the original post?) • Language Similarity: - HIGH: Uses very similar vocabulary, phrases, or jargon - MODERATE: Some matching language - LOW: Very different vocabulary choices • Tone and Style Matching: - HIGH: Matches the original post's formality/casualness, emotion, and style - MODERATE: Partially matches the tone - LOW: Completely different tone 2. DIVERGENCE (How much does the reply differ from the original post?) • Language Dissimilarity: - HIGH: Deliberately uses different vocabulary and expressions - MODERATE: Some distinct language choices - LOW: Very few distinct language choices • Tone and Style Contrast: - HIGH: Deliberately different emotional tone or style - MODERATE: Somewhat different tone - LOW: Little contrast in tone 3. INTERPERSONAL CONTROL • Topic Management: - HIGH: Introduces new topics or strongly redirects discussion - MODERATE: Some new elements while maintaining original topic - LOW: Strictly follows original topic • Assertiveness: - HIGH: Strong opinions, disagreement, or persuasion attempts - MODERATE: Some opinion expression - LOW: Minimal opinion expression 4. DISCOURSE MANAGEMENT • Message Length: Short/Medium/Long • Coherence and Cohesion: - HIGH: Clear logical flow and strong connections - MODERATE: Generally understandable flow - LOW: Disjointed or unclear connections • Turn-taking: - HIGH: Strong engagement with original post's points - MODERATE: Some engagement - LOW: Minimal engagement 5. INTERPRETABILITY • Clarity: - HIGH: Very clear and well-explained - MODERATE: Generally clear - LOW: Unclear or confusing • Relevance: - HIGH: Directly addresses post content - MODERATE: Somewhat related - LOW: Barely related or off-topic 6. EMOTIONAL EXPRESSION • Emotional Content: - HIGH: Strong emotional language - MODERATE: Some emotional content - LOW: Minimal emotion • Emotional Reciprocity: - HIGH: Strongly mirrors original post's emotions - MODERATE: Some emotional matching - LOW: Different emotional tone Here are the Reddit posts to analyze: Original Post: {truncated_post} Reply: {truncated_reply} Return JSON with this exact structure: {{ "convergence_language_similarity": "low/moderate/high", "convergence_tone_style_matching": "low/moderate/high", "divergence_language_dissimilarity": "low/moderate/high", "divergence_tone_style_contrast": "low/moderate/high", "interpersonal_control_topic_management": "low/moderate/high", "interpersonal_control_assertiveness": "low/moderate/high", "discourse_management_message_length": "short/medium/long", "discourse_management_coherence": "low/moderate/high", "discourse_management_turn_taking": "low/moderate/high", "interpretability_clarity": "low/moderate/high", "interpretability_relevance": "low/moderate/high", "emotional_expression_emotional_content": "low/moderate/high", "emotional_expression_emotional_reciprocity": "low/moderate/high" }}""" return question
Identify & categorize potentially harmful stigmatizing language within the OUD recovery discussions.
Stigma is pervasive and harmful in recovery contexts (Krawczyk et al., 2022), but manually analyzing its nuances at scale is infeasible.
Leveraged LLMs guided by specific prompts to systematically analyze posts and replies for stigmatizing language and personal attacks.
Instructed LLMs to analyze post & reply text based on detailed definitions, performing these key tasks:
Does the text contain stigma? (Yes/No)
Identify specific stigmatizing words/phrases (e.g., "junkie", "clean vs dirty", "choose to use").
Rate the overall stigma level (Minor / Moderate / Severe).
Identify direct personal attacks (Yes/No).
Users frequently anchor discussions in their recovery milestones:
"...doing the spiritual/ finding self... crap for awhile now. 7 years, that's when I started methadone and stuck with it... It's hard to face yourself... Every fault, every weakness... I'm not burying shit no more! Lol
On a better note... i'm on day 4!!! No methadone. Monster almost gone! And to commemorate i'm going to design a back piece..." - Example Post Snippet 1
"Hi there, So I am about 75 days clean from everything. I was shooting heroin, smoking weed everyday, drinking too much... I quit everything 75 days ago. Besides coffee, no mood altering substances...
LSD and other psychedelics have been an important part of my life... I want to be able to continue to use psychedelics... but I am afraid that using any type of substance will jeopardize my entire recovery... Does anyone have experience with this...?" - Example Post Snippet 2
Addressed noisy & sparse 'days' data via an automated pipeline:
This process yields coherent daily timelines and flags the critical 10-day window preceding estimated recovery starts (Zero Day) – key for analyzing relapse risk factors.
Processing reveals the wide spectrum of user experiences:
Exploring CAT dimension distributions in the 10-day window *before* an estimated clean streak begins:
Exploring CAT dimension distributions in the 10-day window *before* an estimated clean streak begins:
Exploring CAT dimension distributions in the 10-day window *before* an estimated clean streak begins:
Exploring CAT dimension distributions in the 10-day window *before* an estimated clean streak begins:
Suggests distinct communication shifts potentially preceding successful recovery initiation (further analysis ongoing).
Highlights common patterns. Further analysis will explore links to communication dynamics and recovery outcomes.
Open for Questions & Discussion