Introduction Methodology Findings Conclusion

The Language of Recovery: Modeling Communication Dynamics and OUD Recovery with LLMs

Murtaza Nasir, Assistant Professor
Department of Finance, Real Estate and Decision Sciences

April 2, 2025

The Context: OUD Crisis & Online Support

The Opioid Use Disorder (OUD) crisis remains a critical public health challenge (e.g., CDC, 2024).
Many face barriers like lack of trust or access in traditional healthcare settings (Olsen & Sharfstein, 2014; Krawczyk et al., 2022).
Online Communities (e.g., Reddit's r/OpiatesRecovery) provide vital peer support and relatable experiences (Chancellor et al., 2019).

The Challenge & Our Approach

Understanding communication dynamics & their impact in these large, nuanced online forums is difficult.
Large Language Models (LLMs) enable fine-grained analysis of communication style, emotion, and stigma at scale (Goel et al., 2024).
Our Goal: Leverage LLMs to model these dynamics and link them to OUD recovery trajectories.

The Digital Lifeline: OHCs & OUD Support

Crucial online spaces for peer-to-peer connection and shared lived experience in recovery.
Offer anonymity & accessibility, reducing stigma often felt in traditional settings (e.g., Krawczyk et al., 2022; Naslund et al., 2016).
Provide vital social support, fostering understanding and a sense of belonging crucial for OUD recovery (e.g., Chancellor et al., 2019).

- Briefly introduce OHCs as important support systems. - Highlight the key benefits for individuals with OUD: anonymity, less judgment, easy access. - Emphasize the social support aspect which is core to recovery. - Citations: - Krawczyk, N., Cerdá, M., Perrone, J., & Dhavan, P. (2022). Stigma and substance use disorders: a systematic review of reviews. *The Lancet Psychiatry*, 9(10), 815-838. (Or a similar one on stigma in care) - Naslund, J. A., Aschbrenner, K. A., Marsch, L. A., & Bartels, S. J. (2016). The future of mental health care: peer-to-peer support and social media. *Epidemiology and psychiatric sciences*, 25(2), 113-122. (General OHC benefits) - Chancellor, S., Nitzburg, G., Hu, A., Zampieri, F., & De Choudhury, M. (2019). Discovering the communication needs of adolescents on Reddit: Analysis of an online mental health community. *Journal of medical Internet research*, 21(4), e11888. (Specific example of OHC analysis)

Theoretical Lens: Communication Accommodation Theory (CAT)

We inherently adapt our communication style (language, tone, pace) to others (Giles & Ogay, 2007).
Motivations: Manage social distance, gain approval, increase understanding, build rapport.
Provides a framework to analyze how support is exchanged (or hindered) in OUD recovery forums.

Analyzing Communication Styles: Key CAT Dimensions

Using LLMs, we quantify how users adapt communication across key dimensions:

Convergence / Divergence	Interpersonal Control / Discourse Management	Interpretability / Emotional Expression
Mirroring vs. Contrasting language & tone patterns Measures how closely users match or deliberately differ from others' communication style (Builds rapport? Signals expertise/discord?)	Managing topics, assertiveness, turn-taking & message flow Examines how users guide conversations and establish roles (Influences interaction dynamics & quality?)	Clarity, relevance & sharing/matching emotions Focuses on comprehensibility and emotional connection (Essential for understanding & empathy?)

Convergence / Divergence

Interpersonal Control / Discourse Management

Interpretability / Emotional Expression

Mirroring vs. Contrasting language & tone patterns

Measures how closely users match or deliberately differ from others' communication style

(Builds rapport? Signals expertise/discord?)

Managing topics, assertiveness, turn-taking & message flow

Examines how users guide conversations and establish roles

(Influences interaction dynamics & quality?)

Clarity, relevance & sharing/matching emotions

Focuses on comprehensibility and emotional connection

(Essential for understanding & empathy?)

Guiding Research Question

Central Focus: How does the way people communicate in online recovery forums influence their journey with OUD?
LLM-Powered Lens: We use Large Language Models to precisely measure:
- Communication Accommodation (CAT) strategies
- Emotional expression & tone
- Presence and nature of Stigma
- Relapse-related narratives
The Link: Do these measured communication features correlate with, or potentially predict, self-reported recovery outcomes (clean days, relapse events) on Reddit?

Further Explorations

Discovering Patterns

Can computational analysis reveal distinct communication 'styles' or trajectories associated with positive vs. negative recovery outcomes?

Quantifying Stigma

What is the prevalence and specific nature of stigmatizing language within these OUD recovery communities? (Beyond simple presence/absence)

Interaction Effects

How does the expression or reception of stigma interact with different communication accommodation strategies? (e.g., Does convergence amplify or mitigate stigma's impact?)

- Broaden the scope slightly beyond the primary influence question. - RQ1: Highlight the exploratory nature – looking for *unknown* patterns or user clusters based on communication. - RQ2: Focus on the *details* of stigma – not just if it's there, but what *kind* and how common. Reference the work by Krawczyk et al. (2022) verbally if needed, as it covers stigma systematically. - RQ3: Introduce the idea of interactions – communication isn't simple; how do factors like stigma and accommodation influence each other? This adds complexity and nuance. - These questions often follow from the primary one and can lead to richer interpretations or future work. - Citation Note: Krawczyk, N., Cerdá, M., Perrone, J., & Dhavan, P. (2022). Stigma and substance use disorders: a systematic review of reviews. *The Lancet Psychiatry*, 9(10), 815-838. (Mention this if discussing the importance of studying stigma).

Data Source: Tapping into Reddit's Recovery Ecosystem

Primary Source: Reddit, targeting key OUD subreddits (esp. r/OpiatesRecovery).

Subreddit Distribution: OpiatesRecovery dominates, followed by Others, AskReddit, suboxone, etc.

Rich Data: Authentic Lived Experiences

These forums offer a rich repository of authentic, user-generated text capturing lived experiences and support exchange (Saha et al., 2019).

Example 1: Support Seeking

Comment

"Its my birthday, I'm one month clean, I'm lonely and all I want is a fat shot... feeling awful lonely and isolated... got no one..."

Response

"I feel your pain man, it's a shitty feeling... The struggle's real, but it's worth it... Best of luck and hope you make the choice to stay clean! :)"

Example 2: Milestone

Comment

"Picked up something special today... [Link] I'm very proud of myself... going to NA for about 5 months now and finally picked this sucker up."

Response

"Congrats on the milestone!"

Visualizing Recovery Journeys: Longitudinal Data

Reddit data offers potential for longitudinal analysis by following users' interactions and self-reported status over time (e.g., Chancellor et al., 2019).

Example Trajectories: "Rocky but Resolute"

User timeline: bagzplz - rocky but resolute

User timeline: DobusPR - rocky but resolute

Data Scope & Unit of Analysis

Identified N=160 unique OUD users participating in recovery discussions. (e.g., -negative_creep-, 1Darkgirl, 40box, 4ChanTheHackerMan, 4benny2lava0, ...)
Collected interactions spanning from July 2012 to January 2023.
Unit of Analysis: Post-reply dyads nested within individual user timelines.

Dataset Overview: Dyads
Total Extracted:	79,297
Complete (Post & Reply):	79,241

Dyads Per User
Count:	160
Mean:	~496
Median:	293
Std Dev:	~648
Range:	12 to 4,686

- "Quantifying the scope..." - Point 1 (Fragment 0): N=160 users. - Point 2 (Fragment 1): Timeframe July 2012 - Jan 2023. - Point 3 (Fragment 2): Unit of analysis: post-reply dyads. - Stats Table (Fragment 3): Present the key dyad counts and per-user stats. Note the high mean but lower median, indicating skew. - Histogram (Fragment 4): "This distribution of interactions per user confirms the skewness seen in the stats - most users have hundreds of interactions, but a few 'power users' have thousands." (Point to the long tail). - Transition: "This large, longitudinal, but variable dataset provides the foundation..."

Methodology: Harnessing LLMs for Text Analysis

Utilized locally hosted Large Language Models (LLMs) for deep, nuanced analysis of post-reply interactions. (Leveraging models like Llama, Qwen)
Engineered detailed prompts ensuring structured JSON output. (Crucial for consistency & scale)
This structured, prompt-driven approach provides methodological control over the feature extraction process (cf. Wei et al., 2022 on prompt engineering).

- Point 1: We used LLMs locally to analyze the conversations deeply. Mention the specific models if you like. - Point 2: Emphasize the JSON output. This wasn't just asking the LLM questions; we forced it into a machine-readable format. Key for reliable data. - Point 3: This highlights the rigor – we controlled the LLM's output via careful prompt design. Reference the general idea of prompt engineering benefits if comfortable (Wei et al., 2022 - Chain-of-Thought prompting; shows thoughtful prompting matters). - Citation Note: Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., ... & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. *Advances in Neural Information Processing Systems*, 35, 24824-24837. (Use this to support the idea that careful prompting improves LLM performance/control, even if you didn't use CoT specifically for *all* tasks).

Methodology: Key Features Extracted via LLMs

🗣️

Communication Accommodation

Quantifying convergence, divergence, and control patterns in user interactions based on Communication Accommodation Theory.

🎭

Emotional Landscape

Identifying primary emotions and their intensity levels within recovery discussions to understand emotional expression patterns.

🩹

Stigma & Recovery Markers

Detecting stigmatizing language (type, severity), references to relapse/clean time, and other substance use discussions.

Methodology: The Data Analysis Pipeline

Data Analysis Pipeline Flowchart showing Reddit Data -> Python Processing -> LLM -> JSON -> Database

- This slide visually outlines our automated data processing workflow. - Explain the key stages shown in the diagram: 1. Raw text data ingestion from Reddit. 2. Orchestration via Python scripts (using libraries like pandas, sqlalchemy, openai client). 3. Interaction with local LLM endpoints for analysis based on our structured prompts. 4. Generation of structured JSON results capturing the desired features. 5. Storage of these structured features in a PostgreSQL database, ready for modeling. - This automated pipeline was essential for handling the scale and complexity of the Reddit data reliably.

LLM Feature Engineering - Example: CAT Analysis

Result: Rich, quantifiable features characterizing interaction dynamics.

Prompt Structure & Definitions:

 # Relevant snippet from the Python function:
 def analyze_cat(post, reply, subreddit):
     # ... (truncation code omitted for brevity) ...
     question = f"""You are an expert analyzing online communication patterns in Reddit posts from
     r/{subreddit}. Your task is to evaluate how a reply accommodates or differs from the original post
     using Communication Accommodation Theory (CAT).

     WHAT YOU'RE ANALYZING:
     A Reddit post and its direct reply. We want to understand how the replier adjusts their
     communication style in response to the original poster.

     SCORING SYSTEM:
     - Low: Minimal or no evidence of the characteristic
     - Moderate: Some clear evidence, but not consistently strong
     - High: Strong and consistent evidence throughout
     - For message length only: Short (1-2 sentences), Medium (3-5 sentences), Long (6+ sentences)

     KEY DIMENSIONS TO EVALUATE:

     1. CONVERGENCE (How much does the reply match the original post?)
         • Language Similarity:
             - HIGH: Uses very similar vocabulary, phrases, or jargon
             - MODERATE: Some matching language
             - LOW: Very different vocabulary choices

         • Tone and Style Matching:
             - HIGH: Matches the original post's formality/casualness, emotion, and style
             - MODERATE: Partially matches the tone
             - LOW: Completely different tone

     2. DIVERGENCE (How much does the reply differ from the original post?)
         • Language Dissimilarity:
             - HIGH: Deliberately uses different vocabulary and expressions
             - MODERATE: Some distinct language choices
             - LOW: Very few distinct language choices

         • Tone and Style Contrast:
             - HIGH: Deliberately different emotional tone or style
             - MODERATE: Somewhat different tone
             - LOW: Little contrast in tone

     3. INTERPERSONAL CONTROL
         • Topic Management:
             - HIGH: Introduces new topics or strongly redirects discussion
             - MODERATE: Some new elements while maintaining original topic
             - LOW: Strictly follows original topic

         • Assertiveness:
             - HIGH: Strong opinions, disagreement, or persuasion attempts
             - MODERATE: Some opinion expression
             - LOW: Minimal opinion expression

     4. DISCOURSE MANAGEMENT
         • Message Length: Short/Medium/Long

         • Coherence and Cohesion:
             - HIGH: Clear logical flow and strong connections
             - MODERATE: Generally understandable flow
             - LOW: Disjointed or unclear connections

         • Turn-taking:
             - HIGH: Strong engagement with original post's points
             - MODERATE: Some engagement
             - LOW: Minimal engagement

     5. INTERPRETABILITY
         • Clarity:
             - HIGH: Very clear and well-explained
             - MODERATE: Generally clear
             - LOW: Unclear or confusing

         • Relevance:
             - HIGH: Directly addresses post content
             - MODERATE: Somewhat related
             - LOW: Barely related or off-topic

     6. EMOTIONAL EXPRESSION
         • Emotional Content:
             - HIGH: Strong emotional language
             - MODERATE: Some emotional content
             - LOW: Minimal emotion

         • Emotional Reciprocity:
             - HIGH: Strongly mirrors original post's emotions
             - MODERATE: Some emotional matching
             - LOW: Different emotional tone

     Here are the Reddit posts to analyze:

     Original Post:
     {truncated_post}

     Reply:
     {truncated_reply}

     Return JSON with this exact structure:
     {{
         "convergence_language_similarity": "low/moderate/high",
         "convergence_tone_style_matching": "low/moderate/high",
         "divergence_language_dissimilarity": "low/moderate/high",
         "divergence_tone_style_contrast": "low/moderate/high",
         "interpersonal_control_topic_management": "low/moderate/high",
         "interpersonal_control_assertiveness": "low/moderate/high",
         "discourse_management_message_length": "short/medium/long",
         "discourse_management_coherence": "low/moderate/high",
         "discourse_management_turn_taking": "low/moderate/high",
         "interpretability_clarity": "low/moderate/high",
         "interpretability_relevance": "low/moderate/high",
         "emotional_expression_emotional_content": "low/moderate/high",
         "emotional_expression_emotional_reciprocity": "low/moderate/high"
     }}"""
         return question

- Here we show an example of our feature engineering using LLMs, focusing on Communication Accommodation Theory (CAT). - We prompted the LLM to analyze how replies adapt to original posts, based on established CAT principles (cite Giles & Ogay). - The LLM assessed key dimensions like Convergence, Divergence, Control, Discourse Management, Interpretability, and Emotion. - Crucially, the LLM outputted these evaluations in a structured JSON format, allowing us to quantify these complex interaction styles. - This resulted in a rich set of features describing the communication dynamics. - The box shows the core structure and definitions used in the prompt to guide the LLM's analysis – the detail ensures consistency. - Reference: Giles, H., & Ogay, T. (2007). Communication accommodation theory. In Explaining communication: Contemporary theories and exemplars (pp. 293-310). Routledge.

Methodology: LLM-Powered Stigma Analysis

🎯

Goal

Identify & categorize potentially harmful stigmatizing language within the OUD recovery discussions.

🚫

Challenge

Stigma is pervasive and harmful in recovery contexts (Krawczyk et al., 2022), but manually analyzing its nuances at scale is infeasible.

🤖

Solution

Leveraged LLMs guided by specific prompts to systematically analyze posts and replies for stigmatizing language and personal attacks.

LLM Task: Deconstructing Stigma

Instructed LLMs to analyze post & reply text based on detailed definitions, performing these key tasks:

✅

Classify Presence

Does the text contain stigma? (Yes/No)

📝

Extract Terms

Identify specific stigmatizing words/phrases (e.g., "junkie", "clean vs dirty", "choose to use").

📊

Assess Severity

Rate the overall stigma level (Minor / Moderate / Severe).

⚔️

Detect Attacks

Identify direct personal attacks (Yes/No).

Methodology: From Raw Reports to Numeric Days

Users frequently anchor discussions in their recovery milestones:

"...doing the spiritual/ finding self... crap for awhile now. 7 years, that's when I started methadone and stuck with it... It's hard to face yourself... Every fault, every weakness... I'm not burying shit no more! Lol
On a better note... i'm on day 4!!! No methadone. Monster almost gone! And to commemorate i'm going to design a back piece..." - Example Post Snippet 1

"Hi there, So I am about 75 days clean from everything. I was shooting heroin, smoking weed everyday, drinking too much... I quit everything 75 days ago. Besides coffee, no mood altering substances...
LSD and other psychedelics have been an important part of my life... I want to be able to continue to use psychedelics... but I am afraid that using any type of substance will jeopardize my entire recovery... Does anyone have experience with this...?" - Example Post Snippet 2

Methodology: Reconstructing Recovery Timelines

Addressed noisy & sparse 'days' data via an automated pipeline:

Pipeline for reconstructing user recovery timelines from raw Reddit data — Fig: Workflow from Raw LLM Outputs to Processed Timelines & Critical Window Identification.

This process yields coherent daily timelines and flags the critical 10-day window preceding estimated recovery starts (Zero Day) – key for analyzing relapse risk factors.

- Briefly restate the challenge: noisy, sparse data from LLM day extraction. - Introduce the figure: "We developed an automated pipeline, illustrated here, to address these challenges." - Briefly walk through the main stages shown *in the figure* (e.g., "It involves standardizing time formats, detecting and removing temporal outliers, extrapolating to fill gaps, and finally identifying distinct recovery streaks and their estimated start dates."). You don't need separate points if the figure shows it clearly. - **Crucially, connect to the "pre-recovery window":** Use the final fragmented paragraph. Explain that a key output of this pipeline is not just the clean timeline, but the identification of the 10-day window *before* a user likely started a clean streak. - Emphasize *why* this window is important: "This pre-recovery window is hypothesized to be particularly informative for understanding the communication patterns associated with successful recovery initiation or relapse." This links the complex processing directly to your research goals.

Visualizing the Transformation: Raw vs. Processed

Example User Timeline: Raw vs Processed — Example: Raw reports (scatter) vs. Final Extrapolated Timeline (blue line) after outlier removal & gap filling. Red 'x' marks relapse days.

- Title clearly indicates what's being shown. - Use ONE of your best timeline plots that clearly shows the difference between scattered raw points and the smooth(er) final line. Choose one with some obvious outliers removed or gaps filled if possible. - Verbally Explain: - "This plot illustrates the result of our processing for a single user." - "The scattered points (like sky blue circles for raw clean days, light coral for raw relapse) represent the initial noisy data extracted by the LLM." - "The solid blue line shows the final, continuous 'days clean' timeline after applying outlier detection and backward extrapolation." - "The red 'x' markers indicate days identified as relapse days after processing." - "This transformation provides a much cleaner basis for analyzing trends and linking communication patterns to recovery progression."

Diverse Recovery Journeys (Processed Timelines)

Processing reveals the wide spectrum of user experiences:

Diverse Journeys: Rocky but Resolute

Diverse Journeys: Increasing Stability

Diverse Journeys: Continued Challenges

Preliminary Insights: Communication Near Clean Start Window

Exploring CAT dimension distributions in the 10-day window *before* an estimated clean streak begins:

Preliminary Insights: Communication Near Clean Start Window

Exploring CAT dimension distributions in the 10-day window *before* an estimated clean streak begins:

Distribution of Convergence Dimensions — Fig 1: Convergence & Divergence patterns.

Preliminary Insights: Communication Near Clean Start Window

Exploring CAT dimension distributions in the 10-day window *before* an estimated clean streak begins:

Distribution of Discourse Management Dimensions — Fig 2: Discourse Management indicators.

Preliminary Insights: Communication Near Clean Start Window

Exploring CAT dimension distributions in the 10-day window *before* an estimated clean streak begins:

Distribution of Emotional Expression Dimensions — Fig 3: Emotional Expression characteristics.

Suggests distinct communication shifts potentially preceding successful recovery initiation (further analysis ongoing).

Preliminary Insights: Stigma Prevalence

Bar chart showing distributions of stigma presence (Yes/No), stigma severity (Minor/Moderate/Severe/None), and personal attacks (Yes/No) for both posts and replies, as detected by the LLM analysis. — Figure: Distribution of Stigma Characteristics in Posts & Replies.

Highlights common patterns. Further analysis will explore links to communication dynamics and recovery outcomes.

- Transition: "Having explained how we measured stigma using LLMs, let's look at some initial descriptive findings." - Explain the Chart: Briefly walk through what the bars represent (e.g., "This chart shows the percentage of posts and replies flagged by the LLM for containing stigma, the breakdown by severity, and the prevalence of personal attacks.") - Point out 1-2 key observations if they are obvious (e.g., "We can see that while severe stigma is less common, minor or moderate stigma appears frequently..." or "Replies seem to contain slightly more/less stigma than original posts..."). Be careful not to over-interpret yet. - Conclude: Reiterate that this is descriptive and sets the stage for deeper analysis connecting these stigma metrics to communication accommodation and actual recovery outcomes.

Planned Analysis 1: Predicting Relapse Risk

🎯
Goal: Predict user relapse risk over time using rich communication context.
🧠
Method 1: Deep Text Understanding
Generate nuanced text embeddings (e.g., BERT) for posts/replies.
(Devlin et al., 2019)
📈
Method 2: Modeling Temporal Dynamics
Use LSTMs or Transformers on text embeddings + LLM features.
(Captures how communication evolves towards relapse)
✨
Why: Go beyond static features; capture time dependencies & subtle linguistic shifts predictive of relapse.

- Explain the goal: Can we foresee relapse based on how communication changes? - Method 1: We convert the text itself into meaningful numerical representations (embeddings) using models like BERT. This captures *what* is being said deeply. - Method 2: We use models designed for sequences (LSTMs/Transformers) to look at the *order* of communication (embeddings + our LLM features like CAT, emotion) leading up to a potential relapse. - Why this matters: Recovery isn't static. How someone communicates *changes* over time, and these changes might hold clues about their risk. Standard methods might miss these temporal signals. - Reference: Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. *NAACL-HLT*. (Foundation for modern embeddings).

Planned Analysis 2: Estimating Causal Impact

🎯
Goal: Estimate the causal effect of specific communication strategies (CAT, stigma) on recovery outcomes (days clean, time to relapse).
⚙️
Methods: Advanced Causal Inference
Explore Double Machine Learning (DML), Causal Forests, TMLE.
(Chernozhukov et al., 2018; Athey et al., 2019)
🤖
Leveraging: Combine causal methods with Deep Learning to handle high-dimensional LLM features & control for confounders (e.g., user history).
✨
Why: Move beyond correlation – understand if and how much specific communication styles directly influence recovery trajectories.

- Explain the shift: Now we're asking *why* things happen, not just *what* happens. Does high convergence *cause* longer clean time, controlling for other factors? - Mention the advanced methods (DML, Forests, TMLE) – these are state-of-the-art for estimating causal effects with complex, high-dimensional data like ours. Briefly mention they help isolate the effect of the 'treatment' (communication style) from confounders. - Highlight the synergy: Combining these causal frameworks with deep learning allows us to use all the rich features we extracted without simplifying too much. - Emphasize the 'Why': This is about establishing potential causal links, which is much stronger evidence than simple correlation for informing interventions or platform design. - References: - Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. *The Econometrics Journal*. (Foundation for DML). - Athey, S., Tibshirani, J., & Wager, S. (2019). Generalized random forests. *The Annals of Statistics*. (Foundation for Causal Forests). (Or Athey & Imbens review if preferred).

Planned Analysis 3: Discovering Patterns & Themes

🎯
Goal: Uncover latent structures, user archetypes, and discussion themes within the communication data.
🗺️
Method 1: Visualize the Landscape
Apply dimensionality reduction (UMAP, t-SNE) to embeddings/features.
(McInnes et al., 2018) - Look for clusters (e.g., emotional styles, support needs).
🔍
Method 2: Identify Key Topics
Use Topic Modeling (LDA, Neural Models) on text, potentially stratified by outcome.
(Blei et al., 2003) - Find themes associated with relapse vs. recovery.
✨
Why: Gain qualitative insights, generate new hypotheses, identify distinct user subgroups or communication patterns not captured by supervised models.

- Goal: Find hidden patterns we didn't explicitly look for. Are there natural groupings of users or conversations? - Method 1 (Visualization): Explain UMAP/t-SNE as ways to 'map' the high-dimensional data into 2D, allowing us to visually spot clusters or gradients related to communication styles or user types. - Method 2 (Topics): Explain topic modeling as a way to automatically discover the main themes being discussed (e.g., 'withdrawal symptoms', 'meeting struggles', 'celebrating milestones') and see if these themes differ between users who sustain recovery versus those who relapse. - Why: This is exploratory. It helps us understand the data more deeply, might reveal unexpected patterns, and can inform future research questions or the interpretation of the predictive/causal models. - References: - McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. *arXiv preprint arXiv:1802.03426*. (Foundation for UMAP). - Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. *Journal of machine Learning research*. (Foundation for LDA).

Thank You

Open for Questions & Discussion

Acknowledgements

Collaborators

Anton Ivanov, PhD
Jasmina Tacheva, PhD

Data source

Online community participants.

Connect & Collaborate

Dr. Murtaza Nasir murtaza.nasir@wichita.edu +1 (316) 978-5112 murtaza.cc