Analyzing spoken language and 3-D facial expressions to measure depression severity

**Analyzing spoken language and 3-D facial expressions to measure depression severity
Multi-modal data. For each clinical interview, the researchers use: (a) video of 3D facial scans, (b) audio recording, visualized as a log-mel spectrogram, and (c) text transcription of the patient’s speech. The model predicts the severity of depressive symptoms using all three modalities. Credit: Haque et al.

Researchers at Stanford have recently explored the use of machine learning to measure the severity of depressive symptoms by analyzing people's spoken language and 3-D facial expressions. Their multi-model method, outlined in a paper pre-published on arXiv, achieved very promising results, with an 83.3 percent sensitivity and 82.6 percent specificity.

Currently, over 300 million people worldwide suffer from depression disorders to varying degrees. In extreme cases, depression can lead to suicide, with an average of approximately 800,000 people committing suicide every year.

Mental health disorders are currently diagnosed upon careful examination by a wide range of health care providers, including primary care physicians, clinical psychologists and psychiatrists. Nonetheless, detecting mental illnesses is often far more challenging than diagnosing physical illnesses.

Several factors, including , treatment cost and availability, might prevent affected individuals from seeking help. Currently, researchers estimate that 60 percent of those affected by do not receive treatment.

Developing methods that can automatically detect could improve the accuracy and availability of diagnostic tools, leading to faster and more efficient interventions. A team of researchers at Stanford have recently investigated the use of machine learning to measure the severity of depressive symptoms.

"In this work, we present a machine learning method for measuring the severity of depressive symptoms," the researchers wrote in their paper. "Our multi-modal method uses 3-D facial expressions and spoken language, commonly available from modern cell phones."

**Analyzing spoken language and 3-D facial expressions to measure depression severity
Learning a multi-modal sentence embedding. Overall, the model is a causal CNN. The input for the model is: audio, 3D facial scans, and text. The multi-modal sentence embedding is fed to a depression classifier and PHQ regression model (not shown above). Credit: Haque et al.

Depressed individuals often present a series of verbal and non-verbal symptoms, including monotone pitch, reduced articulation rate, lower speaking volumes, fewer gestures, and more downward gazes. One of the most common tests to assess the severity of depression symptoms is the patient health questionnaire (PHQ).

The method devised by the researchers analyzes audio tracks of patients' voice, 3-D video of their , and text transcriptions of their clinical interviews. Based on this data, the model produces either a PHQ score or classification label indicating major depressive disorder.

In an initial evaluation, the model achieved an average error of 3.67 points (15.3 percent relative), on the PHQ scale, detecting major depressive disorder with 83.3 percent sensitivity and 82.6 percent specificity. The researchers chose to collect the data used in their study via human-to-computer interviews, rather than human-to-human ones.

"Compared to a human interviewer, research has shown that patients report lower fear of disclosure and display more when conversing with an avatar," the researchers wrote. "Additionally, people experience psychological benefits from disclosing emotional experiences to chatbots."

In the future, this new machine learning method could be deployed in smartphones worldwide, aiding the mission of making mental health care cheaper and more accessible. According to the researchers, their model is designed to augment and complement existing clinical methods, rather than issuing formal diagnoses.

"We presented a multi-modal machine learning which combines techniques from , computer vision, and natural language processing," the researchers wrote. "We hope this work will inspire others to build AI-based tools for understanding beyond depression."


Explore further

Looking inside the brain to distinguish bipolar from depression

More information: Measuring depression symptom severity from spoken language and 3D facial expressions. arXiv:1811.08592 [cs.CV]. arxiv.org/abs/1811.08592

© 2018 Science X Network

Citation: Analyzing spoken language and 3-D facial expressions to measure depression severity (2018, December 4) retrieved 12 December 2018 from https://techxplore.com/news/2018-11-spoken-language-d-facial-depression.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.
67 shares

Feedback to editors

User comments

Dec 04, 2018
Teach Several Languages to A Robot (Machine/Computer) showing Various Images. 1) It Eating. 2) A Male Person Eating 3) A Female Person Eating. Let it Hear Image-Accompanying Spoken-Words in Various Languages SUCH AS, I am Eating, He is Eating, She is Eating etc., Thus Teach Nouns, Verbs and Adjectives etc., to it in Various Languages, ALWAYS accompanied by Images. TEACH Youtube Videos to it. There are Several out There.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more