Emotionalizing Metadata from Video and Audio: What is Emotional Capture?

Human emotions have long been the most elusive data for computers to interpret. Yet a technology capable of detecting emotions in real time from video and audio and outputting them as structured metadata——known as Emotional Capture——is now rapidly entering practical deployment. It has been roughly 30 years since Rosalind Picard of the MIT Media Lab established the field with her 1997 publication *Affective Computing*. The Emotion AI market reached an estimated $3.4–4.7 billion in 2025 and is projected to surge to $9.5–15.6 billion by 2030 (CAGR of 15–27%). Hume AI has developed an Expressive Voice Interface (EVI) that maps a 53-dimensional emotional space, enabling real-time emotional dialogue with response latency under 300ms. In January 2026, Google recruited Hume AI CEO Alan Cowen and his engineering team to bolster Gemini's voice capabilities——a symbolic moment signaling that tech giants now recognize the strategic value of Emotion AI. Smart Eye/Affectiva (an MIT Media Lab spinout) holds facial data from over 10 million people across 87 countries, and has its driver emotion monitoring system standard-fitted in BMW, Honda, and Volvo's 2026 models. Realeyes, in collaboration with Mars, has achieved 75% accuracy in predicting sales lift through emotion measurement, delivering tens of millions of dollars in annual advertising efficiency improvements over more than five years. In Japan, the Ministry of Internal Affairs and Communications will fund development of "next-generation AI that reads emotions" for approximately five years starting in fiscal year 2026, with NICT and Osaka University jointly embarking on the construction of a five-senses brain activity database. NEC has deployed emotion-analysis signage that judges shoppers' facial expressions in real time, and NTT has published the EMPAC (Empathic Video Stimulus) Dataset. On the other hand, the EU AI Act (effective February 2025) explicitly prohibits the use of emotion-inference AI in workplaces and educational institutions, with violations subject to fines of up to €35 million. The conversion of emotion into metadata carries proven value across advertising, healthcare, automotive, and entertainment——while also harboring significant challenges around privacy and bias. This article comprehensively examines the concept and history of Emotional Capture, its technical approaches, major services and products, application domains, scientific controversies, ethics and regulation, and future outlook.

What is Emotional Capture — The Technology of Turning Emotions into Data

Emotional Capture is a collective term for technologies that detect human emotional states in real time from video, audio, biosignals, and other sources, and output them as structured metadata. Just as motion capture converts physical movement into data, Emotional Capture converts emotional movement into data.

The foundation underlying this is Affective Computing. Professor Rosalind Picard of the MIT Media Lab published a paper of the same name in 1995 and established the field with the book *Affective Computing*, published by MIT Press in 1997. Picard's argument is clear:

"If we want computers to be genuinely intelligent and to interact naturally with us, we must give them the ability to recognize, understand, and even have and express emotions."

Neuroscience research has repeatedly demonstrated that emotions play an essential role in decision-making, perception, and learning. An AI that does not understand emotion cannot truly understand humans.

The output of Emotional Capture is on an entirely different level from conventional simple sentiment analysis of "positive / negative / neutral." Hume AI maps a 53-dimensional linguistic emotion space, a 48-dimensional facial expression space, and a 48-dimensional vocal prosody space, generating continuous, multi-dimensional emotional metadata that is not limited to the six basic emotions such as "joy" and "anger."

History of Research — From Ekman's Six Basic Emotions to Semantic Space Theory

The scientific foundation of emotional capture is built on three major theoretical currents.

Paul Ekman's Basic Emotion Theory (1960s–). In 1968, psychologist Paul Ekman tested the universality of facial expressions among an isolated tribe in Papua New Guinea, arguing that six basic emotions—anger, surprise, disgust, happiness, fear, and sadness—are universal across cultures. The FACS (Facial Action Coding System) he developed decomposes facial muscle movements into 28 Action Units (AUs) and infers emotions from their combinations. Most facial recognition-based emotion AI today is heavily influenced by FACS.

Lisa Feldman Barrett's Constructionist Theory of Emotion (2006–). Lisa Feldman Barrett, a psychologist at Northeastern University, directly challenged Ekman's universality. According to Barrett's Theory of Constructed Emotion (TCE), emotions are not reflexive responses to the world but are constructed on the fly by the brain through prediction. The same bodily sensation may be interpreted as "anger" by one person and "a stomachache" by another. Emotional granularity varies greatly between individuals, and universal categories are an illusion—Barrett went so far as to state that "the classical view has lost, based on overwhelming evidence."

Alan Cowen's Semantic Space Theory (2017–). Alan Cowen, founder of Hume AI, proposed a third position that aligns with neither the six-category view nor constructionism. Semantic Space Theory (SST) is a data-driven approach that maps the entire emotional space. Through large-scale experiments using vast audio, facial, and linguistic stimuli paired with diverse emotion labels, it demonstrated that emotions are distributed not in discrete categories or simple dimensions, but in a continuous, high-dimensional semantic space. This is the theoretical basis for Hume AI's 53-dimensional emotion model.

The three-way theoretical debate—universalism (Ekman), constructionism (Barrett), and semantic spatialism (Cowen)—remains unresolved. Technically, however, SST-based approaches produce the most accurate emotional metadata and are becoming the de facto standard in industrial applications.

Technical Approach — 4 Modalities for Capturing Emotion

Emotional capture extracts and integrates emotional information from multiple modalities (input channels).

Facial Expression Recognition (Visual Modality)

Facial movements captured by a camera are decomposed into 28 FACS Action Units for real-time analysis. Subtle muscular movements such as brow raises (AU1+AU2), nose wrinkles (AU9), lip curls (AU12), and jaw drops (AU26) are detected, and emotions are inferred from their combinations.

Micro-expression detection is at the forefront of emotional capture. The tiniest facial movements lasting only a fraction of a second serve as clues to "genuine emotions" that surface even when a person consciously tries to suppress them. AI can detect micro-expressions at speeds imperceptible to the human eye.

Affectiva (now Smart Eye) holds the world's largest emotion dataset, trained on over 10 million faces from 87 countries, and detects 28 Action Units in real time.

Speech Emotion Recognition (Audio Modality)

Emotions are inferred from speech prosody——pitch, rhythm, intensity, and duration. Pitch variation is the most prominent feature of emotional prosody: high pitch suggests excitement, joy, and surprise, while low pitch suggests sadness and calmness. Changes in speech rate, the insertion of pauses, and variations in vocal loudness are also important signals.

Cogito's system analyzes over 200 acoustic and vocal signals in real time to provide emotional guidance to call center agents. Hume AI's EVI analyzes prosody in a 48-dimensional speech emotion space.

Multimodal Fusion

This approach integrates facial expressions, speech patterns, text data, and even physiological signals into a unified model. It achieves 15–20% higher accuracy than single-modality systems, and more than 40% of academic research since 2022 has adopted trimodal configurations or Transformer-based cross-modal fusion architectures.

The 2025 Nature paper "MemoCMT" proposed cross-modal Transformer-based feature fusion, "EA-FUSION" achieved integration of EEG and facial expression data, and "HyFusER" realized hybrid fusion via dual cross-modal attention.

Wearable Physiological Signals (Physiological Modality)

Emotions can also be inferred from physiological signals such as electrodermal activity (EDA), heart rate variability (HRV), blood volume pulse (BVP), skin temperature, and electroencephalography (EEG). Research is underway on emotion classification using LSTM-GRU ensemble architectures, leveraging accelerometer and gyroscope data from smartwatches as well as EEG headbands.

Key Services and Products——The Companies Leading the Market

Hume AI — Mapping a 53-Dimensional Emotional Space

Founded in 2021 by Alan Cowen (Psychology PhD), Hume AI is one of the most prominent companies in the emotional AI space. The company raised $50 million in its Series B (led by EQT Ventures, with participation from Union Square Ventures, Comcast Ventures, and LG Technology Ventures), bringing total funding to between $74 million and $80 million.

Its flagship product, the Empathic Voice Interface (EVI), is a voice conversational AI with emotional intelligence. EVI 3 (May 2025) achieved over 100,000 custom voices, sub-300ms response times, and a practical latency of 1.2 seconds, outperforming GPT-4o and the Gemini Live API. EVI4-mini (January 2026) supports 11 languages, including Japanese.

The Expression Measurement API takes voice or video as input and outputs emotional metadata across 53 dimensions (language), 48 dimensions (facial expression), and 48 dimensions (vocal prosody). Pricing — $0.08 per minute for audio/video and $0.00024 per word for text — makes commercial use practical.

In January 2026, Google DeepMind poached Hume AI CEO Alan Cowen and his engineering team to enhance Gemini's voice capabilities. Hume AI signed a licensing agreement and appointed Andrew Ettinger as CEO to continue operating as an independent company. The move is a landmark event symbolizing Google's recognition of the strategic value of emotional AI.

The partnership with Anthropic is also deep. Claude models account for 36% of EVI configurations, with a track record of over one million conversations and nearly two million minutes of interaction.

Smart Eye / Affectiva — The Emotional AI Standard for the Automotive Industry

Affectiva, which spun out of MIT Media Lab in 2009 and was co-founded by Professor Rosalind Picard, was acquired by Sweden's Smart Eye for $73.5 million in 2021.

Holding the world's largest emotional dataset — over 10 million facial data points from 87 countries — the company has secured 84 production contracts and partnered with 12 of the world's top 20 OEMs. Driver emotion monitoring is slated to be standard in 2026 model-year vehicles from BMW, Honda, and Volvo. The system detects fatigue, stress, and distraction to trigger alerts, and automatically adjusts the in-cabin environment (temperature, music, lighting) based on emotional state.

Realeyes — Predicting Ad Effectiveness Through Emotion

Realeyes, a leader in video-based emotion analysis for advertising, processes more than 8 million video views per month. Its collaboration with Mars is particularly noteworthy. Over two years, they built a database spanning 22,000 people, 149 ads, 35 brands, and 6 markets, demonstrating that emotion measurement technology can predict advertising sales lift with 75% accuracy. Based on these results, Mars has allocated 70% of media spend toward high-performing ads across all Tier 1 brands, sustaining hundreds of millions of dollars in annual sales lift for more than five years.

Coca-Cola, Unilever, and Hershey's are also among its clients.

Entropik Technologies — Multimodal Consumer Insights

Founded in Bangalore, India in 2016, Entropik raised $25 million in a Series B led by Bessemer Venture Partners and SIG Venture Capital. The company offers "Affect Lab," a multimodal platform integrating EEG mapping, facial coding, and eye tracking, used by more than 150 global brands.

Other Notable Companies

Cogito offers emotional AI for call centers, analyzing more than 200 acoustic and vocal signals in real time to improve customer satisfaction by up to 20%. Uniphore acquired Spain's Emotion Research Lab to integrate voice emotion analysis into contact centers. Vocalis Health (formerly Beyond Verbal), an Israeli voiceprint emotion analysis company, is working on diagnosing heart disease, sleep disorders, and neurological conditions through vocal biomarkers. MorphCast provides a browser-native, serverless emotional AI SDK. Emerging player Dubformer specializes in AI emotion-transfer dubbing and raised $3.6 million in seed funding led by Almaz Capital in early 2025.

The Democratization of Emotion Recognition Through Open Source

The technology of emotional capture is being rapidly democratized not only by commercial services but also by a growing ecosystem of open-source libraries. An environment is emerging in which researchers and startups can build emotion recognition systems from scratch.

OSS for Facial Expression Recognition

DeepFace (GitHub 22,469 stars, MIT License) is the most widely used Python library for face recognition and expression analysis. It can be installed with a single pip install deepface command, wraps multiple face recognition models including VGG-Face, FaceNet, and ArcFace, and classifies emotions into 7 categories: anger, disgust, fear, happiness, sadness, surprise, and neutral. It also supports real-time video analysis.

OpenFace 2.0 (7,610 stars, CMU MultiComp Lab) is the academic standard for real-time detection of 18 FACS-based Action Units. It integrates facial landmark detection, head pose estimation, and gaze estimation, and is one of the most cited tools in emotion recognition research. In 2025, OpenFace 3.0, a Python-based successor, was also released, integrating RetinaFace for face detection and STAR for landmark detection, enabling multi-task analysis of AUs, emotions, and gaze.

EmotiEffLib (formerly HSEmotion, Apache-2.0 License) is a lightweight library that won first place in the ABAW (Affective Behavior Analysis in-the-Wild) competition. It supports both PyTorch and ONNX backends and performs real-time emotion and engagement recognition from photos and videos.

Py-Feat (MIT License, published in Affective Science) is a comprehensive toolbox for the detection, preprocessing, analysis, and visualization of facial expression data. It detects 7 emotions and Action Units from images and videos, and includes built-in statistical analysis tools such as t-tests and regression analysis.

Google's MediaPipe (34,482 stars, Apache-2.0) is not a tool dedicated to emotion recognition, but it outputs 468 3D facial landmarks and 52 blend shape scores in real time and is widely used as a foundation for building emotion classifiers. It also operates on mobile and edge devices.

OSS for Speech Emotion Recognition

SpeechBrain (11,410 stars, Apache-2.0) is a comprehensive PyTorch-based speech toolkit. It provides emotion recognition models fine-tuned on the IEMOCAP dataset using wav2vec2 and enables seamless integration with Hugging Face. In addition to speech recognition, speaker recognition, and speech enhancement, it includes recipes for emotion recognition.

emotion2vec (1,089 stars, ACL 2024) is the first general-purpose speech emotion representation model based on self-supervised pre-training. It provides emotion2vec+ models (seed/base/large) for 9-class emotion classification and has achieved state-of-the-art accuracy in multiple languages including Chinese, French, German, and Italian — significantly outperforming other open-source models on Hugging Face.

Alibaba's SenseVoice (7,907 stars) is a speech foundation model that integrates speech recognition, language identification, emotion recognition, and audio event detection. It supports Chinese, Cantonese, English, Japanese, and Korean, and has demonstrated performance surpassing the best existing models without fine-tuning on target data.

openSMILE (794 stars, developed by TU Munich/audEERING) holds an industry-standard position in audio feature extraction for emotion recognition. It extracts MFCCs, prosodic features, and spectral features, and provides standard feature sets such as eGeMAPS and ComParE. It runs on Linux, Windows, macOS, Android, iOS, and Raspberry Pi.

OpenAI's Whisper (97,053 stars, MIT License) is a general-purpose speech recognition model, but fine-tuned derivative models have been leveraged for emotion recognition. Whisper-large-v3 fine-tuned on the RAVDESS/SAVEE/TESS datasets achieved approximately 92% accuracy across 7 emotions.

OSS for Multimodal Emotion Recognition

Emotion-LLaMA (550 stars, BSD-3 License) is a pioneering model for LLM-based multimodal emotion recognition and reasoning. It processes HuBERT (audio), VideoMAE (video), EVA/MAE (visual), and text through a LLaMA-based unified model, performing not only emotion recognition but also reasoning (why a given emotion arises).

EmoBox (314 stars, INTERSPEECH 2024) is a multilingual, multi-corpus speech emotion recognition benchmark toolkit covering 32 datasets and 14 languages. It benchmarks 10 pre-trained speech models and provides the most comprehensive SER (Speech Emotion Recognition) benchmark available.

Key Datasets

Behind the emotion recognition OSS ecosystem lies an abundance of publicly available datasets. Research is supported by diverse datasets spanning modalities and scales: the image-based FER2013 (approx. 35,887 images, 7 emotions), the large-scale AffectNet (approx. 1 million images, 8 emotions + valence/arousal), the audio+video RAVDESS (7,356 files, 8 emotions), the audio+video+text IEMOCAP (approx. 12 hours, up to 9 emotions), the TV drama *Friends*-derived MELD (13,000+ utterances, 7 emotions + 3 sentiment polarities), and the Reddit comment-based GoEmotions (58,000 entries, 28 emotions).

With the maturation of these OSS tools and datasets, emotional capture is no longer the exclusive domain of large corporations. An era has arrived in which individual developers and startups can combine DeepFace (images), SpeechBrain (audio), and Emotion-LLaMA (multimodal) to build their own emotion metadata generation pipelines.

Applied Domains — Industries Transformed by Emotional Metadata

Customer Service

Call centers represent the largest commercial market for emotion AI. Cogito's system analyzes agent calls in real time, and when customer frustration is detected, it displays guidance such as "slow down your explanation" or "soften your tone." This improves customer satisfaction by up to 20%. Uniphore detects caller emotions through voice sentiment analysis, enabling intervention before escalation occurs.

Healthcare & Mental Health

Healthcare applications of emotion AI represent the domain with the greatest social impact. Woebot detects anxiety, sadness, and stress from text and voice, delivering CBT (cognitive behavioral therapy)-based talk therapy. Ellie, developed by the National Center for Biomedical Computing, evaluates mental state through facial expressions, vocal tone, and speech patterns. In hospitals, the technology is used for emotional monitoring of patients with speech impairments, elderly individuals, and children.

Vocalis Health's voice biomarker technology holds the potential to non-invasively diagnose heart failure, sleep apnea, and neurological disorders from subtle changes in the voice.

Automotive (Driver Monitoring)

Smart Eye/Affectiva has secured 84 production contracts with 12 of the world's top 20 OEMs. BMW, Honda, and Volvo will include emotion monitoring as a standard feature in their 2026 models. The system detects driver fatigue, stress, and distraction in real time, issuing alerts and suggesting rest breaks. Additionally, automatic adjustment of the in-cabin environment based on emotional state will be implemented—switching to relaxing music, lowering the temperature, and shifting lighting to warmer tones when stress levels are high.

Advertising & Marketing

The collaboration between Mars and Realeyes is the clearest demonstration of the commercial value of emotional metadata. By predicting ad sales lift with 75% accuracy through emotional measurement and allocating 70% of media spend to high-performing ads, the partnership has sustained tens of millions of dollars in annual sales lift for over five years.

Gaming & Entertainment

Adaptive gaming—dynamically adjusting difficulty, story progression, and background music based on player emotions—is an active area of research. For streaming platforms, content recommendation based on viewer emotional state is the next frontier. While 80% of Netflix viewing is attributed to AI recommendations, the introduction of emotional metadata has the potential to further improve recommendation accuracy.

Content Creation

Dubformer specializes in emotion transfer for AI dubbing—faithfully carrying the emotional expression of the source language into the dubbed language. Technology is also being developed to generate frame-level emotional metadata from video, enabling scene-based recommendations.

Ethics & Regulation — The Impact of the EU AI Act and Responsible Innovation

EU AI Act (Enacted February 2025)

The EU AI Act imposes the strictest regulations on emotion-recognition AI. Article 5(1)(f) explicitly prohibits the installation and use of AI emotion-recognition systems in workplaces and educational institutions. Violations carry fines of up to €35 million or 7% of global annual turnover, whichever is higher.

Specifically prohibited uses include employee emotion tracking via webcams and voice recognition in call centers, estimation of student interest and attention in educational settings, and emotion recognition in hiring processes. However, medical and safety purposes (such as driver fatigue detection and pilot attention monitoring) are permitted as exceptions.

Bias and Fairness

The bias problem in emotion AI is severe. Multiple studies have reported higher misrecognition rates for people with darker skin tones, males, and individuals from different cultural backgrounds. Insufficient diversity in training data, false assumptions about the universality of facial expressions, and differences in emotional expression due to neurological conditions or disabilities all risk producing discriminatory outcomes. Research from ACM FAccT 2025 demonstrated that people with disabilities and minority gender identities view emotion AI data collection negatively.

Hume AI's Ethical Framework

Hume AI established The Hume Initiative, defining six ethical principles: Beneficence, Emotional Primacy, Scientific Legitimacy, Inclusivity, Transparency, and Consent. The principle that "AI must never be permitted to treat human emotions as a means to an end" in particular draws a clear line around the commercial use of emotion AI. Treating outputs as "measurements of complex expressive behavior" rather than "direct inference of emotions" — this distinction is significant both scientifically and ethically.

Japan Trends — Foundation Building for Five Senses × Brain Data Begins

In Japan, the construction of an emotional AI infrastructure is underway, led by the government.

The Ministry of Internal Affairs and Communications will support the development of "next-generation AI that reads emotions" for approximately five years starting in fiscal year 2026. Budget has been allocated to joint research between NICT (National Institute of Information and Communications Technology) and Osaka University, with plans to build a database of brain activity data covering the five senses — including smell, touch, and taste. The ministry has positioned "brain information communications" in the 2030s as a priority area, promoting foundational emotional AI technology as a national policy.

NEC has deployed the emotion-analyzing signage system "Target Advertising Signage," which instantly identifies the age, gender, and facial expressions of store visitors and displays the most relevant product videos in real time. The company aims to capture the top share in a global market worth 360 billion yen.

NTT's Communication Science Laboratories have modeled psychological state changes from facial expressions and voice, and have released the EMPAC Dataset (Empathic Video Stimulus Dataset). This dataset — comprising emotion-inducing videos and rating data across six categories (anger, disgust, fear, joy, sadness, and surprise) — is provided free of charge to the research community.

PKSHA Technology's "PKSHA Speech Insight" is an AI-powered speech recognition and analysis platform for contact centers, enabling early detection of complaints through real-time emotional analysis of phone calls.

Market Size and Future Outlook

Market Forecast

The emotional AI market is consistently projected to achieve high growth across multiple research firms. The market size in 2025 is estimated at $3.4–4.7 billion, expanding to $9.5–15.6 billion by 2030 and projected to reach $38.5 billion by 2035. With a CAGR of 15–27%, this represents one of the highest growth rates within the AI sector overall.

Narrowing the scope to the multimodal affective computing market, it is expected to double from $7 billion in 2025 to $14.4 billion in 2030. North America holds the largest market share in 2025, while Asia-Pacific is the fastest-growing region.

Outlook

2026–2027: Driver monitoring systems in vehicles will become standard equipment among major OEMs. With the full application of the EU AI Act (August 2026), emotional AI in workplace and educational settings will be prohibited, while legal deployment in healthcare and safety sectors will accelerate. Google will significantly enhance Gemini's emotional dialogue capabilities using Hume AI's technology, making emotional AI a standard feature of foundation models.

2028–2030: Multimodal fusion accuracy will surpass 90%, and real-time emotional metadata will become standard in content delivery. Emotion-based personalization will become widespread across streaming platforms, advertising, and gaming. Non-invasive health diagnostics using vocal biomarkers will begin receiving FDA approval.

2030 and beyond: Emotional metadata will become standard accompanying data for video and audio content, treated on par with subtitles and timecodes. AI that understands human emotions will hold an overwhelming competitive advantage over AI that does not. Within McKinsey's estimated economic impact of multimodal AI (trillions of dollars annually), emotional AI will become a key component.

VentureBeat, in its article reporting on the $50 million investment in Hume AI, stated:

"The next great leap in AI is emotional understanding."

This view is steadily being substantiated by Google DeepMind's recruitment of the Hume AI team, Mars Inc.'s five years of empirical data, BMW/Honda/Volvo's decisions to include it in their 2026 models, and five years of support from Japan's Ministry of Internal Affairs and Communications.

Impact on the Industry

First, emotional capture has the potential to fundamentally transform AI-human interaction. An AI that cannot understand emotions remains a "tool," but an AI that can understand emotions can become a "companion." As demonstrated by Hume AI's EVI, voice interaction with emotional intelligence produces a qualitatively different user experience compared to conventional chatbots.

Second, the impact on the advertising and marketing industry has already been proven. The collaboration between Mars and Realeyes demonstrated that emotional metadata can predict advertising sales lift with 75% accuracy. This represents a new dimension of effectiveness measurement that complements conventional digital marketing metrics relying on A/B testing and click-through rates.

Third, in the automotive industry, driver monitoring is becoming a standard feature for both safety and comfort. Smart Eye/Affectiva's 84 production contracts and partnerships with 12 major OEMs signify that this technology has moved beyond the experimental stage and entered mass production.

Fourth, in healthcare, non-invasive diagnosis through voice biomarkers has the potential to revolutionize early detection and care for mental health. As Vocalis Health demonstrates, technology that detects cardiac and neurological conditions from subtle changes in voice suggests a future where health screening can be performed with a single smartphone.

Fifth, in Japan, the Ministry of Internal Affairs and Communications' five-year support program and the construction of a five-senses × brain data infrastructure will determine Japan's international competitiveness in emotional AI. The commercial deployments by NEC, NTT, and PKSHA Technology will accelerate implementation in the Japanese market.

Sixth, the EU AI Act's regulations serve not to hinder innovation, but to direct it. The prohibition of emotion AI in workplaces and education creates pressure to develop it not as a surveillance tool, but as a value-creation tool for healthcare, safety, and entertainment. Hume AI's ethical framework serves as a model case for this direction.

References: Rosalind Picard, *Affective Computing* (MIT Press, 1997), Paul Ekman, Facial Action Coding System (FACS), Lisa Feldman Barrett, "Theory of Constructed Emotion" (PMC, 2017), Alan Cowen, Semantic Space Theory (SST), Hume AI Series B $50M (EQT Ventures, Union Square Ventures, Comcast Ventures, LG Technology Ventures), Hume AI EVI 3 (May 2025), Hume AI EVI4-mini (January 2026, 11-language support), Google DeepMind recruits Hume AI CEO Alan Cowen (TechCrunch, PYMNTS, January 2026), Hume AI + Anthropic Claude partnership, Smart Eye acquires Affectiva for $73.5M (TechCrunch, May 2021), Affectiva 84 production contracts / partnerships with 12 of the world's Top 20 OEMs, Realeyes + Mars advertising sales lift prediction 75% accuracy, Entropik Series B $25M (Bessemer Venture Partners, SIG), Cogito 20% improvement in customer satisfaction, Uniphore acquires Emotion Research Lab (January 2021), Vocalis Health (Beyond Verbal + Healthymize) $9M (aMoon), Apple acquires Emotient (Fortune, January 2016), Amazon Halo discontinued (GeekWire, April 2023), MorphCast browser-native emotion AI, Dubformer $3.6M seed (Almaz Capital, 2025), Nature "MemoCMT Cross-Modal Transformer" (2025), Wiley "Advancements in Emotion Classification," Nature "EmoWear Dataset," PMC "Comprehensive Review of Multimodal Emotion Recognition," EU AI Act Article 5(1)(f) prohibition of emotion inference in workplaces and education (enforced February 2025), Illinois BIPA Biometric Data Privacy Act, ACM FAccT 2025 "Distinguishing Emotion AI," Hume Initiative 6 ethical principles, Ministry of Internal Affairs and Communications "Next-Generation AI that Reads Emotions" five-year development support (Nikkei Shimbun, 2025), NICT + Osaka University five-senses brain activity database, NEC emotion analysis signage, NTT EMPAC Dataset, PKSHA Speech Insight, Research and Markets Emotion AI Market $4.71B (2025), Fortune Business Insights $3.4B (2025), Mordor Intelligence $4.52B/$9.47B (2025/2030), EIN Presswire $15.57B (2030), Roots Analysis $38.50B (2035), VentureBeat "Is AI's Next Big Leap Understanding Emotion?," Contrary Research Hume AI, GM Insights Emotion AI Market 2025–2034, GitHub: DeepFace (serengil/deepface, 22.4K stars, MIT), GitHub: OpenFace 2.0 (TadasBaltrusaitis/OpenFace, 7.6K stars), GitHub: OpenFace 3.0 (CMU-MultiComp-Lab/OpenFace-3.0), GitHub: EmotiEffLib (sb-ai-lab/EmotiEffLib, Apache-2.0, ABAW 1st place), GitHub: Py-Feat (cosanlab/py-feat, MIT, published in Affective Science), GitHub: MediaPipe (google-ai-edge/mediapipe, 34.5K stars, Apache-2.0), GitHub: SpeechBrain (speechbrain/speechbrain, 11.4K stars, Apache-2.0), GitHub: emotion2vec (ddlBoJack/emotion2vec, ACL 2024), GitHub: SenseVoice (FunAudioLLM/SenseVoice, 7.9K stars, Alibaba), GitHub: openSMILE (audeering/opensmile, TU Munich/audEERING), GitHub: librosa (librosa/librosa, 8.3K stars, ISC), GitHub: Whisper (openai/whisper, 97K stars, MIT), GitHub: FunASR (modelscope/FunASR, 15.5K stars, MIT), GitHub: Emotion-LLaMA (ZebangCheng/Emotion-LLaMA, BSD-3, multimodal), GitHub: EmoBox (emo-box/EmoBox, INTERSPEECH 2024, 32 datasets/14 languages), GitHub: conv-emotion (declare-lab/conv-emotion, MIT, conversational emotion recognition), Hugging Face: SamLowe/roberta-base-go_emotions (28 emotions, GoEmotions), Hugging Face: speechbrain/emotion-recognition-wav2vec2-IEMOCAP, Datasets: FER2013, AffectNet, RAVDESS, IEMOCAP, MELD, GoEmotions