你的思维是如何构建的，以及你如何塑造它们 | Dr. Jennifer Groh

摘要

杜克大学心理学与神经科学教授 Dr. Jennifer Groh 阐释了大脑如何编码和整合感觉信息——尤其是视觉和听觉——从而形成对世界的连贯感知。她详细介绍了声音定位、多感觉整合背后的神经机制，以及一项令人惊叹的发现：眼球运动能够直接调节耳朵处理声音的方式。对话最终引向一个引人入胜的理论——在神经层面，思维究竟是什么：即在大脑感觉-运动基础设施中运行的模拟过程。

核心要点

思维是感觉模拟：当你想到某个概念（如”猫”）时，大脑会同时在视觉、听觉、嗅觉及其他感觉皮层中运行一个微型模拟。
眼球运动会物理性地移动你的鼓膜：每一次扫视性眼球运动都会在双侧鼓膜中引起精确计时、协调一致的运动——左右鼓膜向相反方向移动——这表明大脑在最早阶段便整合了视觉与听觉。
声音定位是一项非凡的计算壮举：大脑能检测到短至半毫秒的双耳时间差——比单个action potential的持续时间还短——以此判断声音的来源方向。
说话时大脑会主动降低你的听觉敏感度：一种自上而下的机制会在言语产生之前降低听觉转导，以防止嘴巴与耳朵距离过近而产生令人不适的声音。
感觉整合必须持续学习：由于婴儿的头部宽度约为成人的一半，声音定位的时间线索会随发育不断变化，需要持续重新校准。
听力损失与痴呆症相关：感觉输入减少可能导致大脑下调相关回路，进而引发记忆力和注意力的衰退。
保护你的听力：如果旁边的人能察觉到你耳机中有声音传出——哪怕听不清内容，仅仅是感知到声音的存在——那么这个音量很可能正在造成永久性听力损伤。
节奏在人类所有文化中具有普遍性：即使是没有旋律或和声的文化也有节奏，这支持了节奏协调是为促进群体合作与竞争而进化的理论。
腹语效应揭示了大脑如何定位声源：大脑持续计算声音来源的”最可能候选者”，并能以视觉信息覆盖原始听觉输入。
低频声音传播距离更远、绕过障碍物的能力更强，这正是为什么警示信号（鼓声、低音号角）偏好使用低频的原因。

详细笔记

思维的本质：感觉模拟理论

当前讨论的主流理论认为，思维是大脑运行感觉-运动模拟的过程。
当你想到”猫”这样一个概念时，大脑会同时激活：
- 视觉皮层（猫的样子）
- 听觉皮层（猫的声音）
- 嗅觉及其他感觉区域（如猫砂的气味）
这解释了为何一种感觉通道的认知负荷会损害另一种通道：在并线行驶时对乘客说”保持安静”，正是为了将共享的感觉-运动资源重新集中于当前任务。
这一框架为神经科学提供了或许是最具机制依据的思维定义：思维是内部生成的感觉模拟，而非抽象的符号处理过程。

上丘：感觉首次汇聚之处

superior colliculus（上丘）是中脑中一个对视觉和听觉刺激均有响应的结构。
关键发现：上丘中的听觉神经元会根据眼睛的注视方向调整其感受野。
这是 Dr. Groh 职业生涯的奠基性发现，为理解动态的、依赖眼位的听觉地图奠定了框架。
大脑必须持续计算：“这个声音相对于我的注视方向在哪里？“——这一计算涉及参考坐标系的转换。

声音定位的机制

大脑通过以下方式判断声音的位置：
1. 双耳时间差（ITD）：声音到达两耳的时间不同；最大可能的时间延迟约为 0.5 毫秒（短于单个action potential的持续时间）。
2. 双耳强度差（ILD）：头部产生声学遮蔽效应，使远侧耳朵接收到的声音略微偏小。
3. 耳廓的频谱滤波：外耳的褶皱根据声音方向产生独特的频率”指纹”——且这一指纹因人而异。
拥有菜花耳（如摔跤运动员）的人可能频谱滤波发生改变，初期可能在声音定位上有所障碍，但可随时间适应。
声音的距离感知依赖于：
- 相对于已知声源强度的响度（如雷声）
- 室内声学与回声延迟：大脑计算直达声与从附近物体反射的声音之间的到达时间差，以估算距离。

眼球运动与鼓膜：一项突破性发现

Dr. Groh 的实验室发现，每一次扫视性眼球运动都会导致鼓膜移动，即使在完全静默的环境中也是如此。
这种运动具有以下特点：
- 与眼球运动的起始精确锁时
- 差异性运动：眼睛向左移动时，右侧鼓膜向内凸起，左侧鼓膜向外凸起（反之亦然）——呈协调的波状运动模式
- 幅度可分级：运动幅度编码了眼球移动的距离；部分垂直方向的运动信息也被编码其中
该现象通过放置于耳道内的麦克风检测耳声发射而无创发现。
其机制被认为涉及来自大脑的自上而下信号，激活中耳肌肉和外毛细胞（可像肌肉一样收缩），进而牵动听小骨和鼓膜。
这可能代表了视觉-听觉整合的最早阶段——发生在耳朵本身的层面，而非仅仅在大脑中。

多感觉整合与腹语效应

大脑持续评估哪些感觉来源可能属于同一事物，并相应地将它们合并。
时间同步性至关重要：唇形动作与声音之间即使存在微小的偏移，也会导致感知崩溃（正如视频声画不同步时所见）。
腹语效应展示了大脑愿意以视觉信息覆盖听觉定位——大脑将声音归因于木偶，因为其口部动作与音频相关联。
观看视频或电影时，大脑会根据屏幕上所见重新映射声源，尽管声音的物理来源（扬声器、耳塞）始终不变。

为什么录音中你的声音听起来很陌生

三个原因：

录音无法捕捉你声音的完整频谱。
大脑在说话前后会抑制听觉输入——这是一种精确计时的音量降低机制，以防止自身发出的声音（鉴于嘴巴与耳朵距离极近）显得令人不适。
骨传导：你实时听到自己声音的很大一部分是通过骨骼（颅骨振动）传导的，而非空气——录音只能捕捉空气传导的部分。

听力健康与听力损失

如果寿命足够长，80% 的人都会出现听力损失。
由于长期在高音量下使用耳塞，年轻人积累听力损伤的时间比前几代人更早。
安全聆听指南：如果你身旁的人能察觉到你耳机中有声音——不需要听清内容，仅仅是感知到声音的存在——那么这个音量很可能正在造成永久性听力损伤。
降噪耳机比提高音量来克服环境噪音更为可取。
听力损失与dementia（痴呆症）相关：感觉输入减少可能导致大脑下调相关回路，进而引发记忆力和注意力衰退。
蓝牙辐射：据熟悉该领域的神经外科医生表示，蓝牙耳机的辐射量远低于日常环境中的电磁场暴露，不被视为重大健康隐患。

音乐、节奏与进化

节奏是人类所有文化中唯一普遍存在的音乐元素——旋律与和声因文化而异，但节奏不然。
一种主流理论认为：节奏与音乐的进化是为了实现协调的群体行动（集体跺脚、齐声呐喊），以对抗捕食者或竞争群体——使集体发出比任何个体都更响亮的声音。
音乐也可能通过性选择进化而来：具有音乐才能的个体可能拥有更多后代（类似孔雀的羽毛）。
音乐为记忆组织语言：知道一段歌词的前两三个词

English Original 英文原文

How Your Thoughts Are Built & How You Can Shape Them | Dr. Jennifer Groh

Summary

Dr. Jennifer Groh, professor of psychology and neuroscience at Duke University, explains how the brain encodes and integrates sensory information — particularly sight and sound — to create a coherent perception of the world. She details the neural mechanisms behind sound localization, multisensory binding, and the surprising discovery that eye movements directly modulate how the ears process sound. The conversation culminates in a compelling theory of what thoughts actually are at the neural level: simulations run across the brain’s sensory-motor infrastructure.

Key Takeaways

Thoughts are sensory simulations: When you think of a concept (e.g., a cat), your brain runs a mini-simulation across visual, auditory, olfactory, and other sensory cortices simultaneously.
Eye movements physically move your eardrums: Every saccadic eye movement causes a precisely timed, coordinated movement in both eardrums — the right and left eardrums move in opposite directions — suggesting the brain integrates vision and hearing at the earliest possible stage.
Sound localization is an extraordinary computational feat: The brain detects interaural time differences as small as half a millisecond — shorter than a single action potential — to determine where a sound originates.
Your brain actively turns down your hearing when you speak: A top-down mechanism reduces auditory transduction just before speech to prevent the close proximity of your mouth to your ears from being overwhelming.
Sensory integration must be continuously learned: Because a baby’s head is roughly half the width of an adult’s, the timing cues for sound localization change throughout development and must be constantly recalibrated.
Hearing loss is linked to dementia: Reduced sensory input may cause the brain to downregulate circuits, contributing to memory and attention decline.
Protect your hearing: If someone nearby can detect that any sound is coming from your headphones, the volume is likely causing permanent hearing damage.
Rhythm is universal across all human cultures: Even cultures without melody or harmony have rhythm, supporting theories that rhythmic coordination evolved to enable group cooperation and competition.
The ventriloquist effect reveals how the brain assigns sound sources: The brain continuously calculates the “most likely candidate” for a sound’s origin and can override raw auditory input with visual information.
Low-frequency sound travels farther and bends around objects better than high-frequency sound, which is why warning signals (drums, bass horns) favor lower frequencies.

Detailed Notes

The Nature of Thought: Sensory Simulation Theory

The leading theory discussed is that thinking is the brain running sensory-motor simulations.
When you think of a concept like “cat,” the brain simultaneously activates:
- Visual cortex (what the cat looks like)
- Auditory cortex (what a cat sounds like)
- Olfactory and other sensory regions (e.g., smell of kitty litter)
This explains why cognitive load from one modality impairs another: telling a passenger “be quiet” while merging in traffic reflects the need to redirect shared sensory-motor resources toward the immediate task.
This framework offers perhaps the most mechanistically grounded definition of thought in neuroscience: thoughts are internally generated sensory simulations, not abstract symbolic processes.

The Superior Colliculus: Where Senses First Converge

The superior colliculus is a midbrain structure responsive to both visual and auditory stimuli.
Key discovery: auditory neurons in the superior colliculus shift their receptive fields based on where the eyes are pointing.
This was the founding observation of Dr. Groh’s career and set the framework for understanding dynamic, eye-position-dependent auditory maps.
The brain must continuously compute: “Where is this sound relative to where my eyes are looking?” — a calculation that involves reference frame transformation.

Sound Localization: The Mechanics

The brain determines sound location using:
1. Interaural time difference (ITD): Sound reaches one ear before the other; the maximum possible delay is ~0.5 milliseconds (less than the duration of a single action potential).
2. Interaural level difference (ILD): The head creates an acoustic shadow, making sound slightly quieter in the far ear.
3. Spectral filtering by the pinna: The outer ear’s folds create a unique frequency “fingerprint” depending on the direction of the sound — and this fingerprint is unique to each individual.
People with cauliflower ears (e.g., wrestlers) likely have altered spectral filtering and may initially struggle with sound localization but can adapt over time.
Distance perception for sound relies on:
- Loudness relative to known source intensity (e.g., thunder)
- Room acoustics and echo delays: The brain computes the difference in arrival time between direct-path sound and copies bouncing off nearby surfaces to estimate distance.

Eye Movements and the Eardrum: A Groundbreaking Finding

Dr. Groh’s lab discovered that every saccadic eye movement causes the eardrums to move, even in total silence.
The movement is:
- Precisely time-locked to the onset of the eye movement
- Differential: If eyes move left, the right eardrum bulges inward while the left bulges outward (and vice versa) — moving in a coordinated wave-like pattern
- Graded: The magnitude encodes how far the eyes moved; some vertical movement information is also encoded
This was detected non-invasively using a microphone placed in the ear canal to detect otoacoustic emissions.
The mechanism is thought to involve top-down signals from the brain activating the middle ear muscles and outer hair cells (which can contract like muscles), which then tug on the ossicles and eardrum.
This may represent the earliest stage of visual-auditory integration — happening at the level of the ear itself, not just in the brain.

Multisensory Binding and the Ventriloquist Effect

The brain continuously evaluates which sensory sources are likely to belong together and merges them accordingly.
Temporal synchrony is critical: even tiny offsets between lip movements and sound cause perception to break down (as seen in poorly synced videos).
The ventriloquist effect demonstrates the brain’s willingness to override auditory localization with visual information — the brain attributes sound to the puppet because its mouth movements correlate with the audio.
When watching video or movies, the brain remaps sound sources based on what is seen on screen, even though the physical source of the sound (speakers, earbuds) is constant.

Why Your Voice Sounds Strange in Recordings

Three reasons:

Recordings don’t capture the full frequency spectrum of your voice.
The brain suppresses auditory input just before and during speech — a precisely timed volume-reduction mechanism to prevent self-generated sound from being overwhelming (given how close the mouth is to the ears).
Bone conduction: Much of how you hear your own voice live is transmitted through bone (skull vibrations), not air — recordings capture only the air-conducted component.

Hearing Health and Hearing Loss

80% of people will experience hearing loss if they live long enough.
Young people are accumulating hearing damage earlier than previous generations due to constant earbud use at high volumes.
Safe listening guideline: If anyone near you can detect that sound is coming from your headphones — not even the specific content, just the presence of sound — the volume is likely causing permanent hearing damage.
Noise-canceling headphones are preferable to turning up volume to overcome ambient noise.
Hearing loss is correlated with dementia: less sensory input may cause the brain to downregulate circuits, leading to memory and attention decline.
Bluetooth radiation: According to neurosurgeons familiar with the topic, Bluetooth headphone radiation is considerably lower than everyday environmental EMF exposure and is not considered a significant concern.

Music, Rhythm, and Evolution

Rhythm is the only musical element universal across all human cultures — melody and harmony vary, but rhythm does not.
A prominent theory: rhythm and music evolved to enable coordinated group action (stomping, shouting together) to compete with predators or rival groups — being collectively louder than any individual.
Music may also have evolved via sexual selection: musically skilled individuals may have had more offspring (similar to peacock plumage).
Music organizes language for memory: knowing the **first two or three words of a verse

Health & Wellness | 健康知识库

健康导航 Navigation

你的思想是如何形成的，以及你如何塑造它们 | Dr. Jennifer Groh