visual microphone
Image by Tomislav Jakupec from Pixabay

“Sound Transmission Isn’t Necessary”: Visual Microphone Decodes Vibrations to Hear Without Listening

Beijing University scientists have developed a visual microphone that interprets light changes on material surfaces caused by auditory vibrations to decode music, speech, and other sounds that it cannot directly hear.

Scientists have created similar systems using costly lasers or high-speed cameras to interpret sound vibrations on glass windows and other surfaces. However, the newly invented system boasts a low-cost, highly efficient single-pixel camera and the potential ability to decode sounds through glass and other translucent surfaces by measuring their vibrations on various surfaces, such as leaves and sheets of paper.

The team behind the novel listening device stated that the minimal data stream from a single-pixel camera also increases the viability of creating listening devices that can operate continuously and without the need to hear the sounds being monitored directly.

“Our method simplifies and reduces the cost of using light to capture sound while also enabling applications in scenarios where traditional microphones are ineffective, such as conversing through a glass window,” explained team leader Xu-Ri Yao from Beijing Institute of Technology in China. “As long as there is a way for light to pass through, sound transmission isn’t necessary.”

Visual Microphone Detects Light Instead of Sound

Unlike traditional digital cameras that capture images with millions of pixels, a single-pixel camera captures significantly less visual information. Rather than recording an image all at once like conventional multi-pixel cameras, the research team said the scene’s light is modulated using time-varying structured patterns by a spatial light modulator, and the single-pixel detector measures the amount of modulated light for each pattern. While not ideal for high-definition imagery, single-pixel cameras that capture this limited amount of information can be used by a computer to reconstruct visual information about the object.

To build their visual microphone, Yao’s team utilized a device known as a high-speed spatial light modulator. Unlike a sound-based microphone that converts vibrations to electrical signals that can be amplified, recorded, or otherwise manipulated, a microphone using this type of light modulator receives encoded information from vibrations detected by changes in light.

“The sound-induced motion causes subtle changes in light intensity that were captured by the single-pixel detector and decoded into audible sound,” they explained.

The software used to interpret the light-induced motions captured by the visual microphone used Fourier-based localization methods to measure extremely minute variations in an object’s surface caused by sound.

Tests Prove Device Can Decode Speech and Music

After completing the design of their visual microphone, the team tested its performance. In the first series of tests, the team placed their invention 0.5 meters away from a piece of paper. In the second series, the paper was replaced by a plant leaf.

While recording the device’s digital output, the researchers used both Chinese and English words, as well as a musical segment from Beethoven’s “Für Elise.” After recording the subtle vibrations caused by the words and music on the paper and leaf test objects, the team performed their Fourier analysis.

According to the team’s statement, the system successfully reconstructed “clear and intelligible audio” from both samples. When comparing the results, the team found that the paper card produced better results than the leaf.

visual microphone
The researchers successfully reconstructed audio signals by imaging vibration from a paper card (a-c). They applied a signal processing filter to enhance the high-frequency component of the signal (d-f). Image Credit: Xu-Ri Yao, Beijing Institute of Technology

A more detailed analysis of the data captured by the visual microphone and the sound results produced by the interpretation software revealed that low-frequency sounds, less than 1 kHz, were accurately recovered. Conversely, the team reports that they found that high-frequency sounds greater than 1 kHz “showed slight distortion,” which they improved by applying a signal processing filter.

“Combining single-pixel imaging with Fourier-based localization methods allowed us to achieve high-quality sound detection using simpler equipment and at a lower cost,” said Yao. “Our system enables sound detection using everyday items like paper cards and leaves, under natural lighting conditions, and doesn’t require the vibrating surface to reflect light in a certain way.”

Improving Detection Distance and Decoding Accuracy

Because the visual vibration information was captured using a single-pixel camera, the device reportedly achieved a data rate of 4 MB/s. The team stated that this rate was “sufficiently low” to minimize data storage demands and potentially allow for the use of long-term recording devices. They also noted that this versatility is valuable in remote locations, as smaller amounts of data can be uploaded to the web in real-time, “enabling long-duration or even continuous sound recording.”

“The new technology could potentially change the way we record and monitor sound, bringing new opportunities to many fields, such as environmental monitoring, security, and industrial diagnostics,” said Yao. “For example, it could make it possible to talk to someone stuck in a closed-off space like a room or a vehicle.”

The team is already exploring ways to improve the device’s overall performance. According to their statement, the top goals are to improve the distance at which the system can accurately detect minute sound vibrations and increase the system’s conversion accuracy. Although the only working visual microphone using a single-pixel camera is currently restricted to the research team’s lab, they aim to explore several real-world applications beyond remote sensing.

“We aim to expand the system into other vibration measurement applications, including human pulse and heart rate detection, leveraging its multifunctional information sensing capabilities,” Yao said.

The paper “A visual microphone based on computational imaging” was published in Optica Express.

 Christopher Plain is a Science Fiction and Fantasy novelist and Head Science Writer at The Debrief. Follow and connect with him on X, learn about his books at plainfiction.com, or email him directly at christopher@thedebrief.org.