Inside Conduit’s Underground Push to Teach Machines to Read Human Thought

San Francisco, In a city where artificial intelligence startups compete for attention, one small company has taken a different route. Under the streets of San Francisco, in a windowless basement lab, Conduit has spent the last six months quietly building what it claims is the largest brain-to-language dataset ever created.

The ambition behind this effort is broad but specific: to teach machines how to turn human thoughts into written language. This is not about emotions or vague feelings, but about the focused intent that exists just before someone speaks or types.

If successful, this work could change how humans interact with computers and how we understand language itself.

A Basement Lab Built for Scale

The lab itself is intentionally plain. Compact recording booths fill the space, designed for efficiency rather than comfort. There are no windows, no fancy demo rooms, and no effort to create a sleek look typical of many Silicon Valley startups.

Instead, the goal is on productivity.

Over the past six months, thousands of volunteers came through these booths, with each session lasting about two hours. During their time, participants either spoke aloud or typed freely while wearing Conduit’s custom-made neural recording headsets. By the company’s estimate, the total collected now stands at around 10,000 hours of non-invasive brain data.

For researchers in neuro-language and brain-computer interfaces, that number is impressive. Previous datasets in this field have often been measured in dozens or hundreds of hours, not thousands.

Conduit’s bet is straightforward: scale changes everything.

From Clinical Tasks to Natural Conversation

Early versions of the project resembled traditional neuroscience experiments. Participants received specific prompts, instructions, and controlled tasks designed to produce clear, measurable signals.

The results were usable technically, but limited creatively.

Engineers observed that participants sounded flat, disengaged, and overly careful. The language generated in these sessions lacked the rhythm, spontaneity, and variety of everyday speech. More importantly, the neural signals associated with that language were less expressive than the team had hoped.

So Conduit changed its approach.

Instead of treating volunteers as subjects in an experiment, the company redesigned sessions around conversation. Participants were encouraged to speak naturally, often engaging in open-ended discussions with a large language model. There were fewer rules, fewer restrictions, and more space for digression.

The difference, according to the engineers, was immediate:

Language became more varied and expressive.
Speech patterns resembled real-world communication.
Neural signals matched more closely with both audio and text.

By letting people talk as they usually do, the dataset became richer, not just larger.

Hardware Built From the Ground Up

Capturing this amount of data required more than standard equipment.

Conduit’s team quickly realized that existing commercial headsets were not built for the kind of dense, multi-modal recording they needed. Most devices could track only a limited number of signals at once, forcing researchers to choose between depth and breadth.

So, Conduit built its own hardware.

The resulting headsets are heavy, noticeable, and unapologetically industrial. Weighing about four pounds each, the rigs combine several sensing technologies:

  • Electroencephalogram (EEG) to measure electrical brain activity.
  • Functional near-infrared spectroscopy (fNIRS) to track blood flow changes linked to neural activation.
  • Additional sensors to capture timing, motion, and environmental data.

These training headsets were never meant to be user-friendly. Comfort was sacrificed for signal density and accuracy.

Lighter, wearable versions are planned for the future, but only after the models show which signals truly matter.

Timing Is Everything

Gathering brain data is only half the challenge. Making sense of it requires precision.

Conduit’s systems direct data from every sensor into a unified storage pipeline, carefully synchronized down to fractions of a second. That alignment allows models to examine brain activity just before language emerges – the mental moments when intent is forming but words have yet to come.

Those moments are the key target.

Instead of decoding speech after it happens, Conduit’s models are trained to recognize patterns that come before language. The goal is not transcription but anticipation – understanding meaning before it takes physical form.

Fighting Noise, Literally

One of the project’s earliest and most persistent challenges was electrical noise.

Urban buildings are filled with interference: power lines, electronic devices, and even lighting systems can distort delicate neural signals. In the basement lab, those distortions threatened to overwhelm the data.

The team first tried conventional solutions:

  • Shielding cables
  • Adjusting sensor placement
  • Using software filters

When that wasn’t enough, they took a more drastic measure.

At one point, the lab ran entirely on battery power. The building’s main electricity was shut off to eliminate interference at the source.

This workaround improved signal quality but created new issues. Battery packs were heavy, expensive, and had to be swapped constantly. Data occasionally dropped mid-session. Managing these logistics became as challenging as the engineering itself.

Ironically, scale once again became the solution.

As the dataset grew into the thousands of hours, models began to generalize. Variations between individuals, sessions, and even recording conditions mattered less. Perfect noise suppression became less critical when patterns repeated across a massive dataset.

Turning a Research Project Into an Assembly Line

With growth came operational demands.

To keep pace, Conduit rebuilt its backend systems. Corrupted or incomplete sessions were automatically flagged. Supervisors could monitor multiple booths at once. Scheduling software made sure headsets were rarely idle.

At peak operation, the lab ran close to 20 hours a day.

These changes transformed what could have been a fragile research effort into something akin to an industrial data pipeline. Over time, the company estimates it reduced the cost of each usable hour of data by around 40 percent.

In this context, efficiency was not just about saving money. It allowed the team to maintain consistency, catch issues early, and scale without losing quality.

What Comes Next

With data collection mostly complete, Conduit is now shifting its focus. The work is moving inward – away from headsets and scheduling systems, and toward model training and refinement.

Details about performance remain closely guarded. The company has not publicly shared accuracy metrics or demonstrated how well its systems can reconstruct language from brain signals alone.

This silence appears intentional.

In a field prone to hype, Conduit seems focused on letting results speak for themselves. Whether its models can reliably decode meaning – and do so across individuals – remains an open question.

What is clear, however, is that the company has already crossed a threshold few others have reached: scale. By treating brain-language data not as a limited scientific resource but as something that can be collected, refined, and systematized, Conduit is exploring a new model for neuro-AI research.

For now, that experiment continues underground, quietly shaping a future where language may not need words at all.

Article

Source: indianexpress.com

About author