Home > Speech Recognition > Scaling audio-visual learning without labels..

Scaling audio-visual learning without labels

July 31, 23
57 VIEWS
0 LIKS

Researchers from MIT, the MIT-IBM Watson AI Lab, IBM Research, and elsewhere have developed a new technique for analyzing unlabeled audio and visual data that could improve the performance of machine-learning models used in applications like speech recognition and object detection. The work, for the first time, combines two architectures of self-supervised learning, contrastive learning and masked data modeling, in an effort to scale machine-learning tasks like event classification in single- an

“A larger portion of human knowledge is learned in a self-supervised way, because we don't always get supervision signals, and we want to enable the machine-learning model to have the same ability,” says Yuan Gong, an MIT postdoc in the Computer Scie

“So, another way to put it is that self-supervised learning often forms the foundation of an initial model, because it can learn on vast amounts of unlabeled data. And then you can use classical, supervised learning or reinforcement learning to fine tune the model to something particular if you want to,” says Jim Glass, an MIT senior research scientist and member of the MIT-IBM Watson AI Lab.

“I FEEL SORRY FOR ANYONE WHO IS IN A PLACE WHERE HE FEELS STRANGE AND STUPID.”

WRITING BY JACKSON DOE

TRENDING NEWS

July 31, 23

Computational model mimics humans’ ability to predict emotions

July 27, 23

A simpler method for learning to control a robot

July 31, 23

Scaling audio-visual learning without labels

July 31, 23

Building resilience for the next supply chain disruption

AI News

A simpler method for learning to control a robot

VIEW ALL ARTICALS

SUBSCRIBE TO OUR NEWSLETTER

Get notified of the best Ai Tools

LEAVE A COMMENTS

COMMENTS