Mark Mazumder (@markmaz) Twitter Tweets • TwiDoom

coqui

3 years ago

Coqui + Harvard University + Google paper accepted to @INTERSPEECH2021 🐸🚀 🔥 Few-Shot Keyword Spotting in Any Language 🔥 Check it out on arXiv 😎 arxiv.org/abs/2104.01454 Spear-headed by the brilliant 💫 Mark Mazumder 💫 from the Harvard University Edge Computing lab 1/

Coqui + <a href="/Harvard/">Harvard University</a> + <a href="/Google/">Google</a> paper accepted to @INTERSPEECH2021 🐸🚀

🔥 Few-Shot Keyword Spotting in Any Language 🔥

Check it out on arXiv 😎 arxiv.org/abs/2104.01454

Spear-headed by the brilliant 💫 Mark Mazumder 💫 from the <a href="/Harvard/">Harvard University</a> Edge Computing lab

1/

thumb_up_off_alt80

chat_bubble_outline1

repeat24

shareShare

Edge Impulse

@edgeimpulse

3 years ago

This paper introduces a few-shot transfer learning method for keyword spotting in any language: bit.ly/32z3wIr #tinyML

thumb_up_off_alt8

chat_bubble_outline0

repeat5

shareShare

The Gradient

@gradientpub

3 years ago

Over the last year, @commons_ml set out to expand open source speech recognition resources Find out about The People’s Speech, a massive English-language dataset, and the 50-language, 6000-hour Multilingual Spoken Words Corpus 🗣️ 👇👇👇 thegradientpub.substack.com/p/new-datasets…

thumb_up_off_alt22

chat_bubble_outline0

repeat12

shareShare

MLCommons

@mlcommons

3 years ago

MLCommons releases Multilingual Spoken Words Corpus, permissively licensed keyword spotting dataset in 50 languages. The rich audio #speech dataset helps advance development of apps such as voice interfaces for a broad global audience. bit.ly/3yxqYF5 #AI #ML #NeurIPS2021

thumb_up_off_alt206

chat_bubble_outline6

repeat73

shareShare

David Kanter

@thekanter

3 years ago

@commons_ml A huge thanks to all our partners and community supporters, you can read more about this project at mlcommons.org/en/news/spoken… It takes a team :)

thumb_up_off_alt5

chat_bubble_outline1

repeat2

shareShare

Edge Impulse

@edgeimpulse

3 years ago

The Multilingual Spoken Words Corpus is a large and growing audio dataset of spoken words in 50 different languages. It contains more than 340K words and 23M one-second audio samples, adding up to over 6K hours of speech. mlcommons.org/en/news/spoken…

thumb_up_off_alt1

chat_bubble_outline1

repeat2

shareShare

Harvard SEAS

@hseas

3 years ago

A new project aims to build a dataset with 1,000 words in 1,000 different languages to bring voice technology to hundreds of millions of speakers around the world buff.ly/3pZjJlz

thumb_up_off_alt22

chat_bubble_outline1

repeat12

shareShare

Mark Mazumder

@markmaz

3 years ago

Our new dataset, The Multilingual Spoken Words Corpus (NeurIPS 2021 Datasets & Benchmarks Track) is now available! Paper: openreview.net/forum?id=c20ji… Data: mlcommons.org/words Colab Tutorial: colab.research.google.com/github/harvard… Video: youtube.com/watch?v=eGPCwn…

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Andrew Ng

@andrewyng

3 years ago

I’m happy to see MLCommons release both a multilingual speech dataset and a large 30,000 hour diverse english dataset. We need more open datasets like these to help researchers build speech systems that can serve many individuals around the world. 🗺️

thumb_up_off_alt646

chat_bubble_outline12

repeat122

shareShare

Davis Blalock

@davisblalock

2 years ago

"DataPerf: Benchmarks for Data-Centric AI Development" What if instead of holding the data constant and benchmarking different models, we held the model constant and benchmarked different data pipelines? [1/7]

thumb_up_off_alt124

chat_bubble_outline3

repeat26

shareShare

MLCommons

@mlcommons

a year ago

The future of #ML is data-centric! That’s why we built #DataPerf, the leaderboard for data. It is the 1st platform and community for data-centric competitions. Together we will break through data limitations and unlock better ML for the world mlcommons.org/en/news/datape…

thumb_up_off_alt26

chat_bubble_outline0

repeat18

shareShare