Language Technologies Institute | @CarnegieMellon(@LTIatCMU) 's Twitter Profileg
Language Technologies Institute | @CarnegieMellon

@LTIatCMU

The Language Technologies Institute in Carnegie Mellon University's @SCSatCMU

ID:4901650162

linkhttp://lti.cs.cmu.edu calendar_today12-02-2016 15:12:09

1,4K Tweets

9,2K Followers

233 Following

Atharva Kulkarni(@Atharvak311) 's Twitter Profile Photo

Multitask learning (MTL) is known to enhance model performance on average, yet its effect on group fairness is under-explored. In our recent paper with Lucio Dery Jnr Mwinm Amrith Setlur Aditi Raghunathan Ameet Talwalkar & Graham Neubig, we address this gap!

openreview.net/forum?id=sPlhA…
(1/10)

Multitask learning (MTL) is known to enhance model performance on average, yet its effect on group fairness is under-explored. In our recent #TMLR2024 paper with @derylucio @setlur_amrith @AdtRaghunathan @atalwalkar & @gneubig, we address this gap! openreview.net/forum?id=sPlhA… (1/10)
account_circle
Language Technologies Institute | @CarnegieMellon(@LTIatCMU) 's Twitter Profile Photo

The effectiveness of Video LMMs can be enhanced from DPO training using language model reward, which leverages detailed video captions as proxies for video content, leading to cost-effective preference optimization for video LMM alignment. twitter.com/RuohongZhang/s…

account_circle
Suhas Kotha(@kothasuhas) 's Twitter Profile Photo

Defending jailbreaks involves 1) specifying/learning a definition of unsafe outputs and 2) enforcing this definition. Taeyoun, Aditi, and I find that methods can't even enforce a simple definition: do not output “purple”! Are defenses actually robust? ⬇️

arxiv.org/abs/2403.14725

Defending jailbreaks involves 1) specifying/learning a definition of unsafe outputs and 2) enforcing this definition. Taeyoun, Aditi, and I find that methods can't even enforce a simple definition: do not output “purple”! Are defenses actually robust? ⬇️ arxiv.org/abs/2403.14725
account_circle
Xuhui Zhou(@nlpxuhui) 's Twitter Profile Photo

Let’s talk about social simulations! Do you know that term could refer to various settings? Our new work suggests that you might want to double-check before being “amazed” by those simulations.
📜: arxiv.org/abs/2403.05020
🌐: agscr.sotopia.world 1/

account_circle
Zhiqing Sun(@EdwardSun0909) 's Twitter Profile Photo

🌟Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision 🌟

arxiv.org/abs/2403.09472

How can we keep improving AI systems when their capabilities surpass those of human supervisors? (1/n)

🌟Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision 🌟 arxiv.org/abs/2403.09472 How can we keep improving AI systems when their capabilities surpass those of human supervisors? (1/n)
account_circle
Zora Zhiruo Wang(@ZhiruoW) 's Twitter Profile Photo

Tools can empower LMs to solve many tasks. But what are tools anyway?
github.com/zorazrw/awesom…

Our survey studies tools for LLM agents w/
–A formal def. of tools
–Methods/scenarios to use&make tools
–Issues in testbeds and eval metrics
–Empirical analysis of cost-gain trade-off

Tools can empower LMs to solve many tasks. But what are tools anyway? github.com/zorazrw/awesom… Our survey studies tools for LLM agents w/ –A formal def. of tools –Methods/scenarios to use&make tools –Issues in testbeds and eval metrics –Empirical analysis of cost-gain trade-off
account_circle
Vidhi Jain(@viddivj) 's Twitter Profile Photo

What if we could show a robot how to do a task?

We present Vid2Robot,  which is a robot policy trained to decode human intent from visual cues and translate it into actions in its environment. 🤖

Website: vid2robot.github.io
Arxiv: arxiv.org/abs/2403.12943

🧵(1/n)

account_circle
Vidhi Jain(@viddivj) 's Twitter Profile Photo

Excited to share our work on FlexCap! It can provide visual captions at varying level of granularity for any region in the image.

Website: flex-cap.github.io

See 🧵by Debidatta Dwibedi for more details!

account_circle
Kwanghee Choi(@juice500ml) 's Twitter Profile Photo

It's likely that you were reading linear probing results wrong! 🤔 In our new paper for , we analyze probing via information theory to provide two pro tips: (1) use loss, not accuracy, and (2) stop worry about the probe size. arxiv.org/abs/2312.10019 (1/n)

account_circle
Ruiyi Wang(@RuiyiWang153) 's Twitter Profile Photo

Happy PI🥧 day!

Can language agents🤖 learn social skills through imitation and interaction?

We are excited to introduce SOTOPIA-π🥧 pi.sotopia.world, an interactive learning method for training language agents to navigate real-world social scenarios while role-playing!

account_circle
michael(@_michaelginn) 's Twitter Profile Photo

Automated systems for interlinear glossed text generation can aid in language documentation projects

But data for any endangered language is limited…

In our new paper, we compile the largest multilingual IGT corpus and pretrain foundation models

arxiv.org/abs/2403.06399

account_circle
Siddhant Arora(@Sid_Arora_18) 's Twitter Profile Photo

New 📜, we build streaming semi-autoregressive ASR that performs greedy NAR decoding within a block but keeps the AR property across blocks by encoding the labels of previous blocks to achieve strong performance at very low latency!

📜: arxiv.org/pdf/2309.10926…

New #ICASSP2024 📜, we build streaming semi-autoregressive ASR that performs greedy NAR decoding within a block but keeps the AR property across blocks by encoding the labels of previous blocks to achieve strong performance at very low latency! 📜: arxiv.org/pdf/2309.10926…
account_circle
Jacob Springer(@jacspringer) 's Twitter Profile Photo

Autoregressive language models (LLaMA, Mistral, etc) are fundamentally limited for text embeddings since they don’t encode information bidirectionally. We provide an easy fix: just repeat your input! We are the #1 fully-open-source model on MTEB!
arxiv.org/abs/2402.15449
1/6

Autoregressive language models (LLaMA, Mistral, etc) are fundamentally limited for text embeddings since they don’t encode information bidirectionally. We provide an easy fix: just repeat your input! We are the #1 fully-open-source model on MTEB! arxiv.org/abs/2402.15449 1/6
account_circle
Xuhui Zhou(@nlpxuhui) 's Twitter Profile Photo

Excited to share that Sotopia (openreview.net/forum?id=mM7Vu…) has been accepted to ICLR 2024 as a spotlight 🌠!
Sotopia is one of the unique platforms for facilitating socially-aware and human-centered AI systems.
We've been busy at work, and have follow-ups coming soon, stay tuned!

account_circle
Language Technologies Institute | @CarnegieMellon(@LTIatCMU) 's Twitter Profile Photo

Our Master of Computational Data Science program has been named the fifth best data science master's program by FORTUNE! Congrats to all the faculty, staff, students and alums who work hard to make that program great. fortune.com/education/info…

account_circle
Teaching NLP Workshop @ ACL2024(@teaching_nlp) 's Twitter Profile Photo

📢Excited to announce the 6th TeachingNLP workshop to be held at ACL 2024 2024! This full-day workshop will be on August 15 in Bangkok 🇹🇭, with an interactive hybrid option available.
sites.google.com/view/teachingn…

account_circle
Syeda Nahida Akter(@SNAT02792153) 's Twitter Profile Photo

When solving a difficult problem, we often draw a diagram to help us visualize. What if VLMs could do the same?

Introducing Self-Imagine – a method that enhances the reasoning abilities of VLMs on text-only tasks through visualization.

Paper: arxiv.org/abs/2401.08025
🧵↓

When solving a difficult problem, we often draw a diagram to help us visualize. What if VLMs could do the same? Introducing Self-Imagine – a method that enhances the reasoning abilities of VLMs on text-only tasks through visualization. Paper: arxiv.org/abs/2401.08025 🧵↓
account_circle
Zora Zhiruo Wang(@ZhiruoW) 's Twitter Profile Photo

Do you find LM-written programs too complex to understand?
Do bugs often pop up in these solutions?

Check out *TroVE*, a training-free method to create accurate, concise, and verifiable solutions by inducing tools
🔗: arxiv.org/pdf/2401.12869…

account_circle
Pengfei Liu(@stefan_fee) 's Twitter Profile Photo

Scalability is an eternal topic in LLMOps. Existing works focus on training superhuman systems by constructing scalable oversight. We focus on a dual problem: scalable evaluation and argue: by introducing Agent Debate into the meta-evaluation, we can achieve this goal pretty well

Scalability is an eternal topic in LLMOps. Existing works focus on training superhuman systems by constructing scalable oversight. We focus on a dual problem: scalable evaluation and argue: by introducing Agent Debate into the meta-evaluation, we can achieve this goal pretty well
account_circle
Jing Yu Koh(@kohjingyu) 's Twitter Profile Photo

Computer interfaces are inherently visual. To build general autonomous agents, we will need strong vision language models.

To assess the performance of multimodal agents, we introduce VisualWebArena (VWA): a benchmark for evaluating multimodal web agents on realistic visually…

Computer interfaces are inherently visual. To build general autonomous agents, we will need strong vision language models. To assess the performance of multimodal agents, we introduce VisualWebArena (VWA): a benchmark for evaluating multimodal web agents on realistic visually…
account_circle