Language Technologies Institute | @CarnegieMellon (@LTIatCMU) Twitter Tweets • TwiCopy

Atharva Kulkarni

3 weeks ago

Multitask learning (MTL) is known to enhance model performance on average, yet its effect on group fairness is under-explored. In our recent #TMLR2024 paper with Lucio Dery Jnr Mwinm Amrith Setlur Aditi Raghunathan Ameet Talwalkar & Graham Neubig, we address this gap!

openreview.net/forum?id=sPlhA…
(1/10)

account_circle

Language Technologies Institute | @CarnegieMellon

@LTIatCMU

2 weeks ago

The effectiveness of Video LMMs can be enhanced from DPO training using language model reward, which leverages detailed video captions as proxies for video content, leading to cost-effective preference optimization for video LMM alignment. twitter.com/RuohongZhang/s…

thumb_up_off_alt5

chat_bubble_outline0

repeat4

shareShare

account_circle

Suhas Kotha

@kothasuhas

1 month ago

Defending jailbreaks involves 1) specifying/learning a definition of unsafe outputs and 2) enforcing this definition. Taeyoun, Aditi, and I find that methods can't even enforce a simple definition: do not output “purple”! Are defenses actually robust? ⬇️

arxiv.org/abs/2403.14725

account_circle

Xuhui Zhou

@nlpxuhui

1 month ago

Let’s talk about social simulations! Do you know that term could refer to various settings? Our new work suggests that you might want to double-check before being “amazed” by those simulations.
📜: arxiv.org/abs/2403.05020
🌐: agscr.sotopia.world 1/

account_circle

Zhiqing Sun

@EdwardSun0909

1 month ago

🌟Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision 🌟

arxiv.org/abs/2403.09472

How can we keep improving AI systems when their capabilities surpass those of human supervisors? (1/n)

account_circle

Zora Zhiruo Wang

@ZhiruoW

1 month ago

Tools can empower LMs to solve many tasks. But what are tools anyway?
github.com/zorazrw/awesom…

Our survey studies tools for LLM agents w/
–A formal def. of tools
–Methods/scenarios to use&make tools
–Issues in testbeds and eval metrics
–Empirical analysis of cost-gain trade-off

account_circle

Vidhi Jain

@viddivj

1 month ago

What if we could show a robot how to do a task?

We present Vid2Robot, which is a robot policy trained to decode human intent from visual cues and translate it into actions in its environment. 🤖

Website: vid2robot.github.io
Arxiv: arxiv.org/abs/2403.12943

🧵(1/n)

account_circle

Vidhi Jain

@viddivj

1 month ago

Excited to share our work on FlexCap! It can provide visual captions at varying level of granularity for any region in the image.

Website: flex-cap.github.io

See 🧵by Debidatta Dwibedi for more details!

account_circle

Kwanghee Choi

@juice500ml

1 month ago

It's likely that you were reading linear probing results wrong! 🤔 In our new paper for #ICASSP2024 , we analyze probing via information theory to provide two pro tips: (1) use loss, not accuracy, and (2) stop worry about the probe size. arxiv.org/abs/2312.10019 (1/n)

account_circle

Ruiyi Wang

@RuiyiWang153

1 month ago

Happy PI🥧 day!

Can language agents🤖 learn social skills through imitation and interaction?

We are excited to introduce SOTOPIA-π🥧 pi.sotopia.world, an interactive learning method for training language agents to navigate real-world social scenarios while role-playing!

account_circle

michael

@_michaelginn

1 month ago

Automated systems for interlinear glossed text generation can aid in language documentation projects

But data for any endangered language is limited…

In our new paper, we compile the largest multilingual IGT corpus and pretrain foundation models

arxiv.org/abs/2403.06399

account_circle

Siddhant Arora

@Sid_Arora_18

1 month ago

New #ICASSP2024 📜, we build streaming semi-autoregressive ASR that performs greedy NAR decoding within a block but keeps the AR property across blocks by encoding the labels of previous blocks to achieve strong performance at very low latency!

📜: arxiv.org/pdf/2309.10926…

account_circle

Jacob Springer

@jacspringer

2 months ago

Autoregressive language models (LLaMA, Mistral, etc) are fundamentally limited for text embeddings since they don’t encode information bidirectionally. We provide an easy fix: just repeat your input! We are the #1 fully-open-source model on MTEB!
arxiv.org/abs/2402.15449
1/6

account_circle

Xuhui Zhou

@nlpxuhui

2 months ago

Excited to share that Sotopia (openreview.net/forum?id=mM7Vu…) has been accepted to ICLR 2024 as a spotlight 🌠!
Sotopia is one of the unique platforms for facilitating socially-aware and human-centered AI systems.
We've been busy at work, and have follow-ups coming soon, stay tuned!

account_circle

Language Technologies Institute | @CarnegieMellon

@LTIatCMU

2 months ago

Our Master of Computational Data Science program has been named the fifth best data science master's program by FORTUNE! Congrats to all the faculty, staff, students and alums who work hard to make that program great. fortune.com/education/info…

thumb_up_off_alt9

chat_bubble_outline0

repeat3

shareShare

account_circle

Teaching NLP Workshop @ ACL2024

@teaching_nlp

2 months ago

📢Excited to announce the 6th TeachingNLP workshop to be held at ACL 2024 2024! This full-day workshop will be on August 15 in Bangkok 🇹🇭, with an interactive hybrid option available.
sites.google.com/view/teachingn…

account_circle

Syeda Nahida Akter

@SNAT02792153

3 months ago

When solving a difficult problem, we often draw a diagram to help us visualize. What if VLMs could do the same?

Introducing Self-Imagine – a method that enhances the reasoning abilities of VLMs on text-only tasks through visualization.

Paper: arxiv.org/abs/2401.08025
🧵↓

account_circle

Zora Zhiruo Wang

@ZhiruoW

3 months ago

Do you find LM-written programs too complex to understand?
Do bugs often pop up in these solutions?

Check out *TroVE*, a training-free method to create accurate, concise, and verifiable solutions by inducing tools
🔗: arxiv.org/pdf/2401.12869…

account_circle

Pengfei Liu

@stefan_fee

2 months ago

Scalability is an eternal topic in LLMOps. Existing works focus on training superhuman systems by constructing scalable oversight. We focus on a dual problem: scalable evaluation and argue: by introducing Agent Debate into the meta-evaluation, we can achieve this goal pretty well

thumb_up_off_alt25

chat_bubble_outline0

repeat6

shareShare

account_circle

Jing Yu Koh

@kohjingyu

3 months ago

Computer interfaces are inherently visual. To build general autonomous agents, we will need strong vision language models.

To assess the performance of multimodal agents, we introduce VisualWebArena (VWA): a benchmark for evaluating multimodal web agents on realistic visually…

account_circle