Xavier Giró🎗 (@docxavi) 's Twitter Profile
Xavier Giró🎗

@docxavi

Applied scientist at @amazonscience Barcelona, Catalonia. Made at @la_upc & @columbia. Promoting @dlbcnai. Opinions my own.

ID: 633009378

linkhttps://imatge.upc.edu/web/people/xavier-giro calendar_today11-07-2012 14:14:04

6,6K Tweet

2,2K Followers

1,1K Following

Joelle Pineau (@jpineau1) 's Twitter Profile Photo

We dropped another awesome open model: SAM 2. This one comes with the data and an easy-to-use demo. It extends the original Segment Anything Model, to work on video. Enjoy!

Robin Rombach (@robrombach) 's Twitter Profile Photo

🔥 I am so damn excited to announce the launch of Black Forest Labs. We set ourselves on a mission to advance state-of-the-art, high-quality generative deep learning models for images and video, and make them available to the broadest audience possible. Today, we release FLUX.1

Sander Dieleman (@sedielem) 's Twitter Profile Photo

I gave a 1-hour talk about generative modelling at the EEML 2024 summer school last month. It's mostly an intuitive look at how and why diffusion models actually work -- not unlike the content of my recent blog posts. All summer school talks will be freely available online!🙏

Sander Dieleman (@sedielem) 's Twitter Profile Photo

The interpretation of diffusion as autoregression in the frequency domain seems to be stirring up a lot of thought! (I may or may not have a new blog post in the works 🧐)

The interpretation of diffusion as autoregression in the frequency domain seems to be stirring up a lot of thought! (I may or may not have a new blog post in the works 🧐)
Jia-Bin Huang (@jbhuang0604) 's Twitter Profile Photo

How I Understand Transformers Transformer architectures power most (if not all) of the incredible generative AI applications. But how does it work? In this short (17m) video, you and I will go through the basic ideas behind transformers. While making this video, I had

How I Understand Transformers

Transformer architectures power most (if not all) of the incredible generative AI applications. But how does it work? In this short (17m) video, you and I will go through the basic ideas behind transformers. 

While making this video, I had
Google DeepMind (@googledeepmind) 's Twitter Profile Photo

Meet our AI-powered robot that’s ready to play table tennis. 🤖🏓 It’s the first agent to achieve amateur human level performance in this sport. Here’s how it works. 🧵

Jason Baldridge (@jasonbaldridge) 's Twitter Profile Photo

Excited to share our paper about Imagen 3, Google's latest and most capable text-to-image model! arxiv.org/abs/2408.07009 It's been a big team effort. You can try it out on ImageFX, our experimental AI tech surface, now: aitestkitchen.withgoogle.com/tools/image-fx

ELLISBarcelona (@ellisbarcelona) 's Twitter Profile Photo

🗣️ This September, don't miss the first seminar from Prof. Nicu Sebe, director of #ELLIS' program on Multimodal Learning: "Cross-modal understanding and generation of multimodal content". Register here 👉 bit.ly/3A2h0Ap

🗣️ This September, don't miss the first seminar from Prof. Nicu Sebe, director of #ELLIS' program on Multimodal Learning:
 
"Cross-modal understanding and generation of multimodal content".

Register here 👉 bit.ly/3A2h0Ap
European Conference on Computer Vision #ECCV2024 (@eccvconf) 's Twitter Profile Photo

The #ECCV2024 Preliminary Program is now available. The poster/oral session times can be found at the link below. Note, this schedule is subject to changes. docs.google.com/spreadsheets/d…

The #ECCV2024 Preliminary Program is now available.  The poster/oral session times can be found at the link below.  Note, this schedule is subject to changes.
docs.google.com/spreadsheets/d…
Chunting Zhou (@violet_zct) 's Twitter Profile Photo

Introducing *Transfusion* - a unified approach for training models that can generate both text and images. arxiv.org/pdf/2408.11039 Transfusion combines language modeling (next token prediction) with diffusion to train a single transformer over mixed-modality sequences. This

Introducing *Transfusion* - a unified approach for training models that can generate both text and images. arxiv.org/pdf/2408.11039

Transfusion combines language modeling (next token prediction) with diffusion to train a single transformer over mixed-modality sequences. This
Richard Socher (@richardsocher) 's Twitter Profile Photo

We are entering the Age of AI. History does not repeat itself but it rhymes and this era combines aspects of the renaissance, enlightenment and the industrial revolution. We have collectively never had so much access to knowledge. AI is making it more digestible with amazing

Sander Dieleman (@sedielem) 's Twitter Profile Photo

Think you understand classifier-free diffusion guidance? Think again! These two papers beg to differ😁 arxiv.org/abs/2406.02507 arxiv.org/abs/2408.09000 Both full of really great insights that question prevailing assumptions. cc Jaakko Lehtinen Arwen Bradley Preetum Nakkiran

AK (@_akhaliq) 's Twitter Profile Photo

Meta presents Sapiens Foundation for Human Vision Models discuss: huggingface.co/papers/2408.12… We present Sapiens, a family of models for four fundamental human-centric vision tasks - 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Our

AK (@_akhaliq) 's Twitter Profile Photo

Scalable Autoregressive Image Generation with Mamba discuss: huggingface.co/papers/2408.12… model: huggingface.co/hp-l33/aim We introduce AiM, an autoregressive (AR) image generative model based on Mamba architecture. AiM employs Mamba, a novel state-space model characterized by its

Scalable Autoregressive Image Generation with Mamba

discuss: huggingface.co/papers/2408.12…

model: huggingface.co/hp-l33/aim

We introduce AiM, an autoregressive (AR) image generative model based on Mamba architecture. AiM employs Mamba, a novel state-space model characterized by its
CVC_UAB (@cvc_uab) 's Twitter Profile Photo

📢 The Annual Catalan Meeting on Computer Vision (ACMCV) is arriving, taking place on September 17th. This meeting seeks to connect the Computer Vision community of Catalonia, allowing the attendees to strength links. ✏️ Registration is already open: acmcv.cat

📢 The Annual Catalan Meeting on Computer Vision (ACMCV) is arriving, taking place on September 17th.

This meeting seeks to connect the Computer Vision community of Catalonia, allowing the attendees to strength links.

✏️ Registration is already open: acmcv.cat
AK (@_akhaliq) 's Twitter Profile Photo

Google presents Diffusion Models Are Real-Time Game Engines discuss: huggingface.co/papers/2408.14… We present GameNGen, the first game engine powered entirely by a neural model that enables real-time interaction with a complex environment over long trajectories at high quality.

Xavier Giró🎗 (@docxavi) 's Twitter Profile Photo

Happy to see customer obsession by Prime Video España in practice: a new list of movies dubbed to Catalan. It was really a inconvenient having to check Desdelsofà.cat or Goita què fan ara every time I want to enjoy a movie or TV show in Catalan.