Diego Garcia-Olano

resume | github | linkedin | twitter | publications | old site

I'm a Research Scientist at Meta GenAI Trust (formerly Responsible AI) interested in explainable NLP/ML for social good ( healthcare/public policy/general knowledge )
and have had my research published in ACL, CoNLL, ECML, EMNLP, ICLR, ICML, IJCAI and NeurIPS amongst other venues.

My research interests broadly involve explaining Machine Learning model decisions on natural language and multi-modal data, and learning representations of entities for downstream tasks. My PhD defended in July 2022 with my advisor Joydeep Ghosh and committee memebers Alex Dimakis, Harris Vikalo, Atlas Wang & Byron Wallace is titled "In-process Diagnostic methods for Entity Representation Learning on Sequential data at Scale" and focuses on methods that allow neural networks to be more transparent, explainable, and diagnosable during the process of learning and inference as opposed to in a post-hoc analysis fashion. My research has dealt with clustering influence-based embeddings for improved error analysis (NeuIRPS 23), open source tooling for explaining LLMs (EMNLP 23), intermediate entity-based sparse interpretable representation learning (EMNLP 22), efficient entity based knowledge injection for VQA (WWW 222),interpretable biomedical text representations (ACL 21), dense entity retrieval using dual encoders (CoNLL 19), prototypical learning of time series data (ICML 19(short) IJCAI 19 (long), efficient entity-based knowledge injection for KBVQA amongst other things.

key words: influence functions, label quality, data valuation, memorization in LLMs, interpretable entity representations, knowledge injection for multimodal VQA, dense retrieval, in-network prototype learning, dual encoders, feature importance methods, counterfactual explanations

recent news

I'm a co-organizer for the Unlearning and Model Editing (U&ME’24) workshop at ECCV’24, Milano, Italy, Oct 2024. Call For Papers Deadline July 10th, 2024!

I was part of the great team that worked on Meta's recently launched open-source LLama 3 LLM in April 2024 and was a core contributor to the Llama 3.1 release in July 2024

Our paper "Error Discovery by Clustering Influence Embeddings" was accepted at the 2023 NeuIRPS main conference. An earlier version of this paper was previously accepted as an Oral presentation at the ICLR 2023 Trustworthy ML workshop

Our paper Using Captum to Explain Generative Language Models at the EMNLP 2023 - NLP OSS workshop!

I was an invited speaker and panelist at Health Day @ The Web Conference 2023 where I presented a talk on Explainable Machine Learning with applications in Biomedicine and Healhtcare and discussed topics including generative AI and LLMs (large language models).

Our paper "Intermediate Entity-based Sparse Interpretable Representation Learning" was accepted into the the BlackboxNLP workshop at EMNLP 2022. Here is the poster I presented and our github code

As of Summer 2022, I'm a PhD graduate in Electrical & Computer Engineering (ECE) at UT Austin with my advisor Dr. Joydeep Ghosh. My PhD defense slides are here.

March 2022, I was accepted as a Fulbright Specialist to work with the Center of Human Rights at UNICEN in Azul, Argentina on predicting gender bias in judicial proceedings using NLP. In addition to the researchers at UNICEN, I will be working with Dr. Maria De-Arteaga. As part of the project, we setup a workshop on AI and Public Policy/Law in Spanish. Our paper "Detección de sesgos en razón del género en decisiones judiciales utilizando PLN" was accepted for publication based on work during that time.

Feb 2022, "Improving and Diagnosing Knowledge-Based Visual Question Answering via Entity Enhanced Knowledge Injection" has been accepted at ACM WWW 2022 in the Workshop on Multimodal Understanding for the Web and Social Media. Here are the slides for an invited talk on our work.

Dec 2021, preprint on "Improving and Diagnosing Knowledge-Based Visual Question Answering via Entity Enhanced Knowledge Injection" is available. Paper || Code

Summer 2021, my PhD Proposal on "In-process Diagnostic methods for Entity Representation Learning on Sequential Data" has been accepted for candidacy.

Our paper "Biomedical Interpretable Entity Representations" has been accepted at ACL-ICJNLP 2021. Paper || Code || Slides

For the Summer 2020, I have been selected to be an IBM Research PhD fellow for Social Good and will work on a Drug Repurposing for Cancer project using NLP.

I recently gave a high level talk to Cognitive Scale on Explainable AI for NLP (slides link).

In Summer 2019, I interned at Google Cloud AI in Seattle with Besim Avci and Frederick Liu working on explaining seq2seq models using feature attribution methods on Transformer and LSTM based architectures with attention for machine translation.

In Spring 2019: I was a TA for Responsible AI Graduate Seminar ( lots of papers and presentations on Explainability, Fairness, etc!)

In Summer 2018, I interned at Google Research in Mountain View with Jason Baldridge and Daniel Gillick working on entity linking.

In Fall 2017: I started at UT and was a TA for an Advanced Data Mining Masters course.

In Summer 2016, I was selected as an Eric & Wendy Schmidt Data Science for Social Good 2016 fellow at the Unviersity of Chicago and worked with SEDESOL of Mexico on improving their distribution of social services.

I also do alot of Data Visualizations. Here is a lecture I have given on modern data visualization and particulary the d3 javascript library for masters level CS/ML students.

I obtained a Masters of Data Science at the UPC in Barcelona in July 2015 (masters thesis on automated construction of political networks).  
I have a bachelors in Computer Science, Political Science and Hispanic Studies with a Business minor from UT Austin.

A selection of publications and projects, academic, professional and personal.

The Llama 3 Herd of Models
-- Paper || Project/Code
|| Llama 3 405B online
  
Error Discovery By Clustering Influence Embeddings (NeurIPS2023)
-- Paper || Code || Slides
  
Using Captum to Explain Generative Language Models (EMNLP 23)
-- Paper || Code
  
Intermediate Entity-based Sparse Interpretable Representation Learning (EMNLP 22)
-- Paper || Code || Poster
  
Improving and Diagnosing Knowledge-Based Visual Question Answering via Entity Enhanced Knowledge Injection (WWW 22)
-- Paper || Code
Biomedical Interpretable Entity Representations (ACL 21)
-- Paper || Code || Slides
  
Learning Dense Representations for Entity Retrieval (CONLL 19)
-- Paper || Code
  
Explaining Deep Classification of Time-Series Data with Learned Prototypes (ICML 19 Timeseries Workshop 4pgs)
--  Paper || Code

IJCAI 19 Knowledge and Health Discovery Long paper
-- Paper || Code
  
Applying Machine Learning Methods to Enhance the Distribution of Social Services in Mexico (ARXIV 2017)
 
Automated construction and analysis of political networks via open government & media sources (ECML 16)      
       
Link Detection in Political Networks (NLP Class Project 2018)
Predicting a Politician's Party Affiliation from a Photo using Deep Learning Methods ( Deep Learning Class project 2017)   
    
    
Predicting when a Yearbook Photo was Taken using Convolutional Neural Networks
Pitchfork: Are music festival lineups getting worse?      
Glasstire 15th Year Anniversary Texas Art Events  
Assessment of Similarity in Central and State Climate Change Programs of Mexico (Simultec Special Session on Applications of Modeling and Simulation to Climatic Change and Environmental Sciences. 2015)
Glasstire 15th Year Contributors  
Glasstire 15th Year Texas Artists  
Personal Music Visualizations and Interactive Lists    
Turning Album of the Year Lists into a Music Discovery Tool  
Looking at US Presidential Election County Changes from 2012 to 2016      
Blue Islands Project
Identify Blue Counties in America that are surrounded by Red ones, and Predict if a county is a blue island based on just socio-economic and public health data.
     
Google Results By Country      
Every Foreign and Best Picture Film Ever Nominated for the Oscars      
My Favorite Painter's Colors    

List of Spotify's Clarify Data Stories series articles written by Rob Mitchum for which I contributed data mining and data visualization, 2016.


Groove Is In The Heart
Songs of Summer Jobs '
Immigration Songs:
How Music Crosses American Borders
There Are Three Types of Gun Songs
The Persistent Glass Ceiling of Music
From a Benzo to Student Loans:
Debt Anxiety in Today’s Pop Music
Hot Time, Summer in the City

Visualizations related to my masters thesis project in Barcelona about Texas Politics. Click here for Thesis Presentation Slides


WhoYouElect.com      
Politician Networks      
Extended Politician Networks      
Topics and Table of Contents      
Media Coverage Maps      
More Media Coverage Results