# Thom Lake

AI | Deep Learning | Machine Learning

Austin, Texas · [thom.l.lake@gmail.com](mailto:thom.l.lake@gmail.com) · [thomlake.github.io](https://thomlake.github.io)

I work on language-model systems for open-ended problems. My work brings together agentic systems, simulation, principled evaluation, and human–computer interaction.

*Research interests: agent runtimes, evaluation, interactive environments, recommender systems, and post-training for reasoning, coherence, and memory management*

## Selected Publications

- From Distributional to Overton Pluralism: Investigating Large Language Model Alignment, NAACL 2025 [arXiv](https://arxiv.org/abs/2406.17692)
- ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models, NeurIPS 2025 [arXiv](https://arxiv.org/abs/2505.13444)
- Flexible Job Classification with Zero-Shot Learning, RecSys in HR'22 [arXiv](https://arxiv.org/abs/2209.12678)
- Large-scale Collaborative Filtering with Product Embeddings, 2018 [arXiv](https://arxiv.org/abs/1901.04321)

## Experience

### Indeed — Austin, Texas

#### AI Technical Fellow (2025–present)

I lead science for employer-facing AI systems, spanning LLM ranking and decision support systems, conversational assistants, and asynchronous agents. As a senior IC working across dozens of teams, I help shape system architecture, context management strategy, and evaluation methodology for high-stakes AI systems using traditional metrics, model-based assessments (LLM-as-a-Judge), and interactive environments.

#### Principal Data Scientist (2023–2025)

I led early LLM work for hiring workflows, with emphasis on fine-tuning, prompt design, and evaluation. Key applications included candidate assessment, document summarization, and personalized employer outreach, where model improvements increased application starts by 20%. I also worked across ML and data science teams to establish evaluation practices for generative AI systems beyond standard closed-form benchmarks.

#### Staff Data Scientist (2021–2022)

I helped teams shift from bespoke feature engineering to fine-tuning pre-trained language models for ranking, retrieval, parsing, and taxonomy. I led a taxonomy modernization effort that reduced new-market rollout from years to months. In parallel, I built neural recommender systems and introduced fairness evaluation into production workflows.

#### Senior Data Scientist (2019–2020)

I joined as Indeed's first deep learning specialist and helped modernize core hiring systems by adopting pre-trained language models and representation learning techniques. I also introduced GPU-backed experimentation workflows that made these methods practical for applied ML teams.

### Amazon — Austin, Texas

#### Machine Learning Scientist (2016–2019)

I worked on homepage and mobile personalization systems that rank content from multiple recommendation strategies. I developed an attention-based collaborative filtering method that became the top-performing candidate source by engagement, worked on contextual bandit methods for mobile ranking, and helped build shared representation learning infrastructure for millions of products.

### Atlas Wearables — Austin, Texas

#### Lead Data Scientist (2014–2016)

I built on-device machine learning for wearable fitness devices, including exercise recognition, repetition counting, and form analysis. My work included one-shot learning, clustering, and inference pipelines optimized for resource-constrained embedded hardware.

### Zoetis — Kalamazoo, Michigan

#### Data Scientist (2013–2014)

I built data pipelines to normalize messy semi-structured external sources using statistical NLP, heuristics, and human-in-the-loop feedback. I also worked on genotype search and probabilistic inference.

### Western Michigan University — Kalamazoo, Michigan

#### Research Assistant (2010–2013)

I developed neural network models for agricultural disease risk prediction, with emphasis on spatiotemporal generalization, loss design, and evaluation procedures suited to spatiotemporal data.

### Missouri University of Science and Technology — Rolla, Missouri

#### NSF Undergraduate Research (2010)

I worked on wireless sensor networks in simulation and distributed low-resource settings, including unsupervised outlier detection and dynamic tree-based routing.

## Research & Education

### University of Texas at Austin

Doctoral Researcher, TAUR Lab (2023–2025)

### Western Michigan University

MS Computer Science (2012–2015)  
BS Computer Science, Mathematics (2009-2012)