-
Scaling Exponents Across Parameterizations and Optimizers
[arxiv]
[pdf]
K Everett, L Xiao, M Wortsman, AA Alemi, R Novak, PJ Liu, I Gur, J Sohl-Dickstein, LP Kaelbling, J Lee, J Pennington
2024-07
ICML 2024
Understanding parameterizations and how to scale them.
-
Training LLMs over Neurally Compressed Text
[arxiv]
[pdf]
B Lester, J Lee, AA Alemi, J Pennington, A Roberts, J Sohl-Dickstein, N Constant
2024-04
Trying to train transformers on top of transformers with arithmetic compression.
-
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
[arxiv]
[pdf]
PAGI
2023-12
TMLR
Squeezing more performance out of models by fine-tuning on filtered generated responses.
-
Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?"
[arxiv]
[pdf]
PAGI
2023-11
It's easy to get models to perform arithmetic incorrectly, if you just ask nicely.
-
Small-scale proxies for large-scale Transformer training instabilities
[arxiv]
[pdf]
M Wortsman & PAGI
2023-09
ICLR 2024
Studying problems of large scale models in the small.
-
Speed Limits for Deep Learning
[arxiv]
[pdf]
I Seroussi, AA Alemi, M Helias, Z Ringel
2023-07
Working out thermodynamic speed limits for learning.
-
Variational Prediction
[arxiv]
[pdf]
AA Alemi, B Poole
2023-05
AABI2023
Targetting the predictive distribution directly with a variational method.
-
Weighted Ensemble Self-Supervised Learning
[arxiv]
[pdf]
[openreview]
Y Ruan, S Singh, WR Morningstar, AA Alemi, S Ioffe, I Fischer, JV Dillon
2022-11
ICLR 2023
Ensembling the heads of SSL methods gives nice gains.
-
Trajectory ensembling for fine tuning - performance gains without modifying training
[pdf]
[openreview]
[video]
L Anderson-Conway, V Birodkar, S Singh, H Mobahi, AA Alemi
2022-09
HITY Workshop NeurIPS 2022
Ensembling within a trajectory gives some simple gains.
-
Bayesian Imitation Learning for End-to-End Mobile Manipulation
[arxiv]
[pdf]
Y Du, D Ho, AA Alemi, E Jang, M Khansari
2022-02
ICML 2022
Using VIB to help robots open doors.
-
A Closer Look at the Adversarial Robustness of Information Bottleneck Models
[arxiv]
[pdf]
[openreview]
I Korshunova, D Stutz, AA Alemi, O Wiles, S Gowal
2021-06
ICML 2021 AML Workshop Poster
Looking more carefully, IB models aren't fully robust to adversarial examples.
-
Does Knowledge Distillation Really Work?
[arxiv]
[pdf]
S Stanton, P Izmailov, P Kirichenko, AA Alemi, AG Wilson
2021-06
NeurIPS2021
Knowledge distillation doesn't seem to work as well as people assume it does.
-
VIB is Half Bayes
[arxiv]
[pdf]
[poster-talk]
[talk]
AA Alemi, WR Morningstar, B Poole, I Fischer, JV Dillon
2020-11
AABI 2021 Oral
VIB can be rederived as a half-Bayesian half-Maximum likelihood method.
-
PACᵐ-Bayes: Narrowing the Empirical Risk Gap in the Misspecified Bayesian Regime
[arxiv]
[pdf]
[video]
WR Morningstar, AA Alemi, JV Dillon
2020-10
AISTATS2022
Multisample bound that does better than Bayes at prediction for misspecified models.
-
Density of States Estimation for Out-of-Distribution Detection
[arxiv]
[pdf]
WR Morningstar, C Ham, AG Gallagher, B Lakshminarayanan, AA Alemi, JV Dillon
2020-06
AISTATS 2021 Oral
Simple density-of-states inspired out of distribution detection.
-
The OpenKIM Processing Pipeline: A Cloud-Based Automatic Materials Property Computation Engine
[arxiv]
[pdf]
[openkim.org]
DS Karls, M Bierbaum, AA Alemi, RS Elliot, JP Sethna, EB Tadmor
2020-05
Journal of Chemical Physics
Database for Interatomic Potentials.
-
Neural Tangents: Fast and Easy Infinite Neural Networks in Python
[arxiv]
[pdf]
[code]
R Novak, L Xiao, J Hron, J Lee, AA Alemi, J Sohl-Dickstein, SS Schoenholz
2019-12
ICLR
Simple to use python package for training infinitely wide neural networks.
-
Variational Predictive Information Bottleneck
[arxiv]
[pdf]
AA Alemi
2019-10
AABI
Most modern inference procedures can be rederived as a simple variational bound on a predictive information bottleneck objective.
-
Information in Infinite Ensembles of Infinitely-Wide Networks
[arxiv]
[pdf]
R Shwartz-Ziv, AA Alemi
2019-10
AABI 2019 - PMLR
While they seem complex, infinite ensembles of infinitely-wide networks are simple enough to enable tractable calculations of many information theoretic quantities.
-
CEB Improves Model Robustness
[arxiv]
[pdf]
I Fischer, AA Alemi
2019-10
Entropy
A class conditional version of VIB shows good robustness.
-
On Predictive Information in RNNs
[arxiv]
[pdf]
Z Dong, D Oktay, B Poole, AA Alemi
2019-10
Modern RNNs do not optimally capture predictive information in sequences.
-
Thermodynamic Computing
[arxiv]
[pdf]
T Conte, E DeBenedictis, N Ganesh, T Hylton, JP Strachan, RS Williams, AA Alemi, L Altenberg, G Crooks, J Crutchfield, L del Rio, J Deutsch, M DeWeese, K Douglas, M Esposito, M Frank, R Fry, P Harsha, M Hill, C Kello, J Krichmar, S Kumar, SC Liu, S Lloyd, M Marsili, I Nemenman, A Nugent, N Packard, D Randall, P Sadowski, N Santhanam, R Shaw, A Stieg, E Stopnitzky, C Teuscher, C Watkins, D Wolpert, J Yang, Y Yufik
2019-11
CCC
A position paper on the future of thermodynamic computing.
-
On Variational Bounds of Mutual Information
[arxiv]
[pdf]
B Poole, S Ozair, A van den Oord, AA Alemi, G Tucker
2019-05
ICML
Overview of recent advances in variationally bounding mutual information.
-
Dueling Decoders: Regularizing Variational Autoencoder Latent Spaces
[arxiv]
[pdf]
B Seybold, E Fertig, AA Alemi, I Fischer
2019-05
Sometimes a worse decoder gives better representations.
-
Variational Autoencoders with Tensorflow Probability Layers
[post]
I Fischer, AA Alemi, JV Dillon, TFP Team
2019-03
Tensorflow Blog
TFP makes VAEs easy.
-
On the Use of ArXiv as a Dataset
[arxiv]
[pdf]
[code]
CB Clement, M Bierbaum, KP O'Keeffe, AA Alemi
2019-05
ICLR workshop RLGM
More people should use the ArXiv as a dataset.
-
β-VAEs can retain label information even at high compression
[arxiv]
[pdf]
E Fertig, A Arbabi, AA Alemi
2018-12
NeurIPS BDL Workshop
Some rich decoder VAEs can magically focus on salient information.
-
Canonical Sectors and Evolution of Firms in the US Stock Markets
[arxiv]
[pdf]
LX Hayden, R Chachra, AA Alemi, PH Ginsparg, JP Sethna
2018-10
Quantitative Finance
Matrix factorization gives automatic and continous sector assignments to stocks.
-
WAIC, but Why? Generative Ensembles for Robust Anomaly Detection
[arxiv]
[pdf]
H Choi, E Jang, AA Alemi
2018-10
Even though it shouldn't work, robust likelihoods can detect OOD data in practice.
-
TherML: Thermodynamics of Machine Learning
[arxiv]
[pdf]
[video]
AA Alemi, I Fisher
2018-07
ICML2018 TFADGM Workshop
Modern variational latent variable modelling looks a lot like Thermodynamics.
-
Uncertainty in the Variational Information Bottleneck
[arxiv]
[pdf]
[slides]
AA Alemi, I Fischer, JV Dillon
2018-07
UAI UDL Workshop
VIB builds robust classifiers which are aware of what they don't know.
-
Watch your step: Learning node embeddings via graph attention
[arxiv]
[pdf]
S Abu-El-Haija, B Perozzi, R Al-Rfou, AA Alemi
2018-12
NeurIPS
Building better graph representations.
-
GILBO: one metric to measure them all
[arxiv]
[pdf]
AA Alemi, I Fischer
2018-12
NeurIPS
A variational lower bound on the mutual informations in GANs highlight some of their problems.
-
Fixing a Broken ELBO
[arxiv]
[pdf]
[slides]
AA Alemi, B Poole, I Fischer, JV Dillon, RA Saurous, K Murphy
2018-05
ICML
Adopting a representational view of VAEs can help explain away some of their problems.
-
Tensorflow distributions
[arxiv]
[pdf]
[code]
JV Dillon, I Langmore, D Tran, E Brevdo, S Vasudevan, D Moore, B Patton, AA Alemi, M Hoffman, RA Saurous
2017-11
Paper accompanying library.
-
Light microscopy at maximal precision
[arxiv]
[pdf]
M Bierbaum, BD Leahy, AA Alemi, I Cohen, JP Sethna
2017-02
Phys Rev X
Better featuring of colloids.
-
Jeffrey's prior sampling of deep sigmoidal networks
[arxiv]
[pdf]
LX Hayden, AA Alemi, PH Ginsparg, JP Sethna
2017-05
Jeffrey's prior doesn't really work for neural networks.
-
Motion prediction under multimodality with conditional stochastic networks
[arxiv]
[pdf]
K Fragkiadaki, J Huang, AA Alemi, S Vijayanarasimhan, S Ricco, R Sukthankar
2017-05
Pedestrian motion is stochastic which creates certain challenges.
-
Inception-v4, inception-resnet and the impact of residual connections on learning
[arxiv]
[pdf]
C Szegedy, S Ioffe, V Vanhoucke, AA Alemi
2017-02
AAAI
Residual connections improve the inception family of classifiers.
-
Deep Variational Information Bottleneck
[arxiv]
[pdf]
AA Alemi, I Fischer, JV Dillon, K Murphy
2017-03
ICLR
A modern formulation of the Information Bottleneck which is friendly towards neural networks.
-
Improved generator objectives for gans
[arxiv]
[pdf]
B Poole, AA Alemi, J Sohl-Dickstein, A Angelova
2016-12
NeurIPS Adversarial Workshop
You can target separate divergences for the generator and discriminator of a GAN.
-
Tree-Structured Variational Autoencoder
[pdf]
R Shin, AA Alemi, G Irving, O Vinyals
2016-11
Attempting to learn tree-structured representations.
-
Improving inception and image classification in tensorflow
[post]
AA Alemi
2016-06
Google Research Blog
Blogpost accompanying open source release of Inception Resnet V2.
-
DeepMath-deep sequence models for premise selection
[arxiv]
[pdf]
G Irving, C Szegedy, AA Alemi, N Eén, F Chollet, J Urban
2016-06
NeurIPS
Using neural networks to improve automatic theorem proving.
-
SPARTA: Fast global planning of collision-avoiding robot trajectories
[pdf]
CJM Mathy, F Gonda, D Schmidt, N Derbinsky, AA Alemi, J Bento, FM Delle Fave, JS Yedidia
2015-12
Using ADMM to do fast trajectory planning.
-
You can run, you can hide: The epidemiology and statistical mechanics of zombies
[arxiv]
[pdf]
AA Alemi, M Bierbaum, CR Myers, JP Sethna
2015-11
Phys Rev E
A fun pedadogical introduction to epidemiology and statistical mechanics.
-
Zombies Reading Segmented Graphene Articles On The Arxiv
[pdf]
AA Alemi
2015-08
Thesis
A collection of four of my graduate student projects.
-
Clustering via Content-Augmented Stochastic Blockmodels
[arxiv]
[pdf]
JM Cashore, X Zhao, AA Alemi, Y Liu, PI Frazier
2015-05
Better clustering through content conditioning.
-
Text segmentation based on semantic word embeddings
[arxiv]
[pdf]
AA Alemi, P Ginsparg
2015-03
Using word2vec vectors to do automatic text segmentation.
-
Mechanical properties of growing melanocytic nevi and the progression to melanoma
[arxiv]
[pdf]
A Taloni, AA Alemi, E Ciusani, JP Sethna, S Zapperi, CAM La Porta
2014-04
PloS One
Elastic models of skin cancer.
-
Ensuring reliability, reproducibility and transferability in atomistic simulations: The knowledgebase of interatomic models (https://openkim.org)
[pdf]
E Tadmor, R Elliott, D Karls, A Ludvik, J Sethna, M Bierbaum, AA Alemi, T Wennblom
2014-10
-
Knowledgebase of Interatomic Models application programming interface as a standard for molecular simulations
[pdf]
[openkim.org]
R Elliott, E Tadmor, D Karls, A Ludvik, J Sethna, M Bierbaum, AA Alemi, T Wennblom
2014-10
Building a website to collect interatomic potentials and score them.
-
Imaging atomic rearrangements in two-dimensional silica glass: watching silica's dance
[pdf]
PY Huang, S Kurasch, JS Alden, A Shekhawat, AA Alemi, PL McEuen, JP Sethna, U Kaiser, DA Muller
2013-10
Science
Applying elastic theory to the atomic scale.
-
Growth and form of melanoma cell colonies
[arxiv]
[pdf]
MM Baraldi, AA Alemi, JP Sethna, S Caracciolo, CAM La Porta, S Zapperi
2013-08
JSM
Simple models of skin cancer growth.
-
Near-field radiative heat transfer between macroscopic planar surfaces
[arxiv]
[pdf]
RS Ottens, Volker Quetschke, Stacy Wise, AA Alemi, Ramsey Lundock, Guido Mueller, David H Reitze, David B Tanner, Bernard F Whiting
2011-03
Phys Rev Lett
Exploration of quantum tunnelling as a mechanism for cooling the next generation LIGO detectors.
-
Laplace-Runge-Lenz Vector
[pdf]
AA Alemi
2009-06
Undergraduate project on the history of the Runge Vector.
-
NEMS Coupling
[pdf]
AA Alemi
2008-09
Undergraduate research project on synchronization in nano cantilevers.
-
Why Venus has no moon
[pdf]
AA Alemi, DJ Stevenson
2006-09
AAS Oral
Undergraduate research investigating whether two collisions in the opposite direction could explain Venus' lack of moon and slow rotation.