
Training LLMs over Neurally Compressed Text
[arxiv]
[pdf]
B Lester, J Lee, AA Alemi, J Pennington, A Roberts, J SohlDickstein, N Constant
202404
Trying to train transformers on top of transformers with arithmetic compression.

Beyond Human Data: Scaling SelfTraining for ProblemSolving with Language Models
[arxiv]
[pdf]
PAGI
202312
Squeezing more performance out of models by finetuning on filtered generated responses.

Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?"
[arxiv]
[pdf]
PAGI
202311
It's easy to get models to perform arithmetic incorrectly, if you just ask nicely.

Smallscale proxies for largescale Transformer training instabilities
[arxiv]
[pdf]
M Wortsman & PAGI
202309
Studying problems of large scale models in the small.

Speed Limits for Deep Learning
[arxiv]
[pdf]
I Seroussi, AA Alemi, M Helias, Z Ringel
202307
Working out thermodynamic speed limits for learning.

Variational Prediction
[arxiv]
[pdf]
AA Alemi, B Poole
202305
AABI2023
Targetting the predictive distribution directly with a variational method.

Weighted Ensemble SelfSupervised Learning
[arxiv]
[pdf]
[openreview]
Y Ruan, S Singh, WR Morningstar, AA Alemi, S Ioffe, I Fischer, JV Dillon
202211
ICLR 2023
Ensembling the heads of SSL methods gives nice gains.

Trajectory ensembling for fine tuning  performance gains without modifying training
[pdf]
[openreview]
[video]
L AndersonConway, V Birodkar, S Singh, H Mobahi, AA Alemi
202209
HITY Workshop NeurIPS 2022
Ensembling within a trajectory gives some simple gains.

Bayesian Imitation Learning for EndtoEnd Mobile Manipulation
[arxiv]
[pdf]
Y Du, D Ho, AA Alemi, E Jang, M Khansari
202202
ICML 2022
Using VIB to help robots open doors.

A Closer Look at the Adversarial Robustness of Information Bottleneck Models
[arxiv]
[pdf]
[openreview]
I Korshunova, D Stutz, AA Alemi, O Wiles, S Gowal
202106
ICML 2021 AML Workshop Poster
Looking more carefully, IB models aren't fully robust to adversarial examples.

Does Knowledge Distillation Really Work?
[arxiv]
[pdf]
S Stanton, P Izmailov, P Kirichenko, AA Alemi, AG Wilson
202106
NeurIPS2021
Knowledge distillation doesn't seem to work as well as people assume it does.

VIB is Half Bayes
[arxiv]
[pdf]
[postertalk]
[talk]
AA Alemi, WR Morningstar, B Poole, I Fischer, JV Dillon
202011
AABI 2021 Oral
VIB can be rederived as a halfBayesian halfMaximum likelihood method.

PACᵐBayes: Narrowing the Empirical Risk Gap in the Misspecified Bayesian Regime
[arxiv]
[pdf]
[video]
WR Morningstar, AA Alemi, JV Dillon
202010
AISTATS2022
Multisample bound that does better than Bayes at prediction for misspecified models.

Density of States Estimation for OutofDistribution Detection
[arxiv]
[pdf]
WR Morningstar, C Ham, AG Gallagher, B Lakshminarayanan, AA Alemi, JV Dillon
202006
AISTATS 2021 Oral
Simple densityofstates inspired out of distribution detection.

The OpenKIM Processing Pipeline: A CloudBased Automatic Materials Property Computation Engine
[arxiv]
[pdf]
[openkim.org]
DS Karls, M Bierbaum, AA Alemi, RS Elliot, JP Sethna, EB Tadmor
202005
Journal of Chemical Physics
Database for Interatomic Potentials.

Neural Tangents: Fast and Easy Infinite Neural Networks in Python
[arxiv]
[pdf]
[code]
R Novak, L Xiao, J Hron, J Lee, AA Alemi, J SohlDickstein, SS Schoenholz
201912
ICLR
Simple to use python package for training infinitely wide neural networks.

Variational Predictive Information Bottleneck
[arxiv]
[pdf]
AA Alemi
201910
AABI
Most modern inference procedures can be rederived as a simple variational bound on a predictive information bottleneck objective.

Information in Infinite Ensembles of InfinitelyWide Networks
[arxiv]
[pdf]
R ShwartzZiv, AA Alemi
201910
AABI 2019  PMLR
While they seem complex, infinite ensembles of infinitelywide networks are simple enough to enable tractable calculations of many information theoretic quantities.

CEB Improves Model Robustness
[arxiv]
[pdf]
I Fischer, AA Alemi
201910
Entropy
A class conditional version of VIB shows good robustness.

On Predictive Information in RNNs
[arxiv]
[pdf]
Z Dong, D Oktay, B Poole, AA Alemi
201910
Modern RNNs do not optimally capture predictive information in sequences.

Thermodynamic Computing
[arxiv]
[pdf]
T Conte, E DeBenedictis, N Ganesh, T Hylton, JP Strachan, RS Williams, AA Alemi, L Altenberg, G Crooks, J Crutchfield, L del Rio, J Deutsch, M DeWeese, K Douglas, M Esposito, M Frank, R Fry, P Harsha, M Hill, C Kello, J Krichmar, S Kumar, SC Liu, S Lloyd, M Marsili, I Nemenman, A Nugent, N Packard, D Randall, P Sadowski, N Santhanam, R Shaw, A Stieg, E Stopnitzky, C Teuscher, C Watkins, D Wolpert, J Yang, Y Yufik
201911
CCC
A position paper on the future of thermodynamic computing.

On Variational Bounds of Mutual Information
[arxiv]
[pdf]
B Poole, S Ozair, A van den Oord, AA Alemi, G Tucker
201905
ICML
Overview of recent advances in variationally bounding mutual information.

Dueling Decoders: Regularizing Variational Autoencoder Latent Spaces
[arxiv]
[pdf]
B Seybold, E Fertig, AA Alemi, I Fischer
201905
Sometimes a worse decoder gives better representations.

Variational Autoencoders with Tensorflow Probability Layers
[post]
I Fischer, AA Alemi, JV Dillon, TFP Team
201903
Tensorflow Blog
TFP makes VAEs easy.

On the Use of ArXiv as a Dataset
[arxiv]
[pdf]
[code]
CB Clement, M Bierbaum, KP O'Keeffe, AA Alemi
201905
ICLR workshop RLGM
More people should use the ArXiv as a dataset.

βVAEs can retain label information even at high compression
[arxiv]
[pdf]
E Fertig, A Arbabi, AA Alemi
201812
NeurIPS BDL Workshop
Some rich decoder VAEs can magically focus on salient information.

Canonical Sectors and Evolution of Firms in the US Stock Markets
[arxiv]
[pdf]
LX Hayden, R Chachra, AA Alemi, PH Ginsparg, JP Sethna
201810
Quantitative Finance
Matrix factorization gives automatic and continous sector assignments to stocks.

WAIC, but Why? Generative Ensembles for Robust Anomaly Detection
[arxiv]
[pdf]
H Choi, E Jang, AA Alemi
201810
Even though it shouldn't work, robust likelihoods can detect OOD data in practice.

TherML: Thermodynamics of Machine Learning
[arxiv]
[pdf]
[video]
AA Alemi, I Fisher
201807
ICML2018 TFADGM Workshop
Modern variational latent variable modelling looks a lot like Thermodynamics.

Uncertainty in the Variational Information Bottleneck
[arxiv]
[pdf]
[slides]
AA Alemi, I Fischer, JV Dillon
201807
UAI UDL Workshop
VIB builds robust classifiers which are aware of what they don't know.

Watch your step: Learning node embeddings via graph attention
[arxiv]
[pdf]
S AbuElHaija, B Perozzi, R AlRfou, AA Alemi
201812
NeurIPS
Building better graph representations.

GILBO: one metric to measure them all
[arxiv]
[pdf]
AA Alemi, I Fischer
201812
NeurIPS
A variational lower bound on the mutual informations in GANs highlight some of their problems.

Fixing a Broken ELBO
[arxiv]
[pdf]
[slides]
AA Alemi, B Poole, I Fischer, JV Dillon, RA Saurous, K Murphy
201805
ICML
Adopting a representational view of VAEs can help explain away some of their problems.

Tensorflow distributions
[arxiv]
[pdf]
[code]
JV Dillon, I Langmore, D Tran, E Brevdo, S Vasudevan, D Moore, B Patton, AA Alemi, M Hoffman, RA Saurous
201711
Paper accompanying library.

Light microscopy at maximal precision
[arxiv]
[pdf]
M Bierbaum, BD Leahy, AA Alemi, I Cohen, JP Sethna
201702
Phys Rev X
Better featuring of colloids.

Jeffrey's prior sampling of deep sigmoidal networks
[arxiv]
[pdf]
LX Hayden, AA Alemi, PH Ginsparg, JP Sethna
201705
Jeffrey's prior doesn't really work for neural networks.

Motion prediction under multimodality with conditional stochastic networks
[arxiv]
[pdf]
K Fragkiadaki, J Huang, AA Alemi, S Vijayanarasimhan, S Ricco, R Sukthankar
201705
Pedestrian motion is stochastic which creates certain challenges.

Inceptionv4, inceptionresnet and the impact of residual connections on learning
[arxiv]
[pdf]
C Szegedy, S Ioffe, V Vanhoucke, AA Alemi
201702
AAAI
Residual connections improve the inception family of classifiers.

Deep Variational Information Bottleneck
[arxiv]
[pdf]
AA Alemi, I Fischer, JV Dillon, K Murphy
201703
ICLR
A modern formulation of the Information Bottleneck which is friendly towards neural networks.

Improved generator objectives for gans
[arxiv]
[pdf]
B Poole, AA Alemi, J SohlDickstein, A Angelova
201612
NeurIPS Adversarial Workshop
You can target separate divergences for the generator and discriminator of a GAN.

TreeStructured Variational Autoencoder
[pdf]
R Shin, AA Alemi, G Irving, O Vinyals
201611
Attempting to learn treestructured representations.

Improving inception and image classification in tensorflow
[post]
AA Alemi
201606
Google Research Blog
Blogpost accompanying open source release of Inception Resnet V2.

DeepMathdeep sequence models for premise selection
[arxiv]
[pdf]
G Irving, C Szegedy, AA Alemi, N Eén, F Chollet, J Urban
201606
NeurIPS
Using neural networks to improve automatic theorem proving.

SPARTA: Fast global planning of collisionavoiding robot trajectories
[pdf]
CJM Mathy, F Gonda, D Schmidt, N Derbinsky, AA Alemi, J Bento, FM Delle Fave, JS Yedidia
201512
Using ADMM to do fast trajectory planning.

You can run, you can hide: The epidemiology and statistical mechanics of zombies
[arxiv]
[pdf]
AA Alemi, M Bierbaum, CR Myers, JP Sethna
201511
Phys Rev E
A fun pedadogical introduction to epidemiology and statistical mechanics.

Zombies Reading Segmented Graphene Articles On The Arxiv
[pdf]
AA Alemi
201508
Thesis
A collection of four of my graduate student projects.

Clustering via ContentAugmented Stochastic Blockmodels
[arxiv]
[pdf]
JM Cashore, X Zhao, AA Alemi, Y Liu, PI Frazier
201505
Better clustering through content conditioning.

Text segmentation based on semantic word embeddings
[arxiv]
[pdf]
AA Alemi, P Ginsparg
201503
Using word2vec vectors to do automatic text segmentation.

Mechanical properties of growing melanocytic nevi and the progression to melanoma
[arxiv]
[pdf]
A Taloni, AA Alemi, E Ciusani, JP Sethna, S Zapperi, CAM La Porta
201404
PloS One
Elastic models of skin cancer.

Ensuring reliability, reproducibility and transferability in atomistic simulations: The knowledgebase of interatomic models (https://openkim.org)
[pdf]
E Tadmor, R Elliott, D Karls, A Ludvik, J Sethna, M Bierbaum, AA Alemi, T Wennblom
201410

Knowledgebase of Interatomic Models application programming interface as a standard for molecular simulations
[pdf]
[openkim.org]
R Elliott, E Tadmor, D Karls, A Ludvik, J Sethna, M Bierbaum, AA Alemi, T Wennblom
201410
Building a website to collect interatomic potentials and score them.

Imaging atomic rearrangements in twodimensional silica glass: watching silica's dance
[pdf]
PY Huang, S Kurasch, JS Alden, A Shekhawat, AA Alemi, PL McEuen, JP Sethna, U Kaiser, DA Muller
201310
Science
Applying elastic theory to the atomic scale.

Growth and form of melanoma cell colonies
[arxiv]
[pdf]
MM Baraldi, AA Alemi, JP Sethna, S Caracciolo, CAM La Porta, S Zapperi
201308
JSM
Simple models of skin cancer growth.

Nearfield radiative heat transfer between macroscopic planar surfaces
[arxiv]
[pdf]
RS Ottens, Volker Quetschke, Stacy Wise, AA Alemi, Ramsey Lundock, Guido Mueller, David H Reitze, David B Tanner, Bernard F Whiting
201103
Phys Rev Lett
Exploration of quantum tunnelling as a mechanism for cooling the next generation LIGO detectors.

LaplaceRungeLenz Vector
[pdf]
AA Alemi
200906
Undergraduate project on the history of the Runge Vector.

NEMS Coupling
[pdf]
AA Alemi
200809
Undergraduate research project on synchronization in nano cantilevers.

Why Venus has no moon
[pdf]
AA Alemi, DJ Stevenson
200609
AAS Oral
Undergraduate research investigating whether two collisions in the opposite direction could explain Venus' lack of moon and slow rotation.