Deep Program Understanding - Microsoft Research

The Deep Program Understanding project aims to teach machines to understand complex algorithms, combining methods from the programming languages, software engineering and the machine learning communities.

Learning to Understand Programs

Building “smart” software engineering tools requires learning to analyse and understand existing code and related artefacts such as documentation and online resources (e.g. StackOverflow). One of our primary concerns is the integration of standard static analysis methods with machine learning methods to create learning-based program analyses that can be used within software engineering tools. Such tools can then be used to find bugs, automatically retrieve or produce relevant documentation, or verify programs.

Highlighted Publications

Marc Brockschmidt, Miltos Allamanis, Alexander L. Gaunt, and Oleksandr Polozov. Generative Code Modeling with Graphs.
In ICLR’19: International Conference on Learning Representations.
Pengcheng Yin, Graham Neubig, Miltos Allamanis, Marc Brockschmidt, and Alexander L. Gaunt. Learning to Represent Edits.
In ICLR’19: International Conference on Learning Representations.
Miltos Allamanis, Marc Brockschmidt, and Mahmoud Khademi. Learning to Represent Programs with Graphs.
In ICLR’18: International Conference on Learning Representations.
Miltos Allamanis, Earl T. Barr, Prem Devanbu, and Charles Sutton. A Survey of Machine Learning for Big Code and Naturalness.
In ACM Computing Surveys 2018.

Learning to Generate Programs

A core problem of machine learning is to learn algorithms that explain observed behaviour. This can take several forms, such as program synthesis from examples, in which an interpretable program matching given input/output pairs has to be produced; or alternatively programming by demonstration, in which a system has to learn to mimic sequences of actions.

Highlighted Publications

Chenglong Wang, Kedar Tatwawadi, Marc Brockschmidt, Po-Sen Huang, Yi Mao, Oleksandr Polozov, and Rishabh Singh. Robust Text-to-SQL Generation with Execution-Guided Decoding.
Tech Report, 2018.
Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. DeepCoder: Learning to Write Programs.
In ICLR’17: International Conference on Learning Representations.
Alexander L. Gaunt, Marc Brockschmidt, Rishabh Singh, Nate Kushman, Pushmeet Kohli, Jonathan Taylor, and Daniel Tarlow. TerpreT: A Probabilistic Programming Language for Program Induction.
Tech Report, 2016.
Miltos Allamanis, Daniel Tarlow, Andrew D. Gordon, and Yi Wei. Bimodal Modelling of Source Code and Natural Language.
In ICML’15: International Conference on Machine Learning.

Advancing the Machine Learning Frontier

Structured data such as programs represent a challenge for machine learning methods. The combination of domain constraints, known semantics and complex structure requires new machine learning methods and techniques. Our focus in this area is the analysis and generation of graphs, for which we have developed novel neural network architectures and generative procedures.

Highlighted Publications

Qi Liu, Miltiadis Allamanis, Marc Brockschmidt, and Alexander L. Gaunt. Constrained Graph Variational Autoencoders for Molecule Design.
In NeurIPS’18: International Conference on Neural Information Processing Systems.
Renjie Liao, Marc Brockschmidt, Daniel Tarlow, Alexander Gaunt, Raquel Urtasun, and Richard S. Zemel. Graph Partition Neural Networks for Semi-Supervised Classification.
In ICLR’18: International Conference on Learning Representations (Workshop Track).
Yujia Li, Richard Zemel, Marc Brockschmidt, and Daniel Tarlow. Gated Graph Sequence Neural Networks.
In ICLR’16: International Conference on Learning Representations.

Below are some of the relevant publications of our group.

Learning to Understand Programs

Marc Brockschmidt, Miltos Allamanis, Alexander L. Gaunt, and Oleksandr Polozov. Generative Code Modeling with Graphs.
In ICLR’19: International Conference on Learning Representations.
Pengcheng Yin, Graham Neubig, Miltos Allamanis, Marc Brockschmidt, and Alexander L. Gaunt. Learning to Represent Edits.
In ICLR’19: International Conference on Learning Representations.
Patrick Fernandes, Miltos Allamanis, and Marc Brockschmidt. Structured Neural Summarization.
In ICLR’19: International Conference on Learning Representations.
Miltos Allamanis. The Adverse Effects of Code Duplication in Machine Learning Models of Code.
Tech Report, 2019
Santanu Dash, Miltos Allamanis, and Earl Barr. RefiNym: Using Names to Refine Types.
In FSE’18: Foundations of Software Engineering.
Vincent Hellendoorn, Christian Bird, Earl Barr, and Miltos Allamanis. Deep Learning Type Inference.
In FSE’18: Foundations of Software Engineering.
Miltos Allamanis, Marc Brockschmidt, and Mahmoud Khademi. Learning to Represent Programs with Graphs.
In ICLR’18: International Conference on Learning Representations.
Miltos Allamanis, Earl T. Barr, Christian Bird, Premkumar Devanbu, Mark Marron, and Charles Sutton. Mining Semantic Loop Idioms from Big Code.
In IEEE Transactions in Software Engineering, 2018.
Miltos Allamanis, Earl T. Barr, Prem Devanbu, and Charles Sutton. A Survey of Machine Learning for Big Code and Naturalness.
In ACM Computing Surveys 2018.
Miltos Allamanis and Marc Brockschmidt. SmartPaste: Learning to Adapt Source Code.
Tech Report, 2017.
Marc Brockschmidt, Yuxin Chen, Pushmeet Kohli, Siddharth Krishna, and Daniel Tarlow. Learning Shape Analysis.
In SAS’17: Static Analysis Symposium.
Miltos Allamanis, Hao Peng, and Charles Sutton. A Convolutional Attention Network for Extreme Summarization of Source Code.
In ICML’16: International Conference of Machine Learning.
Miltos Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. Suggesting Accurate Method and Class Names.
In FSE 2015: Foundations of Software Engineering
Marc Brockschmidt, Yuxin Chen, Byron Cook, Pushmeet Kohli, and Daniel Tarlow. Learning to Decipher the Heap for Program Verification.
In Workshop on Constructive Machine Learning at ICML 2015. Best Paper Award.
Chris Maddison and Daniel Tarlow. Structured Generative Models of Natural Source Code.
In ICML’14: International Conference on Machine Learning.
Miltos Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. Learning Natural Coding Conventions.
In FSE 2014: Foundations of Software Engineering

Learning to Generate Programs

Chenglong Wang, Kedar Tatwawadi, Marc Brockschmidt, Po-Sen Huang, Yi Mao, Oleksandr Polozov, and Rishabh Singh. Robust Text-to-SQL Generation with Execution-Guided Decoding.
Tech Report, 2018.
Chenglong Wang, Marc Brockschmidt, and Rishabh Singh. Pointing Out SQL Queries From Text.
Tech Report, 2017.
Pratiksha Thaker, Daniel Tarlow, Marc Brockschmidt, and Alexander L. Gaunt. Semantics-aware Program Sampling.
In DISCML @ NeurIPS’17.
Alexander L. Gaunt, Marc Brockschmidt, Nate Kushman, and Daniel Tarlow. Differentiable Programs with Neural Libraries.
In ICML’17: International Conference on Machine Learning.
Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. DeepCoder: Learning to Write Programs.
In ICLR’17: International Conference on Learning Representations.
John K. Feser, Marc Brockschmidt, Alexander L. Gaunt, and Daniel Tarlow. Neural Functional Programming.
In ICLR’17: International Conference on Learning Representations (Workshop Track).
Chengtao Li, Daniel Tarlow, Alexander L. Gaunt, Marc Brockschmidt, and Nate Kushman. Neural Program Lattices.
In ICLR’17: International Conference on Learning Representations.
Alexander L. Gaunt, Marc Brockschmidt, Rishabh Singh, Nate Kushman, Pushmeet Kohli, Jonathan Taylor, and Daniel Tarlow. TerpreT: A Probabilistic Programming Language for Program Induction.
Tech Report, 2016.
Mukund Raghothaman, Yi Wei, and Youssef Hamadi. SWIM: Synthesizing What I Mean.
In ICSE’16: International Conference on Software Engineering.
Miltos Allamanis, Daniel Tarlow, Andrew D. Gordon, and Yi Wei. Bimodal Modelling of Source Code and Natural Language.
In ICML’15: International Conference on Machine Learning.

Advancing the Machine Learning Frontier

Qi Liu, Miltiadis Allamanis, Marc Brockschmidt, and Alexander L. Gaunt. Constrained Graph Variational Autoencoders for Molecule Design.
In NeurIPS’18: International Conference on Neural Information Processing Systems.
Renjie Liao, Marc Brockschmidt, Daniel Tarlow, Alexander Gaunt, Raquel Urtasun, and Richard S. Zemel. Graph Partition Neural Networks for Semi-Supervised Classification.
In ICLR’18: International Conference on Learning Representations (Workshop Track).
Yujia Li, Richard Zemel, Marc Brockschmidt, and Daniel Tarlow. Gated Graph Sequence Neural Networks.
In ICLR’16: International Conference on Learning Representations.

We have open-sourced many of our work and implementations.

Libraries

dpu-utils: useful Python utilities for projects on deep program understanding.
gated-graph-neural-networks: A set of efficient TensorFlow implementations of graph neural networks that can handle large and sparse graphs.

Project-Specific Utilities

constrained-graph-variational-autoencoders: code for constrained graph VAEs.
DeepCoder-Utils: Code used in the experiments of the DeepCoder paper (ICLR 2017)
graph-partition-neural-network-samples: Sample code for Graph Partition Neural Networks.
dpu-learning-to-represent-edits: C# data extraction for “Learning to Represent Edits”
graph-based-code-modelling: The code for the ICLR’18 and ICLR’19 papers

People

Miltos Allamanis
Senior Researcher
Marc Brockschmidt
Senior Principal Research Manager

Microsoft Research Blog

Microsoft Research Blog

Overview

Learning to Understand Programs

Highlighted Publications

Learning to Generate Programs

Highlighted Publications

Advancing the Machine Learning Frontier

Highlighted Publications

Publications

Learning to Understand Programs

Learning to Generate Programs

Advancing the Machine Learning Frontier

Relevant Software

Libraries

Project-Specific Utilities

People

Miltos Allamanis

Marc Brockschmidt

Follow us:

Share this page: