Microsoft Research Blog
Microsoft Research Blog
Overview
The Deep Program Understanding project aims to teach machines to understand complex algorithms, combining methods from the programming languages, software engineering and the machine learning communities.
Learning to Understand Programs
Building “smart” software engineering tools requires learning to analyse and understand existing code and related artefacts such as documentation and online resources (e.g. StackOverflow). One of our primary concerns is the integration of standard static analysis methods with machine learning methods to create learning-based program analyses that can be used within software engineering tools. Such tools can then be used to find bugs, automatically retrieve or produce relevant documentation, or verify programs.
Highlighted Publications
- Marc Brockschmidt, Miltos Allamanis, Alexander L. Gaunt, and Oleksandr Polozov. Generative Code Modeling with Graphs.
In ICLR’19: International Conference on Learning Representations. - Pengcheng Yin, Graham Neubig, Miltos Allamanis, Marc Brockschmidt, and Alexander L. Gaunt. Learning to Represent Edits.
In ICLR’19: International Conference on Learning Representations. - Miltos Allamanis, Marc Brockschmidt, and Mahmoud Khademi. Learning to Represent Programs with Graphs.
In ICLR’18: International Conference on Learning Representations. - Miltos Allamanis, Earl T. Barr, Prem Devanbu, and Charles Sutton. A Survey of Machine Learning for Big Code and Naturalness.
In ACM Computing Surveys 2018.
Learning to Generate Programs
A core problem of machine learning is to learn algorithms that explain observed behaviour. This can take several forms, such as program synthesis from examples, in which an interpretable program matching given input/output pairs has to be produced; or alternatively programming by demonstration, in which a system has to learn to mimic sequences of actions.
Highlighted Publications
- Chenglong Wang, Kedar Tatwawadi, Marc Brockschmidt, Po-Sen Huang, Yi Mao, Oleksandr Polozov, and Rishabh Singh. Robust Text-to-SQL Generation with Execution-Guided Decoding.
Tech Report, 2018. - Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. DeepCoder: Learning to Write Programs.
In ICLR’17: International Conference on Learning Representations. - Alexander L. Gaunt, Marc Brockschmidt, Rishabh Singh, Nate Kushman, Pushmeet Kohli, Jonathan Taylor, and Daniel Tarlow. TerpreT: A Probabilistic Programming Language for Program Induction.
Tech Report, 2016. - Miltos Allamanis, Daniel Tarlow, Andrew D. Gordon, and Yi Wei. Bimodal Modelling of Source Code and Natural Language.
In ICML’15: International Conference on Machine Learning.
Advancing the Machine Learning Frontier
Structured data such as programs represent a challenge for machine learning methods. The combination of domain constraints, known semantics and complex structure requires new machine learning methods and techniques. Our focus in this area is the analysis and generation of graphs, for which we have developed novel neural network architectures and generative procedures.
Highlighted Publications
- Qi Liu, Miltiadis Allamanis, Marc Brockschmidt, and Alexander L. Gaunt. Constrained Graph Variational Autoencoders for Molecule Design.
In NeurIPS’18: International Conference on Neural Information Processing Systems. - Renjie Liao, Marc Brockschmidt, Daniel Tarlow, Alexander Gaunt, Raquel Urtasun, and Richard S. Zemel. Graph Partition Neural Networks for Semi-Supervised Classification.
In ICLR’18: International Conference on Learning Representations (Workshop Track). - Yujia Li, Richard Zemel, Marc Brockschmidt, and Daniel Tarlow. Gated Graph Sequence Neural Networks.
In ICLR’16: International Conference on Learning Representations.
Publications
Below are some of the relevant publications of our group.
Learning to Understand Programs
- Marc Brockschmidt, Miltos Allamanis, Alexander L. Gaunt, and Oleksandr Polozov. Generative Code Modeling with Graphs.
In ICLR’19: International Conference on Learning Representations. - Pengcheng Yin, Graham Neubig, Miltos Allamanis, Marc Brockschmidt, and Alexander L. Gaunt. Learning to Represent Edits.
In ICLR’19: International Conference on Learning Representations. - Patrick Fernandes, Miltos Allamanis, and Marc Brockschmidt. Structured Neural Summarization.
In ICLR’19: International Conference on Learning Representations. - Miltos Allamanis. The Adverse Effects of Code Duplication in Machine Learning Models of Code.
Tech Report, 2019 - Santanu Dash, Miltos Allamanis, and Earl Barr. RefiNym: Using Names to Refine Types.
In FSE’18: Foundations of Software Engineering. - Vincent Hellendoorn, Christian Bird, Earl Barr, and Miltos Allamanis. Deep Learning Type Inference.
In FSE’18: Foundations of Software Engineering. - Miltos Allamanis, Marc Brockschmidt, and Mahmoud Khademi. Learning to Represent Programs with Graphs.
In ICLR’18: International Conference on Learning Representations. - Miltos Allamanis, Earl T. Barr, Christian Bird, Premkumar Devanbu, Mark Marron, and Charles Sutton. Mining Semantic Loop Idioms from Big Code.
In IEEE Transactions in Software Engineering, 2018. - Miltos Allamanis, Earl T. Barr, Prem Devanbu, and Charles Sutton. A Survey of Machine Learning for Big Code and Naturalness.
In ACM Computing Surveys 2018. - Miltos Allamanis and Marc Brockschmidt. SmartPaste: Learning to Adapt Source Code.
Tech Report, 2017. - Marc Brockschmidt, Yuxin Chen, Pushmeet Kohli, Siddharth Krishna, and Daniel Tarlow. Learning Shape Analysis.
In SAS’17: Static Analysis Symposium. - Miltos Allamanis, Hao Peng, and Charles Sutton. A Convolutional Attention Network for Extreme Summarization of Source Code.
In ICML’16: International Conference of Machine Learning. - Miltos Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. Suggesting Accurate Method and Class Names.
In FSE 2015: Foundations of Software Engineering - Marc Brockschmidt, Yuxin Chen, Byron Cook, Pushmeet Kohli, and Daniel Tarlow. Learning to Decipher the Heap for Program Verification.
In Workshop on Constructive Machine Learning at ICML 2015. Best Paper Award. - Chris Maddison and Daniel Tarlow. Structured Generative Models of Natural Source Code.
In ICML’14: International Conference on Machine Learning. - Miltos Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. Learning Natural Coding Conventions.
In FSE 2014: Foundations of Software Engineering
Learning to Generate Programs
- Chenglong Wang, Kedar Tatwawadi, Marc Brockschmidt, Po-Sen Huang, Yi Mao, Oleksandr Polozov, and Rishabh Singh. Robust Text-to-SQL Generation with Execution-Guided Decoding.
Tech Report, 2018. - Chenglong Wang, Marc Brockschmidt, and Rishabh Singh. Pointing Out SQL Queries From Text.
Tech Report, 2017. - Pratiksha Thaker, Daniel Tarlow, Marc Brockschmidt, and Alexander L. Gaunt. Semantics-aware Program Sampling.
In DISCML @ NeurIPS’17. - Alexander L. Gaunt, Marc Brockschmidt, Nate Kushman, and Daniel Tarlow. Differentiable Programs with Neural Libraries.
In ICML’17: International Conference on Machine Learning. - Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. DeepCoder: Learning to Write Programs.
In ICLR’17: International Conference on Learning Representations. - John K. Feser, Marc Brockschmidt, Alexander L. Gaunt, and Daniel Tarlow. Neural Functional Programming.
In ICLR’17: International Conference on Learning Representations (Workshop Track). - Chengtao Li, Daniel Tarlow, Alexander L. Gaunt, Marc Brockschmidt, and Nate Kushman. Neural Program Lattices.
In ICLR’17: International Conference on Learning Representations. - Alexander L. Gaunt, Marc Brockschmidt, Rishabh Singh, Nate Kushman, Pushmeet Kohli, Jonathan Taylor, and Daniel Tarlow. TerpreT: A Probabilistic Programming Language for Program Induction.
Tech Report, 2016. - Mukund Raghothaman, Yi Wei, and Youssef Hamadi. SWIM: Synthesizing What I Mean.
In ICSE’16: International Conference on Software Engineering. - Miltos Allamanis, Daniel Tarlow, Andrew D. Gordon, and Yi Wei. Bimodal Modelling of Source Code and Natural Language.
In ICML’15: International Conference on Machine Learning.
Advancing the Machine Learning Frontier
- Qi Liu, Miltiadis Allamanis, Marc Brockschmidt, and Alexander L. Gaunt. Constrained Graph Variational Autoencoders for Molecule Design.
In NeurIPS’18: International Conference on Neural Information Processing Systems. - Renjie Liao, Marc Brockschmidt, Daniel Tarlow, Alexander Gaunt, Raquel Urtasun, and Richard S. Zemel. Graph Partition Neural Networks for Semi-Supervised Classification.
In ICLR’18: International Conference on Learning Representations (Workshop Track). - Yujia Li, Richard Zemel, Marc Brockschmidt, and Daniel Tarlow. Gated Graph Sequence Neural Networks.
In ICLR’16: International Conference on Learning Representations.
Relevant Software
We have open-sourced many of our work and implementations.
Libraries
- dpu-utils: useful Python utilities for projects on deep program understanding.
- gated-graph-neural-networks: A set of efficient TensorFlow implementations of graph neural networks that can handle large and sparse graphs.
Project-Specific Utilities
- constrained-graph-variational-autoencoders: code for constrained graph VAEs.
- DeepCoder-Utils: Code used in the experiments of the DeepCoder paper (ICLR 2017)
- graph-partition-neural-network-samples: Sample code for Graph Partition Neural Networks.
- dpu-learning-to-represent-edits: C# data extraction for “Learning to Represent Edits”
- graph-based-code-modelling: The code for the ICLR’18 and ICLR’19 papers