I'm a Ph.D
- Name: José Wesley de Souza Magalhães
- Job: Research
- Birth: 02/1996
- Residence: Edinburgh, Scotland, UK
- Hometown: Ferros, Minas Gerais, Brazil
I'm a Ph.D. Candidate in Computer Science at The University of Edinburgh (UoE). Currently I'm working as a Research Postgraduate in the Institute for Computing and Systems Architecture (ICSA) under the supervision of Professor Michael O'Boyle. My research interests include Compilers and Programming Languages, focusing on Neural-Guided Program Synthesis and Benchmark Validation.
I am currently researching methods to automatically lift legacy code to new, modern, and optimized languages. The goal is to translate programs, or part of them, to domain-specific languages. Lifting enables leverage domain knowledge and execute code on new hardware platforms, therefore, leading to performance gains.
My Education
-
Ph.D. in Computer Science
The University of Edinburgh 2022-NOWAdvisor: Michael O'Boyle
Local: Edinburgh, Scotland, UK
-
Master's Degree in Computer Science
Federal University of Minas Gerais (UFMG) 2019-2021Final Article: Automatic Inspection of Program State in an Uncooperative Environment
Advisor: Fernando Magno Quintão Pereira
Local: Belo Horizonte, Minas Gerais, Brazil
-
Bachelor's Degree in Computer Science
Federal University of Viçosa (UFV) 2015-2018Final Project: A Flow-Sensitive Approach for Steensgaard's Pointer Analysis
Advisor: Daniel Mendes Barbosa
Local: Florestal, Minas Gerais, Brazil
My Projects
-
Lifting legacy code to optimized domain-specific languages
ViewProject Lifting
Existing software is usually written in general-purpose languages that leverage little the efficiency of the application's domain. Besides that, rewriting code to a new target is a difficult, tedious and error-prone task. Program Lifting is a technique to automatically translate legacy code to modern domain-specific languages (DSLs). Such DSLs embed domain knowledge, which is crucial to optimize programs and leads to big performance improvements. In fact, DSLs have been outperforming general-purpose ones in a variety of domains
In addition to that, lifting assists in the context of the rapid evolution of Computer Architecture by automatically porting existing code databases to new hardware platforms. Emerging hardware is often programmed by domain-specific languages. Therefore, lifting can be used to bridge the gap between legacy code and modern hardware devices by mapping programs, or part of them, to new accelerator oriented DSL.
View -
Whiro: Automatic inspection of program state
ViewProject Whiro
This project aims to automatically insert verification code into programs. This verification code reports the internal state of a program in a given program point. By the state of the program, we mean the values of the program variables at that point. This technique tracks values of static, stack-allocated, and heap-allocated variables, being able to traverse the memory graph in unsafe languages, like C and C++; and it works in programs transformed by different compiler optimizations. Such an approach can be used for debugging purposes, synthesis of benchmarks, and data visualization.
View -
Jotai: Generating synthetic inputs to turn Angha programs into executable
ViewProject Jotai Benchmarks
The goal of this project is to create a technique to synthesize inputs to programs, in order to allow such programs to execute. We implement a Clang plugin that analyzes the signature of functions and gathers information about its parameters. Then, our framework instantiate constraint-guided inputs based on the types of the parameters of a function and links them into the program as a library. Using this framework, we will be able to transform the programs created in Project Angha into executable benchmarks, which will enable us to perform many different experiments that involve, for instance, running time and resource usage.
View -
Angha: A suite with over one million compilable programs mined from open-source repositories
ViewProject Angha
The goal of this project is to design, implement, and validate a general methodology to produce benchmarks for compilers. This approach goes over open-source repositories containing C programs, downloads these programs, splits them into individual functions, and uses Psyche-c, a type reconstructor, to produce compilable bytecodes out of each individual function. To demonstrate that this methodology is useful, we used the benchmarks to improve a machine learning technique and trained a predictive compiler for code size reduction. We have been able to generate binaries over 10% smaller than clang -Oz.
View -
Miscellaneous
ViewProjects
-
Back in 2017, I wrote a book about Julia language, with some college colleagues. It is in Portuguese, actually, but it can be interesting for beginners on that language. View Book
-
I participated in the Artifact Evaluation Committe for CGO 2021. View Committe
-
I have taught some classes about LLVM metadata in the LLVM Introductory Course promoted by the Compilers Laboratory on YouTube. Check it out!
-
My Papers
-
2023
C2TACO: Lifting Tensor Code to TACO
José Wesley de Souza Magalhães, Jackson Woodruff, Elizabeth Polgreen, MIchael F. P. O'BoyleConference: International Conference on Generative Programming: Concepts & Experiences (GPCE) 2023 View
View Paper -
2022
Automatic Inspection of Program State in an Uncooperative Environment
José Wesley de Souza Magalhães, Chunhua Liao, and Fernando Magno Quintão PereiraJournal: Software: Practice and Experience. 2022; 1-32 View
View Paper -
ExeBench: An ML-Scale Dataset of Executable C Functions
Jordi Armengol-Estapé, Jackson Woodruff, Alexander Brauckmann, José Wesley de Souza Magalhães, Michael F. P. O'Boyle.Conference: International Symposium on Machine Programming (MAPS) 2022 View
View Paper -
2021
ANGHABENCH : a Suite with One Million Compilable C Benchmarks for Code-Size Reduction
Anderson Faustino da Silva, Bruno Conde Kind, José Wesley de Souza Magalhães, Jerônimo Nunes Rocha, Breno Campos Ferreira Guimaraes, and Fernando Magno Quintao Pereira.Conference: International Symposium on Code Generation and Optimization (CGO) 2021 View
View Paper -
2019
Synthesis of Benchmarks for the C Programming Language by Mining Software Repositories
Breno Campos Ferreira Guimarães, José Wesley Magalhães, Fernando Quintão Pereira, and Anderson Faustino da Silva.Conference: XXIII Brazilian Symposium on Programming Languages (SBLP) View
View Paper