Research Assistant - Pacific Northwest Labs - Richland, WA - 2022
Accelerating Graphics Workloads using Custom RISC-V Hardware Extensions.
Using Vortex framework extensible architecture for accelerating graph analytics. Solving the ISA design, compilers, and hardware implementation challenges to move a high-level graph analytics computational primitive down to custom hardware. This work also involved porting the Vortex framework into the Xilinx ecosystem to leverage their SoC and HBM-based FPGA devices.
Research Assistant - Oak Ridge Labs - Oak Ridge, TN - 2019
Hybrid RTL-HSL Single-source Hardware Development using OpenARC/Cash Frameworks.
Extending the CASH single-source framework to integrate with OpenCL as an extension to enable hybrid hardware development. This work was the first proposal that demostrated mixing High-level synthesis with optimized RTL-based module for performance optimization. This fearture was later implemented by Xilinx in 2020 as part of their Vitis HLS toolchain.
Research Assistant - Microsoft Research - Redmond, WA - 2017
Latency-Aware Compiler Optimization for a Machine Learning FPGA Accelerator.
Implemented runtime profiling for Tensorflow and CNTK models computation graphs. Derived performance models for Machine Learning computation primitives. Integrated profiling data to assist tasks graph partitioning and offload scheduling on FPGA.
Research Assistant - Intel Labs - Hillsboro, OR - 2016
Performance Analysis for high level synthesis (HLS) tools targeting FPGAs.
Implemented and evaluated HLS based AES Encryption Accelerator versus verilog HDL. Implemented and evaluated HLS based SPMV Accelerator versus verilog HDL. Implemented Graph Analytics framework for GPU and FPGA acceleration.
Research Assistant - Intel Labs - Hillsboro, OR - 2015
Compiler Optimization for a 3D Stacked DRAM Data-Analytics Accelerator.
Implemented the programming interface and compiler backend for map/reduce parallel algorithms. Contributed to the implementation of the high level simulator using Simics. Ported Graph Analytics benchmarks to custom accelerator.
Software Developer - Microsoft - Redmond, WA - 2014
Windows Graphics Team
Collaborated to the design and implementation of the graphics emulation software for the Windows Advanced Rasterization Platform. Implemented industry compliant GPU emulators for Direct3D 9, 10, 11, 12 Graphics APIs. Latest work was influencial in modern graphic API arhcitectures such as Vulkan. Worked on Just-in-time compilation with support for SSE and AVX extensions. Acquired deep knowledge of the GPU rasterization pipeline and OS driver architecture.
Software Developer - Microsoft - Redmond, WA - 2008
Windows Mobile Graphics Team
Collaborated to the design and implementation of the Direct3D graphics rasterizer on Windows CE. Implemented state-of-the art OpenGL ES software renderer with fixed function pipeline. Acquired experience in JIT compilation for ARM architectures. Acquired experience in kernel drivers development for embedded systems.
C/C++, C#, Java, Python, Verilog, Chisel, Bluespec
Compilers Tools & Architectures
LLVM, Clang, x86, ARM, RISCV, SSE, AVX, Neon
Simics, Gem5, Sesc, GPGPUSim, Manifold, HMCSim, DRAMSim
Tensorflow, OpenMP, MPI, CUDA, OpenCL, DirectCompute