Research

Ongoing · Past · Talks & Guest Lectures

Ongoing Research

Operator Selection in Differentiable Fuzzy Logic

KRistal Group, Nanjing University · Advisor: Prof. YiZheng Zhao · Oct 2025 - Present

  • Project: Studying fuzzy operator selection in differentiable logic for neuro-symbolic learning
  • Approach: Combining empirical analysis on diverse ontology corpora with theoretical study of training dynamics
  • Status: First-author manuscript completed; under submission

AI4Math: Mathematical Research Collaboration Platform

Microsoft Research Asia (MSRA) · Advisor: Ziyu Zhou · Mar 2026 - Present

  • Project: AI-assisted platform for mathematical research collaboration, combining automated theorem proving, literature discovery, and proof assistant integration
  • Contributions: Core contributor to platform architecture and knowledge infrastructure; developing LLM-based tools for mathematical reasoning and formal verification workflows
  • Status: Active development phase; preparing for demonstration to mathematical community

Reasoning-Based WebAgents with Fine-Grained Reinforcement Learning

Ludwig Maximilian University of Munich · Advisor: Dr. Yao Zhang · Nov 2025 - Present

  • Problem: Web agents struggle with multi-step reasoning tasks requiring perception, planning, and grounded decision-making
  • Approach: Designing fine-grained RL frameworks that decompose complex web tasks into learnable sub-goals with structured reward signals
  • Methods: Novel reward modeling mechanisms for training stability; image-grounded reasoning modules for multimodal perception
  • Deliverables: Prototype agent system integrating vision-language models with RL-based task decomposition

Reasoning-Enhanced Reward Models for Preference Alignment

Independent Research · Advisor: Dr. Zhen Han · Jul 2025 - Present

  • Problem: Standard reward models for RLHF struggle to capture reasoning quality beyond surface-level coherence
  • Approach: Integrating reasoning-specific signals into reward modeling via a pipeline combining reject sampling, SFT, and RL for scalable preference data
  • Status: Experiments ongoing; manuscript in preparation

Past Research Experience

Interactive Theorem Proving with LLMs and Lean4

ScaleML Lab, UIUC · Advisor: Prof. Tong Zhang · Apr - Jun 2025

  • Problem: LLMs can propose plausible proof steps but lack formal verification, limiting their reliability for mathematical reasoning
  • Approach: Built a prototype integrating Lean4 with LLMs for interactive theorem proving on MiniF2F; bidirectional pipeline (LLM ↔ Lean4) with proof-state serialization and closed-loop refinement
  • Outcome: Working prototype + analysis of common failure modes (context violations, invalid step proposals) informing interface design

Benchmarking System for HarmonyOS Intelligent Agents

Huawei 2012 Labs · Supervisor: JianFeng Gui · Jul - Sept 2025

  • Problem: Need for systematic evaluation of reasoning and adaptability in mobile OS agents across diverse tasks
  • Contributions: Co-developed benchmarking infrastructure for the IntelliOS-agent pipeline; integrated HDC debugging tools with LLM-based reasoning modules and ported Python dependencies to HarmonyOS
  • Outcome: Deployed in Huawei's internal IntelliOS project for agent evaluation

Quantum Memory Architectures for Machine Learning

QUEST Lab, NC State University · Advisor: Prof. Yuan Liu · Jul - Nov 2024

  • Problem: Quantum computing hardware for ML workloads lacks optimized memory architectures tailored to quantum-classical hybrid execution
  • Approach: Explored quantum memory designs specifically for quantum machine learning algorithms
  • Contributions: Proposed optimized computational architecture for ML workloads on quantum systems; co-authored a manuscript later continued by collaborators

Adversarial Backdoors in Machine Learning Models

COSEC Research Group, Nanjing University · Advisors: Prof. Yuan Zhang, Prof. Sheng Zhong · Jul 2023 - Dec 2024

  • Problem: Understanding and defending against backdoor attacks in neural network training pipelines
  • Contributions: Proposed novel exploit mechanism for backdoor injection; designed attack experiments on malicious training scenarios
  • Impact: Work contributed to group's broader research on ML robustness and trustworthiness

Talks & Guest Lectures

Talks

  • Reinforcement Learning with GRPO: From PPO to Group-Relative Policy Optimization · NJU AIA, 2026
  • Building a Neural Network from Scratch with NumPy · NJU AIA, 2025
  • Building a Neural Network from Scratch with NumPy · NJU AIA, 2023

Guest Lectures

  • Lean4 for Interactive Theorem Proving · Discrete Mathematics, NJU · Jan 2026
  • Cybersecurity / Offensive-Defensive Techniques · NJU · Dec 2025