Research
Ongoing · Past · Talks & Guest Lectures
Ongoing Research
Operator Selection in Differentiable Fuzzy Logic
KRistal Group, Nanjing University · Advisor: Prof. YiZheng Zhao · Oct 2025 - Present
- Project: Studying fuzzy operator selection in differentiable logic for neuro-symbolic learning
- Approach: Combining empirical analysis on diverse ontology corpora with theoretical study of training dynamics
- Status: First-author manuscript completed; under submission
AI4Math: Mathematical Research Collaboration Platform
Microsoft Research Asia (MSRA) · Advisor: Ziyu Zhou · Mar 2026 - Present
- Project: AI-assisted platform for mathematical research collaboration, combining automated theorem proving, literature discovery, and proof assistant integration
- Contributions: Core contributor to platform architecture and knowledge infrastructure; developing LLM-based tools for mathematical reasoning and formal verification workflows
- Status: Active development phase; preparing for demonstration to mathematical community
Reasoning-Based WebAgents with Fine-Grained Reinforcement Learning
Ludwig Maximilian University of Munich · Advisor: Dr. Yao Zhang · Nov 2025 - Present
- Problem: Web agents struggle with multi-step reasoning tasks requiring perception, planning, and grounded decision-making
- Approach: Designing fine-grained RL frameworks that decompose complex web tasks into learnable sub-goals with structured reward signals
- Methods: Novel reward modeling mechanisms for training stability; image-grounded reasoning modules for multimodal perception
- Deliverables: Prototype agent system integrating vision-language models with RL-based task decomposition
Reasoning-Enhanced Reward Models for Preference Alignment
Independent Research · Advisor: Dr. Zhen Han · Jul 2025 - Present
- Problem: Standard reward models for RLHF struggle to capture reasoning quality beyond surface-level coherence
- Approach: Integrating reasoning-specific signals into reward modeling via a pipeline combining reject sampling, SFT, and RL for scalable preference data
- Status: Experiments ongoing; manuscript in preparation
Past Research Experience
Interactive Theorem Proving with LLMs and Lean4
ScaleML Lab, UIUC · Advisor: Prof. Tong Zhang · Apr - Jun 2025
- Problem: LLMs can propose plausible proof steps but lack formal verification, limiting their reliability for mathematical reasoning
- Approach: Built a prototype integrating Lean4 with LLMs for interactive theorem proving on MiniF2F; bidirectional pipeline (LLM ↔ Lean4) with proof-state serialization and closed-loop refinement
- Outcome: Working prototype + analysis of common failure modes (context violations, invalid step proposals) informing interface design
Benchmarking System for HarmonyOS Intelligent Agents
Huawei 2012 Labs · Supervisor: JianFeng Gui · Jul - Sept 2025
- Problem: Need for systematic evaluation of reasoning and adaptability in mobile OS agents across diverse tasks
- Contributions: Co-developed benchmarking infrastructure for the IntelliOS-agent pipeline; integrated HDC debugging tools with LLM-based reasoning modules and ported Python dependencies to HarmonyOS
- Outcome: Deployed in Huawei's internal IntelliOS project for agent evaluation
Quantum Memory Architectures for Machine Learning
QUEST Lab, NC State University · Advisor: Prof. Yuan Liu · Jul - Nov 2024
- Problem: Quantum computing hardware for ML workloads lacks optimized memory architectures tailored to quantum-classical hybrid execution
- Approach: Explored quantum memory designs specifically for quantum machine learning algorithms
- Contributions: Proposed optimized computational architecture for ML workloads on quantum systems; co-authored a manuscript later continued by collaborators
Adversarial Backdoors in Machine Learning Models
COSEC Research Group, Nanjing University · Advisors: Prof. Yuan Zhang, Prof. Sheng Zhong · Jul 2023 - Dec 2024
- Problem: Understanding and defending against backdoor attacks in neural network training pipelines
- Contributions: Proposed novel exploit mechanism for backdoor injection; designed attack experiments on malicious training scenarios
- Impact: Work contributed to group's broader research on ML robustness and trustworthiness
Talks & Guest Lectures
Talks
- Reinforcement Learning with GRPO: From PPO to Group-Relative Policy Optimization · NJU AIA, 2026
- Building a Neural Network from Scratch with NumPy · NJU AIA, 2025
- Building a Neural Network from Scratch with NumPy · NJU AIA, 2023
Guest Lectures
- Lean4 for Interactive Theorem Proving · Discrete Mathematics, NJU · Jan 2026
- Cybersecurity / Offensive-Defensive Techniques · NJU · Dec 2025