Research

Research Focus

My research aims to enhance reasoning reliability and interpretability in AI systems by integrating symbolic methods with neural learning. I explore how formal reasoning, structured knowledge, and interactive verification can address fundamental limitations in language models and autonomous agents.

Core Interests: Neuro-Symbolic AI · LLM Reasoning · Reinforcement Learning for Reasoning · Formal Verification · Interactive Agents

Ongoing Research

Reasoning-Based WebAgents with Fine-Grained Reinforcement Learning

Ludwig Maximilian University of Munich · Advisor: Dr. Yao Zhang · Nov 2025 - Present

Problem: Web agents struggle with multi-step reasoning tasks requiring perception, planning, and grounded decision-making
Approach: Designing fine-grained RL frameworks that decompose complex web tasks into learnable sub-goals with structured reward signals
Methods: Novel reward modeling mechanisms for training stability; image-grounded reasoning modules for multimodal perception
Deliverables: Prototype agent system integrating vision-language models with RL-based task decomposition

Neuro-Symbolic Framework for Adaptive Logical Reasoning

KRistal Group, Nanjing University · Advisor: Prof. YiZheng Zhao · Oct 2025 - Present

Problem: LLMs perform poorly on tasks requiring strict logical consistency and compositional generalization
Approach: Building a framework that combines logical semantics with differentiable neural modules to enable end-to-end learning while preserving logical structure
Methods: Meta-learning-based reasoning strategies; fuzzy logic engine balancing faithfulness with computational efficiency
Deliverables: Modular reasoning architecture supporting both neural adaptation and symbolic constraints

Reasoning-Enhanced Reward Models for Preference Alignment

Independent Research · Advisor: Dr. Zhen Han · Jul 2025 - Present

Problem: Standard reward models for RLHF struggle to capture reasoning quality beyond surface-level coherence
Approach: Integrating reasoning-specific signals into reward modeling to improve alignment on complex reasoning tasks
Methods: Pipeline combining reject sampling, supervised fine-tuning (SFT), and reinforcement learning for scalable preference data generation
Status: Conducting experiments on reasoning-guided reward modeling; manuscript in preparation

Past Research Experience

Interactive Theorem Proving with LLMs and Lean4

ScaleML Lab, UIUC · Advisor: Prof. Tong Zhang · Apr - Jun 2025

Problem: LLMs can propose plausible proof steps but lack formal verification, limiting their reliability for mathematical reasoning
Approach: Built a prototype integrating Lean4 formal proof assistant with LLMs for interactive theorem proving on MiniF2F benchmark
Methods: Designed bidirectional communication pipeline (LLM ↔ Lean4) with proof-state serialization; implemented closed-loop refinement where Lean4 verifies LLM-proposed tactics
Artifacts: Working prototype system; analysis of common failure modes (context violations, invalid step proposals) informing interface design

Benchmarking System for HarmonyOS Intelligent Agents

Huawei 2012 Labs · Supervisor: JianFeng Gui · Jul - Sept 2025

Problem: Need for systematic evaluation of reasoning and adaptability in mobile OS agents across diverse tasks
Contributions: Co-developed benchmarking infrastructure from scratch; contributed to IntelliOS-agent pipeline
Methods: Integrated HDC debugging tools with LLM-based reasoning modules; ported Python dependency libraries to HarmonyOS environment
Deployment: System deployed in Huawei's internal IntelliOS project for agent evaluation

Quantum Memory Architectures for Machine Learning

QUEST Lab, NC State University · Advisor: Prof. Yuan Liu · Jul - Nov 2024

Problem: Quantum computing hardware for ML workloads lacks optimized memory architectures tailored to quantum-classical hybrid execution
Approach: Explored quantum memory designs specifically for quantum machine learning algorithms
Contributions: Proposed optimized computational architecture for ML workloads on quantum systems
Status: Co-first author manuscript submitted to ISCA 2025

Adversarial Backdoors in Machine Learning Models

COSEC Research Group, Nanjing University · Advisors: Prof. Yuan Zhang, Prof. Sheng Zhong · Jul 2023 - Dec 2024

Problem: Understanding and defending against backdoor attacks in neural network training pipelines
Contributions: Proposed novel exploit mechanism for backdoor injection; designed attack experiments on malicious training scenarios
Impact: Work contributed to group's broader research on ML robustness and trustworthiness