RealityCraft
Natural-language XR scripts.
Turns plain-English XR test instructions into executable controller-and-hand-tracking scripts. Closes the gap between human testers and the devices they wield.
Building reliable spatial intelligence,
perceiving, generating, verifying & maintaining XR, embodied AI, and 3D-aware software.
phd background @ cuhk · drag the scene to perturb ↗
I build and break reliable spatial intelligence: AI systems that perceive, reason, decide, and act in space, with guarantees of safety, robustness, explainability, and verifiability. My work spans XR (VR / AR / MR), embodied AI, multi-modal LLM agents, and the software engineering that has to keep all of it from clipping through the floor.
PhD background at CUHK, advised by Prof. Michael R. Lyu (ACM / AAAS / IEEE Life Fellow). Previously co-founder & CTO of PortalX (YC China · MiraclePlus), founding team at Common Defense (Quantstamp).
Natural-language XR scripts.
Turns plain-English XR test instructions into executable controller-and-hand-tracking scripts. Closes the gap between human testers and the devices they wield.
Constraint-expressive IR for LLMs.
An intermediate representation that lets LLMs reason about physical and semantic constraints when synthesizing 3D scenes. The resulting layouts are physically consistent, not just plausible.
Demystifying multi-user XR defects.
First systematic study of defects unique to multi-user XR (desync, ghost avatars, shared-state corruption) across 100+ real-world apps.
Stereoscopic visual inconsistencies in VR.
Taxonomized the stereoscopic visual inconsistencies that cause cybersickness, then built StereoID, a static analyzer that surfaced them across 171 VR apps.
Context-sensitive GUI grounding for XR testing.
A context-sensitive grounding approach that lets automated testers locate XR GUI elements the way human players do, by reading the surrounding 3D context rather than raw pixels.
Multi-modal software-onboarding agents.
Co-founded a YC China–backed startup building universal self-serve AI assistants for any software. Multi-modal LLM agents do intelligent Q&A, automatic onboarding, and task automation across platforms.
▸ first-author · # co-first-author · * corresponding
A context-sensitive grounding approach that lets automated testers locate XR GUI elements the way human players do, by reading the surrounding 3D context rather than raw pixels.
First systematic study of defects unique to multi-user XR (desync, ghost avatars, shared-state corruption) across 100+ real-world apps.
An intermediate representation that lets LLMs reason about physical and semantic constraints when synthesizing 3D scenes, producing layouts that are physically consistent and human-plausible.
RealityCraft turns plain-English XR test descriptions into executable interaction scripts, closing the gap between human testers and the controllers they use.
CARE mines user reviews of XR apps and uses an LLM pipeline to surface cybersickness symptoms developers can act on, turning subjective discomfort into reproducible signal.
Stereoscopic visual inconsistencies (SVIs) cause cybersickness in VR. We taxonomize them and build StereoID, a static analyzer that finds SVIs across 171 VR apps.
The first systematic look at bugs in WebXR applications, a foundational study that helped seed the Reliable Spatial Intelligence research direction.
Mentored 10+ junior PhD, master's, and undergraduate students across CUHK, HIT, SYSU, and SUSTech. Many now at CMU, UW, EPFL, CUHK, HKU, HKUST, ByteDance, Tencent, and Alibaba.