Chapter 10: Experimental Design and Validation Plan
Summary
This chapter presents experimental designs for validating TacGlove [#26]/TacTeleOp and TacPlay [#27] hypotheses: 3 pilot processes (capping, labeling, packaging), required hardware (gloves, robots, smart glasses), quantitative evaluation metrics (success rate, convergence time, generalization, cost), and key ablation experiments (3-axis vs binary, co-training conditions, tactile residual decomposition).
10.1 Three Pilot Processes
Process 1: Container Capping (Difficulty: High)
- Action: Hold container, rotate cap clockwise
- Tactile requirement: Normal force (grip) + shear force (rotation torque) precision control
- 3-axis vs binary: Binary detects contact only, cannot distinguish insufficient/excessive torque
- Success criteria: Cap fully closed within specified torque range
Process 2: Label Application (Difficulty: Medium)
- Action: Apply label to container surface, press uniformly to remove bubbles
- Tactile requirement: Uniform pressure distribution maintenance
- 3-axis vs binary: 3-axis advantageous for detecting force imbalance
- Success criteria: Label position accuracy ±2mm, no bubbles
Process 3: Assembly Packaging (Difficulty: High)
- Action: Place and secure multiple parts in box in specified order
- Tactile requirement: Multi-directional force + precision positioning (snap-fit)
- 3-axis vs binary: Force feedback essential for precision positioning during snap-fit
- Success criteria: All parts correctly positioned and secured
10.2 Required Hardware
| Equipment | Quantity | Purpose | Notes |
|---|---|---|---|
| TacGlove | 3+ pairs | Shift changes + spares | Fabricated by Prof. Park Y.-L.'s group |
| 3-axis magnetic tactile sensors | 8/glove × 3 = 24+ | Tactile data collection | BMM350 + magnetic elastomer |
| Allegro or LEAP Hand | 1–2 units | Robot experiments | 16-DoF dexterous |
| Smart glasses (Aria or equiv.) | 2 units | Egocentric RGB + head pose | Time synchronization required |
| Cosmetics process setup | 1 set | Lab reproduction | Actual containers, labels, packaging |
| GPU (A100 × 4+) | 1 set | Co-training + RL | Isaac Sim compatible |
| MuJoCo + TACTO | License | Sim environment | TacPlay sim-first |
10.3 Evaluation Metrics
Primary Metrics
| Metric | Definition | Target |
|---|---|---|
| Success rate | Success ratio over 50 trials per process | Capping >85%, Label >90%, Packaging >80% |
| Convergence time | Learning time to reach target success rate | TacPlay: <4 hr/object |
| Novel object generalization | Success on unseen containers/labels | >70% (within -15%p of seen) |
| Cost efficiency | Human labor hours per unit success rate | 3×+ over teleop |
Ablation Metrics
| Ablation | Conditions | Purpose |
|---|---|---|
| 3-axis vs binary | 8 sensors 3-axis vs 8 sensors binary | H3 tactile resolution marginal gain |
| Co-training conditions | A only, B only, A+B (vision), A+B (vision+tactile) | H2 co-training + tactile value |
| Scaling curve | Data B at 10, 50, 200, 800 hr | Tactile scaling law |
| Tactile residual transfer | \Delta_{capping} → label vs \Delta_{label} trained fresh | H3 cross-task generalization |
| Residual decomposition | Kinematic vs task vs object components | Understanding residual structure |
10.4 Experimental Protocol
Phase A: TacGlove/TacTeleOp Validation (Month 1–6)
M1–M2: 3-finger glove + tactile sensor prototype fabrication and smart glasses synchronization.
M3: Lab pilot collection of 50 hours. Initial co-training experiments on capping.
- Go/No-Go check: co-training (A+B) > A only? Tactile addition effect > +5%p?
M4: 5-finger extension (if feasible). Execute 3-axis vs binary ablation.
M5: 200+ hour collection. Scaling curve analysis (10, 50, 200 hr). Complete all co-training ablations.
M6: TacGlove/TacTeleOp paper writing and submission.
Phase B: TacPlay Validation (Month 3–8)
M3–M4: Build tactile-target RL environment in MuJoCo + TACTO. Reward function design and convergence verification.
- Go/No-Go check: Sim convergence on 2+ of 3 tasks?
M5–M6: Real-world glove-mounted play experiments. Capping task priority.
M7: Tactile residual cross-task transfer experiment (capping → labeling).
M8: TacPlay paper writing or workshop paper preparation.
Go/No-Go Decision Framework
| Condition | Go | Pivot | Stop |
|---|---|---|---|
| Co-training (A+B > A) | Statistically significant | Trend only | Adverse |
| Tactile addition | +10%p or more | +5–10%p | <+5%p |
| Sim RL convergence | 2+ of 3 tasks | 1 task only | 0 tasks |
| Glove stability | 8+ hr continuous | 4–8 hr | <4 hr |
10.5 Expected Results and Interpretation
Best Case
- TacGlove/TacTeleOp: 800 hr tactile co-training → capping 90%+. Tactile scaling law confirmed log-linear. 3-axis > binary by +10%p.
- TacPlay: Tactile-target RL converges in sim and real. Cross-task residual transfer succeeds. 85%+ at 0 hr teleop.
Realistic Case
- TacGlove/TacTeleOp: Co-training effect confirmed at 200 hr, 3-axis vs binary significant only for capping. Scaling law partially verified.
- TacPlay: Sim convergence confirmed, partial real success. Cross-task transfer limited. Workshop paper level.
Worst Case
- TacGlove/TacTeleOp: Tactile co-training negligible vs vision-only. No 3-axis vs binary difference. → Pivot to hardware contribution (stretchable) + dataset contribution.
- TacPlay: RL convergence failure. → "Challenges and Limits of Tactile-Target RL" negative result paper.
10.6 Connection to Our Direction
This experimental design systematically validates all hypotheses from Chapter 7 (TacGlove), Chapter 8 (TacTeleOp), and Chapter 9 (TacPlay). The 3-axis vs binary ablation directly responds to Ye et al.'s [2026] "binary 85%" result, and the tactile scaling curve extends EgoScale's [2] vision scaling law to the tactile domain. The next chapter discusses long-term outlook beyond these experiments (Chapter 11).
References
- Ye, Q., et al. (2026). Visual-Tactile Learning for Dexterous Manipulation. Science Robotics. scholar
- Zheng, R., et al. (2026). EgoScale: Egocentric Video Pretraining. arXiv. scholar
- Yin, J., et al. (2025). OSMO: A Large-Scale Tactile Glove. arXiv. #18 scholar
- DexH2R (2024). Task-Oriented Residual RL. arXiv. scholar
- Si, Z., et al. (2025). ExoStart. arXiv. #9 scholar
- Kareer, S., et al. (2024). EgoMimic. arXiv. scholar