Skip to main content

Component Library

RoIS defines 17 basic HRI components. Every component (except System Information) shares the RoIS_Common interface: start, stop, suspend, resume, and component_status. About 70% of components are identical across paradigms. The perception and speech components run the same ML models whether the input is a robot camera or a webcam. Only actuation, world model, and stream source differ.

ComponentRobot backendAvatar backendShared?
Person DetectionYOLO on cameraYOLO on webcamyes
Person Localizationdepth + trackerworld positiondiff coord system
Person IdentificationInsightFaceInsightFaceyes
Face DetectionMediaPipeMediaPipeyes
Face LocalizationMediaPipe face meshMediaPipe face meshyes
Sound Detectionmic VADmic VADyes
Sound Localizationmic-array DOAmic-array DOA / virtualdiff
Speech RecognitionWhisperWhisperyes
Gesture RecognitionMediaPipe HolisticMediaPipe Holisticyes
Speech SynthesisTTS to speakerTTS to lip-syncdiff output
ReactionLED / gestureanimation / expressionparadigm-specific
NavigationNav2 (physical)NavMesh (virtual)paradigm-specific
FollowNav2 + trackervirtual followparadigm-specific
Movecmd_vel to motorstransform to avatarparadigm-specific
Audio Streamingmic to WebRTCTTS output to WebRTCdiff source
Video Streamingcamera to WebRTCrendered frames to WebRTCdiff source
System Informationbattery, CPU, jointsFPS, memory, avatar statediff state

The component's logic is the same across adapters. Only the binding differs. The spec also supports user-defined components beyond the basic 17, reusing RoIS_Common and the profile mechanism. An HRI Component Profile can include another profile via sub_component, so an extended component can reuse a base component's messages and add new ones.