Creation and annotation of a local multimodal corpus (video, audio, text) for public safety. Fundamental research on multimodal models and their explainability. Development of video and audio detection algorithms for public safety. Integration and optimization of multimodal processing pipelines. Testing and validation of the prototype in a demonstrative environment.