Though roughly 80% of world knowledge is in video format, generative AI has primarily focused on textual content and pictures due the complexity of video evaluation in processing visible, textual, and audio knowledge concurrently. Not solely is video evaluation complicated as a result of its nature of multimodality, the necessity to acknowledge objects, feelings, and context and to successfully search, have interaction, and talk with video knowledge additional present challenges.
Enter Twelve Labs, a startup constructing multimodal basis fashions for video understanding. The overarching downside that Twelve Labs solves for is video-language alignment. Twelve Labs focuses on creating machine studying methods that produce highly effective video embeddings aligned with human language. This implies their fashions can interpret and describe video content material utilizing textual content. This know-how presents prospects the power to seek for particular moments in an unlimited video archive, both by offering textual content descriptions or interacting with Twelve Labs’ fashions utilizing textual content prompts. This permits the era of varied varieties of content material, resembling summaries, chapterizations, and highlights. Finally, Twelve Labs is revolutionizing the way in which we seek for and comprehend movies, addressing present limitations in AI. Their know-how has versatile purposes, together with advert insertion, content material moderation, media evaluation, and spotlight reel creation, making them a major participant within the area of video knowledge interplay.
Twelve Labs initially caught our consideration when a workforce of 4 younger AI engineers gained the 2021 ICCV VALUE Problem, outperforming AI groups from tech giants resembling Tencent, Baidu, and Kakao. We have been extraordinarily impressed by the fast progress of the mannequin and firm’s development because the problem. In a brief time period, Twelve Labs has turn out to be a frontrunner within the area, featured within the NVIDIA GTC 2023 Keynote, and attracting expertise like Minjoon Search engine optimization, a professor on the Korea Superior Institute of Science & Know-how (KAIST), who now serves as Chief Scientist. The expertise that Minjoon brings as a distinguished NLP analysis scientist, coupled with CTO Aiden Lee, who’s an knowledgeable in CV AI, additional validates Twelve Labs’ capability to create highly effective massive multimodal fashions to video understanding.
Twelve Labs is just not solely offering a cutting-edge video understanding resolution but additionally a developer platform that’s set to launch APIs that may sort out video second retrieval, classification, and video-to-text to deal with downstream duties. Basically, Twelve Labs is bringing a brand new video interface to make video simply as simple as textual content, giving enterprises and builders programmatic entry to the entire semantic info that reside of their video knowledge. This developer-friendly strategy has already attracted 20,000 builders to the platform through the beta section. Additional, they not too long ago introduced that their Pegasus-1 mannequin already outperforms current fashions in video summarization benchmarks, demonstrating a major enchancment in video understanding.