Putting clips back into context with Multimodal AI
Recently I came across this clip on X:
https://x.com/krystalball/status/1903150899687477599?s=46
Ignoring the politics being discussed in this clip, this is a common occurrence within our modern media diet. We’re always being bombarded with high powered, polarizing clips that are not representative of a larger conversation or point being made by someone. We are often seeing rage-bait clips taken out of context, which serve a political, corporate, or ideological agenda. This makes it difficult to make sense of the world and form our own balanced opinions. It feels gross some times to think about how often every day people like me are being manipulated.
While humans may not have time to watch an entire 3 hour podcast interview with a politician before review a clip like this, Multimodal AI does. Rather than just relying on something like community notes on X or having to read the comments, I would love for a platform to provide “AI Context” using Multimodal AI models. Nuance matters and every day people deserve it and need it to make informed consequential decisions - like who to vote for.
Imagine being able to see data and analysis generated by AI on any given podcast interview or panel discussion. Imagine these kinds of clips being automatically flagged so that the viewer can have the much needed context for the entire conversation and then bring that into perspective when formulating their own opinions. It could create a balanced approach to what is being discussed and maybe alleviate some of the polarization going on lately. AI could serve as a great moderator and hopefully help provide some objectivity for users. We just need to build it.