2022: A Critical Year for Online Communities and Discourse
Last week, I tweeted this:
I donāt know what it is - but this is a topic, as an industry, that we should be making public service announcements about. It sounds crazy, but, maybe even a billboard or airport ad campaign with the simple question: Whatās your AI disinformation strategy?
I feel this topic should be top of mind for any business leader who runs any kind of website that accepts keyboard input from any user on the internet (which is basically most websites) especially into the 2022 new year.
AI language models are here. They are capable of generating comprehendible, coherent, very human-like, reasonable snippets of text. These snippets of text could be automated through some kind of bot and could be used for a long list of nefarious purposes.
Weāve already learned the societal importance of accurate information for things like democratic elections and as time has gone on, weāve become highly dependent on online discourse and the online communities which connect us.
At a simple level - if you have a comments section, signup form, or any kind of application form I would wager all of these form input fields are at risk of receiving large scale, possibly false, information from AI language models right now. This means your database, user list, training data, may be at risk of receiving (potentially false or misleading) AI generated content at a large scale at this exact moment.
If you run an online community, this is a risk to you as well. Iām not sure what the impacts will be on human online communities becoming overwhelmed with comments, dialogue, and posts from AI based language models. Will this lead to a more toxic community? Could it lead to core human members of your online forum leaving because the ācultureā of your online community has changed (without anyone realizing it has become overrun by bots)? Do you have a community policy on disclosing bot-generated content or will your existing pseudonymous policies backed by user voting suffice?
The key thing that has changed is there are more language models available than ever. Between AI21, Cohere, OpenAI, and EleutherAI, these large AI models are becoming ubiquitous and reaching a wider audience than ever before. They can also be āimproved onā already to target specific websites like yours through a technical process called, āfine tuningā. Iām not blaming any particular company, this a larger tech industry problem which will need to be resolved over time.
Itās really hard to distinguish AI generated content from human generated content. Also, a lot of this AI content is unique and doesnāt have exact search matches on Google. So, this means it would likely pass existing plagiarism checks and basic duplication detection techniques.
In a future post, I will provide some solutions and propose how the industry may combat this problem - but for now my only goal is to simply raise this alarm. If you run an online business, I believe you should be monitoring incoming user input and putting this problem on your radar. Keep an eye out for AI generated content and be ready to change your policies, infrastructure, and code to adapt to a growing segment of it.