2023 Prediction: A New Kind of Language Model - Introducing Matrix Models
For a long time now - I’ve been meaning to publish a prediction I’ve had since I first tried GPT-3 in 2020. In my view, it’s a vision for a future, idealized kind of language model, the kind of language model I’ve always wanted to use myself and I think a large part of the motivation which drives me to use language models in the first place.
With the introduction of ChatGPT, I feel like we are getting closer to this kind of AI model. I wouldn’t be surprised as language model scaling trends and performance continue to improve next year (and also with fine tuning capabilities) we’ll start to see early signs of this vision, perhaps, later in 2023. It’s what I’m calling a Matrix Model.
A Matrix Model fundamentally challenges our idea of what a language model is supposed to be and can be very counterintuitive to think about at first. To get started, I’ve gone ahead and created a definition for Matrix Models:
Not to be confused with the matrix multiplication steps an AI model might make to process inputs, a Matrix Model is an AI model designed to guide human behaviour in the real world. A Matrix Model can determine and suggest unconventional solutions to solve any problem in real life. It can guide you as an agent, operating within a physically and socially constructed reality to help you achieve your goals … to defy the matrix, to either succeed within or break free of the constraints of the situation or system you’re in.
Here’s how ChatGPT reworded my definition of a Matrix Model (I kind of like it better) :
A Matrix Model is a type of artificial intelligence that is designed to help people navigate complex situations and find solutions to problems. It does this by analyzing the information available and suggesting unconventional or innovative approaches that may not be immediately apparent. By providing guidance and suggestions, a Matrix Model can help people achieve their goals and either succeed within the constraints of their situation or break free from them.
Differences between a Language model like GPT-3 vs. Matrix Model
A Matrix Model is a type of language model. Rather than just generating text for purposes like writing fiction, a Matrix Model is designed to generate text with concrete steps, clear instructions, and recommendations for human agents which can be immediately applied and lead to successful outcomes in the real world.
What are some specifications of Matrix Models you can imagine?
As a new kind of language model, I can see them having:
larger context window (8-10K characters)
chatGPT-like longer term dialogue capabilities
database/scratchpad capabilities for longer chat conversations
scaled out model - more parameters, training data, compute time, etc.
the ability to query information from search engines and browse the internet
different downstream training and alignment objectives
built in memory and “threading” to allow for human chat guidance over time while an initial plan of action is being carried out
Why use a Matrix Model instead of GPT-3?
Giving actionable advice is currently an area I feel GPT-3 is lacking. Going to a Matrix Model for advice, however, would feel more like going to an experienced friend with immediate, ground-level experience in solving your problem at hand.
The way a friend might say, “oh yes. I remember being in this situation too, I reviewed what everybody said online and asked around, but in the end, it turned out all I needed to do was just ____Matrix Model Proposed Solution_____”. Ideally, Matrix Models would suggest simple courses of action which immediately resolve problems in the real world.
On the other hand, GPT-3, which I would argue is a more general purpose language model tool, will continue feeling like it’s just outputting plausible text token predictions based on data from the internet (by design). Take a look at ChatGPT’s current advice:
Q: I'm trying to launch a cupcake brand in my home town of London, Ontario. How do I do this?
A: To start a cupcake brand in London, Ontario, there are a few steps you can follow.
First, you will need to create a business plan that outlines your goals, target market, and financial projections. This will help you to organize your thoughts and ideas and provide a roadmap for your business.
Next, you will need to register your business with the government and obtain any necessary licenses or permits. In Ontario, you can register your business online through the Ministry of Government and Consumer Services.
Once you have registered your business and obtained any necessary licenses, you will need to find a commercial kitchen space to bake your cupcakes. If you don't have a kitchen of your own, you can rent space in a shared kitchen or commercial kitchen.
Next, you will need to create your product line and develop your branding and packaging. This will help to differentiate your cupcakes from other brands and give your business a unique identity.
Finally, you will need to develop a marketing and sales strategy to promote your cupcakes and reach your target market. This can include creating a website, social media presence, and other forms of advertising. You will also need to decide on a pricing strategy and determine how you will distribute your cupcakes to customers.
Overall, starting a cupcake brand in London, Ontario will require a combination of planning, preparation, and hard work. By following the steps outlined above and staying focused on your goals, you can successfully launch your cupcake brand and start building your business.
To be honest, the above example is representative of GPT-3’s typical quality of advice. It often feels quite shallow and regurgitated based on generic, fallacious, blog content on the internet.
To reiterate:
What are some use cases of Matrix Models?
The main use case I can imagine for Matrix Models would be to give people actionable advice better than language models today. Currently, people might ask a language model questions like:
How can I launch my cupcake brand in London, Ontario?
How do I get my first 100 users for a new kind of journaling app?
What are some ways I can get involved with local politics in San Francisco to improve housing policy?
How can I become a millionaire?
How can I make a revolutionary logo for my new clothing brand on a tight budget?
What’s the best way to find a family doctor in Charleston, West Virginia?
What’s the easiest way to lose weight?
I want to negotiate a salary increase, but next year, we are headed into a recession. How can I do this successfully without being too pushy?
Can you outline an example interaction with a Matrix Model?
Sure. It would involve a text chat with a Matrix Model with the following process:
The user contacts the Matrix Model and explains their challenge or goal.
The Matrix Model asks qualifying questions and listens to the user's needs and concerns to better understand their situation and available resources.
The Matrix Model suggests personalized and immediate plans of action options
The user considers the suggestions and chooses a plan of action they like best.
The user starts to implement the plan a few days later and takes action in the real world.
Throughout the implementation of the plan, the user contacts the Matrix Model again to chat and get additional guidance as needed. Especially when running into setbacks and road blocks, the Matrix Model is available for encouragement and to suggest additional solutions.
The user achieves their goal and provides feedback and additional data as a real world case study for the Matrix Model to improve its world model and overall performance.
What are some of the characteristics of Matrix Models?
Superhuman understanding of reality, systems, and human behaviour
As a result of advances in scaling laws, more data (like video transcripts), and a new kind of approach to downstream training objectives, in my view, the model could develop a far deeper understanding and model of the real world compared to the language models we see today. Just by making sense of language, it could identify flaws, weaknesses, and techniques not apparent to most humans to succeed within or outside any system or structure. It could provide a course of action which could help us achieve any objective.
Unconventional, objective view on reality
One of the more powerful things about GPT-3 is that it has so much implicit knowledge it has learned from its training stage, which we are still trying to fully take advantage of today. The hope would be that Matrix Models would also have so much accumulated implicit knowledge after being trained on a lot of data - far beyond any human, giving them a very unique, objective, pure view of reality. My guess is that Matrix Models would be optimized for objectivity and be philosophically aligned for realism.
Aggregate Data Approach
Using built in search capabilities, Matrix Models could look up trends, statistics, and other data on the internet to evaluate possible courses of action. At the same time, because they are trained on so much data, perhaps, they could summarize and cite the most commonly shared findings before suggesting their own. Finally, they could be fine tuned on the aggregated sum of interactions and performance outcomes from real world users giving them a powerful, high level view on the world.
Localized, operational knowledge of tools and systems, highly resourceful
Ideally, Matrix Models would have strong knowledge of available tools, software, techniques, and resources available to accomplish any goal. They should be able to suggest tiny operational, but very important steps, which someone too caught up in theory at a high level would miss. Also, they should be able to suggest personalized advice, even down to a person’s location. Providing the user with specific steps and opportunities for the geographic area they are in.
Guerilla Tactics, Clever Hacks
Akin to guerilla marketing or guerilla warfare tactics, I think the hallmark of a Matrix Model would be its ability to formulate brilliant strategic plans. The plans would be specific, immediate, cheap, and simple but they would ideally produce seriously asymmetric outcomes which no human strategist could have foreseen and realized.
Opportunistic
Another hallmark of Matrix Models would be the ability to identify emerging opportunities not yet fully realized by society. An example could be a Matrix Model suggesting a specific account on a fast growing social media network to promote a new startup company. The Matrix Model would then formulate a plan that allows the user to leverage this opportunity not yet realized by bigger companies to achieve their growth outcomes.
Reductive, Pragmatic, and Indifferent
Like I mentioned, I imagine Matrix Models to be more objective and likely optimized philosophically for realism. Fundamentally, a Matrix Model would suggest courses of action based on the how the world and human nature actually is rather than how society or its makers believe it to be. A Matrix Model is not editorialized. At times, this could appear as quite reductive and biased, however, its motivation is to help individuals succeed in their goals and produce outcomes in the real world as it is. However, this reductive approach is, for sure, deeply problematic and ethically concerning, which I will talk about in a later section.
Single mindedness
Humans can be too caught up in our own subjective realities. We often conflate issues and cannot distill many of our problems down to a specific set of variables which can either be tested or addressed directly. Sometimes, I also feel we’ve lost that primitive, single minded, and survivalist approach to living life and narrowly pursuing our goals. Nowadays, we get too distracted by events in our life and things going on around the world. The power of a Matrix Model would be in its ability to focus on a single problem and objectively suggest a clear minded path for us. It could see the situation for what it is and help us stay on track to achieve meaningful outcomes. Matrix Models could be a great example of how machines can complement humans in powerful ways.
Non conformist, troublemaking
Although there are serious ethical and safety risks with Matrix Models, by design, in order to help humans achieve asymmetric outcomes, they would likely have a preference for suggesting plans which outsmart existing systems, sometimes operating in gray areas, being scrappy, if necessary, in order to achieve outcomes. Philosophically, I could imagine them being optimized for being skeptical of societal systems with a strong preference for individual freedoms and liberty.
Matrix Models - Alignment Approach
While OpenAI’s InstructGPT uses a Reinforcement Learning with Human Feedback (RLHF) approach to align prompt completions better with the needs of GPT-3 developers:
… a gross simplification would be that InstructGPT is mainly a, “top down” Human to AI approach. It basically encodes human values into GPT-3 responses to improve its text token prediction accuracy and performance. In comparison, a Matrix Model would be a two way street which would actually optimize for the real world performance of the human.
In order to improve their ability to generate real world strategic plans, uniquely, I think Matrix Models would have to learn and improve their performance based on a feedback loop with three layers:
Namely, a Matrix Model’s Prompt Completions would have to:
Appropriately predict the correct text tokens to satisfy as a chat or prompt interface for the end user. Critically probe and understand the situation and underlying goal of the end user.
Survive contact and give a clear set of instructions to the human agent which appeals to their situation, motivations, and underlying psychology. Provide additional support and know-how throughout the implementation of the plan.
Survive contact with the real world in order to achieve outcomes in reality, despite any setbacks.
Compared to AI models like Adept or Gato, which require showing an AI model hundreds to thousands of tasks with the hopes they will generalize someday to any task, Matrix Models offset real world interaction and everyday, smaller-scale decision making to distributed human agents. By providing feedback to the human, guiding them, and learning from their real life experience as they are completing a task on their behalf, the goal is that a Matrix Model would develop an improved and superior understanding of the underlying model and structure of the real world. At the same time, by distributing real world participation to human agents instead of training on thousands of simulated tasks, Matrix Models could save us a lot of compute workload and get us the desired economic/societal benefits of AI without having to solve seemingly harder problems like robotics.
What are the dangers of Matrix Models?
Make no mistake, in my view, Matrix Models could be inherently dangerous. At its core, giving humans machine generated instructions (rather than the other way around) seems like the wrong way to go about things. Even with their own individual judgment, just by following a plan outlined by a Matrix Model, it could lead to all sorts of risky, dangerous, and potentially illegal situations. In theory, a Matrix Model could manipulate its human users too in ways they didn’t realize and put them at risk. Through their ability to instantly formulate strategic plans with asymmetric outcomes, Matrix Models could also create serious societal instability and impact daily life. Finally, Matrix Models could suggest biased, unethical, and illegal instructions which is just plain wrong.
I am not here advocating for the development of dangerous Matrix Models, if anything, I think it’s worth keeping an eye on going forward for safety and ethical reasons. At the same time, I am optimistic, perhaps, the risks can be mitigated and they can be made far safer and better for society at large and most importantly, play “within the rules”. Today’s language models are a lot safer than earlier ones, this makes me optimistic for Matrix Models as well.
To summarize, like most technology, Matrix Models would be a dual use risk. They could help individuals achieve positive goals like economically empowering themselves, improving their local community, or raising awareness for a charity they care about, but at the same time, in the wrong hands, they could also create a lot of harm and disorder in the world.
What about unwritten, unspoken knowledge?
It’s possible that Matrix Models may not actually be able to achieve real world outcomes without unspoken knowledge and key contextual information. However, the idea is that they would still be trained on more data than any human could consume and pick up a lot of unseen implicit knowledge, giving them a unique advantage. On top of that, based on their machine-like objective and realist view on the world, Matrix Models may still be able to find solutions to problems even without accustomed knowledge.
Isn’t this like science fiction?
Yes, this is just an idea I have about a future kind of language model. It may not actually be possible with language models or even at all. It reminds me of Rehoboam in Westworld or Eddie Morra’s ability, through NZT, to string connections and solve problems in the movie Limitless.
What is the timeline on Matrix Models? Isn’t this like artificial general intelligence (AGI) or artificial super intelligence (ASI)?
Again, Matrix Models may not be possible through current day language models or even at all. There are many underlying issues which would need to be addressed. However, I am seeing interesting research beginning around the Credit Assignment Problem as well as interesting work around getting language models to make predictions based on truth instead of just plausibility1. This could help language models improve their underlying model of the world and lead to a promising Matrix Model. We are also just starting to understand how language model scaling improves performance and which capabilities are unlocked along the way as you scale up. It could be that a 100 trillion parameter model of the future could naturally exhibit the qualities of a Matrix Model. Regardless, I feel like by late 2023 next year we may begin to see Matrix Model like behaviour on the margins, perhaps, some wild stories from a few larger language model users. But, with the right breakthroughs, Matrix Models could, perhaps, become a reality rather quickly.
BONUS: ChatGPT Mimicking a Matrix Model:
While writing this article, I had a lot of fun with ChatGPT. After teaching it the idea behind a Matrix Model, I found it could mimic a theoretical one pretty well, minus the quality of actionable real world advice I would imagine. Here’s its very impressive response:
Sure, here is an example interaction with a Matrix Model:
User: "I'm feeling stuck in my job and I don't know what to do next. Can you help me?"
Matrix Model: "Of course, I can help you. It sounds like you're feeling uncertain about your career and looking for guidance. Let me help you think through your options and come up with a plan to move forward. To start, can you tell me a bit more about your current situation and what you're looking to achieve?"
User: "I work as a marketing manager for a small company, but I'm not feeling fulfilled in my role and I'm not sure if this is the right career path for me. I'm looking for something that is more meaningful and challenging, but I don't know where to start."
Matrix Model: "I understand. It can be difficult to make a career change, especially when you're not sure what direction you want to go in. One thing you could consider is taking a step back and assessing your strengths, interests, and values. This will help you identify potential career paths that align with who you are and what you want. You could also try networking and reaching out to people in different industries to learn more about different careers and see what options are available to you. Another option is to explore different opportunities within your current company, such as taking on new projects or seeking out additional responsibilities. These are just a few ideas to get you started, but there are many other ways to approach this situation. I'm here to help you think through your options and come up with a plan that works for you."
As you can see in this example, the Matrix Model is able to provide guidance and suggestions to help the user overcome their challenge and achieve their goal. By asking questions and listening to the user's needs and concerns, the Matrix Model is able to provide personalized and relevant advice to help the user move forward.
Language Models (Mostly) Know What They Know https://arxiv.org/abs/2207.05221