How AI will actually change the world

We ditch the media hysteria to focus on how machine learning tools work and how we can make best use of them
25 July 2023
Presented byJames Tytko.
Production byJames Tytko.

NEURAL-NETWORK-BRAIN

Technology is bringing computers and the brain closer together

Share

It's been quite the ride in the realm of artifical intelligence over the past year or so. As impressive as advancements in machine learning have been, however, few experts are worried about bots taking our jobs and threatening our safety as a species. The truth is, tools like ChatGPT are not the way AI is going to prove most helpful in the short term. We scratch below the surface to explain how, with more carefully trained programmes, the real potential of chatbots can and is being unlocked...

In this episode

A silhouetted business man running to a meeting...

00:45 - Big tech goes all in on AI

From Microsoft, to Meta, to Google. Everyone wants a piece of the pie.

Big tech goes all in on AI
Josh Cowls, University of Oxford

Big tech companies all want in on the action, whether it’s Microsoft’s GPT4, Google’s LaMDA or Meta’s new Llama 2, which was announced this week.

Speaking with James Tytko about how the industry is embracing AI, Josh Cowls, from the Oxford Internet Institute…

Josh - I think looking at the business model of these companies is really interesting in terms of how they actually try to market and sell AI. So Microsoft has been pretty upfront in some of its early iterations that to use their AI products you'll have to pay. So they've recently announced that it will cost almost double what it currently costs to use their office 365 suite if you want to have the AI powered tools on top as well. So that will take it from about $35 a month to 65 in the US. That's one way and the most obvious way perhaps of actually making money off the systems. Meta has come along with an alternative approach, at least in the first instance, and decided to essentially give this away for free. That reveals some interesting differences in how I think these companies maybe see AI as a whole. So what companies like Meta have done, particularly in social media as well, is really gone for market share. So the idea is to get as many individuals and businesses using these technologies and then figure out how to monetise that after the fact and whichever of those approaches wins out will I think help to determine the future of these technologies.

James - The risk to these companies that are gobbling up market share and working out the profit later is the limitations of these big large language models, namely that, because of the vast amounts of data they're trained on, things can slip into their output which become unhelpful. And especially as we start to view these tools not just as impressive toys that make us think, 'wow, what lifelike output' but we try and apply them to professional settings, there's not going to be so much take up in those environments unless we can trust a higher proportion of the output than we currently can.

Josh - I think we need to get really clearly focused on the domain in which we're deploying these AI systems. If you think about it as a bit of a parallel when social media first came along, we could connect with long lost friends and family and shoot the breeze, essentially. Few people could have foreseen what those networks would be used for just 5 or 10 years later, having a role in things like the Arab Spring, say, and of course later playing a darker role in elections in the US. And so that range of possibilities, that scope, is incredibly broad, even broader perhaps for AI than it is for social media. So one of the interesting challenges for policymakers and for people developing this technology is how much do you try to show what AI can do in a really specific narrow domain, cracking really difficult problems with health or with astrophysics or with whatever else, and how much you say, let's just give or sell this product package to people and see what they want to do with it. And I think both of those approaches obviously come with clear benefits, but also clear risks.

James - Are we certain that tools like ChatGPT are going to influence the way in which the average person works going into the future? I ask because I read a study this week from MIT finding that for workers using ChatGPT for professional writing tasks, productivity gains were most marked in the least skilled among them. And it just got me thinking that there's a limit to how helpful, especially these super large language models are going to be for businesses to think, I want to spend money on this moving forward.

Josh - I think in all of our day-to-day lives, we have a mix of cognitive tasks and other tasks which require a lot of either thinking or working on a particular problem. And then a lot of things which we feel, well maybe if there was a sophisticated chatbot that could take care of that email or that conversation for us. So I don't think that this is going to render everyone except a really highly skilled small set of people unemployed or unemployable, but clearly in a really wide range of capacities, it's going to have an influence. If Microsoft is licensing this technology to put in its office products, those have billions of users, so clearly many working lives are going to be touched in some way. How much that has an impact on productivity, I think is a really difficult and interesting question to consider, particularly when we think about what we are really optimising for. Are we optimising for productivity for productivity's sake? Are we trying to free ourselves up for either a life of leisure or a life of interesting cognitive tasks which machines can't do particularly well yet? So I suppose on the most optimistic end, it might allow us to think about what really makes us distinctively human and ideally usher these technologies towards solving the things which are perhaps less interesting and less distinctively human for us, whilst we work out what it is we really want to do.

James - The reason I bring that up is not to be a party pooper, because I think everyone working in AI seems to agree that this technology really is amazing and the advances have been so impressive. But is there perhaps a sense that we need to rethink how we deploy the technology if we want to make best use of it, for example, by training language models on more specific sets of data so they can become more bespoke tools, as it were?

Josh - I do agree that I think once we have an idea of exactly the problems we're trying to solve or help to solve with large language models that will allow us, right at the top of the AI development streams, figure out how we actually gather the data and train models on data sets all the way down to how they get applied in different contexts. If we start at the top of that stream with a clearer idea of what exactly we want to do, this set of technologies can really help optimise for particular outcomes. The nice thing about that is that it also allows a larger set of people in society to actually have a say, because when we're agreed on the objective, helping to tackle cancer, for example, looking at things like protein folding, you can include ordinary people in that conversation and then design systems that are aimed to reflect the intentions of those people. Now it may of course either not work, it may be counterproductive, it could even be dangerous even with those clearer guardrails, but at least you have a clearer register of the purposes and the interests that are being served by the development of a particular AI model before it gets deployed. And then you can assess it against that benchmark in stark contrast to what we've seen with ChatGPT, which is very general purpose. But once you plug away at it as a domain expert in the law or whatever else, you can start to see its flaws relatively quickly.

A picture of someone talking, drawn on a blackboard

09:13 - FinGPT:确保equitable LLM access

Smaller languages deserve open source LLMs to build applications with...

FinGPT: Ensuring equitable LLM access
Sampo Pyysalo

English is the international language, particularly for science and international communications. So it naturally represents the largest pool of online training data for technology companies to use when training their AI systems.

But, in Finland, their eco-friendly supercomputer ‘LUMI’ is helping a team of researchers to safeguard the Finnish language in the AI industry.

And because it’s trained on a more narrow dataset from the Finnish national library, as well as the web crawls that power other LLM’s. This means it’s also less susceptible to some of the hallucination problems affecting mainstream chatbot. On a recent trip to the country, James Tytko met up with the architect of the new project, the University of Turku’s Sampo Pyysalo…

Sampo - ChatGPT, the latest iteration, actually has remarkable capabilities to operate in languages other than English. You can talk to it also in Finnish. The thing that most motivates our work in building open language models for the Finnish language in particular isn't so much that the systems that the big multinational companies are building wouldn't have any capability for Finnish, but rather that those models are closed, they're not available for research, they're not available as a foundation for building independent applications. And of course they don't really have a focus on smaller languages such as Finnish. So by creating our own open models for smaller languages such as Finnish, we can ensure that the data represents the part of the Finnish language and culture that we find most relevant. And that will hopefully serve as the best basis for models that not only speak the Finnish language, but also to some degree share the perspective of Finland and its population.

James - That's a very interesting point because, to my shame, especially when I come to a country like this, I only speak one language and you come here and people seem to speak four at a minimum.

Sampo - A lot of central European, say Germanic or romance languages, even the smaller languages, are part of bigger language families where they have close neighbours. So there is some expectation that the models will be able to draw on those languages or texts in those languages in order to learn some of the smaller languages. So Finnish is certainly not alone, but it is, to some degree in a unique position within Europe in that it is in a very small language family and it happens to be quite distant from other members of that family. So we think it's quite important that we dedicate resources to having specifically Finnish texts in order to train these types of models.

James - The reason for doing what you're doing, it's got two prongs, really. It's to prevent a kind of cultural apocalypse where we move towards a single language world and destroy so much in the process, but also as a foundation for Finnish artificial intelligence in institutions.

Sampo - Yes, absolutely. So we wish to maintain, to preserve a degree of independence, not only for our language, but also for our academic industry work where we don't become reliant on assistance that are only running on servers in the US and only available via API. So it still very much remains to be seen to what degree these systems will actually form the foundation for a new type of industry or replace work in current ones, but the more they do, the more important I think it is for us to have national and European infrastructure that can compete, at least to a degree, with what the big multinationals are doing.

James - And all this is possible thanks to LUMI and the computational grunt and backing it gives you, is it a case that some other countries which perhaps don't have the Finnish reputation for computing that you do, their languages might be left behind in this?

Sampo - We certainly hope not. We're currently a member of a horizon EU funded project that seeks to develop similar models that would cover at least all official EU languages. So it is our goal to extend what we have now been able to do for Finnish to all European languages and hopefully also beyond.

A doctor coming out of a computer screen

14:48 - Training an LLM to become a medical assistant

Google are launching the second iteration of their Med PaLM project...

Training an LLM to become a medical assistant
Karan Singhal, Google

What can these chatbots really do for us? If the chatbots we’re free to use on the internet are no more than glorified tech demos, how can we actually deploy them in properly useful ways?Karan Singhal is a staff research engineer at Google. As he explains to Chris Smith, he’s working on Med PaLM 2, an AI he hopes will serve as an assistant for medical professionals in diagnostics and therapeutics…

Karan - We've been really looking at the space of medical AI and all the advances over the last few years and noticed a few things. The first was that there were large advances in accuracy on narrow tasks, things like the ability for models to make diagnoses in radiology. But the second thing was really noticing that we saw limited uptake of these technologies. And I think part of it was of lack of flexibility and interactivity. And so if you have a model that makes predictions of whether or not some chest x-ray is normal or abnormal, it might be less usable than a system that you can truly interact with, engage in dialogue with, give feedback to, and get an explanation from instead of just the classification model. And so when we started this work, it was really thinking about that problem and bridging that gap from all the advances in AI to the things that are actually useful in real world clinical practice.

Chris - Arguably, if I ask an AI to tell me how many presidents of America there have been, if it gets it wrong and makes a few up, it's much less of a consequence than if I ask, 'does this chest x-ray contain a lung cancer?'

Karan - I think for me personally, that is almost the entire motivation for working in this setting to start. Foundation models are tricky to apply because it's so safety critical. And so I really came into this thinking about the problem of building more steerable or safe AI systems. There's a lot of nuance to performing well in this setting across all the different axes that we care about: preventing harm and producing equitable outcomes and making sure you're aligned to scientific consensus.

Chris - Is that down to what you train it on? Because these AIs are a product of the information they ingest and see the connections between. Or is it also more nuanced than that in how you actually instruct it to work? Or is it both?

Karan - Definitely it's both. If you take this base model that's been pre-trained on webscale data, the PaLM model for example - not specifically adapted for any of the medical settings or things like that - if you take that model and then apply it towards tasks like long form consumer medical question answering like we evaluated in the Med PaLM paper, it does not perform super well on axes like 'alignment with scientific consensus' because training data on the internet often has the potential for harm. And so if all we do is train on that data and not instruct these models on how to produce safer outputs, then you know we won't be in a good place. But when we take this extra step of providing explicit human feedback in various ways, that's a way we can guide these models. And so for the Med PaLM paper, what we did is we worked with a panel of physicians to craft expert demonstrations of good model behaviour across all these axes that we care about and then used that to instruct the Med PaLM model about how to behave using a technique called instruction prompt tuning that we introduced

Chris - When you did this, how good was it?

卡兰——有两大任务,我们把model through. One was multiple choice question answering on medical exams and medical research questions. What we noticed is that these models were performing state of the art across all the data sets that we studied in this work. The second thing was really thinking about consumer medical question answering. So asking these models to produce a long form open generation response to a consumer medical question. Baseline models didn't really perform well on this task. Physicians rated it 61.9%. So then what we did was we applied that human feedback aspect to the training of the model and then what we saw was that 92.6% of Med PaLM answers were aligned with scientific consensus and this was compared to 92.9% for clinicians. So it was now more in the same ballpark compared to that big difference earlier with the baseline model.

Chris - In other words, if I pick a physician off the shelf and I ask them to answer the question that your platform is answering, it's going to give an answer rated by a third party of about the same as the physician's answer, give or take?

Karan - It depends how you do that measurement. After the Med PaLM work, we've expanded on that measurement where we actually ask people to do pairwise comparisons between the model output and the physician output. What we have observed is, at least with the physician populations that we're using, with the specific ways we're collecting the data - I wanna caveat with all that - Med PaLM 2 responses were preferred across eight of nine of the axes that we were studying in that medical question answering task. Another caveat here is really thinking that this evaluation is not grounded in a real world clinical setting and it's not done with the largest panel of physicians; not fully done with the most representative sample of questions that we might ask. And so there's still a lot of work to be able to take this early promising technology and bring it to the settings in which it can have the most impact.

Chris - People say when you ask them 'how does it work?' They'll say, 'well, it's not an explainable technology.' As in, they don't mean they can't explain it, they just can't explain how it works because they don't know. How do you instruct it to 'think' a certain way like that?

Karan - There are a couple of different notions of explainability that can be useful here. One is asking a model to produce an explanation of its own behaviour before it produces a final answer. One version of this is called chain-of-thought prompting. This is something that we explored in this Med PaLM work as well. And what you do is, if you're asking the model to provide a diagnosis given a clinical vignette, you're asking the model to work step by step towards an answer to the question. That could be viewed as a form of explainability, but at the end of the day these models are still relatively a black box, but there's also work going on around mechanistic interpretability of models to better understand the nuts and bolts of how models work. And that's also work that we're excited about.

克里斯说,你成功了,你得到something which the FDA approve, were this to go into clinical practice, where would you see this sitting in the consulting chain that goes from patient through to some kind of medical outcome?

卡兰,我想我们的第一件事to see are use cases around reducing the burden of clinical documentation on doctors. There's a lot of work recently taking transcripts of medical interactions, producing summaries of notes that can be useful to be sent to patients and useful for payers and things like that. I think right now many doctors report that they spend two hours a day after dinner with their loved ones, writing clinical documentation to avoid liability or other issues. And I think that is a real cost and it's something where we can bring that time back. I think in the medium or longer term, there are higher stakes but also potentially impactful use cases that are worth exploring. And so things like clinical decision support, thinking about in the case of a radiologist for example, whether or not this model can double check/produce a more accurate report. I think there's a lot of use cases there that we're not quite ready for, but I think will be quite impactful over the next five, ten years.

A stylised computer network.

22:15 - Creating new biology using AI

Protein design tools use the language of amino acids to train LLMs

Creating new biology using AI
Ali Madani, Profluent

现在,我们得到令人眼花缭乱的可能性that right. How’s this for a radical idea: training language models on the vocabulary of genetics to understand the structures of the proteins that make our cellular clockwork tick. Let me explain.

Proteins are polymers assembled from chemical building blocks called amino acids. Different amino acids with different chemical characteristics produce proteins with different shapes and functions, whether it’s enzymes to digest your dinner through to muscle fibres to enable you lift weights or run a marathon.

But when it comes to designing proteins from scratch, for instance to make new drugs like an antibody, or proteins that can be used to make packing cases or even pesticides, working out which amino acids to include and in what order to get the structure and function we want has been an impossibly big problem.

But now step forward Ali Madani, the CEO of Profluent, who are bringing AI to bear on the problem. He outlined the vision to James Tytko.

Ali - The space of available proteins that we could sample is exponentially and mind bogglingly large. An average protein has something called an amino acid that's strung together. These are building blocks, like Legos, that form a sequence. An average length protein will have on the order of 300 to 400 of these Lego building blocks. And for each one of these components, there are 20 different design options. And just to put that into perspective, if you were to take the total number of grains of sand on earth and the total number of humans that have ever lived on earth throughout human history and multiply that by the total number of hypothetical atoms that exist within the universe, that still pales in comparison to the combinatorial space that exists within possible proteins altogether. What we essentially have done for the variety of problems that we have in front of us, whether therapeutics or diagnostics or for industrial applications, is we've relied on finding needles in the haystack of nature, basically finding machines that have already evolved in nature to repurpose them, copy and paste them effectively for problems that we have in human health or otherwise. The promise of machine learning here is that we can actually take control as humans; be able to design from the bottom up novel proteins and not have to essentially rely on searching within this massive haystack basically, and be able to really build the solutions for the most pressing problems we face on the planet, whether it's human health problems to issues of sustainability and the environment.

James - What is it that AI can offer to help achieve this? What does the technology you hope to develop have in common with the AI systems people are more familiar with; the chat bots. Is it that the language of proteins, of biology, really resembles our human languages in such a way where the technology we're more familiar with can be useful?

阿里,这是一个奇妙的问题。什么是真正的mazing here specifically is there is almost a unification with respect to a lot of these techniques that we've been developing from a sequence modelling perspective that can be applied to many different domains, whether it's applied to natural languages like English, to programming languages like Python and CPlusPlus, the language of biology, proteins and DNA as well. Some of the fundamental premises that have enabled this are from a modelling perspective. Advances in modelling architectures and attention mechanisms and also the availability of data. And that really stressed the latter portion: having a rich information source that we can use these flexible machine learning models to learn and uncover patterns that exist within the data and really learn underlying principles, whether it's in natural language that correlates to grammatical structure and semantics or within biology or proteins that correlates to biophysical principles such as structural elements or binding sites or other types of principles from a physics perspective. And what's really powerful here is that they can ingest large amounts of data to, in a data-driven way, uncover those principles.

James - Throughout the programme, we've been hearing how the source of many prominent AI models' power is also their greatest weakness. That while the huge amounts of data they're trained on allows them to come out with, for example, human-like utterances, it also means that dangerous biases and misinformation you find all over the internet slip through its net. So when you are creating a protein designing tool, how do you make sure you are filtering the training data and nothing dodgy makes it into your output?

Ali - I think curation and alignment is a central problem that many of us are facing across different data domains and it applies also to protein design as well. We've had millions of years of evolution and so many different pockets of protein space that have evolved over time for varying different functions, some of which may be completely unrelated to a problem that you have in mind, for example. And there could be lots of noise and data as well. In essence, thinking about it from the perspective of what we aggregate, all of the available biological data that the world's researchers have collected on proteins that exist within the world, there's tons of noise within that as well. Being able to curate this effectively, to align the dataset for a given functional prediction task or functional generation task one has in mind and is desired, that's a challenging problem and something that we think very deeply about at Profluent and within the academic community as well.

James - As exciting as this all is, it sounds very, very complicated. What are the major bottlenecks as you see it?

Ali - Going along the same lines of alignment of datasets, specifically for given functional prediction tasks. We may also have not just sequence information, but we also have structural information and then information that's gathered from the wet lab laboratory experiments specifically. And how to incorporate and basically unify these different modes of data is going to be one challenge that comes to mind. Another challenge as well really comes down again to the wet lab. So similar to what we've seen in natural language processing where we're using human feedback in particular, how to have a tight coupling between the modelling effort that we do on the publicly available data sources and the work that we do specifically in the wet lab for a given problem that we're trying to solve for in particular, and utilising that wet lab data in an effective manner. That's going to be another challenge and limitation of the techniques.

James - So what are the first proteins we might expect that will be worked on using this method?

Ali - One is within the area of antibodies. Antibodies are proteins specifically that are very effective binders. So for example, with respect to Covid, we have antibodies that bind and neutralise covid whenever it gets introduced into our body. Being able to effectively generate antibodies and design them for multiple properties that come to mind, not just binding affinity, but also looking into how the immune response will be within our human bodies. Having multiple of these parameters that come to mind, being able to generate sequences that could work well in one or two rounds of design processes. I think that's going to be one big area that's going/that is already being revolutionised.

Comments

Add a comment