The current risks of implementing generative AI

OpenAI has shown that its generative offspring, ChatGPT, can pass exams with flying colours – not only seeing it through high school but also graduating college/university. But how will it fare in a paid job? Most people that have played around with generative AI are amazed by its potential. But it is not devoid of faults. For the technology to truly make its mark commercially, there are some risks that need to be considered:

  • Bias

    AI models reflect the data used to train them. That means they are prone to exhibit the same biases that exist in society. Bias may, in fact, be a polite way to put it. More bluntly, AI can be capable of racism, misogyny, or any of the bigotry that humans can exhibit. To prevent our flaws carrying over to our robot spawn, we will need to carefully screen AI training data and models (not only for overt bias like hate speech, but also subtler systemic biases).

  • Hallucinations

    Generative AI—Large language models (LLMs) specifically—can occasionally generate seemingly convincing responses that are not only false, but unfounded in any of their underlying training data. This ability to occasionally spout a load of fabricated BS could be seen as an entertaining quirk (maybe even making LLMs more human-like). But when it comes to developing production systems that can be trusted, it is something that needs to be addressed.

  • Privacy / plagiarism

    Large language models are trained, as their name suggests, on very large volumes of language data (textual content). The easiest way to get access to that kind of data is by scraping the web[1] (60% of GPT-3’s training data came from Common Crawl – a non-profit organisation that scrapes publicly available web pages, stores it on AWS and makes it available for R&D purposes). While this data is publicly available, it is not necessarily free of copyright. The potential for LLMs to plagiarise or infringe copyright is therefore a real one[2].

    One way to address that challenge may be to ensure accurate use of citations and/or greater transparency in the way models are constructed. In that regard, there may be some cause for concern. OpenAI has been less open on the training data and methodologies used by GPT-4 relative to GPT-3[3]. Uncertainty also hangs over the training data used by Google’s LLMs[4]. Commercial implementations of LLMs will need to take these risks into account.

  • Legal liability

    AI can serve as a very powerful resource. But it is by no means infallible. The issue of legal liability (what happens when things go wrong) will need to be worked out as commercial implementations of AI become more common. As we place greater responsibility in the hands of autonomous computer programs, the size of the potential liability grows.

    What happens if an AI loan approval engine is found to be racist? Or an AI algo trading engine finds a way to access and act upon insider information? Within the financial services industry these are potential risks that need to be well thought through, both from a technical and legal perspective.

    In other fields, risks can constitute a matter of life or death. What happens if a self-driving car swerves into a crowd of innocent bystanders? Or an AI doctor provides advice that is life threateningly bad? Or, in the Terminator scenario, what happens when an AI military supercomputer decides to wage war and destroy humanity (well, at least the liability will be mitigated by the limited number of surviving plaintiffs!).  Whatever the use case, or scenario, it is vital that legal liability is well understood and assigned across all parties involved – from the creator of the foundation model/s, the party that fine-tuned and implemented those models, through to the model user, and any third parties impacted by its use.

[1] https://arxiv.org/pdf/2005.14165.pdf

[2] https://www.npr.org/2023/08/16/1194202562/new-york-times-considers-legal-action-against-openai-as-copyright-tensions-swirl#

[3] https://gizmodo.com/chatbot-gpt4-open-ai-ai-bing-microsoft-1850229989

[4] https://skiff.com/blog/was-bard-trained-on-gmail-data