Note about generative AI and LLMs

Since the publication of the textbook, large language models (LLMs) and LLM-based apps such as ChatGPT have become extremely popular. These are based on enormous deep neural nets trained on incredibly large corpuses of text data. There are analogous models, based on image data, that can generate images and videos based on instructions from the user. Some are fine-tuned using human-based reinforcement learning. These models are an extension of several machine learning concepts (deep learning, reinforcement learning, text mining) that are explained at a high level in the textbook. However, they fall into a different class of machine learning methods called “generative AI” where new data (such as text and images) are created. These methods differ from supervised and unsupervised learning. All these different approaches have important uses and continue to provide valuable business analytics solutions, but their use is limited to situations where huge amounts of relevant and accessible data are available to be used for training.

Generative AI has been a source of controversy and ethical concern. For one thing, copyrighted data not owned by the model creator is often used in training, an issue which is being tested in the courts, as of 2023. Another issue is the ability of generative AI to create text passages that can pass for human-generated writing (schools worry about this). Some generative AI models, particularly those that can create images, video, and sound, do such a good job that they can create harmful fictitious videos of individuals that can pass for real and cause reputational damage when circulated.