"In ancient times cats were worshipped as gods, they have not forgotten this" -Terry Pratchett
If the above quote seems off and it is then let me explain I am writing a blog post in the wake of the release of ChatGPT and I thought to start this blog with a quote about chat.
Instead I got this and lots of images of adorable kittens. This shows why language is so hard for machines. Do you as a machine respond to that 40% of people who correctly typed in chat and really want a quote about chatting, talking, and chin wagging; or do you assume they just misspelt and want cute cats.
Therefore I present to you why ChatGPT is a great idea but why it might be fundamentally flawed, not everyone misspells cat but in trying to create an AI to cater to all people all the time we just cater for the average person by answering the average prompt with the most average response. Therefore there would seem to be a gap in the market to build your own chatbots.
This is a blog post on where those resources might be out there to start figuring out if I could build one in an evening and where that rabbit hole on the internet leads...
The very simple answer is yes and much more quickly than you'd think...
I started out saying to myself can I make a ChatGPT. In fact you're of spoiled for choice.
My favourite was called GPT-J and OpenChatKit but I need time to test them. See below for where you can find different models, because I have not had time to test anything to any great degree. I just present this as a list and description of different transformer projects and having got lost on the blog-verse, I have included directions to any papers or research as they came up.
After going through the various models I found I have directions for where you can get data and benchmarks if you wanted to see how they were trained.
All people's work is their own and the links below are more shown to direct people to interesting projects they may wish to look into. A variety of different licence types are available both commercial and research only, open source and others; so please check that it is acceptable to use the model in the way you intend especially for commercial users.
List of resources:
So here is a list of various ChatBot projects that are not ChatGPT with links to where you can find them. I think you'll agree the title of the article is reasonable, Chat GPT is not singular. Arguably it is the most popular, but it is early days in the field. Several of these AI boast similar capability.
OpenChatKit
Used to create specialised and general purpose chat bots.
Code Repo: GitHub - togethercomputer/OpenChatKit
You can chat to it here: OpenChatKit - The first open-source ChatGPT
Code to run it looks like this.
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-ul2", load_in_8bit=True, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("google/flan-ul2")
inputs = tokenizer("A step by step recipe to make bolognese pasta:", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
Output: ['In a large skillet, brown the ground beef and onion over medium heat. Add the garlic']
The above code, if you have noticed using a web address to go get the tokenise and model. Though if you read the licence and get the git hub you can directly download the model and you do not need to redownload it each time.
Licence: OpenChatKit is licensed under the Apache License 2.0, which allows you to freely use, modify, and distribute the software. You can also inspect the weights of the model using the Hugging Face Transformers library or the Jupyter notebooks provided in the GitHub repository.
lIT-LLaMa
Independent implementation of LLaMa that builds on NANOGPT from Lighting AI. LLaMa
NanoGPT
Code and repo:GitHub - karpathy/nanoGPT: The simplest, fastest repository for training/finetuning medium-sized GPTs.
but I can also find a second one here... both by the same guy Tesla head Andrej Karpathy.
Comes with the transformer on the github and some boiler plate code to get it to run in PyTorch. And if you thought that other companies would never compete with ChatGPT then this model compares well with ChatGPT 2.0 and is free which comared to the estimated trainning costs of 50k in 2020 for ChatGPT 2.0 shows you how quick the market moves.
Licence MIT standard licence.
Cerbas GPT:
A family of 7 GPT-3 models from 111 million to 13 billion parameters. But there is a catch, all the models have been designed to be run on cerebas chipsets and particularly the GPUs. I think this is a interesting model for business when looking into this I was looking to analyse
Website here: Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models - Cerebras
Github here: GitHub - Cerebras/modelzoo
Dolly (Databricks)
An LLM trained using GPT-J and fined tuned on Stanford Alpaca. Not licensed for commercial use even though it is built on open source objects like GPT-J. The Git HUB clearly states this but has been built by a commercial entity (Databricks) and it can be assumed it is their in house large language model. it has a size of 6 billion parameters.
GPT-J
Dolly is built of the backs of GPT-J a lot of these different models do that. GPT-J uses mesh transformer JAX designed by Ben Wang's. I like GPT-J it feels very open source while also having all the nice to haves like an interface setup for you to chat with a instance online and try it out.
The same people also produce the ple.
Furthermore, it has a logo that's very Jazzy. Look on the website and see what I mean...
GPT-J website: GPT-J-6B: 6B JAX-Based Transformer – Aran Komatsuzaki (wordpress.com)
A interface to use it: EleutherAI - text generation testing UI
Notebook to run it in: Google Colab
Pythia
A family of 16 language models from 70M-12B parameters from EleutherAI.
The creators of Pythia state that currently, there is no collection of models that is accessible to the general public, follows a well-established training process, and maintains uniformity between scales.
Git hub here: GitHub - EleutherAI/pythia
Other brands are available:
So these ones are out there but I failed to get around to reading about them in one evening.
Open Assistant
A chat based assistant that understands tasks, can interact with third party systems, and retrieve information dynamically. The demo uses a fine tunes 30 billion parameter LLaMa.
GeoV
A 9 billion pre-trained LLM using rotary position imbedding with relative distances.
Baize
Open-source chat model trained with LoRA using 100k dialogs generated by letting ChatGPT chat with itself.
Vicuna
An Open-Source Chatbot achieving almost the same performance as Google bard and ChatGPT.
Koala
A chatbot trained by fine-tuning Meta's LLaMa on dialogue data gathered from the web.
GPT4All
Train a assistant style LLM with 800k parameters based on LLaMa
Dalai
The fastest way to run LLaMa and Alpaca locally; includes a user interface.
Alpaca.cpp
Intended to build a fast ChatGPT like app on your computer.
How do they build new models?
Over the period of doing this piece of research, I came across various terms for where data was pulled from for training a new large language model. Like all AI, a transformer intakes an input and expresses an output and is then trained by feeding back in the error between the difference from the actual output to the intended output. A transformer to massively simplify takes in say 600 words and then tries and predicts the next 600 words.
This is reliant on large data amounts being used in their training which means you need data. You should also use data that has been opened up for the use case being licensed for that use so you are not abusing copyright. Here are two options below.
Sandford Alpaca
Sandford Alpaca dataset is built around large amount of data around questions and answers.
Website here: Stanford CRFM
The Pile:
Both a big data set and a benchmark with a leader board showing the score on the data +has a leader board for best score. I wonder if anyone will try and knock the GPTs of its leader board.
Website here: The Pile (eleuther.ai)
Benchmarks
Of course, if you built a Large Language Model you would want to show it off and tell people how good it is. Well there is a whole sleuth of different benchmarks designed to test and measure your new AI against. There was too many of these to get through in one evening...
If you get to the point of having a model done maybe you could look them up!
- Lambada (lambada_openai)
- Wikitext (wikitext)
- PiQA (piqa)
- SciQ (sciq)
- WSC (wsc)
- Winogrande (winogrande)
- ARC-challenge (arc_challenge)
- ARC-easy (arc_easy)
- LogiQA (logiqa)
- BLiMP (blimp_*)
- MMLU (hendrycksTest*)
Uses For Transformers that are not ChatGPT
Everyone talks about as if it is obvious the use for AI and transformers is to talk to us. Below are AIs built for non talking tasks.
LogAI
Log AI is a tool for telemetry and checking device logs. It ingests and analyses data from logs on computers and spits out a variety of graph and visualisation based on the results. The idea is, the AI has multiple layers and does all the cleaning and analysing before passing to its visualisation layer in order to make sense of what is happening.
Follows the open telemetry model and seems to sell itself as a open source tool.
Github repo here: https://github.com/salesforce/logai?ref=blog.salesforceairesearch.com
I could not find a direct website but a better description is available here: LogAI: A Library for Log Analytics and Intelligence (salesforceairesearch.com)
Bloom & Mto
A family of models capable of following human instructions in dozens of languages zero-shot. Intended as a translation tool to change from one language to another. The github repo includes a lot of information on how to implement as well, hugging face is also good and it feels really well documented.
Github: GitHub - bigscience-workshop/xmtf: Crosslingual Generalization through Multitask Finetuning
The hugging face website is very specific about hardware and all the tools, just really useful. See website here: bigscience/bloomz-mt · Hugging Face
Blogs
I came across a variety of blogs and other interesting people.
Yi Tay:
Produced Flan-UL2 for google as discussed above. I like this blog POST On Emergent Abilities, Scaling Architectures and Large Language Models — Yi Tay and I would sum it up as the assumption that with scale emergent capabilities must appear. His blog is a bit limited...
Blog here: Yi Tay.
Andrej Karpathy:
Seems like an interesting guy head of AI at tesla (I assume that means he knows his stuff). His blog is interesting and has a lot of different posts. Built the Tesla computer vision project till 2022.
Blog here: Andrej Karpathy blog
YouTube here: Andrej Karpathy - YouTube
Cerebas (the same model mentioned above)
They have a blog, it looks good and has a fair amount of content too. It feels this was written for people who don't have a PHD but maybe they also want to sell you something.
Blog here: Cerebras Blog Landing Page - Cerebras
Sandford Education Blog
A good looking blog with many posts I am interested in and many I found interesting.
Blog here: Stanford CRFM
Bloom blog post on development on Bloom MTX
Blog Here: The Technology Behind BLOOM Training (huggingface.co)
AI communities
I of course, having read lots of blogs, I came across different communities interested in AI. Here is a small selection. Hint hugging face is the big one for transformers.
You know there has to be a Reddit community for AI and it does not disappoint. Almost a constant stream of new AI goodies and inks to fresh research papers to read. Enjoy!
Website: Machine Learning News (reddit.com)
Hugging Face
Let me be honest, the first place you will look for transformers is hugging face. Hugging face is a website that lists a lot of different modes and gives them a description. Honestly, there is more information there than here and if you're just interested in Transformers and code snippets to run them the website is below, thanks for the click.
The problem I find with hugging face is the model does seem to be built around providing you a API to access LLMs and I am more interested in finding ways to get chat bots onto servers which I own and control to do tests for use.
Website: 🤗 Transformers (huggingface.co)
Further Reading
Along with blogs I kept coming across various white papers and research journals that I thought I better squirrel them away for research, below are some examples.
If you are wondering why I keep putting the word LLaMa and unsure why that means AI and not a domesticated Mammal, see here the white paper on that.
[2302.13971] LLaMA: Open and Efficient Foundation Language Models (arxiv.org)
From where we get the idea that language models could be a route to AGI.
[2205.05131] UL2: Unifying Language Learning Paradigms (arxiv.org)
Chain of thought prompting elicits reasoning in large language models.
[2201.11903] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (arxiv.org)
Training language models to follow instructions. A clear argument when to make
[2203.02155] Training language models to follow instructions with human feedback (arxiv.org)
Extra computation steps to imrove LLMs
[2210.11399] Transcending Scaling Laws with 0.1% Extra Compute (arxiv.org)
Model Scalability with pathways
[2204.02311] PaLM: Scaling Language Modeling with Pathways (arxiv.org)
Paper for using the Pile dataset
[2101.00027] The Pile: An 800GB Dataset of Diverse Text for Language Modeling (arxiv.org)
Specific Citations:
They asked nicely! I realised these are intended to be added to code distribution but included out of respect for authors.
The Pile
@article{pile, title={The {P}ile: An 800GB Dataset of Diverse Text for Language Modeling}, author={Gao, Leo and Biderman, Stella and Black, Sid and Golding, Laurence and Hoppe, Travis and Foster, Charles and Phang, Jason and He, Horace and Thite, Anish and Nabeshima, Noa and Presser, Shawn and Leahy, Connor}, journal={arXiv preprint arXiv:2101.00027}, year={2020} }
Bloom
@misc{muennighoff2022crosslingual, title={Crosslingual Generalization through Multitask Finetuning}, author={Niklas Muennighoff and Thomas Wang and Lintang Sutawika and Adam Roberts and Stella Biderman and Teven Le Scao and M Saiful Bari and Sheng Shen and Zheng-Xin Yong and Hailey Schoelkopf and Xiangru Tang and Dragomir Radev and Alham Fikri Aji and Khalid Almubarak and Samuel Albanie and Zaid Alyafeai and Albert Webson and Edward Raff and Colin Raffel}, year={2022}, eprint={2211.01786}, archivePrefix={arXiv}, primaryClass={cs.CL} }
Add comment
Comments
Very detailed and interesting. Well done 👏