AI That Are Not Chat GPT

Published on 10 April 2023 at 21:41

"In ancient times cats were worshipped as gods, they have not forgotten this" -Terry Pratchett

 

If the above quote seems off and it is then let me explain I am writing a blog post in the wake of the release of ChatGPT and I thought to start this blog with a quote about chat.

Instead I got this and lots of images of adorable kittens. This shows why language is so hard for machines. Do you as a machine respond to that 40% of people who correctly typed in chat and really want a quote about chatting, talking, and chin wagging;  or do you assume they just misspelt and want cute cats. 

Therefore I present to you why ChatGPT is a great idea but why it might be fundamentally flawed, not everyone misspells cat but in trying to create an AI to cater to all people all the time we just cater for the average person by answering the average prompt with the most average response. Therefore there would seem to be a gap in the market to build your own chatbots.

 

This is a blog post on where those resources might be out there to start figuring out if I could build one in an evening and where that rabbit hole on the internet leads...

 

The very simple answer is yes and much more quickly than you'd think...

 

I started out saying to myself can I make a ChatGPT. In fact you're of spoiled for choice.

My favourite was called GPT-J and OpenChatKit but I need time to test them. See below for where you can find different models, because I have not had time to test anything to any great degree. I just present this as a list and description of different transformer projects and having got lost on the blog-verse, I have included directions to any papers or research as they came up. 

After going through the various models I found I have directions for where you can get data and benchmarks if you wanted to see how they were trained. 

All people's work is their own and the links below are more shown to direct people to interesting projects they may wish to look into. A variety of different licence types are available both commercial and research only, open source and others; so please check that it is acceptable to use the model in the way you intend especially for commercial users. 

 

List of resources:

 

So here is a list of various ChatBot projects that are not ChatGPT with links to where you can find them. I think you'll agree the title of the article is reasonable, Chat GPT is not singular. Arguably it is the most popular, but it is early days in the field. Several of these AI boast similar capability.  

 

 

OpenChatKit

Used to create specialised and general purpose chat bots. 

Code Repo: GitHub - togethercomputer/OpenChatKit 

You can chat to it here: OpenChatKit - The first open-source ChatGPT

Code to run it looks like this.

 

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

 

model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-ul2", load_in_8bit=True, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("google/flan-ul2")

 

inputs = tokenizer("A step by step recipe to make bolognese pasta:", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

 

Output: ['In a large skillet, brown the ground beef and onion over medium heat. Add the garlic']

 

The above code, if you have noticed using a web address to go get the tokenise and model.  Though if you read the licence and get the git hub you can directly download the model and you do not need to redownload it each time. 

Licence: OpenChatKit is licensed under the Apache License 2.0, which allows you to freely use, modify, and distribute the software. You can also inspect the weights of the model using the Hugging Face Transformers library or the Jupyter notebooks provided in the GitHub repository.

 

lIT-LLaMa

Independent implementation of LLaMa that builds on NANOGPT from Lighting AI. LLaMa 

Code and Git Repo: GitHub - Lightning-AI/lit-llama: Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed. 

 

NanoGPT

Code and repo:GitHub - karpathy/nanoGPT: The simplest, fastest repository for training/finetuning medium-sized GPTs.

but I can also find a second one here... both by the same guy Tesla head Andrej Karpathy. 

GitHub - gmh5225/GPT-nanoGPT: The simplest, fastest repository for training/finetuning medium-sized GPTs. 

Comes with the transformer on the github and some boiler plate code to get it to run in PyTorch. And if you thought that other companies would never compete with ChatGPT then this model compares well with ChatGPT 2.0 and is free which comared to the estimated trainning costs of 50k in 2020 for ChatGPT 2.0 shows you how quick the market moves. 

Licence MIT standard licence. 

 

Cerbas GPT:

 

A family of 7 GPT-3 models from 111 million to 13 billion parameters. But there is a catch, all the models have been designed to be run on cerebas chipsets and particularly the GPUs. I think this is a interesting model for business when looking into this I was looking to analyse 

Website here: Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models - Cerebras

Github here: GitHub - Cerebras/modelzoo

 

Dolly (Databricks)

An LLM trained using GPT-J and fined tuned on Stanford Alpaca. Not licensed for commercial use even though it is built on open source objects like GPT-J. The Git HUB clearly states this but has been built by a commercial entity (Databricks) and it can be assumed it is their in house large language model. it has a size of 6 billion parameters. 

GitHub Repo: https://www.bing.com/ck/a?!&&p=53b943085ea78548JmltdHM9MTY4MTA4NDgwMCZpZ3VpZD0wOTNhODU3Yi1mMGM1LTZhYTItMjhjYS05N2Q1ZjEwZTZiMDAmaW5zaWQ9NTE5Nw&ptn=3&hsh=3&fclid=093a857b-f0c5-6aa2-28ca-97d5f10e6b00&psq=Dolly+(Databricks)&u=a1aHR0cHM6Ly9naXRodWIuY29tL2RhdGFicmlja3NsYWJzL2RvbGx5&ntb=1

 

GPT-J

Dolly is built of the backs of GPT-J a lot of these different models do that. GPT-J uses mesh transformer JAX designed by Ben Wang's. I like GPT-J it feels very open source while also having all the nice to haves like an interface setup for you to chat with a instance online and try it out. 

The same people also produce the ple. 

Furthermore, it has a logo that's very Jazzy. Look on the website and see what I mean...

GPT-J website: GPT-J-6B: 6B JAX-Based Transformer – Aran Komatsuzaki (wordpress.com) 

A interface to use it: EleutherAI - text generation testing UI

Notebook to run it in: Google Colab

 

Pythia

A family of 16 language models from 70M-12B parameters from EleutherAI.

The creators of Pythia state that currently, there is no collection of models that is accessible to the general public, follows a well-established training process, and maintains uniformity between scales.

Git hub here: GitHub - EleutherAI/pythia

 

Other brands are available: 

 

So these ones are out there but I failed to get around to reading about them in one evening. 

 

Open Assistant

A chat based assistant that understands tasks, can interact with third party systems, and retrieve information dynamically. The demo uses a fine tunes 30 billion parameter LLaMa.

GeoV

A 9 billion pre-trained LLM using rotary position imbedding with relative distances.

Baize

Open-source chat model trained with LoRA using 100k dialogs generated by letting ChatGPT chat with itself.

Vicuna

An Open-Source Chatbot achieving almost the same performance as Google bard and ChatGPT.

Koala

A chatbot trained by fine-tuning Meta's LLaMa on dialogue data gathered from the web.

GPT4All

Train a assistant style LLM with 800k parameters based on LLaMa

Dalai

The fastest way to run LLaMa and Alpaca locally; includes a user interface.

Alpaca.cpp

Intended to build a fast ChatGPT like app on your computer. 

 

How do they build new models?

 

Over the period of doing this piece of research, I came across various terms for where data was pulled from for training a new large language model. Like all AI, a transformer intakes an input and expresses an output and is then trained by feeding back in the error between the difference from the actual output to the intended output. A transformer to massively simplify takes in say 600 words and then tries and predicts the next 600 words.

This is reliant on large data amounts being used in their training which means you need data. You should also use data that has been opened up for the use case being licensed for that use so you are not abusing copyright. Here are two options below. 

 

Sandford Alpaca

Sandford Alpaca dataset is built around large amount of data around questions and answers. 

Website here: Stanford CRFM

 

The Pile: 

Both a big data set and a benchmark with a leader board showing the score on the data +has a leader board for best score. I wonder if anyone will try and knock the GPTs of its leader board.

Website here: The Pile (eleuther.ai)

 

Benchmarks

 

Of course, if you built a Large Language Model you would want to show it off and tell people how good it is. Well there is a whole sleuth of different benchmarks designed to test and measure your new AI against. There was too many of these to get through in one evening...

If you get to the point of having a model done maybe you could look them up!

 

  • Lambada (lambada_openai)
  • Wikitext (wikitext)
  • PiQA (piqa)
  • SciQ (sciq)
  • WSC (wsc)
  • Winogrande (winogrande)
  • ARC-challenge (arc_challenge)
  • ARC-easy (arc_easy)
  • LogiQA (logiqa)
  • BLiMP (blimp_*)
  • MMLU (hendrycksTest*)

 

Uses For Transformers that are not ChatGPT

 

Everyone talks about as if it is obvious the use for AI and transformers is to talk to us. Below are AIs built for non talking tasks.

 

LogAI

Log AI is a tool for telemetry and checking device logs. It ingests and analyses data from logs on computers and spits out a variety of graph and visualisation based on the results. The idea is, the AI has multiple layers and does all the cleaning and analysing before passing to its visualisation layer in order to make sense of what is happening. 

Follows the open telemetry model and seems to sell itself as a open source tool.

Github repo here: https://github.com/salesforce/logai?ref=blog.salesforceairesearch.com 

I could not find a direct website but a better description is available here: LogAI: A Library for Log Analytics and Intelligence (salesforceairesearch.com)

 

Bloom & Mto

A family of models capable of following human instructions in dozens of languages zero-shot. Intended as a translation tool to change from one language to another. The github repo includes a lot of information on how to implement as well, hugging face is also good and it feels really well documented. 

Github: GitHub - bigscience-workshop/xmtf: Crosslingual Generalization through Multitask Finetuning

The hugging face website is very specific about hardware and all the tools, just really useful. See website here: bigscience/bloomz-mt · Hugging Face

 

Blogs

 

I came across a variety of blogs and other interesting people. 

 

Yi Tay:

 

Produced Flan-UL2 for google as discussed above. I like this blog POST On Emergent Abilities, Scaling Architectures and Large Language Models — Yi Tay  and I would sum it up as the assumption that with scale emergent capabilities must appear. His blog is a bit limited...

Blog here: Yi Tay.

 

Andrej Karpathy:

 

Seems like an interesting guy head of AI at tesla (I assume that means he knows his stuff). His blog is interesting and has a lot of different posts. Built the Tesla computer vision project till 2022. 

Blog here: Andrej Karpathy blog

YouTube here: Andrej Karpathy - YouTube

 

Cerebas (the same model mentioned above)

 

They have a blog, it looks good and has a fair amount of content too. It feels this was written for people who don't have a PHD but maybe they also want to sell you something. 

Blog here: Cerebras Blog Landing Page - Cerebras

 

Sandford Education Blog

 

A good looking blog with many posts I am interested in and many I found interesting. 

Blog here: Stanford CRFM

 

Bloom blog post on development on Bloom MTX

Blog Here: The Technology Behind BLOOM Training (huggingface.co) 

 

AI communities

 

I of course, having read lots of blogs, I came across different communities interested in AI. Here is a small selection. Hint hugging face is the big one for transformers. 

 

Reddit

 

You know there has to be a Reddit community for AI and it does not disappoint. Almost a constant stream of new AI goodies and inks to fresh research papers to read. Enjoy!

Website: Machine Learning News (reddit.com)

 

Hugging Face

 

Let me be honest, the first place you will look for transformers is hugging face. Hugging face is a website that lists a lot of different modes and gives them a description. Honestly, there is more information there than here and if you're just interested in Transformers and code snippets to run them the website is below, thanks for the click. 

The problem I find with hugging face is the model does seem to be built around providing you a API to access LLMs and I am more interested in finding ways to get chat bots onto servers which I own and control to do tests for use.

Website: 🤗 Transformers (huggingface.co)

 

Further Reading

 

Along with blogs I kept coming across various white papers and research journals that I thought I better squirrel them away for research, below are some examples. 

 

If you are wondering why I keep putting the word LLaMa and unsure why that means AI and not a domesticated Mammal, see here the white paper on that. 

[2302.13971] LLaMA: Open and Efficient Foundation Language Models (arxiv.org)

From where we get the idea that language models could be a route to AGI.

[2205.05131] UL2: Unifying Language Learning Paradigms (arxiv.org) 

Chain of thought prompting elicits reasoning in large language models.

[2201.11903] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (arxiv.org) 

Training language models to follow instructions. A clear argument when to make 

[2203.02155] Training language models to follow instructions with human feedback (arxiv.org)

Extra computation steps to imrove LLMs

[2210.11399] Transcending Scaling Laws with 0.1% Extra Compute (arxiv.org) 

Model Scalability with pathways

[2204.02311] PaLM: Scaling Language Modeling with Pathways (arxiv.org) 

Paper for using the Pile dataset

[2101.00027] The Pile: An 800GB Dataset of Diverse Text for Language Modeling (arxiv.org) 

 

Specific Citations:

 

They asked nicely! I realised these are intended to be added to code distribution but included out of respect for authors. 

 

The Pile

@article{pile, title={The {P}ile: An 800GB Dataset of Diverse Text for Language Modeling}, author={Gao, Leo and Biderman, Stella and Black, Sid and Golding, Laurence and Hoppe, Travis and Foster, Charles and Phang, Jason and He, Horace and Thite, Anish and Nabeshima, Noa and Presser, Shawn and Leahy, Connor}, journal={arXiv preprint arXiv:2101.00027}, year={2020} }

 

Bloom

@misc{muennighoff2022crosslingual, title={Crosslingual Generalization through Multitask Finetuning}, author={Niklas Muennighoff and Thomas Wang and Lintang Sutawika and Adam Roberts and Stella Biderman and Teven Le Scao and M Saiful Bari and Sheng Shen and Zheng-Xin Yong and Hailey Schoelkopf and Xiangru Tang and Dragomir Radev and Alham Fikri Aji and Khalid Almubarak and Samuel Albanie and Zaid Alyafeai and Albert Webson and Edward Raff and Colin Raffel}, year={2022}, eprint={2211.01786}, archivePrefix={arXiv}, primaryClass={cs.CL} }

Add comment

Comments

Sally
2 years ago

Very detailed and interesting. Well done 👏