Hello World - A AI Speaks

Published on 16 April 2022 at 23:44

"I wish the first word I ever said was quote. So when I died the last word I would say was unquote" - Steven Wright

I wonder if like Steven Wright suggested he was born said Quote and ended Unquote then if this ever happened. Who were they quoting?

If a whole life is just quoting others what would that mean?

And I think this might raise a philosophical question if we build AI that copies human language will we forever be asking are my thoughts my own or are they just a rehash and reapplication of things previously said.

Unfortunately, I might seriously pose that question with the AI I discuss here... It learns language in a similar manner.

Foreword

I have been building an AI with the challenge that it should learn English the way a native speaker does. I patterned it after the Turing test and put forward the ideas of how I would run the test in a previous post here: Hello World - Measures Of Language Skill | A Logic Called Joe (webador.co.uk).

I proposed this test as an idea of an AI that might learn English by hearing it and report the below as y experiments in getting this working. If you remain sceptical then I commend you to be so.

If it comes across as arrogant or incomplete being vague on details I point out that a language-based chatbot already exists and I have just been experimenting with improvements to my own. I present it as a treatise on AGI because I have attempted to set up my Bayesian Turing test (discussed below) in such a way that if got lucky would be able to identify improvements.

So these networks contain many changes that I consider to be non-standard.

Introduction

The title for this series comes from the first computer programme ever written and one by tradition it is usual to teach as the first lesson by printing the word "hello world" to the screen. Originally I chose the title in reference that my measure was to be that the AI should learn its language in a way as close to the way we speak as possible. I proposed would learn on the scale of individual letters would be scored on basis of the full words it got right.

For full details of the maths of measure I scored different networks please see the above link but in short, it's a logarithmic scale representing several words got correctly a score of 6 would be something like if ran a random number generator alongside the book after 1,000,000 attempts would generate a similar result. The book it reads along to is 1500 pages long and probably between 500 million or 50 million characters (I can't remember which). My current best score is around 7,000 to put that in perspective a perfect score having spoken every word in the book would be rated around 30 million meaning 7000 represents about 0.00023333333333333333 of total information in the book-learned successful.

You might quibble that is a bit low but that is in the realm of like a new baby learning and speaking its first words during reading its first book. So maybe the less we talk about human parallels less we all feel stupid.

The test also requires the AI not to be run repeatedly on the same book and in all these tests the score given represents a single pass. Therefore to take the comparison to its logical conclusion it is very much like the baby is thrown out with the bathwater. If I am honest this is one reason we should be worried about AGI the tests we might apply to confirm its existence might by necessity be so strict that any machine that passes is will surpass us.

It is something to think that the first flying machines did not start at the power of a bumblebee and slowly grow to the level of a hawk, when we invented flight in a bound human's built something that could fly faster than any hawk. What is the possibility that the first thinking machines will in their first thoughts surpass us?

This learned something approaching language at a level comparable to how we learn the language in a single book. It is purely reinforcement learning that has no phoneme or structure to the language it listened to the book and was scored where it got it right. I for one think that is impressive...

I admit this is not proof of anything; not proof that a coming age of AGI is on its way. I share the below as I set myself a task to learn a thing that I did not think existed at the time I began the experiment. I submit the below only as a talking point that you could say I got an AI to speak you might disagree and if you do I would warmly welcome the debate.

This is for the neural network framework I have been building and testing myself.

Philosophy First, Technicalities later

When I started this project I had a view that consciousness must be about synapses and that the deficiency in neural networks was that some other form of memory that tracked at the level of neurones would be important. In the course of this project, I ran many tests and the one that I will talk about below had a very specific architecture and a bug caused it to have the memory I was testing turned off.

So like all good things in life it started as a mistake...

I don't want to go into technical details if you scan down to the bottom I explain the version I am talking about but now how it works. But I think whatever the top performance of whatever the best version of this AI will be it seems to have multiple factors.

Computational Power

I set up these tests to focus on trying to establish how much computational power would be needed to house a human type brain. An idea I had was that a neural network might have strange behaviours when growing too large.

The gestation of the final model ended up being a process of finding an improvement waiting till it scaled up

It started quite poorly...I had written a programme that would store up the test results as they came in onto a save location and a linear regression programme that would translate them to estimate the size of the final neural network that would be big enough to do all 30,000,000 of the book (a stupid idea because it implies the AI is smart enough to know English perfectly before learning it).

This of course includes a mistake on purpose. It ignores the singularities where learning changes in the error function (see below) and if learning is non-linear it will tend to underestimate it. A lot of the initial attempts were not able to quantify how big should make the AI. Therefore you should take this as a best guess based on questionable data.

The initial estimate of size for sentient AI is below.

Estimated depth 244317.43299698274
Estimated width 224980.92493165337

Total neurones for an AI as smart as a whole 1500 page book: 54,966,536,075.66118

54 billion neurones seem to be half the human body's estimated number of neurones. Seems a legit estimate as to when googling how many neurones are in the human body it says 100 billion. I could imagine it takes almost half of you to be well "you" and the other half is running background processes.

The above might be criticised as to why would the information in a whole book be decided to be equivalent to sentience or an estimate of anything. Well, this is a fair point but I guess I don't know how else might quantify smart other than writing a whole book. It is admittedly very fuzzy logic in a hard to estimate area.

This does highlight a problem that I think gets missed how do you compare machine intelligence to human intelligence?

This was my older version which included the type of memory I now bypassed to move onto other tests but recently reinstated. The version I speak about below is a different implementation (how weird is it when you talk about having multiple models of experimental AI consciousness and you trying to figure out which one is right).

This is where I started: See how much it sucked

I tried and found some improvements. In the next version and there have been several versions to test theories and exclude bad ideas.

I then found more explosive improvements. As you expect when eclipsed the previous version I tried to create more versions to figure out what made it better at learning English. Despite the one below being "better" I really love the above because implementation of memory it has.

Singularities in the strictest sense

The reason I have been using these types of visualisation is they show singularities like the below. In this, I do not mean the idea of a hard take-off to machine intelligence (though also yes) but periods in the AI growth that changed the scoring potential of the network.

What I have learned is that AI growth is not continuous a potential problem I have had is some promising starts have ended after much processing with a singularity where scoring growth dropped off a cliff.

I did also learn from this and while haven't confirmed it in every case but I have a strong suspicion that growth in any axis is a gaussian shape and after a certain size in width depth must be added. As you can see below adding more inputs no longer increase network performance but there remains growth in adding depth.

So anything I say here should be looked at with scepticism because a singularity in the future could change the growth pattern. I theorize it shouldn't stop but I have assumed that a well-performing AI at a smaller scale continues that growth at larger scales.

This growth in capability has to also be seen as a trade-off with computation. Below is a 3d graph of the big O notation in the amount of processing between inputs and depth. As you can see the better output is to have more depth over inputs.

But you will note in many graphs that inputs seem to have a big effect in many cases on the AI performance.

Note: yes I am assuming a fully connected neural network.

So What Did It Say?

If you're still reading this and have written this off as a bit crazy; I will elaborate a bit. Some of it is well noise. I had to insert some noise. This is for the reason that English at the beginning is fairly hard and the noise lets you detect if smaller neural networks are managing anything.

I have also seen it use names like Michael etc correctly as well as a limited vocabulary. It should be noted that to be counted as being correct it has to perfectly pronounce each letter in sequence and line up with the next word in the book. The book does not repeat so anytime it is right it is highly likely that some form of "intelligence" whether linguistic, temporal, or an understanding of literature is present.

The calculations are based on entropy calculations and therefore presented as representative of the "information" that the AI has absorbed in a quantifiable manner.

The below are selected to show some examples I have a fuller list elsewhere.

**************************************************
in
likelihood of event
1.0542973687900566e-06
likelihood of AI saying this randomly
7.22238540800808e-08
Times this happened
1
likelihood this could happen
7.614541932050618e-14
likelihood this could happen in this book
7.222385408008078e-08
**************************************************

Or missing characters and concatenations.

**************************************************
hecalled
likelihood of event
1.0542973687900566e-06
likelihood of AI saying this randomly
5.2162850981808034e-15
Times this happened
1
likelihood this could happen
5.499515653870803e-21
likelihood this could happen in this book
5.2162850981808034e-15
**************************************************

Though the stuff that it started getting right is much more impressive. Here is an example. It liked the word I, No, Fed, Feed, and simple stuff. You could assert some pseudoscience reason for this that to understand human behaviour you have to first detect when a person is speaking (therefore learn I) then detect agency (therefore No) and generate some knowledge of rationales like fed or feeding. But how do you explain high? I (i think it is just high was common in the book) I think it would therefore be advisable to avoid attaching any belief to why a certain word developed earlier though I did briefly try on other places and don't think it is entirely related to just this book it does seem to start by learning I, No and certain key words at a guess.

Seems something interesting to think on.

**************************************************
high
likelihood of event
0.00019504501322616048
likelihood of AI saying this randomly
7.22238540800808e-08
Times this happened
46
likelihood this could happen
1.4086902574293644e-11
likelihood this could happen in this book
5.3736117519364454e-05
**************************************************

feed
likelihood of event
3.584611053886193e-05
likelihood of AI saying this randomly
7.22238540800808e-08
Times this happened
5
likelihood this could happen
2.58894425689721e-12
likelihood this could happen in this book
1.669815506331468e-05
**************************************************

Though you can see that sometimes it only says a word once and I am a bit wary of this but as you can see it gets suitably down scored for doing this and the panel details likelihood of a word appearing in the book the baseline if was a random number and then the likelihood these two events happen concurrently. Finally, it weights it for the number of times in book word happened, therefore you find single letter values like "I" or "a" can get 1000 hits and still get a negative value as a random number generator actually would be highly likely to get this value.

**************************************************
came
likelihood of event
0.0007706913765855314
likelihood of AI saying this randomly
7.22238540800808e-08
Times this happened
1
likelihood this could happen
5.566230152329002e-11
likelihood this could happen in this book
0.038593610890086054
**************************************************
each
likelihood of event
1.897735263822102e-05
likelihood of AI saying this randomly
7.22238540800808e-08
Times this happened
1
likelihood this could happen
1.3706175477691113e-12
likelihood this could happen in this book
2.3400528721946178e-05

**************************************************
high;

(this one appears to be even getting its grammar right!)
likelihood of event
5.271486843950284e-06
likelihood of AI saying this randomly
1.183997607870177e-09
Times this happened
1
likelihood this could happen
6.241427813156245e-15
likelihood this could happen in this book
2.9599940196754426e-08
**************************************************
door,
likelihood of event
2.1085947375801135e-05
likelihood of AI saying this randomly
1.183997607870177e-09
Times this happened
1
likelihood this could happen
2.496571125262498e-14
likelihood this could happen in this book
4.735990431480708e-07
**************************************************
among
likelihood of event
0.0004997369528064869
likelihood of AI saying this randomly
1.183997607870177e-09
Times this happened
1
likelihood this could happen
5.91687356687212e-13
likelihood this could happen in this book
0.00026601584654583986
**************************************************
laid
likelihood of event
0.00015392741584334827
likelihood of AI saying this randomly
7.22238540800808e-08
Times this happened
1
likelihood this could happen
1.1117231220793903e-11
likelihood this could happen in this book
0.001539523673571002
**************************************************
make
likelihood of event
0.0005629947949338902
likelihood of AI saying this randomly
7.22238540800808e-08
Times this happened
1
likelihood this could happen
4.06616539171503e-11
likelihood this could happen in this book
0.020595065334059515
**************************************************
feeding
likelihood of event
4.2171894751602265e-06
likelihood of AI saying this randomly
3.18193390989029e-13
Times this happened
1
likelihood this could happen
1.341881819544476e-18
likelihood this could happen in this book
5.091094255824463e-12
**************************************************
hill
likelihood of event
2.6357434219751418e-05
likelihood of AI saying this randomly
7.22238540800808e-08
Times this happened
1
likelihood this could happen
1.9036354830126546e-12
likelihood this could happen in this book
4.51399088000505e-05
**************************************************

Conclusion

It is variant 19 out of 25 and 26th-28th on its way at the time of writing. I found an error in it and another version should test out if I have a fix in the hope that it will be possible to expand on the other axis.

Version 26 and 25th look to see if I can combine this as one model of memory within the Neural Network and one that I had found earlier and performed well but not as well as this.

I hope you will forgive not sharing the details. I am hoping the next batch works better.

I don't propose this as hey I built an AGI but what I propose is it would seem possible. I think this version might be flawed but have to wait to see where and if its learning plateaus or if an alternative version overtakes it. Give me a year and a research grant and I'll tell you then (which will not happen). But I investigated if it was possible to build something like a Google Language transformer at the level of per letter with no shortcuts just an AI learning English and words in a strict environment. Would I get anywhere?

The surprising answer was Yes!

I think it might suffer from it overwriting past learning so I am not sure really if it is "smart" like us or smart like a parrot. The learning of first words in the first book metaphor should be stressed though. That being said I haven't looked into getting it to say full sentences or equivalent transitions from the "information" evaluation in my testing to practical use cases.

If I did it might be possible to assess sizes of networks needed for a intelligent task.

Maybe it can teach itself just words but not string whole thoughts together. Though I think that it has learned graphs of words which are connections between letters. Why would words be harder? Especially when the only way to predict the next word implies some understanding of every word that preceded it?

You can see why getting it working at learning words from letters seems a good place to start for developing true AI.

I think for practical usage it could be used to make a self-learning chatbot. It is essentially a text transformer built from a different approach. The training environment of learning in a book if you were to compare to how we raise children; then it would be the abusive equivalent of the AI learning to talk in a week of comparable human growth then being got rid of because it only partially recognized names, places, and individual words and did not develop the telepathic ability to guess the plot and authors intent and therefore failed to guess whole sentences ahead of time. I would think it a given that it can be trained to recognize a paragraph of a customer's request and reply politely (but I haven't tested this).

Though I submit the humble opinion that to be sentient and conscious an AI would need to read and understand the customer request then tell them; No and then they could bugger off... I submit true sentience probably includes the additional ability to see the other as a free-willed agent like yourself and have some sense of control over the direction of one's thoughts to assess these decisions and decide. So maybe the true test of AI is when it can not only talk to us but also tell us where to go...(just a thought).

Hey well it did learn the word "No" on its own... Though I doubt that is because ego is in the AI it is more likely ego is in the words themselves and lends a curious thought having noted the AI learned certain words seemingly before others. Does that mean anything to how language is structured? It is just a thought but how does a AI with one assumes cannot have a "I" or "me" even understand those words? Kind of makes you think...

My ideas on what a big AGI might look like have changed each week. I suspect they may continue to change but you can see it is at least interesting. I hope you have found it a interesting read.

I am not a Ph.D. student, this is just a blog but for my part, I swear it seems at least in principle AGI is possible and probably closer than you think. I started with an estimate that the machine required would be bigger than any computer imaginable. I am now wondering If when we meet true AI I could fit a sentient AI on my laptop but this is early days ask me in a month or two...(maybe).

I don't know how long that would take to train that sentient AI. I have a web scraper already built if I strike it lucky to go get books from the website and just force-feed it data....what could go wrong?

« Previous Artificial Language Generator Problems In Communicating Complex Analytics Next »