Visualising a reinforcement learning agent

Published on 6 November 2021 at 23:05

I'd rather work on the positive reinforcement, the things I did well. Hale Irwin

 

A year ago I built a neural network from scratch trying to learn the specifics of its behaviours and maths. The problem I had chosen for this project was stock trading and attempted to build a neural network, this project slowly grew into a Neuralevolution algorithm and then into a more complete system. I built this from scratch and one of the main aims was to learn the mathematics without using a framwework so I more fully I built the AI framework in python, the genetic algorithm, and all the attendant functions and classes.

At present the code when run, will download the data and then experiments making changes to a population of AI to make them more efficient sock traders. All that is left is to go get a cup of tea while the AI evolve and adapt. 

I call it my MTEM for multi task evolutionary model as it utilises a network of networks and evolutionary algorithm to optimise their connections.  Though it has a problem a AI produced by evolution is the blackest of black boxes being more like a Russian doll of black boxes. For a long time I was resigned to the view that "best" was just the AI that made the most money in the tournaments and I gave up trying to explain why. 

Recently I had the idea of leaning into the metaphor of the neural networks outputs being like braiin waves and trying to show them against the realities of the stock market trading that the AI found it in. 

Even after building it though I have had some concerns that I simply didn't understand what it was doing. Eventually I started building graphs with hope that this could help me start analysing the AI. The most common one I used was overlaying the AI output (in green) compared to blue the price of any trade conducted tomorrow. Therefore the AI had two challenges that it must prognosticate if would be a good day or bad day tomorrow and decide if want to trade today. 

The AI was rewarded if sold  stock at a profit and the opposite when sold a stock at a loss. Below is sample of some of the plots. 

You can see with the below AAL stock that the general trend of the price is upwards but with random price collapses (note the scale means they appear just as straight lines down) and as you can see the AI output jumps up and down at a frequency that increases with these price collapses. 

 

A issue that I had was that some neural networks would sometimes fil to do anything and so I had to start looking into what was the cause. Something that I found interesting was that the below looks fairly obviously a good deal it is going straight up! Happy days! Though this is not a great manner to train artificial intelligence which relies on the feed back of getting the positive or negative reinforcement upon making a decision. The problem with getting a lucky stock is that you never have to make a decision and so never learn! It is odd but in the below the AI will have only bought the stock right at the end when there was a marked drop in price.

I still have a task to look into transfer learning or other methods to train a AI on more volatile data so it understands trending data. Though it maybe a function of the AI that it is best suited for volatility where velocity and volume calculations are probably the best 

 

Though the problem of training can be improved upon and better modifications. The top graph is a earlier network and the second is network post training on the data. Note the two are the same data but don't line up perfectly as the starting date and position as starting scenario is random. Though you can see that post training the AI buys the stock and keeps it as a earlier point. 

Though as I can't see the green line going quite above 0 I am not 100% certain that the AI traded at all. 

This is where there maybe some issues in that while the AI is clearly "better" an old school neo liberal economics position is that past data is of little value as the market should have already incorporated all past data into the current price point. In practice it likely this is faulty as clearly most forms of trading is profitable precisely because there are repeating market patterns and the efficient market hypothesis is possibly not that efficient. That being said it is worth bearing this in mind as an AI is pattern matching but also "remembering" those patterns. It appears something missed that if AI are efficient then market action

Over time the AI tends to become very 0 centred the below graph is actually a AI that trades regularly but its decision making is heavily centered around very small fluctuations make it change.

In fact while the behaviour of the previous AI looks a lot like investment strategy of putting the money into the market and leaving it only pulling out if there ls a market crash concern the below hex graph shows the profit value on X and and number of days held along Y. While you can see that at one point it did hold a position for 140 days you can see the hex shows it rarely did this instead looking to hold it for only a single day and often making small profit margins.

This shows a potential issue with AI that they don't necessarily do what you want they do what they are rewarded for. Trading regularly for small positive returns is as valid to the AI as waiting single big returns. The AI has to pay a fee for any transaction just as in real life. I am currently working on some improvements for this and these graphing methods where a first step. In practice though I have found that the genetic algorithm and evolution tends to weed this behaviour out and a AI trading at a high frequency maybe just learning anyway and may eventually grow out of the process. 

Interestingly I found that nearly all AI likes trading at the end of a month or a week. With the most profitable days being Thursday and Friday and the 1st or last days of the month. See the below shows 

Day of the month i.e. 1 if 01/01/2021

The below shows a sleight wiggle in profitability of trades based on day of the week with spread being highest on Friday (top value), Y value i Monday=1, Tuesday=2 and so on. 

Surprisingly it looks like stock prices may have even been seasonal!

Conclusion

I intend to use this as the basis of future testing and measuring the AI behaviour I built everything as classes and a event manager style system that when wired in listens for the AI events and records it to make the graphs at the end and so can generate new graphs in the future. I plan to try comparing the AI output waves with inputs to see what might be affecting it. 

This seems to be the only approach with time series affected AI as a key dimension that affects all there decision making is time. 

I had expected to use the same Hex plot methodology to see if it was close prices or volume or close prices or the highest price etc. The truth is while there was correlation and multiple groupings there wasn't anything that would make me think the AI used it as its trading signal. Though it got me setup for future testing and automated the workflow. While I really should do something using the more mainstream published artificial intelligence libraries I have found that I learned more faster by trying to build mine from scratch. So there is where the story ends I did not get any closer to understanding what was going in my AI head beyond that they seem to like Fridays and hate Mondays And on that maybe that was the biggest indicator of artificial intelligence of them all...

There are method's for machine learning explainability Shapley values but with the experiments I have planned and dealing with the problems of time series data has left me increasingly thinking about what tools I can use. The AI input's from this data differ from Supervised Learning as it receives data feeds and makes decisions. Trying to figure out what a AI is doing and why in real-time seems err difficult. Though would seem foundational to improving it as the amount of evolution from the genetic algorithm and the neural network able to learn means there are a huge number of moving pieces. 

The truth is I have seen more than one way of trading develop within my AI sometimes looking like investing sometimes like a signal trader  and his shows how adaptive and useful neural networks are. 

Add comment

Comments

There are no comments yet.

Create Your Own Website With Webador