wboler05 / pso_neural_net Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 61.92 MB

C++ 96.39% MATLAB 0.44% QMake 1.83% Python 1.34%

pso_neural_net's People

Contributors

Stargazers

Watchers

pso_neural_net's Issues

Crossover and Mutations

Currently, a rudimentary point system is implemented to shake up the weakest particles in the group. The point system scores based on how often a particle can find a personal best, local best, or global best, which rates each particle as a performance metric for convergence. This point system may be useful to explore further, but the consequence of poor performing particles should be modified.

In genetic algorithms, crossover occurs when two children are randomly assigned specific features from each other. Currently, the code assigns the full global, local, or personal best to the particle for the edges of the neural network, but we may want to actually consider crossover instead of full assignment. This would mean that the weakest performing edge would actually gain a probabilistic assignment from the personal, local, or global best.

For mutation, the particle edges are currently 100% randomized. Mutation should actually consist of a probability of edges becoming mutated and not the entire set. We should modify our mutation of the weakest particle to actually be a mutation of a randomly selected particle, and only a random number of randomly selected edges should be mutated.

These modifications should be rather simple to implement.

Backpropagation Revisit

Need to implement backpropagation for comparison checks.

Remove Topology from PSO and Implement Validation

It appears that the ANN is having a hard time training with Topology being in the PSO. There are two things that could be done to change this. We could change what it means for a node to be activated by reducing its probability from flying, but there is no real way to implement this with PSO. Next, we could completely remove topology from the training and reduce the amount of states required for training the PSO. This could work, but would require for us to modify the topology by hand.

There is a third option to train an ANN to a certain topology and then drop out nodes one-by-one. This would not be difficult to implement and should be the next focus. Further exploration of this would mean that we pick a node with the weakest edge connections and then drop it.

For now, we need to implement a switch to disable training of Topology.

Save Neural Network Objects / Load As Activation Functions

Add the capability to save trained neural nets and load them as activation functions for nodes.

Validation

Implement 10-fold cross validation by creating a validation set from the testing set. Then have the ANN train ten times with mutually exclusive validation sets and report the final score.

Recurrent Network

Recurrent neural networks are best used for predicting outcomes based on time. To state, the difference between a feedforward network and a recurrent network is that the output of forward nodes connect to the input of prior nodes. This introduces a concept of memory, where a direct cycle is introduced to the neural network.

Currently, the implementation of the neural network is defined as a feedforward network. In order to implement a recurrent network, the simplest method will be to add weights to the nodes pointing at their selves. These recurrent nodes should only be implemented on the hidden layers.

The biggest problem with this is that a recurrent network would require an alternative training method. Since the network is cleared before applying an input, the memory aspect might be irrelevant.

Optimize Cache Settings

The settings available for the cache are "total segments in cache" and "size of cache". Size of cache would be restricted to the system availability, so two checks will need to be implemented: Are we exceeding available memory at initialization and are we exceeding available memory during runtime. The first check would be rather simple to implement, because we already restrict the cache size at initialization. The second check would require extra thought since we will be met with two options: immediately halt the process and return an error, or attempt to resize the cache with some rule accordance.

Either way, this opens a dilemma for how to handle the total number of segments in the cache. There will need to be an appropriate balance between reducing file accesses, while at the same time reducing the amount of objects to transfer when a miss occurs in the cache. A simple test at initialization could automatically determine the total number of segments based on the runtime results of the current system, but the performance may change if available memory at runtime is restricted. This could mean that the number of segments are either: a) kept static or b) reevaluated when memory runs short and we do not halt the process.

Evaluation of the number of segments required to optimize performance of the cache could be done in two ways: a) we can take samples of number of segments and test accessing random data objects against runtime and run linear regression, or b) we could use PSO by randomly accessing data objects and optimizing runtime. The cost of either operation would need to be appropriately evaluated with the expectation that it may run more than once in a program.

Particle Map

One may be interested in viewing the performance of the PSO based on the convergence and movements of each particle. It would be beneficial to make a 2D graph that is able to view particles and their values. This graph would be able to select a single edge and allow the user to navigate through each edge. In order to achieve this without interrupting the PSO training performance, the graph will need to query for the specific particle and edge positions at a low rate and only when the graph is visible. The graph should be able to pop-up at the user's request as a new dialog window, accessed from the menu.

The purpose of this graph will be to give a visual representation of each particle edge convergence as the PSO trains the neural net. The user will be able to select each edge and monitor if there is any movement or new bests being found. This will give the user an idea of whether training is still occurring or if the training should be halted.

Confusion Matrix Correction

Prior work with the confusion matrix for the GUI and for ANN performance was implemented with the expectation that all nodes were classification nodes. Now that we have a mix of regression and classifier as output, this will need to be modified so that only the first output node (node 0) is handled in the confusion matrix.

Edge Restrictions

In order to implement a mixture of regression and classifier, the input and output nodes should have edges which are open ended (-inf, inf). For back-propagation to work on an RNN, it is advised that edges should be bounded to prevent feedbacks from exploding. To accommodate this concept, we should restrict all the hidden edges and recurrent edges to [-1,1]. This should be fairly straightforward to implement and test.

Ini File

We need to be able to store parameters and recent input file location to an INI file. This file should be loaded from a default location and checked at each starting of the program. If one does not exist, the program should create it with default settings. The INI should be updated at on closing.

Adding Activation Function to PSO

In this issue, we evaluate the use of the PSO to optimize the logistic sigmoid function for providing fine tuning of the slope of each activation function at each node. It is expressed as an important aspect to solution convergence in Dr. Eberhart's Computational Intelligence: Concepts to Implementations, where he has implemented a PSO ANN which optimizes the constant in a logistic sigmoid. This allows for some neurons to have a more rapid response to inputs and other to have a slower response. As a model of this behavior, a 3D graph is expressed in MATLAB to show the relation of inputs x between -1 to 1 and k between 0 to 5, to their outputs in a logistic function:

Constants k closer to zero would have an extremely slow response to inputs, resulting in minimal activation, and constants k near 5 would result in an approximation of the function 1 / (1 + x).

The approach to implement this in the PSO would be to apply an extra element in the 3D matrix State. Currently, the State looks as such:

[ T H1 H2 ... Hn R1 R2 ... Rn-1 ]

Furthermore, T, Hi and Ri are 2D matrices indicating topology, hidden layer weights, and recurrent edge weights. By modifying it to be shown below, the extra element will account for the logistic function constant for each hidden node:

[ T H1 H2 ... Hn R1 R2 ... Rn-1 A1 A2 ... An-1 ]

To modify the activation function, the polynomial approximation will be removed and the standard exponential function will be used to explicitly calculate the activation function. This will result in slower execution, but the advancements in solution convergence may provide as a sufficient tradeoff. The interface for the function will be modified as follows:

real activation(const real & input, const real & k)

where k is the constant defined for that particular node.

PSO Inversion

Currently, the PSO operates on several samples and calculates an error after each sample is evaluated. It may be better to actually implement the PSO by taking a single sample at a time and flying the particles for several iterations before moving on to a new sample. This may prove to be more beneficial, although it may also result in heavy overtraining. Currently, overtraining is not an issue, but that may point to why our technique is resulting in such low percentages.

Historical Inputs

Rather than trying to train the ANN for current day inputs, it may be better to train the ANN for an inclusion of prior inputs (previous day) moving window. There would be some slight modification to the code that allows each OutageDataObject to point to the previous day index, and a method for setting the number of input nodes based on the history window size.

Modify Training

Currently, the ANN grabs random training inputs for each particle, which may be different or repeated values. We need to make the ANN more consistent.

First, we need to implement a shuffled list of all training inputs and then pull in order samples from that list.

Next, we need to make sure each particle in each epoch gets the same sample sets for a good comparison. So, if I=5023 is a sample from the training data in Epoch 1, then all particles from 1 to 50 must train with that same I=5023.

This should improve convergence.

Iss5 - Activation Function and Output

We need to analyze the neural net's activation function and verify that it works correctly. Currently, the activation function is based on the Gaussian function with a sigma of 0.35. Also, for negative cases, the function returns it's negative output. The values should be restricted between -1 and 1, but further testing is required.

Also, the output is currently set for 1 in output[0] = no PE, 1 in output[1] = PE. Need to modify the code for single output, for output[0] = 0 is no PE and = 1 is PE.

Single List for all Nodes

When I first wrote this project, I wrote it as a hobby/test. The first thing that I wrote after the PSO implementation was the neural net nodes. I had made the choice to train the edges, but I wanted to keep the neurons straightforward for the purposes that I was exploring. This resulted in me creating 3 lists for the neural network: input nodes, inner nodes, output nodes. This allowed me to easily define the inner nodes separately from the input and output nodes, as well as to treat the input and output's linear activation separately from the inner nodes. The side effect of this is that we've lost the ability to make a completely simple neural net that does not contain hidden layers.

In this issue, the code will be resolved to contain a single list/vector of neurons. Inputs will be considered at location [0], and outputs at [list.size() -1].

MSE Plot

A useful tool for machine learning is the MSE plot. It shows how quickly a solution is converging and visually represents the raw performance of the algorithm. There should be another plot that displays the MSE over epochs. An example is shown below.

Iss7 - Training Interface

Currently, training is implemented by loading labelled data into inputs and outputs, and then storing them to the NeuralPso object. The NeuralPso object selects which labels to test based on random selection of labels, maintaining uniformity across positive and negative selections, and returns the cost of the test to the inherited Pso object. NeuralPso also serves a dual purpose to instantiate the virtual fly() and getCost() function, required to train the PSO, while also containing the trained NeuralNet.

To maintain readability, the training aspect of the NeuralPso should be split from NeuralPso and provided through an interface class, NeuralPsoTrainer. Via this class, the correct training operations can be written for the NeuralNet, while leaving the NeuralPso to handle the NeuralNet and Pso interactions. the getCost() function can also be re-implemented, maintaining all abstraction of the data itself at the training interface.

This is important due to the current hectic nature of NeuralPso. If we were to change from PE data to stock market data, many functions inside of NeuralPso will need to be re-written, while others will need to avoid being touched. To maintain the appropriate encapsulation, the separation of the training aspect to a training interface object will improve the ability to make quick modifications to the program in simple steps.

Iss3 - Pulmonary Embolism

A fellow research student needs help with his pulmonary embolism neural network. I will test his data on my neural network to verify if the data is feasible, and then give guidance with respect to what I've done.

Layer by Layer Training

Once we've implemented the method to train all the particles with the exact same training inputs for each epoch, we should investigate training each layer at a time per epoch. Basically, we try to change the last edge layer in the first epoch, then train the next-to-last edge layer in the next epoch, and so forth until we get to the input layer, and rotate to the end again.

Outage Data Trainer

Currently the project is situated with an AND gate trainer. There will need to be a wrapper class developed for the scrubbed input data and its outputs. For now, we have decided on the inputs to be Year, Month, Day, City, County, Reported Event, Storm Type, Daily Precipitation, Daily Low Temp, and Daily High Temp. The outputs will be whether or not there was an outage and the number of affected customers.

Data Cache

Currently, all the labelled data is loaded directly into memory. This would be impossible for extremely large datasets. In order to alleviate this, a data cache should be designed that loads from a file chunks of data. There should be a class that holds the sections of data, as well as a class that is the section of data. The data will first try to be accessed via the cache in it's specified section. If it's not found within the index range, then the section will load from file a new chunk of data.

Iss2 - Implement OpenCL

Current training in serial coding takes hours to compute. Need to squeeze out some parallelization in order to get some speedup. Can optimize the flying particles through parallel methods. Need to figure out what else I can implement parallel as well.

Iss1 - Implement working neural net

The first step of this project is to implement a working neural net. From here, we will have an input layer, inner network nodes, and an output layer. The input, inner, and output should all be of double type. The weights will be connected between all nodes in between each layer. For the time being, we shall implement each layer as a list of nodes, and each edge as a double list of each edge layer. All nodes should only provide values between 0 and 1. This will be working if we are able to properly propagate a set of inputs to the output with all uniform weights, resulting in equal values on the outputs between 0 and 1.

Testing Neural Net

Need to be able to load a previously trained ANN and test it against various data sets. The implementation is there for loading an ANN, but the GUI needs to be created for selecting which test to run and everything needs to be evaluated for proper execution.

PSO and BP Combined

Last night, I read [1] discussing the combination of PSO and RNNs. In their discussion, the authors found a tradeoff between PSO and back propagation. PSO is good for finding a global maxima/minima but not so good at fine tuning the error. Back propagation is good for fine tuning the error, but gets trapped in a local maxima/minima. Their novel idea was to treat the training of the RNN with a maximum amount of epochs/generations first with PSO, i.e. 50 generations, and then finalize the training with back propagation.

I believe it would be beneficial to implement back propagation in the training of the neural net in combination with PSO, as a final approach for fine-tuning the accuracy of our best global solution.

[1] P. Xiao, G. K. Venayagamoorthy and K. A. Corzine, "Combined Training of Recurrent Neural Networks with Particle Swarm Optimization and Backpropagation Algorithms for Impedance Identification," 2007 IEEE Swarm Intelligence Symposium, Honolulu, HI, 2007, pp. 9-15.
doi: 10.1109/SIS.2007.368020
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4223149&isnumber=4223144

AND Gate GUI

In order to tweak the neural net for more accurate data with a known output, the program should be modified from PE to an AND Gate.

Remove boost references for ECE 570 project

We have no use for multi-threading the logger due to the nature of Qt. We may want to use threading in the future, but I suggest implementing the use of standard threads. The boost library is used for nothing else. This branch will remove the unnecessary boost code that causes installations to be complicated.

Bias Node

It was found that there should be a bias node set to the constant value of 1 in the neural net. This helps for activation functions where 0 inputs result in 0 output, regardless of training. The bias weights can be trained as extra edges and used as necessary to provide on "ON" state for any necessary nodes in the case when a 0 would produce an incorrect off "OFF".

Enable/Disable Inputs and Outputs

We need a GUI that allow us to select which inputs and outputs to disable. Mostly, we need to disable specific inputs. The GUI should have check boxes for every input and every output. The NeuralNet will update its topology for inputs and outputs based on the settings.

Iss6 - PSO on Topology

There is much difficulty in deciding the appropriate conditions for the neural net. Currently with the weights and floors established for the MSE, accuracy, precision, sensitivity, specificity, and f-score, the resulting neural net for any structure can be diverse. Along with the changes to weights and floors, changing the structure of the neural net itself results in diverse results as well.

This produces a particular circumstance where it is necessary to implement some sort of optimization on the NeuralPso itself. In this case, there will need to be a decision on what the appropriate fitness will be for the said Meta PSO. The conditions will most likely change for the specific purpose of the Neural Net, so the Meta PSO cost will also need to be generalized.

As an example, I am attempting to use the PE data to train the neural net. The PE data itself is limited, with about 2000 data points. The purpose of the data is to detect which patients may be having a PE, in order to send them to radiology. At the time being, all the patients experiencing PE will be sent to the radiology to have an X-ray, but this produces unnecessary exposure to patients. The neural net should be capable of disseminating which patients should get x-rayed and which shouldn't. The obvious optimization is a reduction of patients being sent to radiology, but the obvious problem is missing patients who are experiencing PE. So, for this particular case, it would be necessary to test positive for as many patients as possible, as long as we pick up every PE positive patient.

The objective of the neural net would change depending on the problem. For the case of PE, we need to get all PE patients, but reduce the non-PE patients. If we were to attempt to predict the stock market, then we would need to increase the overall accuracy of both negative and positive results. The Meta Pso would need to take this into consideration.

Separate Networks Regression and Classifier

After several failed attempts to train the ANN with different configurations, the highest accuracy that can be achieved on any combination of test inputs is 64%. One of the problems may be due to the training of two different nodes for classification and regression. It may be necessary to separate the two concepts and to train a neural network for each output. This would mean that there would be two neural networks in parallel to process the two events. With the ability to shut off certain outputs for training, it may be beneficial to try to train for one node at a time, as opposed to both nodes.

Automated Testing

We should be at the point of being able to write up an automated test. For this particular case, we should implement a test that increments the number of nodes until a maximum number, and then increment the layer. For each node count/toplogy, the test should test the ANN for Recurrent, Feedforward, and both Dynamic and Static topologies. Each test should be ran some N amount of times for a Monte Carlo Simulation and the best ANN should be taken, w.r.s to Accuracy, MSE, and non-zero true positive and negative results. This test is expected to last several hours or possibly even days, at which point it will produce a list and statistics of the best selected global bests for all training cases.

Iss4 - Qt Implementation

The project was originally designed in Code::Blocks. In order to implement a GUI, the project should be transferred to Qt. This will require for much of the debugging output to be terminated. The dual threaded functionality between command input and PSO running will need to be changed or removed. The GUI will show the neural net's weights as it is training, as well as provide a confusion matrix and simple inputs to change the PSO fitness function and neural net parameters.

Vector Overflow when Run Pressed

There's a bug lurking where the program crashes when "Run" is pressed in normal mode. If debugging, the problem does not rear itself. It appears to have something to do with QwtPlot, but it's unknown which plot the problem exists in. Most likely, it's either the Fitness Plot or the ANN visualization.

Change to Classification Output

We should implement our outputs as full classification, based on the number of affected customers, instead of trying to predict the regression from a single node. This should make it easier for the ANN to train for a specific output. There will be 5 nodes, as follows:

A = 0
0 < A <= 10
10 < A <= 100
100 < A <= 1000
1000 < A

where each of the nodes represents the number of customers expected to be affected (severity).

We may need to balance the testing data, but we'll do so later if it becomes an issue. The last two levels may be difficult to train for.

Bias Node Revisit

In a previous issue, a bias node was introduced as an input with value 1 that connected to the first layer of the hidden nodes. Although this idea had the right thinking, the implementation is incorrect. Rather than having a single bias node at the input, there should be a bias node for every hidden layer, as well as the output. Currently in training for an outage event, the ANN assumes that 0 affected individuals is the most generalized case, and attempts to force every node in the last layer to apply -1 (where -1 is mapped to 0 and +1 is mapped to the maximum affected individuals). This results in the weights of every connected node to the regression node to be -1. A bias node should remedy this and allow for each node to have more sparse activations for particular generalizations, while allowing a default -1 value to be set.

This issue will look to remedy this.

Move PE To different repo

The PE project has a few bugs in it that should be fixed. Not only is it still an important, but it's changes should be separated from the current repo. Open a new branch and bring in the bug fixes to the PE project, and move that project to its own repo.

Validate Topology

One of the problems with the Topology being in the ANN is that an entire layer can shut itself off. There needs to be a function implemented that returns true if there exists a path from at least one input to at least one output. This would require a BFS that would have to search each input node for each output node, with a worst case complexity of O(Inputs * Outputs * (Edges + Nodes)). Dynamic programming using memoization could reduce this complexity, but a simple brute force BFS should be implemented first, and then optimized from there.

Reconfigure validation and testing

I need a new branch to set up validation and testing configurations. Currently, the program is hard-wired to handle testing sets, but the introducing of validation sets needs to be addressed.

Debugging Neural Net

Many problems arise when one tries to modify the neural net in any way and is unable to catch all bugs and mistakes in the process. There is no real way to test that the neural net is behaving properly, except to break down each function of the neural net into known test cases and do debug steps through it.

One particular test case which may be the best method to debug the neural net for any particular condition would be to feed the neural net the expected outputs, rather than feed it the inputs. If we feed the neural net its outputs, then there should be a one-for-one correlation between the input of the neural net and the output. This should give us the best base-case scenario for the neural net, so that different parameters, such as hidden layer size, feedforward vs recurrent, and various other parameters can be validated. If the neural net is unable to attain an extremely high accuracy for guessing it's output based on it's expected output, then we'll know there's a problem with the ANN.

This test case should be implemented as a function. A button on the GUI should call this function, which will construct a neural net based on the current parameters, but re-assign the inputs of the neural net with its outputs, and run a typical PSO training. If the accuracy of the ANN is not substantially high, then the ANN is not behaving correctly and should be evaluated.

Classification Matrix

Currently, the confusion matrix setup for calculating accuracy is only sufficient for binary classifications. Since we now have 5 classification outputs, we should implement a classification matrix that shows what the predicted values were and what the expected values were for each classification node. Then, the sum of predictors across the diagonal of the matrix will result in the true accuracy of the ANN.

Automated Test Procedure

Currently, we are running various tests by hand, which is becoming tedious as expected. To further our progress, a complex automated test procedure should be implemented that can automatically generate tests based on the accuracy of the overall best ANN. This test procedure should smartly choose which datasets to use, based on independent tests of separate LOAs. It should also implement smart ANN selection, which determines the best ANN for a given set of inputs from topological training enabled, and then disabling topological training with the desired generated topology. Once deciding on an ANN, input values should be tweaked for determining which data provides the best accuracy, and then further training and validation should commence on the new selection.

For further consideration, it may need to be able to isolate nodes with minimal degree (or some metric) to determine which nodes to drop for continuation of training.

A possible implementation would require for a configuration file to be created which contains the location of all files separated LOAs, and a possible schedule.

wboler05 / pso_neural_net Goto Github PK

pso_neural_net's People

Contributors

Stargazers

Watchers

pso_neural_net's Issues

Recommend Projects

Recommend Topics

Recommend Org