Testing and validation are important stage in modelling. The model may be a software, simulation, prototype, and so on. System Development Life Cycle (SDLC) includes the testing in its last stage before implementation. We can use “black box” or “white box” testing to ensure the software we have just created was following the requirements. For gathering more information about testing, see the software engineering book. This post will focus on validation.
The difference between Epoch and Iteration
In soft computing, modelling an application needs training, an iteration step to adjust some parameters to achieve some goals. In neural network the iteration term is replaced by epoch. The interesting explanation was found in this site by Ph.D student of MIT. Whereas the iteration only run in forward direction every step, the epoch runs both forward and backward direction.
Testing and Training
For training, the data must be prepared. Not only for number of data but also the clean of data must be considered. Use some method in data cleansing before training. The suggestion from text book of neural network, e.g. neural network design: hagan, was using a normal Euclidean for data training.
Instead of using all data for training, some applications separate these data into two blocks: data for training and testing. For example in IDRISI software, when a model use MLP neural network for training, it separates into two data: training and testing.
To ensure that data training and testing are well designed, Matlab has provided with crossvalind function for cross validation. Cross validation change the data training into testing and vice versa. Now 10-fold cross validation is a standard that use 10 blocks of data. Nine of the 10 blocks data used for training and one for testing (see the explanation).
Receiver Operating Characteristic (ROC)
ROC is used for comparing a prediction with the result. If the system predict a signature’s owner is John, and the actual is John, this situation are said “true positive” and if not “false positive”. The others are true negative and false negative. The Area Under ROC-Curve ( see AUC post) is a famous parameters in IDRISI software for accuracy calculation of a model.
In Land Change Modeller, two predictions are: change and persistence. For example if we predict that a location in 2015 is change, but actually not, it is called False alarm. Instead of true and false, the Misses, False alarms, and Hits are used. The explanation from IDRISI site:
-
A | B | B = Hits – Model predicted change and it changed
-
A | A | B = Misses – Model predicted persistence and it changed
-
A | B | A = False Alarms – Model predicted change and it persisted
Other software and vertical application may be different term use in validation calculation.