Source Code & Executables

 


Creating a Project and Compiling the Source Code Here are two PowerPoint files showing the steps to create a project (Visual Studio) or solution (visual Studio.net) and compile the course source code files. These PowerPoint files are large (>900k) and may download slowly.

Visual C++ 6.0 screen shots (~ 1.6 MB)

Visual Studio.Net screen shots (~ 1 MB)


Float_error I've written a C++ program that explores the computational issues of real numbers represented by floating point numbers in computers using multiplication and division. Here is the header with the declarations of the Float_store class and here is the cpp control file. Here is a Windows executable of the program. Feel free to download the files and experiment with them. All I ask is that you give me proper source citation for the work contained in the files. © 2004 John Aleshunas


Fibonacci This is a C++ program that the Fibonacci sequence to explore the computational issues of integer numbers represented in computers. Here is a zip file containing the application source code. Here is a Windows executable of the program. Feel free to download the files and experiment with them.  All I ask is that you give me proper source citation for the work contained in the files. © 2004 John Aleshunas

Here is another variation of the Fibonacci generator. This version generates the very large integers correctly. Here is a zip file containing the application source code and here is the Windows executable. You can compare this source code with the version above to see the method of handling very large integers in computer computations. © 2007 John Aleshunas


Float_error2 Here is another C++ program that explores the computational issues of real numbers represented by floating point numbers in computers using squaring and rooting.  Here is the header with the declarations of the Float_store class and here is the cpp control file.  The Float_store class uses a vector<float> storage class to store the successive results as a stack. This stack storage facilitates finding the root easily by dividing by the former value. Here is a Windows executable of the program. Feel free to download the files and experiment with them.  All I ask is that you give me proper source citation for the work contained in the files. © 2004 John Aleshunas


Multivariate This C++ application reads a data set from disk and computes the descriptive statistics of the data. Here is the header with the declarations of the Flower and Flower_set classes and here is the cpp control file. The Flower_set class is a good example of a dynamically sized vector<Flower> storage class. Feel free to download the files and experiment with them.  All I ask is that you give me proper source citation for the work contained in the files.

Multivariate partitions the input dataset into each labeled subset and retains a copy of the total set. It computes the mean, maximum, minimum, variance, and standard deviation for each data subset. The application outputs the resulting descriptive statistics for each data set to a text file on disk for your analysis.

The data set used with this application is the Iris dataset created by Fisher in 1936. The three iris cultivars in the set provide good classification examples. The iris data set can be downloaded here. A description of the dataset is here. The description discusses the classification nuances of the three cultivars but doesn't provide a good explanation for this behavior; this is left as an exercise for you to find.


Monty Hall Puzzle This is a C++ implementation of the three door problem as shown on "Let's Make a Deal" staring Monty Hall. This code gives you the chance to experiment with the problem and maintains statistics of your success rate. The program presents you with a menu so that you can choose to play individual sessions or run a cluster of sessions automatically. You can also reset the application statistics and restart your session. Here is the header with the declarations of the application classes and here is the cpp control file. Here is a Windows executable of the program. Feel free to download the files and experiment with them.  All I ask is that you give me proper source citation for the work contained in the files. © 2005 John Aleshunas


K-Means (single attribute version) This is a C++ implementation of the K-Means algorithm. Here is the header with the declarations of the application classes and here is the cpp control file. Here is a Windows executable of the program. This program uses some supporting files: a control file and an input data file. Here are sample control files (for Iris and Wine data). You will need to rename (the application looks for a file named Control.dat) and edit the control file to identify which data input file you are using. Here are some data files you can use to experiment with this application: an iris dataset with no noise, the full iris dataset, a noisy iris dataset, a very noisy iris dataset, and finally the full wine dataset. The iris datasets use the petal length data and the wine dataset uses the Flavanoids data for this experiment. remember to edit the Control.dat file to correctly identify which input dataset and output dataset you are using. Feel free to download the files and experiment with them.  All I ask is that you give me proper source citation for the work contained in the files. © 2004 John Aleshunas

If you prefer a GUI version of this application rather than a command line version, use the Canary Project version.


K-Means (multi-attribute version) This is a C++ implementation of the K-Means algorithm. Here is a zip file with the application source code. Here is a Windows executable of the program. This program uses some supporting files: a control file and an input data file. Invoke the application using the command line string: k-means-multi Control_File_name ENTER. Here are sample control files (for Iris and Wine data). You will need to rename (the application looks for a file named Control.dat) and edit the control file to identify which data input file you are using. Here are some data files you can use to experiment with this application: the full iris dataset and finally the full wine dataset. This version of the K-Means algorithm clusters the data based on the behavior of all of the attributes in each instance tuple. This factor can dramatically chang the clustering behavior when compared to the single attribute version of the K-Means algorithm. Remember to edit the Control.dat file to correctly identify which input dataset and output dataset you are using. Feel free to download the files and experiment with them.  All I ask is that you give me proper source citation for the work contained in the files. © 2004 John Aleshunas [updated 2014]

If you prefer a GUI version of this application rather than a command line version, use the Canary Project version.

 

EM Clustering This is a C++ implementation of the Expectation Maximization clustering algorithm. The EM algorithm is a clustering methodology that determines the probability that a given data instance belongs to each of the clusters of the trained model. Here is a zip file containing the C++ source code, a Windows executable and some example datasets to experiment with.

© 2009 John Aleshunas


K-Nearest Neighbors This is a C++ implementation of the K-Nearest Neighbors algorithm. Here is the header with the declarations of the application classes and here is the cpp control file. Here is a Windows executable of the program. This program uses some supporting files: a control file (iris or wine), input classification files (iris or wine) and input test files (iris or wine). You will need to edit the control file to identify which data input file set (iris or wine) you are using. Feel free to download the files and experiment with them.  All I ask is that you give me proper source citation for the work contained in the files. © 2004 John Aleshunas

If you prefer a GUI version of this application rather than a command line version, use the Canary Project version.


Self-Organizing Map (SOM) This is the self-organizing map algorithm developed by Teuvo Kohonen at Helsinki University of Technology. Here are copies of the application documentation (text file and MS Word doc).

The SOM application consists of several sub-programs made up of many C code files. The executables for the four main sub-programs are here: randinit, vsom, vcal, and visual.

It is easiest to invoke the executables using DOS batch files. This technique provides better control of execution, assists with repetitive testing and eliminates typographical errors at the command prompt. The batch files are text and can be edited to suit your needs. Here are some batch files to use with the executable programs (randinit, randinit_iris, vsom1, vsom1_iris, vsom2, vsom2_iris, vcal, vcal_iris). If you want to use the Visual program, you will need to create your own batch file.

The files needed to compile the four main sub-programs are here in self extracting zip files: randinit, vsom, vcal, and visual.

Here are some data sets to test the application with (grid data, iris data, more iris data).

Here is the entire entire SOM package in an executable zip file.

I created another application, SOM_mapper, that makes it easier to extract the layers in a SOM (labels, individual attributes, etc.) and display them in a single map image for analysis. Here is an executable zip file containing SOM_mapper and some example files. © 2008 John Aleshunas

SOM_mapper has two operating requirements:

1) The application gets its parameters from a control.dat file. The control.dat file has three arguments: the layer to extract, the input label file and the output map file. The layer argument should be 0 (zero) to extract the map labels (SOM_mapper inserts 5 underscores (_____) or the word BLANK for empty labels). Each of the other attribute layers are numbered starting at 1 and going up to the maximum attribute count.

2) The map labels MUST have a non-numeral as the first character. Any character, other than 0 - 9, is acceptable.


C 4.5 is a decision tree algorithm created by J. Ross Quinlan. It is based on the ID3 algorithm. the definitive reference for C 4.5 is the book C 4.5: Programs for Machine Learning by J. Ross Quinlan, Morgan Kaufmann, ISBN: 1-55860-238-0, 1993 [available at the Emerson Library]. The instructions to use the application are in chapter 9 of this text and here is a pdf of that content.

The actual application consists of four executable files c45, c45rules, consult, and consultr. C45 reads your input data and creates a decision tree (outputted to a file and displayed at the command prompt). C45rules converts the tree data into a set of production rules (outputted to a file and displayed at the command prompt). Consult runs an interactive session with a person, using the tree data to classify the user's responses to questions. Consultr also runs an interactive session, using the rules data to classify the user's responses to questions.

The C 4.5 executable use command line arguments and are easily invoked using batch files (prevents spelling errors). The application needs two data files to operate: a .names file and a .data file. The .names file identifies the classification names, the attribute names and the attribute data types.

Here are the C 4.5 executables: c45.exe, c45rules.exe, consult.exe, and consultr.exe

Here are sample data files: iris.names, iris.data, golf.names, golf.data, wine.names, and wine.data

Here are batch files that can be used with the data files above:

c45_iris.bat, c45rules_iris.bat, consult_iris.bat, and consultr_iris.bat

c45_golf.bat, c45rules_golf.bat, consult_golf.bat, and consultr_golf.bat

c45_wine.bat, c45rules_wine.bat, consult_wine.bat, and consultr_wine.bat

If you want to try C 4.5, save all of the files into a folder, open a Command (DOS) window, change your working directory to the folder containing the C 4.5 files, and invoke the batch files from the command prompt. The required order of execution is: c45.exe (c45_FILENAME.bat) is first, it must precede c45rules.exe (c45rules_FILENAME.bat) and consul.exe (consult_FILENAME.bat). c45rules.exe (c45rules_FILENAME.bat) must follow c45.exe and must run before consultr.exe (consultr_FILENAME.bat).

 

giniTree is a decision tree implementation in C++ using the gini index to determine the optimal splitting attributes. Here is a zip file containing the C++ source code, a Windows executable and some example datasets to experiment with. Here is an executable zip file of the same content if you don't have an unzip utility [some firewall systems may block the download of files with an exe extension].

© 2009 John Aleshunas


A Simple Neural Network using the Backpropagation Algorithm implementation. This implementation of the backpropagation algorithm uses a sigmoid activation function in the network nodes. Here is the header with the declarations of the application classes and here is the cpp control file. Here is a Windows executable of the program. Here is brief user documentation for this application (in MS Word format). This program uses some supporting files: a control file an input data file and a test data file. Here are sample control files (for Construct and Iris data - construct is a generated dataset to test classification during code development). You will need to rename (the application looks for a file named Control.dat) and edit the control file to identify which data input file you are using. Here are some data files you can use to experiment with this application: the full construct dataset and the full iris dataset. Finally, here are two datasets to test the classification accuracy of your network: a construct test dataset and an iris test dataset. Remember to edit the Control.dat file to correctly identify which input dataset and test dataset you are using. Feel free to download the files and experiment with them.  All I ask is that you give me proper source citation for the work contained in the files.

© 2005 John Aleshunas


A Simple Genetic Algorithm implementation

 

A Simple Genetic Programming implementation using a modified form of lilgp [ref: http://garage.cse.msu.edu/software/lil-gp/] called CGP developed by Cezary Janikow. [rererence]. Here is a zip file of the automated ant problem. The zip file contains the ant GP executable, the necessary control files, all of the user modifiable source code files and example batch files to run the application.

 

 

 

 

big lebowski