MATH 3210 Data Mining
Foundations


Course Description:
This course explores the core concepts of data mining including the research methodology and process, data sources, messy data and data cleansing. It also examines algorithms in each of the main data mining groupings of classification, categorization, and association rules. The course emphasizes the use of data mining concepts in real-world applications with database components. Students will present their findings and recommendations in written and oral project reports.


Course Syllabus

Weekly Road Map

Student Presentation Topics


Learning Outcome Assessment Rubrics


Course Materials:


Course Documentation

Report Format This is a link to the format we'll use for our assignment reports. It is saved as a Microsoft Word document

Using the Visual Studio Step Debugger This is a link to a Microsoft Word document that explains the basics of using the step debugger built into MS Visual Studio.

Classroom Visual Aids


Source Code & Executables


Course Support Materials


Data Sets

Supplemental Excel data sets


Homework Assignments:

Week 1  Assignment #1 Due 9 September

Week 2  Assignment:  Student Presentation Topics

Week 3   ERD Examples Assignment #2 Due Monday 30 September   

Week 4  Assignment #3 in-class exercise

Week 5  

Week 6   Assignment #4 Due Monday 28 October   

Week 7   Decision Tree Example

Week 8 Assignment: Term Project Proposals Due   

Week 9  Assignment #5 Due Monday 11 November  

Week 10   

Week 11  Assignment #6 Due Monday 25 November  

Week 12

Week 13  Assignment #7 Due Monday 9 December  

Week 14   

Week 15

Week 16 Final Research Project Presentation [location: ]  16 December @ 1:00 PM  


Helpful Links:

The Mathematics & Computer Science Lib-Guide is a repository created to support research work by faculty and students in the Mathematics & Computer Science Department. You should find links supporting many of your research needs on this site including DOI article search, e-book resources, reference websites, and writing and citation style guides.

Here is a search widget you can use [it is set to enter a DOI but you can search by other criterion once you get to the search page]

DOI Search

source KD nuggets is a good source for information on Data Mining, Knowledge Discovery, Genomic Mining, and Web Mining.

source The UCI Machine Learning Repository is a repository of databases, domain theories and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms.

data.gov Data.gov is another source of free datasets. You can get data sets in several formats [CSV, XML, etc.] Since many of the files are in XML format you can Google search for an XML parser to read and convert the XML files to another text format.

Correlates of WarThe Correlates of War Project was founded in 1963 by J. David Singer, a political scientist at the University of Michigan. The original and continuing goal of the project has been the systematic accumulation of scientific knowledge about war. Joined by historian Melvin Small, the project began its work by assembling a more accurate data set on the incidence and extent of inter-state and extra-systemic war in the post-Napoleonic period. To do this scientifically Singer and Small found they needed to operationally resolve a number of difficult issues such as what is a ?state? and what precisely is a ?war.? Building upon the work of other pioneers such as Pitirim Sorokin, Lewis Frye Richardson, and Quincy Wright, Singer and Small published The Wages of War in 1972, a work that established a standard definition of war that has guided the research of hundreds of scholars since its publication.

source Bitpipe Data Mining: Reports. Browse these reports to find the latest Data Mining white papers, product literature, webcasts, and case studies. Please be patient, this page loads slowly.

source  Founded in 1999, Bayesware Limited is a privately-owned company developing Knowledge Discovery and Data Mining software based on Bayesian methods. Bayesware produces and supports programs, delivers custom-built solutions, provides training programs, and offers consultancy services to corporate customers and public institutions.

They have a product called Bayesware Discoverer which is a Knowledge Discovery program based on Bayesian networks. You can get a free academic version of Bayesware Discoverer or a free 30-day trial copy of Bayesware Discoverer Professional from the Bayesware site.

source This is a link to the Genetic Algorithm Archives at the Navy Center for Applied Research in Artificial Intelligence (NCARAI).

The Navy Center for Applied Research in Artificial Intelligence (NCARAI) has been involved in both basic and applied research in artificial intelligence since its inception in 1982. NCARAI, part of the Information Technology Division within the Naval Research Laboratory , is engaged in research and development efforts designed to address the application of artificial intelligence technology and techniques to critical Navy and national problems.

The research program of the Center is directed toward understanding the design and operation of computer systems capable of improved performance based on experience; efficient and effective interaction with other systems and with humans; sensor-based control of autonomous activity; and the integration of varieties of reasoning as necessary to support complex decision-making. The emphasis at NCARAI is the linkage of theory and application in demonstration projects that use a full spectrum of artificial intelligence techniques.

The NCARAI includes the Immersive Simulations section, the Intelligent Multimodal/Multimedia Systems section, the Intelligent Systems section, and the Interface Design and Evaluation section.

Math Software Vendor Links

Here?s some links to mathematical and statistical software vendors:

SPSS              http://www.spss.com/

SAS                http://www.sas.com/

S-Plus             http://www.insightful.com/

Mathematica    http://www.wolfram.com/

MatLab            http://www.mathworks.com/

Maple              http://www.maplesoft.com/

These will give you some idea what resources are available and what they can do.

 

Here's some links to Open Source mathematical and statistical software that you may find useful

Notepad++      http://notepad-plus-plus.org/

Orange      http://orange.biolab.si/

PSPP              http://www.gnu.org/software/pspp/

R              http://www.r-project.org/

Octave       http://www.gnu.org/software/octave/

Sage       http://www.sagemath.org/

Enthought Canopy (Python development environment)      https://www.enthought.com/

OpenOffice       http://www.openoffice.org/

LibreOffice         https://www.libreoffice.org/

 

Webster University Writing CenterWebster University Writing CenterThe Webster University Writing Center offers free and friendly writing advice to all students, staff, and faculty at Webster University. Their trained coaches will help with every stage of the writing process, from brainstorming ideas to documenting sources. They work with all levels of writers and projects, including reports; résumés and cover letters; admission essays and personal statements; summaries, critical analyses, and literature reviews; research and term papers; theses and dissertations; and more.

Purdue University's On-line Writing Lab (OWL)   Purdue University's On-line Writing Lab (OWL)

 

Strange Facts