MATH 3220
Data Mining Methods



Course Description:
This course explores the core concepts of data mining including the research methodology and process, data sources, messy data and data cleansing. It also examines algorithms in each of the main data mining groupings of classification, categorization, and association rules. The course emphasizes the use of data mining concepts in real-world applications with database components. Students will present their findings and recommendations in written and oral project reports.


Course Syllabus

Weekly Roadmap


Learning Outcome Assessment Rubrics


Course Materials


Course Documentation

Report Format This is a link to the format we'll use for our assignment reports. It is saved as a Microsoft Word document

Using the Visual Studio Step Debugger This is a link to a Microsoft Word document that explains the basics of using the step debugger built into MS Visual Studio.

Classroom Visual Aids


Source Code & Executables


Student Presentation Topics


Course Support Materials


Data Sets

Supplemental Excel data sets


Homework Assignments:

Week 1 : Assignment #1 due

Week 2 : Assignment #2 due

Week 3 :  

Week 4 :    

Week 5 : Assignment #3  due

Week 6 :   

Week 7 : (Project Proposals) Due

Week 8 : Assignment #4  due

Week 9 :  

Week 10 : Assignment #5 due  

Week 11 : 

 Week 12 :Assignment #6 due  

Week 13 :  

Week 14 :  

Week 15 : Research Report Due 

Week 16 : Research Report Presentations [xx:xx PM on xx December]  


Helpful Links:

The Mathematics & Computer Science Lib-Guide is a repository created to support research work by faculty and students in the Mathematics & Computer Science Department. You should find links supporting many of your research needs on this site including DOI article search, e-book resources, reference websites, and writing and citation style guides.

Here is a search widget you can use [it is set to enter a DOI but you can search by other criterion once you get to the search page]

DOI Search

source KD nuggets is a good source for information on Data Mining, Knowledge Discovery, Genomic Mining, and Web Mining.

source The UCI Machine Learning Repository is a repository of databases, domain theories and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms

data.gov Data.gov is another source of free datasets. You can get data sets in several formats [CSV, XML, etc.] Since many of the files are in XML format you can Google search for an XML parser to read and convert the XML files to another text format.

Correlates of WarThe Correlates of War Project was founded in 1963 by J. David Singer, a political scientist at the University of Michigan. The original and continuing goal of the project has been the systematic accumulation of scientific knowledge about war. Joined by historian Melvin Small, the project began its work by assembling a more accurate data set on the incidence and extent of inter-state and extra-systemic war in the post-Napoleonic period. To do this scientifically Singer and Small found they needed to operationally resolve a number of difficult issues such as what is a “state” and what precisely is a “war.” Building upon the work of other pioneers such as Pitirim Sorokin, Lewis Frye Richardson, and Quincy Wright, Singer and Small published The Wages of War in 1972, a work that established a standard definition of war that has guided the research of hundreds of scholars since its publication.

.source Bitpipe Data Mining: Reports. Browse these reports to find the latest Data Mining white papers, product literature, webcasts, and case studies. Please be patient, this page loads slowly

.source  Founded in 1999, Bayesware Limited is a privately-owned company developing Knowledge Discovery and Data Mining software based on Bayesian methods. Bayesware produces and supports programs, delivers custom-built solutions, provides training programs, and offers consultancy services to corporate customers and public institutions. They have a product called Bayesware Discoverer which is a Knowledge Discovery program based on Bayesian networks. You can get a free academic version of Bayesware Discoverer or a free 30-day trial copy of Bayesware Discoverer Professional from the Bayesware site.

source This is a link to the Genetic Algorithm Archives at the Navy Center for Applied Research in Artificial Intelligence (NCARAI).The Navy Center for Applied Research in Artificial Intelligence (NCARAI) has been involved in both basic and applied research in artificial intelligence since its inception in 1982. NCARAI, part of the Information Technology Division within the Naval Research Laboratory , is engaged in research and development efforts designed to address the application of artificial intelligence technology and techniques to critical Navy and national problems.The research program of the Center is directed toward understanding the design and operation of computer systems capable of improved performance based on experience; efficient and effective interaction with other systems and with humans; sensor-based control of autonomous activity; and the integration of varieties of reasoning as necessary to support complex decision-making. The emphasis at NCARAI is the linkage of theory and application in demonstration projects that use a full spectrum of artificial intelligence techniques.The NCARAI includes the Immersive Simulations section, the Intelligent Multimodal/Multimedia Systems section, the Intelligent Systems section, and the Interface Design and Evaluation section.

Math Software Vendor Links

Here’s some links to mathematical and statistical software vendors:

SPSS              http://www.spss.com/

SAS                http://www.sas.com/

S-Plus             http://www.insightful.com/

Mathematica    http://www.wolfram.com/

MatLab            http://www.mathworks.com/

Maple              http://www.maplesoft.com/

These will give you some idea what resources are available and what they can do.

Open-Source Software Resources

Here’s some links to Open-Source mathematical, statistical, and productivity software vendors:

Notepad++      http://notepad-plus-plus.org/

Orange      http://orange.biolab.si/

PSPP              http://www.gnu.org/software/pspp/

R                      http://www.r-project.org/

RStudio           http://rstudio.org/

OpenOffice     http://www.openoffice.org/

LibreOffice     http://www.libreoffice.org/

BibWordExtender     http://bibword.codeplex.com/wikipage?title=BibWord%20Extender

Helpful Writing Links

webster_logotitle The Webster University Writing Center offers free and friendly writing advice to all students, staff, and faculty at Webster University. Their trained coaches will help with every stage of the writing process, from brainstorming ideas to documenting sources. They work with all levels of writers and projects, including reports; résumés and cover letters; admission essays and personal statements; summaries, critical analyses, and literature reviews; research and term papers; theses and dissertations; and more.

source   Purdue University's On-line Writing Lab (OWL)

 

 

Strange Facts