AdvancedMiner Documentation
		Next

AdvancedMiner Documentation

Table of Contents

AdvancedMiner System Documentation

1. AdvancedMiner Installation and Administration

System requirements

AdvancedMiner installation

In the MS Windows operating system
On the Linux operating system
Additional information

Running the server

In the MS Windows operating system
On the Linux operating system
License key
Troubleshooting

Running the client

Connecting to the server

Running the script with AdvancedMiner on the command line

Uninstallation

I. Tutorials

2. Tutorials Overview

3. Quick Start Tutorial

Introduction

Graphical user interface overview

Working with Projects

Connecting to a Metadata Repository and Databases

Connecting to a Metadata Repository
Connecting to a database

Importing data

Loading an example dataset
Importing CSV files
Copying tables between different databases

4. Data Exploration Tutorial

Introduction

Data exploration

Exploring data structure
Exploring data content

Preparing a dataset for modeling (data processing example)

5. Model Building Guide

6. Model Testing Guide

7. Model Applying Guide

8. Manual Scoring Card Building Guide

9. Scoring Card Building With Model Guide

Logistic regression model building
Converting the model to a scoring card

II. AdvancedMiner Concepts

10. Introduction to AdvancedMiner

Introduction
AdvancedMiner Client
AdvancedMiner Server

11. AdvancedMiner Client Graphical User Interface

IDE

Window management
Menus and actions

Components

Dictionary
Lift for Tree
Log Viewer
Projects
Files
Versioning
Documents
Editor
Processes
Favorites
Output
Navigator
Palette
Properties
Tasks
Services
Search results
Metadata Object Editors
Script Editor

12. Metadata Repository

Introduction to Metadata Repository (MR)

Metadata Repository concepts
Connecting to a Metadata Repository
AdvancedMiner icons for data objects
References
Data Object manipulation
Executing Tasks
Saving and Reloading
Testing

MR Object List

PhysicalData

LogicalData

CalculateStatisticsTask

CalculateTestResultTask

ComputeModelStatisticsTask

TransformationBuildTask

TransformationApplyTask

ScriptWrapper

Trigger

ScoringCodeBuildTask

ScoringCodeApplyTask

MiningFunctionSettings

TransformationSettings

III. Using AdvancedMiner

13. Workflow

Introduction to Workflow

Using Workflow Module

Description of nodes

Data
Data Exploration
Diagrams
Technical Transformations
Analytical Transformations
Modeling
SNA
Results
Gython
Other

14. Gython – the AdvancedMiner Scripting Language

Python quick reference

Syntax
Variables
Operators
Flow control
Working with objects
Defining and calling functions

Gython methods for different types of variables

String methods
List methods
Dictionary methods

Python Library functions

Built-in functions
String functions
Mathematical functions
Random functions
Date/time objects

Managing Gython objects

Constructing and accessing objects
Saving objects
Loading objects
Renaming objects
Executing tasks
Deleting objects
Checking object existence
Task termination
Saving script environment
Loading script environment
Setting alias to the metadata repository
Sending messages to the log
Registry Repository
Project path

Context Scripts

How do context scripts work?
Where can I find context scripts?
Writing context scripts

Requesting user input using InputDialog

15. AdvancedMiner in Practice

Model building

General rules
Approximation model building
Classification model building
Clustering model building
Survival model building

Model testing

Approximation Test Task
Classification Test Task
Survival Test Task
Time Series Test Task
Classification Test Result Task

Applying Models in AdvancedMiner

Basic concepts
Advanced concepts
Minimal set-up
Applying for different mining functions
Examples

Shorthand methods of building, testing and applying models

Approximator
Classifier
Clusterer
Applier

Experiments

Experiments project
Running experiments
Comparing models
Dictionary

Social Network Analysis

Building networks
Filtering networks
Analysing networks
Visualising networks

16. Data Access and Data Processing

Database Access

Aliases
Database explorer
Using SQL statements

Importing and exporting data and other database operations

Importing MS Excel spreadsheets
Importing CSV files
Exporting data to MS Excel spreadsheets
Exporting data to CSV files
Getting column list for a database table
Deleting a database table
Checking for a database table existence.

Creating tables in Gython

Creating a table with manually specified data
Creating a table with data copied from a list
Creating a table with data obtained from an sql query
Using lists to define column names and formats
Importing data from external sources

The Trans procedure

Basic transformations
The where keyword
The keep in and drop in keywords
The keep out and drop out keywords
The format keyword
Indexes
Flow control
Appending tables
The rename keyword
Joining tables
Notes

Data transformation functions

Ranking data (the rank procedure)
Expansion of data (the interpolate procedure)
Sampling data (the sample command)
Splitting tables (the tableSplit procedure)
Transposing tables (the transpose procedure)
Comparing two tables (the tablesCompare procedure)

Predefined transformations for data Mining models

Introduction
Transformation Types
Usage
Examples
Important notes

References

17. Integration with common office suites

Built-in support for Office Suites

Setting up an MS-Office connection
Setting up an OpenOffice connection

Creating custom reports

Creating and working with a spreadsheet document
Creating and using a text document

18. Optimization Library

The Optimization Problem
Objective Function
Constraints
Optimization methods
Solving the optimization problem
Usage
References

19. Statistical Procedures and Tests

Statistical functions

Chi-square statistic
Pearson's correlation coefficient
Multidimensional frequency analysis procedure

Statistical tests

Statistical test usage
Empirical distribution function
The Anderson-Darling test
The Chi-square test
The F-test
Kolmogorov-Smirnov test
Kuiper test
Levene's test
The Mann-Whitney test
Pearson's test
Test of proportions
Sign test
Spearman's test
Student's t-test
References

20. Probability distributions

Distributions Library

Characteristics and samples of the distributions
List of available continuous distributions
List of available discrete distributions
Distribution Tables

Special Functions Library

Sample Statistics of Empirical Data

Random Number Generators

References

21. Monte Carlo Markov Chains Library

Introduction

The MarkovChain class

Description
MarkovChain object methods
MarkovChain static methods

Algorithms

The Metropolis algorithm
Metropolis-Hastings algorithm
Bayesian inference

Transition functions

Transition functions from distribution
Random walk transition function

Distributions

Sampling Distribution
Likelihood function
Helper distributions

Convergence Diagnostics and Output Analysis tool

Output Analysis
Diagnostics

References

22. Scoring Code in AdvancedMiner

Introduction

Scoring code for models

Requirements
Creating Java scoring code based on a model step by step
Architecture of Java scoring code
Executing scoring code for a model
Differences in Scoring Code output for various model

Executing scoring code outside the AdvancedMiner system

Reading the Input Signature
Example of using scoring code in an external application

23. Data Visualization

Introduction

Preparing data for plotting

Data objects
Declaring column types
Automatically obtaining the data type
Data specification patterns
Series grouping
Inconsistent data

Creating plots and charts

Chart objects
Chart object methods
Chart types
Grouping charts
Additional topics

Manipulating plots

Manipulating 2D plots
Manipulating 3D plots

24. Freq - a visual data exploration tool

Introducing Freq

Launching Freq
Overview of the Freq component

Working with attributes

Calculating attributes
Attribute view
Attribute display modes
Histogram types
Editing levels and grouping values

Analyzing data with Freq

Virtual attributes
Filtering data
Working with targets
Correlation matrix

Exporting to Excel spreadsheets

Attribute statistics in Freq

Basic attribute statistics
Attribute correlation statistics
Target related statistics

Integration with other components

Opening physical data
Viewing data
Binding between components

25. Report Engine

Introduction
Usage

26. Operating Server

Introduction
Requirements and Architecture
Configuration
Quick Start guide

27. Model Reports

Introduction
Efficiency Report
Statistical Test Report
Stability Report

IV. Modules

28. Automatic Variable Selection

Introduction

Method description

Method assumptions
Full Model
Forward Selection
Backward Elimination
Stepwise Selection
Best Subset Selection

Usage

Data requirements
Model building and testing
Model application

Example of automatic variable selection

References

29. Bivariate Probit

Introduction

Method description

Full observability likelihood function
Partial observability likelihood function
Maximum likelihood estimator
Model significance
Testing for zero correlation
Confidence limits

Usage

Data requirements
Model building
Model application

Example

References

30. Classification Trees

Introduction

Method description

The structure of Classification Trees
Tree building algorithm
Tree pruning
Null values

Usage

Data requirements
Model building and testing
Model application
Model statistics

Method description

The structure of Smart Trees
Model building algorithm
Null values

Usage

Data requirements
Model building and testing

Model statistics

32. Discriminant Analysis

Introduction

Method description

The discriminant analysis model
Model assumptions

Usage

Data requirements
Model building and testing
Model application

Example

References

33. Matching (Data Quality)

Introduction

Method description

Blocking indexes
Attribute similarity evaluations
Record classification

Usage

Features
Data requirements
Model building and testing
Model Application

Examples

References

34. Feed Forward Neural Networks

Introduction

Method description

Usage

Data requirements
Model building and testing
Model application

Examples

Data preparation
Model building examples
Model application examples
Model testing examples

References

35. K-Means Clustering

Introduction

Method description

Usage

Data requirements
Model building
Model statistics
Model application

Example of K-Means Clustering

References

36. Kohonen Networks

Introduction

Method description

Usage

Data requirements
Model building
Computation of model statistics

SOM Explorer

The SOM Model
Visualization
Saving a modified model

Examples

References

37. Linear Regression

Introduction

Method description

Standard linear regression
Weighted Linear Regression (WLS)
Iteratively Re-Weighted Least Squares (IRLS) Regression

Usage

Data requirements
Model building and testing
Model application

Examples

Standard linear regression example
IRLS regression example

References

38. Logistic Regression

Introduction

Method description

The logit function
Odds and odds ratio
Likelihood function
Measures of goodness of fit of the model
Multicollinearity in Logistic Regression
Confidence intervals

Usage

Data requirements
Model building and testing
Model application

Example of logistic regression

References

39. Survival Analysis

Introduction

Method description: survival models

Censored observations
Nonparametric models
The Cox model

Usage

Data requirements
Model building and testing
Model application

Example of Survival Analysis


Non-parametric survival model example

References

40. Scoring Card

Introduction

Method description

Definitions and notation
Algorithm details

Usage

Data requirements
Model building
Model testing
Model application

Examples

Creating a scoring card using the provided context script

References

41. Time Series

Introduction

Method description

Usage

Data requirements
Model building
Model testing
Model application

Examples

Model building
Model testing
Model application

References

42. Social Network Analysis Module

Introduction

Method description

Social Network
Classification of networks
Basic concepts used in network analysis
Description of the algorithms used in the Social Network Analysis

Usage

Network building
Network analysis
Network filtering
Network visualization

Examples

References

A. Examples

Scoring code
Automatic Variable Selection
Bivariate Probit
Classification Trees
Discriminant Analysis
Feed Forward Neural Networks
Kohonen Networks
Linear Regression
IRLS Regression
Logistic Regression
Survival Analysis - nonparametric model
Survival Analysis - example of the Cox semiparametric model
PCA transformation
Calculate Statistics Example

B.

Language Codes
Country Codes

V. GDBase

Preface

43. GDBase Command Reference

CREATE/REPLACE TABLE

CREATE TABLE ... TRANSFORM

The __vars__ dictionary
Skipping rows
Processing in groups
The __save__ function
Referring to previous rows
Using sql inside TRANSFORM

CREATE TRIGGER

The RAISE function

DISTINCT
KEEP and DROP
FROM
WHERE
GROUP BY
ORDER BY
LIMIT
UNION, UNION ALL, APPEND, EXCEPT, INTERSECT
JOIN
SAMPLE
Order of execution in SELECT statements

Unary operators
Binary operators
Column names
SELECT statement in expressions
CAST statement
Additional information

GDBase core functions

Simple functions
Aggregate Functions
Window functions

Quoting

GDBase keywords
Special characters

Nulls handling

Comments in SQL code

44. Importing and exporting data

Importing and exporting between GDBase databases

Importing from a local GDBase database
Importing from a remote GDBase database
Exporting to the local GDBase database
Exporting to a remote GDBase database

Importing and exporting using ODBC drivers

IMPORT ... USING ODBC
EXPORT ... USING ODBC

Data types in imported tables

45. GDBase Administration

General information

GDBase setup
Database log-in
Default user

Administrator accounts

Creating an administartor account
Changing the administrator password

Users

Adding new users
User properties
Changing user properties
Displaying user properties
Table ownership
User Privileges
Changing user password
Deleting users

Access Control

Determining access privileges
Default Table Properies

Controlling queries

SHOW PROCESS
KILL PROCESS
PAUSE PROCESS
RESUME PROCESS

Commands available from user interface level

Additional information

Resetting user privileges

C. GDBase Keywords

Index