AdvancedMiner Documentation


Table of Contents

AdvancedMiner System Documentation
1. AdvancedMiner Installation and Administration
System requirements
AdvancedMiner installation
In the MS Windows operating system
On the Linux operating system
Additional information
Running the server
In the MS Windows operating system
On the Linux operating system
License key
Troubleshooting
Running the client
Connecting to the server
Running the script with AdvancedMiner on the command line
Uninstallation
I. Tutorials
2. Tutorials Overview
3. Quick Start Tutorial
Introduction
Graphical user interface overview
Working with Projects
Connecting to a Metadata Repository and Databases
Connecting to a Metadata Repository
Connecting to a database
Importing data
Loading an example dataset
Importing CSV files
Copying tables between different databases
4. Data Exploration Tutorial
Introduction
Data exploration
Exploring data structure
Exploring data content
Preparing a dataset for modeling (data processing example)
5. Model Building Guide
6. Model Testing Guide
7. Model Applying Guide
8. Manual Scoring Card Building Guide
9. Scoring Card Building With Model Guide
Logistic regression model building
Converting the model to a scoring card
II. AdvancedMiner Concepts
10. Introduction to AdvancedMiner
Introduction
AdvancedMiner Client
AdvancedMiner Server
11. AdvancedMiner Client Graphical User Interface
IDE
Window management
Menus and actions
Components
Dictionary
Lift for Tree
Log Viewer
Projects
Files
Versioning
Documents
Editor
Processes
Favorites
Output
Navigator
Palette
Properties
Tasks
Services
Search results
Metadata Object Editors
Script Editor
12. Metadata Repository
Introduction to Metadata Repository (MR)
Metadata Repository concepts
Connecting to a Metadata Repository
AdvancedMiner icons for data objects
References
Data Object manipulation
Executing Tasks
Saving and Reloading
Testing
MR Object List
PhysicalData
LogicalData
CalculateStatisticsTask
CalculateTestResultTask
ComputeModelStatisticsTask
MiningBuildTask
MiningApplyTask
TestTasks
MatchingTask
TransformationBuildTask
TransformationApplyTask
ScriptWrapper
Trigger
ScoringCodeBuildTask
ScoringCodeApplyTask
MiningFunctionSettings
TransformationSettings
III. Using AdvancedMiner
13. Workflow
Introduction to Workflow
Using Workflow Module
Description of nodes
Data
Data Exploration
Diagrams
Technical Transformations
Analytical Transformations
Modeling
SNA
Results
Gython
Other
14. Gython – the AdvancedMiner Scripting Language
Python quick reference
Syntax
Variables
Operators
Flow control
Working with objects
Defining and calling functions
Gython methods for different types of variables
String methods
List methods
Dictionary methods
Python Library functions
Built-in functions
String functions
Mathematical functions
Random functions
Date/time objects
Managing Gython objects
Constructing and accessing objects
Saving objects
Loading objects
Renaming objects
Executing tasks
Deleting objects
Checking object existence
Task termination
Saving script environment
Loading script environment
Setting alias to the metadata repository
Sending messages to the log
Registry Repository
Project path
Context Scripts
How do context scripts work?
Where can I find context scripts?
Writing context scripts
Requesting user input using InputDialog
15. AdvancedMiner in Practice
Model building
General rules
Approximation model building
Classification model building
Clustering model building
Survival model building
Model testing
Approximation Test Task
Classification Test Task
Survival Test Task
Time Series Test Task
Classification Test Result Task
Applying Models in AdvancedMiner
Basic concepts
Advanced concepts
Minimal set-up
Applying for different mining functions
Examples
Shorthand methods of building, testing and applying models
Approximator
Classifier
Clusterer
Applier
Experiments
Experiments project
Running experiments
Comparing models
Dictionary
Social Network Analysis
Building networks
Filtering networks
Analysing networks
Visualising networks
16. Data Access and Data Processing
Database Access
Aliases
Database explorer
Using SQL statements
Importing and exporting data and other database operations
Importing MS Excel spreadsheets
Importing CSV files
Exporting data to MS Excel spreadsheets
Exporting data to CSV files
Getting column list for a database table
Deleting a database table
Checking for a database table existence.
Creating tables in Gython
Creating a table with manually specified data
Creating a table with data copied from a list
Creating a table with data obtained from an sql query
Using lists to define column names and formats
Importing data from external sources
The Trans procedure
Basic transformations
The where keyword
The keep in and drop in keywords
The keep out and drop out keywords
The format keyword
Indexes
Flow control
Appending tables
The rename keyword
Joining tables
Notes
Data transformation functions
Ranking data (the rank procedure)
Expansion of data (the interpolate procedure)
Sampling data (the sample command)
Splitting tables (the tableSplit procedure)
Transposing tables (the transpose procedure)
Comparing two tables (the tablesCompare procedure)
Predefined transformations for data Mining models
Introduction
Transformation Types
Usage
Examples
Important notes
References
17. Integration with common office suites
Built-in support for Office Suites
Setting up an MS-Office connection
Setting up an OpenOffice connection
Creating custom reports
Creating and working with a spreadsheet document
Creating and using a text document
18. Optimization Library
The Optimization Problem
Objective Function
Constraints
Optimization methods
Solving the optimization problem
Usage
References
19. Statistical Procedures and Tests
Statistical functions
Chi-square statistic
Pearson's correlation coefficient
Multidimensional frequency analysis procedure
Statistical tests
Statistical test usage
Empirical distribution function
The Anderson-Darling test
The Chi-square test
The F-test
Kolmogorov-Smirnov test
Kuiper test
Levene's test
The Mann-Whitney test
Pearson's test
Test of proportions
Sign test
Spearman's test
Student's t-test
References
20. Probability distributions
Distributions Library
Characteristics and samples of the distributions
List of available continuous distributions
List of available discrete distributions
Distribution Tables
Special Functions Library
Sample Statistics of Empirical Data
Random Number Generators
References
21. Monte Carlo Markov Chains Library
Introduction
The MarkovChain class
Description
MarkovChain object methods
MarkovChain static methods
Algorithms
The Metropolis algorithm
Metropolis-Hastings algorithm
Bayesian inference
Transition functions
Transition functions from distribution
Random walk transition function
Distributions
Sampling Distribution
Likelihood function
Helper distributions
Convergence Diagnostics and Output Analysis tool
Output Analysis
Diagnostics
References
22. Scoring Code in AdvancedMiner
Introduction
Scoring code for models
Requirements
Creating Java scoring code based on a model step by step
Architecture of Java scoring code
Executing scoring code for a model
Differences in Scoring Code output for various model
Executing scoring code outside the AdvancedMiner system
Reading the Input Signature
Example of using scoring code in an external application
23. Data Visualization
Introduction
Preparing data for plotting
Data objects
Declaring column types
Automatically obtaining the data type
Data specification patterns
Series grouping
Inconsistent data
Creating plots and charts
Chart objects
Chart object methods
Chart types
Grouping charts
Additional topics
Manipulating plots
Manipulating 2D plots
Manipulating 3D plots
24. Freq - a visual data exploration tool
Introducing Freq
Launching Freq
Overview of the Freq component
Working with attributes
Calculating attributes
Attribute view
Attribute display modes
Histogram types
Editing levels and grouping values
Analyzing data with Freq
Virtual attributes
Filtering data
Working with targets
Correlation matrix
Exporting to Excel spreadsheets
Attribute statistics in Freq
Basic attribute statistics
Attribute correlation statistics
Target related statistics
Integration with other components
Opening physical data
Viewing data
Binding between components
25. Report Engine
Introduction
Usage
26. Operating Server
Introduction
Requirements and Architecture
Configuration
Quick Start guide
27. Model Reports
Introduction
Efficiency Report
Statistical Test Report
Stability Report
IV. Modules
28. Automatic Variable Selection
Introduction
Method description
Method assumptions
Full Model
Forward Selection
Backward Elimination
Stepwise Selection
Best Subset Selection
Usage
Data requirements
Model building and testing
Model application
Example of automatic variable selection
References
29. Bivariate Probit
Introduction
Method description
Full observability likelihood function
Partial observability likelihood function
Maximum likelihood estimator
Model significance
Testing for zero correlation
Confidence limits
Usage
Data requirements
Model building
Model application
Example
References
30. Classification Trees
Introduction
Method description
The structure of Classification Trees
Tree building algorithm
Tree pruning
Null values
Usage
Data requirements
Model building and testing
Model application
Model statistics
Example
References
31. Smart Trees
Introduction
Method description
The structure of Smart Trees
Model building algorithm
Null values
Usage
Data requirements
Model building and testing
Model statistics
32. Discriminant Analysis
Introduction
Method description
The discriminant analysis model
Model assumptions
Usage
Data requirements
Model building and testing
Model application
Example
References
33. Matching (Data Quality)
Introduction
Method description
Blocking indexes
Attribute similarity evaluations
Record classification
Usage
Features
Data requirements
Model building and testing
Model Application
Examples
References
34. Feed Forward Neural Networks
Introduction
Method description
Usage
Data requirements
Model building and testing
Model application
Examples
Data preparation
Model building examples
Model application examples
Model testing examples
References
35. K-Means Clustering
Introduction
Method description
Usage
Data requirements
Model building
Model statistics
Model application
Example of K-Means Clustering
References
36. Kohonen Networks
Introduction
Method description
Usage
Data requirements
Model building
Computation of model statistics
SOM Explorer
The SOM Model
Visualization
Saving a modified model
Examples
References
37. Linear Regression
Introduction
Method description
Standard linear regression
Weighted Linear Regression (WLS)
Iteratively Re-Weighted Least Squares (IRLS) Regression
Usage
Data requirements
Model building and testing
Model application
Examples
Standard linear regression example
IRLS regression example
References
38. Logistic Regression
Introduction
Method description
The logit function
Odds and odds ratio
Likelihood function
Measures of goodness of fit of the model
Multicollinearity in Logistic Regression
Confidence intervals
Usage
Data requirements
Model building and testing
Model application
Example of logistic regression
References
39. Survival Analysis
Introduction
Method description: survival models
Censored observations
Nonparametric models
The Cox model
Usage
Data requirements
Model building and testing
Model application
Example of Survival Analysis
Non-parametric survival model example
References
40. Scoring Card
Introduction
Method description
Definitions and notation
Algorithm details
Usage
Data requirements
Model building
Model testing
Model application
Examples
Creating a scoring card using the provided context script
References
41. Time Series
Introduction
Method description
Usage
Data requirements
Model building
Model testing
Model application
Examples
Model building
Model testing
Model application
References
42. Social Network Analysis Module
Introduction
Method description
Social Network
Classification of networks
Basic concepts used in network analysis
Description of the algorithms used in the Social Network Analysis
Usage
Network building
Network analysis
Network filtering
Network visualization
Examples
References
A. Examples
Scoring code
Automatic Variable Selection
Bivariate Probit
Classification Trees
Discriminant Analysis
Feed Forward Neural Networks
Kohonen Networks
Linear Regression
IRLS Regression
Logistic Regression
Survival Analysis - nonparametric model
Survival Analysis - example of the Cox semiparametric model
PCA transformation
Calculate Statistics Example
B.
Language Codes
Country Codes
V. GDBase
Preface
43. GDBase Command Reference
Introduction
ALTER TABLE
CHECK TABLE
COMMENT
CREATE INDEX
CREATE/REPLACE TABLE
CREATE TABLE ... TRANSFORM
The __vars__ dictionary
Skipping rows
Processing in groups
The __save__ function
Referring to previous rows
Using sql inside TRANSFORM
CREATE TRIGGER
The RAISE function
CREATE VIEW
DELETE
DROP INDEX
DROP TABLE
DROP TRIGGER
DROP VIEW
GET
INSERT
MERGE
ON CONFLICT
REPLACE INTO
SELECT
DISTINCT
KEEP and DROP
FROM
WHERE
GROUP BY
ORDER BY
LIMIT
UNION, UNION ALL, APPEND, EXCEPT, INTERSECT
JOIN
SAMPLE
Order of execution in SELECT statements
TRANSACTION
UPDATE
Data types
Expressions
Unary operators
Binary operators
Column names
SELECT statement in expressions
CAST statement
Additional information
GDBase core functions
Simple functions
Aggregate Functions
Window functions
Quoting
GDBase keywords
Special characters
Nulls handling
Comments in SQL code
44. Importing and exporting data
Importing and exporting between GDBase databases
Importing from a local GDBase database
Importing from a remote GDBase database
Exporting to the local GDBase database
Exporting to a remote GDBase database
Importing and exporting using ODBC drivers
IMPORT ... USING ODBC
EXPORT ... USING ODBC
Data types in imported tables
45. GDBase Administration
General information
GDBase setup
Database log-in
Default user
Administrator accounts
Creating an administartor account
Changing the administrator password
Users
Adding new users
User properties
Changing user properties
Displaying user properties
Table ownership
User Privileges
Changing user password
Deleting users
Access Control
Determining access privileges
Default Table Properies
Controlling queries
SHOW PROCESS
KILL PROCESS
PAUSE PROCESS
RESUME PROCESS
Commands available from user interface level
Additional information
Resetting user privileges
C. GDBase Keywords
Index