WO2002011017A2 - Multivariate responses using classification and regression trees systems and methods - Google Patents
Multivariate responses using classification and regression trees systems and methods Download PDFInfo
- Publication number
- WO2002011017A2 WO2002011017A2 PCT/US2001/021753 US0121753W WO0211017A2 WO 2002011017 A2 WO2002011017 A2 WO 2002011017A2 US 0121753 W US0121753 W US 0121753W WO 0211017 A2 WO0211017 A2 WO 0211017A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- function
- split
- node
- server
- child
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/02—Banking, e.g. interest calculation or account maintenance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
Definitions
- This invention relates generally to prediction of responses using mathematical algorithms for quality measurements and more specifically to the use of Classification and Regression Tree (CART) analysis for prediction of responses.
- CART Classification and Regression Tree
- an amount collected on a charged off loan is a function of many demographic variables, as well as historic and current information on the debtor. If one desired to predict the amount paid for an individual borrower, a statistical model need be built from an analysis of trends between the account information" 1 and the amount paid by "similar" borrowers, that is, borrowers with similar profiles. CART tools allow an analyst to sift, i.e. data mine, through the many complex combinations of these explanatory variables to isolate which ones are the key drivers of an amount paid.
- the present invention is, in one aspect, a method of allowing inclusion of more than one variable in a Classification and Regression Tree (CART) analysis.
- the method includes predicting y using p exploratory variables, where y is a multivariate response vector.
- a statistical distribution function is then described at "parent" and "child” nodes using a multivariate normal distribution, which is a function of y.
- a split function where "child" node distributions are individualized, compared to the parent node is then defined.
- Figure 1 is a single node split diagram
- Figure 2 is a univariate Classification and Regression Tree (CART) model for recovery amount
- Figure 3 is a univariate CART model for recovery timing
- Figure 4 is a multivariate CART model for recovery amount and timing using negative entropy and Hotelling
- Figure 5 is a multivariate CART model for recovery amount and timing using Kullback-Liebler Divergence.
- Figure 1 illustrates a single split 10 where a heterogeneous parent node, P, 12 is observed to identify a split that is used to segregate a heterogeneous parent node, 12 into more homogeneous child nodes, such as node L 14 and node R 16, as defined by an appropriate measure of diversity.
- the p(y) notation is used, with subscripts where appropriate, to describe probability .density function at the parent and child nodes in the sequel.
- Node Impurity is negative entropy
- CART analysis and methodology can be applied, for example, for valuation of non-performing commercial loans.
- a valuation of n non-performing commer ral loans involves ascribing (underwriting) the loans with values for a recovery amount, expressed as a percentage of unpaid principal balance, and a value for recovery timing, expressed in months after an appropriate baseline date (e.g., date of acquisition). Recovery amount and timing information is sufficient to calculate the present valu of future cash flows, a key part of portfolio valuation.
- Underwriters of defaulted loans use their individual and collective experience to ascribe these values.
- Statistical models can be used to associate underwriters' values with key loan attributes that shed light on the valuation process.
- Figure 2 illustrates a univariate CART model 20 for a percentage recovery amount for non-performing commercial loans.
- Statistics at each node are included in the rectangle representing the node.
- Node 22 shows a number, n which represents the number of loans in the analysis.
- n is equal to 151.
- the 151 loans are examined for a split, and as noted in nodes 24 and 26, 132 of the loans have a legal status as being in collections or the subject of a lawsuit, shown in node 24, while nineteen of the loans are classified as being current as shown in node 26.
- Nodes 28 and 30 signify where another split has been identified between the 132 loans of node 24 relating to a secured score which is a scoring model prediction of whether or not the borrower account is collateralized (secured by real estate).
- Figure 3 illustrates a univariate CART model 40 for a recovery timing amount in months for non-performing commercial loans.
- Statistics at each node are included in the rectangle representing the node.
- Node 42 shows a number, n which represents the number of loans in the analysis.
- n is equal to 151.
- the 151 loans are examined for a split, and as noted in nodes 44 and 46, thirty of the loans have a legal status as being in collections or current, shown in node 44, while 121 of the loans are classified as. being the subject of a lawsuit as shown in node 46.
- Nodes 48 and 50 signify where another split has been identified between the thirty loans of node 44, node 48 signifying that payers have paid in the last twelve months in nine of the thirty loans in node 44 and node 50 signifying that no payments have been made in the last twelve months for twenty-one of the thirty loans.
- Nodes 52 and 54 signify where another split has been identified between the 121 loans of node 46 relating to a secured score which is a scoring model prediction of whether or not the borrower's account is collateralized (secured by real estate).
- ⁇ (,, P) ⁇ log
- ⁇ (s, P) - ⁇ -( ⁇ L - ⁇ R )' ⁇ - i ( ⁇ L - ⁇ R ) .
- Figure 4 illustrates a single CART model 60, resulting from an implementation version of either of the covariance structure split function equation or the mean structure split function equation above.
- the explanatory variables used in the analysis are: account status, secured score, and legal status which are described above.
- a split is identified in node 62 regarding the legal status of the 151 loans.
- Node 64 signifies that nineteen of the 151 loans have a legal status of current, while node 66 signifies that 132 of the 151 loans are in collections or are the subject of a lawsuit.
- Splits are identified in both nodes 64 and 66.
- the split in node 64, the nineteen loans that are current, is indicated in node 68 which shows that ten of the nineteen loans have had no payment activity over the last twelve months and node 70 shows that nine of the nineteen loans from node 64 have had payment activity.
- Node 66 is split into two nodes 72 and 74 where node 72 signifies that
- 121 of the 132 loans of node 66 are the subject of a lawsuit, while node 74 signifies that eleven of the loans are in collections.
- the 121 loans of node 72 are further separated into nodes 76 and 78, showing that of the 121 loans that are subjects of lawsuits, 54 are secured by assets such as real estate, shown in node 76, while 67 of the loans are unsecured, shown by node 78.
- the split function of the present invention is ⁇ (s, P)
- R log signifies the expected value, taken over the joint
- the present invention uses Kullback-Liebler divergence 13 a node split criterion. This crit don has an interpretation related to the node impurity function earlier described.
- K liback-Liebler divergence is a general measure of discrepancy between probability distributions, that is usually a function of mean and covariance structure.
- Figure 5 displays a single CART model 90, resulting from an implementation version using maximum likelihood estimations of the split function, ⁇ (s, P) defined above.
- the explanatory variables used in the analysis are: account status, secured score, and legal status which are described above and again the 151 commercial loans example is used.
- a split is identified in node 92 regarding account activity of the 151 loans over the past twelve months, resulting in a split into nodes 94 and 96.
- node 96 where no payments have been received for 142 of the original 151 loans another split is identified, regarding the secured status of the 142 loans.
- Node 98 shows that 61 of the 142 loans of node 96 are secured, perhaps by real estate, while node 100 shows that 81 of the 142 loans of node 96 are unsecured.
- a split identified in node 100 results in nodes 102 and 104, where node 102 represents that ten of the 81 loans of node 100 are in collections, while node 104 represents that 71 of the 81 loans of node 100 have a legal status of being current or in lawsuit.
- muK ' variate CART response methodology is useful for determination of recovery timings and amounts and has efficiency over known univariate response models in that one model is used to data mine multiple through multiple covariates to predict future loan performances.
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU22979/02A AU785207B2 (en) | 2000-07-19 | 2001-07-11 | Multivariate responses using classification and regression trees systems and methods |
EP01984452A EP1316046A1 (en) | 2000-07-19 | 2001-07-11 | Multivariate responses using classification and regression trees systems and methods |
CA002385141A CA2385141A1 (en) | 2000-07-19 | 2001-07-11 | Multivariate responses using classification and regression trees systems and methods |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/619,278 US7003490B1 (en) | 2000-07-19 | 2000-07-19 | Multivariate responses using classification and regression trees systems and methods |
US09/619,278 | 2000-07-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2002011017A2 true WO2002011017A2 (en) | 2002-02-07 |
Family
ID=24481218
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2001/021753 WO2002011017A2 (en) | 2000-07-19 | 2001-07-11 | Multivariate responses using classification and regression trees systems and methods |
Country Status (5)
Country | Link |
---|---|
US (1) | US7003490B1 (en) |
EP (1) | EP1316046A1 (en) |
AU (1) | AU785207B2 (en) |
CA (1) | CA2385141A1 (en) |
WO (1) | WO2002011017A2 (en) |
Families Citing this family (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020174088A1 (en) * | 2001-05-07 | 2002-11-21 | Tongwei Liu | Segmenting information records with missing values using multiple partition trees |
US6654727B2 (en) * | 2001-11-29 | 2003-11-25 | Lynn Tilton | Method of securitizing a portfolio of at least 30% distressed commercial loans |
US7251639B2 (en) * | 2002-06-27 | 2007-07-31 | Microsoft Corporation | System and method for feature selection in decision trees |
JP2004054769A (en) * | 2002-07-23 | 2004-02-19 | Ns Solutions Corp | Loan assets management system, loan assets management method, and recording medium and program therefor |
US7558755B2 (en) * | 2005-07-13 | 2009-07-07 | Mott Antony R | Methods and systems for valuing investments, budgets and decisions |
US20070055619A1 (en) * | 2005-08-26 | 2007-03-08 | Sas Institute Inc. | Systems and methods for analyzing disparate treatment in financial transactions |
WO2008137544A1 (en) | 2007-05-02 | 2008-11-13 | Mks Instruments, Inc. | Automated model building and model updating |
KR100902006B1 (en) | 2007-05-29 | 2009-06-11 | 주식회사 신한은행 | System and Method for Dealing Non Performing Loan and Program Recording Medium |
US20090055140A1 (en) * | 2007-08-22 | 2009-02-26 | Mks Instruments, Inc. | Multivariate multiple matrix analysis of analytical and sensory data |
JP2011508320A (en) * | 2007-12-21 | 2011-03-10 | エム ケー エス インストルメンツ インコーポレーテッド | Hierarchical organization of data using partial least squares analysis (PLS-tree) |
US9892461B2 (en) * | 2008-06-09 | 2018-02-13 | Ge Corporate Financial Services, Inc. | Methods and systems for assessing underwriting and distribution risks associated with subordinate debt |
US8494798B2 (en) * | 2008-09-02 | 2013-07-23 | Mks Instruments, Inc. | Automated model building and batch model building for a manufacturing process, process monitoring, and fault detection |
US9069345B2 (en) * | 2009-01-23 | 2015-06-30 | Mks Instruments, Inc. | Controlling a manufacturing process with a multivariate model |
US8331699B2 (en) * | 2009-03-16 | 2012-12-11 | Siemens Medical Solutions Usa, Inc. | Hierarchical classifier for data classification |
US8234230B2 (en) * | 2009-06-30 | 2012-07-31 | Global Eprocure | Data classification tool using dynamic allocation of attribute weights |
US8855804B2 (en) | 2010-11-16 | 2014-10-07 | Mks Instruments, Inc. | Controlling a discrete-type manufacturing process with a multivariate model |
US9429939B2 (en) | 2012-04-06 | 2016-08-30 | Mks Instruments, Inc. | Multivariate monitoring of a batch manufacturing process |
US9541471B2 (en) | 2012-04-06 | 2017-01-10 | Mks Instruments, Inc. | Multivariate prediction of a batch manufacturing process |
US20130346033A1 (en) * | 2012-06-21 | 2013-12-26 | Jianqiang Wang | Tree-based regression |
US20140372174A1 (en) | 2013-06-12 | 2014-12-18 | MEE - Multidimensional Economic Evaluators LLC | Multivariate regression analysis |
US9460402B2 (en) | 2013-12-27 | 2016-10-04 | International Business Machines Corporation | Condensing hierarchical data |
US9582566B2 (en) | 2013-12-27 | 2017-02-28 | International Business Machines Corporation | Condensing hierarchical data |
GB201610984D0 (en) | 2016-06-23 | 2016-08-10 | Microsoft Technology Licensing Llc | Suppression of input images |
CN107272611A (en) * | 2017-05-27 | 2017-10-20 | 四川用联信息技术有限公司 | A kind of algorithm for weighing manufacture procedure quality ability |
Family Cites Families (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6016255A (en) | 1990-11-19 | 2000-01-18 | Dallas Semiconductor Corp. | Portable data carrier mounting system |
AU1427492A (en) * | 1991-02-06 | 1992-09-07 | Risk Data Corporation | System for funding future workers' compensation losses |
US5737440A (en) | 1994-07-27 | 1998-04-07 | Kunkler; Todd M. | Method of detecting a mark on a oraphic icon |
US5740271A (en) | 1994-07-27 | 1998-04-14 | On-Track Management System | Expenditure monitoring system |
JPH0877010A (en) * | 1994-09-07 | 1996-03-22 | Hitachi Ltd | Method and device for data analysis |
US5864839A (en) * | 1995-03-29 | 1999-01-26 | Tm Patents, L.P. | Parallel system and method for generating classification/regression tree |
US5710887A (en) | 1995-08-29 | 1998-01-20 | Broadvision | Computer system and method for electronic commerce |
US5671279A (en) | 1995-11-13 | 1997-09-23 | Netscape Communications Corporation | Electronic commerce using a secure courier system |
AU2191197A (en) | 1996-02-26 | 1997-09-10 | E Guide, Inc. | Cordless phone back link for interactive television system |
JPH09282900A (en) | 1996-04-11 | 1997-10-31 | Oki Electric Ind Co Ltd | Memory module |
US5850446A (en) | 1996-06-17 | 1998-12-15 | Verifone, Inc. | System, method and article of manufacture for virtual point of sale processing utilizing an extensible, flexible architecture |
US5889863A (en) | 1996-06-17 | 1999-03-30 | Verifone, Inc. | System, method and article of manufacture for remote virtual point of sale processing utilizing a multichannel, extensible, flexible architecture |
US5987132A (en) | 1996-06-17 | 1999-11-16 | Verifone, Inc. | System, method and article of manufacture for conditionally accepting a payment method utilizing an extensible, flexible architecture |
US6002767A (en) | 1996-06-17 | 1999-12-14 | Verifone, Inc. | System, method and article of manufacture for a modular gateway server architecture |
US6026379A (en) | 1996-06-17 | 2000-02-15 | Verifone, Inc. | System, method and article of manufacture for managing transactions in a high availability system |
US5812668A (en) | 1996-06-17 | 1998-09-22 | Verifone, Inc. | System, method and article of manufacture for verifying the operation of a remote transaction clearance system utilizing a multichannel, extensible, flexible architecture |
US5943424A (en) | 1996-06-17 | 1999-08-24 | Hewlett-Packard Company | System, method and article of manufacture for processing a plurality of transactions from a single initiation point on a multichannel, extensible, flexible architecture |
US5983208A (en) | 1996-06-17 | 1999-11-09 | Verifone, Inc. | System, method and article of manufacture for handling transaction results in a gateway payment architecture utilizing a multichannel, extensible, flexible architecture |
US5931917A (en) | 1996-09-26 | 1999-08-03 | Verifone, Inc. | System, method and article of manufacture for a gateway system architecture with system administration information accessible from a browser |
US5978840A (en) | 1996-09-26 | 1999-11-02 | Verifone, Inc. | System, method and article of manufacture for a payment gateway system architecture for processing encrypted payment transactions utilizing a multichannel, extensible, flexible architecture |
US5996076A (en) | 1997-02-19 | 1999-11-30 | Verifone, Inc. | System, method and article of manufacture for secure digital certification of electronic commerce |
US6249775B1 (en) * | 1997-07-11 | 2001-06-19 | The Chase Manhattan Bank | Method for mortgage and closed end loan portfolio management |
US6026364A (en) * | 1997-07-28 | 2000-02-15 | Whitworth; Brian L. | System and method for replacing a liability with insurance and for analyzing data and generating documents pertaining to a premium financing mechanism paying for such insurance |
-
2000
- 2000-07-19 US US09/619,278 patent/US7003490B1/en not_active Expired - Lifetime
-
2001
- 2001-07-11 EP EP01984452A patent/EP1316046A1/en not_active Withdrawn
- 2001-07-11 AU AU22979/02A patent/AU785207B2/en not_active Ceased
- 2001-07-11 CA CA002385141A patent/CA2385141A1/en not_active Abandoned
- 2001-07-11 WO PCT/US2001/021753 patent/WO2002011017A2/en active Application Filing
Also Published As
Publication number | Publication date |
---|---|
US7003490B1 (en) | 2006-02-21 |
AU785207B2 (en) | 2006-11-02 |
AU2297902A (en) | 2002-02-13 |
CA2385141A1 (en) | 2002-02-07 |
EP1316046A1 (en) | 2003-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7003490B1 (en) | Multivariate responses using classification and regression trees systems and methods | |
Chen et al. | Mining the customer credit using hybrid support vector machine technique | |
Chen et al. | A hybrid approach for portfolio selection with higher-order moments: Empirical evidence from Shanghai Stock Exchange | |
Edirisinghe et al. | Portfolio selection under DEA-based relative financial strength indicators: case of US industries | |
Nikolic et al. | The application of brute force logistic regression to corporate credit scoring models: Evidence from Serbian financial statements | |
Nazemi et al. | Improving corporate bond recovery rate prediction using multi-factor support vector regressions | |
Maher et al. | Predicting bond ratings using neural networks: a comparison with logistic regression | |
Kim | Predicting bond ratings using publicly available information | |
Solares et al. | Handling uncertainty through confidence intervals in portfolio optimization | |
Baştürk et al. | Forecast density combinations of dynamic models and data driven portfolio strategies | |
Halkos et al. | Effective energy commodity risk management: Econometric modeling of price volatility | |
Ravi et al. | Foreign exchange rate prediction using computational intelligence methods | |
Korolkiewicz et al. | A hidden Markov model of credit quality | |
Amendola et al. | Variable selection in default risk models | |
Zafeiriou et al. | Short-term trend prediction of foreign exchange rates with a neural-network based ensemble of financial technical indicators | |
Shahbazi | Using decision tree classification algorithm to design and construct the credit rating model for banking customers | |
US7617172B2 (en) | Using percentile data in business analysis of time series data | |
Wu et al. | The weighted average information criterion for multivariate regression model selection | |
Avellaneda et al. | Hierarchical PCA and modeling asset correlations | |
Mohanty et al. | A support vector regression framework for indian bond price prediction | |
Daniels et al. | Derivation of monotone decision models from non-monotone data | |
Raymaekers et al. | Weight-of-evidence 2.0 with shrinkage and spline-binning | |
US20110295767A1 (en) | Inverse solution for structured finance | |
Bernardi et al. | High-dimensional sparse financial networks through a regularised regression model | |
Serur et al. | Hierarchical PCA and modeling asset correlations |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2385141 Country of ref document: CA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 22979/02 Country of ref document: AU |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2001984452 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWP | Wipo information: published in national office |
Ref document number: 2001984452 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: JP |