CA2701178A1

CA2701178A1 - System and method for converting a natural language query into a logical query

Info

Publication number: CA2701178A1
Application number: CA2701178A
Authority: CA
Inventors: Sreenivasa Rao PRAGADA; Viswanath Dasari; Abhijit A. PATIL
Original assignee: Execue, Inc.; Sreenivasa Rao PRAGADA; Viswanath Dasari; Abhijit A. PATIL
Current assignee: EXECUE Inc
Priority date: 2006-10-31
Filing date: 2007-10-31
Publication date: 2008-06-12
Also published as: EP2082334A2; EP2082334A4; WO2008070362A3; WO2008070362A2; US8924197B2; US20080104071A1

Abstract

Disclosed arc systems, methods, and computer readable media for converting a natural language query into a logical query. The method embodiment comprises receiving a natural language query and convcxiing the natural language query using an extensible engine to generate a logical query, the extensible engine being linked to the toolkit and knowledge base- In one embodi-ment, a natural language query can be processed in a domain independent method to generate a logical query.

Description

2 PCT/US2007/083152 SYSTEM AND METI-IOD FOR CONVER'I'ING A NATURAL LANGUAGE
QUERY INTO A LOGICAL QUERY
PRIORITY CLAIM

(0001) 'tlte present application claims thc benefit: ofV'.S. l'rovisit;mal Application No.
60/86:3,799, flecl f:)ctober 31, 2110G; the content of which is incorporated herein by reference in its entirety.

BACKGROUND OF THE INVENTION
1. l~"ield of the. Invention [0002] The present invention relates generally to semantic searches and specifically to convert'in5 nattrral language queries into logical yuecics.

2. Intruduction

[0003] Many approaches have bcen used to tn= to solve the general problem of using natural 1anl,naage to search st:tuctttred databases or unstructured text with varying levels of success. A potential approach or soltition can be broadly ciivided into two parts: (1) processing the natural language question into a logical query, and (2) mapping the converted query to databases. This application discusses the first part.

[00041 Keyword matchirtg ancl l;ramrnar-based natural language processuig are somc common appr.oaches to adc.iressing the first part of processing the natural language qttestion into a logical cluery. [-ach of these two techniques have significant liniitations individually. Keyword-based querying is a simple method of niatchinl; keywords in the user quen'to thc database entities. Kc}word ixiatching rnav bc cffcctive in han(ilinl;
simple questions like "number ofcttstorncrs;" but tctids be highly errorprr.me in handling cornplexqucsuons when understanding proper ass4Dciations of the different parts of the uscr query is necessary.

(0005] A kcyword-based natural language query consists of a sin7plr. list of words entered by the uscr, much like what niany pcople enter as search strings in mt)dcrn search cngines. For example, if a user is searching for the fivc=day weather forecast in 13crmuda, thc user may say or enter the text "Bermuda weather". From the user's point of vicw, these keyword-based searches may bc convcnient and do not require strict syntax whiie entering the quent. 'I'he query context Is ln the user's mllnd and thus wc,tlld be vcry difficult, if not impossible, for a natural language processor to understand the tneaning and intent of the query. f-or example, if user enters the kc}nvords "cold fusion", the system tiv<tuld nor know if the use.r mcant (:nldl~'ttsionthe softmare, the encra generation technique used by the nuclear physicist's, or nvo unrelated keywords "cold"
anCl "tt1s1(Jn".

[PQQ6] I'hrt"e fUndamental problenls with a ke}*WOrd-based approach arC (1) the same word could have multiple, different mcanings based on the context or domain thC nser is interested in, (2) the keyword-based approacli cotild result in a huge list of alternative answers leavitig, the burden of selecting the right answer to the user, ancl (3) the approach becomes ineffective as the targeted volume of search space wordsittcr.eases.

(0007] A grammar-based or ianguage processing approach tcy dissectinga user ciuer}-usint; parts-of-s peech, granmiars, etc. is also contrnon. 1 lowever, the success of grammar-based sUlutions is limited based on dependency on a hroperly fratned question, language ambiguity, aisd, most importantly, the lack of agramnyar or a rninimizcd granimar appropriate to business-spcak whtch is how business uscrs tcnd to ask questions (or for. a particular dotnain).

(00081 A f;ramtnar-based approach nrpically dcEines :t strict syntax for the natural language processor. "17ic nlles are dcfined for conNtenience of implementation. Users arc scldom aware of these rules or tltc rationale bchind them. When a user types the qtlcry that exactly matches iviCly the foreordained syntax, the language proc.e.ssor understands the query and possibly somc of the relationships among the keywords. These processors do a better job in accurately recos;niiins; the meaning of the qtterv when compared with keyword-bast:d lanKuage processors.

(()009] However,,v,ranimar, based processors also have many lirnitatit>trs.
First, grammar ttiles are nett known to the cnd user. For exarrtple, users may not. be aware that a concept rnust be followed 1)y unit of time for the gran7mar rulc! to work as in "Sales in )anuary".
F'or sotrre tisers, an input such as "January Sales" niay be mor.c:
convenicnt. Second, the grammar n.tles can become complex as conrbinations of rttles increases.
'.nird, n,ramrrtar ntlcs focus inore on syntax and order than the semantic mcanings and retationships.
l"ourth, grammttr-based processors are harzl to extcnd and arc unable to tind aew relat:iotrships that the syst.em does not already know about. hifth, thc grammar. Uased appr.oach is niore suitable for implementinga new programming language on a specific hardware platform and is nor an effective solution for natural language processing.
[00101 Programmatic and rules basecl approach to parsing natural uscr qucry portions is another contmoti approach in addressing some of the challe.nges of these tecluriques. For example, developers attempt to e,trvision various forms of natural pk'trases and tr<-, to address the.m proqrnmmatically, writing code for each or more ccirrurron structures.
Wlrile this approach rnay prove reasonably effective wtth limited phrases, it can hecK)tne unwieldy very quickly when parsing natural language querics.

10011] 1~'olksonomy is atlother information retrieval nicshodology consisting of user g)c.ncrated, open-ended labels that categorize contcnt such as weh page.s, online photographs, xtid wcb links, A folksonomy is mast notably contrasted from a taxonomy in that the authors of the labeling system are often ehe mzin users (and sc>rnc.tiines oril;irtators or experts) of the ecmtent to which tttc labels arc applied. The labels are commonly known as tags and the labeling process is called tag,rinl;. 'I11C
process of folksononiic tagging is intended to make a body of informatican increasingly easier to search, discover, and navigate over time. A wcll-developed folksonorny is idealh=

accessible as a;:hared vocabuLtry that is both originated by and tamiliar to its primary users. 'I'rvo widely cited examples of wcbsit:es usitig,- folksonomic tatging;
are I~lickr andDel.icio.us: :b'olksonramv, while collaborativelvgeneratcd, suffersfrom the same challenges as keyword-based search wltll the lack of relationship information.

100121 T.astl}', an ontc>logy attemhts to rcpresent a re.al-wctrld vicw of business mOdcls, granunars, sentence constntcts, or phrases. I--iowevcr, building sentantic frameworks quickly bccpmes tiine consuming.and cost prohibitive as tlte scope of the application or the drantain inercascs.

[0013] Aciordingly, what is needed in the art is a wayto prc.>cess a rlatural langual;c query that can overcome the limitations of a single, rigid approach.

SUMMARY OF THE INVENTION

(00141 Additional features and advantages of thc invcntionwill be set forth in the description which follows, and in partwill be obvious from thc description, or niay be le.artled by practice of the invention. The feature.s and ndvant.:tgec; of the invention may be realized and obtained by meati:; cif the in,truments ancj combinations particularly pointed out in the. appended claims, '1'hese and otlier featuics of the present invention will becotne tnore fulls{ apparent from the following description and appended claims, or may be learned by the practice: of the invetition as set forth herein.

(00151 Disclosed herein are svsteins, methods, and contputer readable mettia for cotivert:ingz natural language query intc> a logic:il query. An excmplary, method embodiment of tle invention inclucles receiving a natural language query, processing thc natural language cluetT using an extensiblc cnl;ine to generate a logical query, the extensible enginc being linked to the toolkit and knowledge base.

(00161 `11ie priucipies of the ittvention niay be utilized to provide a flexible, robust method of converting natural language yueries to a logical qum, without forcing the user tC) learn Rn artAfl(:1a1 syntax other than that Of natural lingUage or WtthOtlt uslngSrny structured input/cluerv farrns.

BRIEF DESCRIPTION OF THE DRAWINGS

100171 In order to describe the rnanner in which the abr>ve= recited and other advantagcs anc9features c>f the invention can be obtained, a nlore.l>articular descriptiort of thc invention briefly rlescribed above will be renciered by reference to specific ernbodiments thereof wltich are illustrated in the appended drawings. Ltndcrstancling, that thesc drawinl;s depict only typical cmbrrdimertts of the invention and su=ettot tlierefc>re to be considered to be limit:ing of its scope, the invention will be described and explained witll additional specificity and cietail thrc>ugh the usc of thc accompanying dr.a\xrinl,n in tivhi.ch:
[0018] FIG. 1 illustrates a basic system orcoinputing device enabodinlent of dte invention;

10019.1 r-IG. 2r1 illustratc:s si mcthod ctnboclimcnt cr1~ thc invention;
10020j FIG. 2I3 illustrates basic components of the invention;

[0021'1 1^:IG. 3 illustrates high Icvc,l flow ctiagram of i:he extensible engine;
100221 ["T.G. 4 illustrntes a flow diagrarli of token st:andardization;

1()0231 FIG. 5i1 illustrates an aspect of a snowflake fornlation; and [00241 I~.IG. 5L3 illustrates another aspect of a snoa~flake formation.
DrTAILED DP;SCRIP'.I'ION OI~' TI-IE INVENTION

(0025) Various embodiments of tlte invention are discussed in detail below.
While sliecitic implementations are discussed, it should be understood that this is ctonc for illustrauon purposes onlyI. A person skilled in the relevant art will recognize that other conlponents and configurations may be used xvithoutpartinl; fronl the spirir and scope of the invention.

(0{726j With reference tra FIC,. 1, an exemplat3= svstun for implementing thc itivcnticrn includes a l,eneral-purpose computing device 100, izicluding a processing unit (CPU) 120 and st systeni bus 110 that couples varioas system components incltiditig thc i:yste.m rnemcrry such as read only memory {ROA4} 140 and random access metnory (1?,AN1~ 150 to the processing unit 120. Other syste:tn memorv 130 niay be available for use as well. It can be appreciated that the invention may operate on a computing device witil rnc>re than one Cl'U 120 or an a group or clttster bf'comput;ing devices networked tol;ether to provide greater processing capability. `tlte system bus 111) may be atty of several ty-pes of bus stnictures includint; a niemorst bus or tnenyory controller, a peripheral bus, and a local bus using any of a variety of bus archite.ctures. A basic input/output (1310S), containing the basic routinc that:lle.lps to transfei informatiort benvec.n elenient4 within the cotnputing device 100, suci as during start-up, i> ty'l:)ically storcd in RC)M 140. `17te computing device 100 further includcs ';tora8e mcans such as a hard disk drive 160, a ma~,ntetic disk drivc, atl optical disk drive, tape drive or the like. '.Chc storage device 160 is connected to the system bus 110 by a drive interface. The drives and the associated coniliutcr reaciablc media provide nqnvolatile storage of computer readable instructions, data str.ttctures, progr.am modules7nd otherdata for the computing device 1C1Ci:1lte basic cuniponents are known to those of skill in the art and appropriate variations are contemplated depending on the typc of dcvice, such as wherher the device is a small, handheld compui:inl, device, a desktop computer, or a cotnputer server, [0027] Although the exeinplary environnicnt described hereirt etnploy.s the hard disk, it should be appreciated by thcase skilled in the art that other types of cornputer readable niedia whictt can store data that arc accessible bya cornputer, such as nial;netic cassettes, flash meniory car(is, diotal versatile disks, cartridges, random acccss me.rnaries (ltilMs), read only rnenlory (RCJINI), a cable or wirc:lcss signal containinga bit stream aud the like, may also be used in the cxernplary aperxting cnvirontnent.

[0028) To enable user interaction with the computing device 100, tin input device 190 re.prescnts any number of input mechanisms, such as a microphone for speech, a touch sensitive scxccn for gesture or graphical input, keyboard, mousc, mt:~tion input, specch and so forth. '1"hL itiput niay be uscd by the presenter to indicate the beginning of a speech search yuery. "I1le dcviec output 170 can alsc, be otic or morc of a number of output means. In some instances, tnultirriodal systems enablc a user to provide tnultiple types of input to cOnllnltntcate Wtth the computing devlce 100. 'l"he communications interfstce 180 i;e:nerally governs atrd manages:-the uscr inpt.itand sy5tcm Ot:rtYaut. 'Tlrcre is no restriction oti ttte invention operating on any particular hardware arr-rtngentent and t}icrciore the basic featiues here may easily bc subst;itute.d for improved hardware or firmxvare arrangcKnents as they arc developed.

100291 For clarity of explanatian, the illt,tsttacive ernbodimenr of the present itivcntion is presented as comprising individualfitnceional blocks (iirclur3ing functional blocks labeled as st "proccssor"). 'Ilir t'itnctions thcsc blc:rcks represent may be larovided throul;h the use of either shared or dedicated hardware, inchidinl;, but not limited to, hardware capable of executing software. For exatnple the functions of one or more processors presented in FTG. I may bc provided by a single shared processor or mriltiple processors.
(Gse of the term "processor" sltould not be construed to refer exclusively to hardware capable of executing softwarc.) Tllustr.ative ernbocdiments may coniprise microprocess<ar anc3/or digital signttl processor (DSl'). harc3war.e, read-canlv metnory (IZ.C:)IM) for storing software perforrning the opcrations discussed below, and random access memory (R.1\4) for storing results. Very large scale integration (X11..S]) hardivare embodimertt:+, as wcll as custom VLSI circuitry in combination with a general purpose DSP circuit, may also be lirovided.

[0030) FIG. 2:1 illustrates a method embodiment of the invent.ian. First, the method includes receiving a natural language query (202). As the natural latil;uage cJuen' is received, or afterwards, each word in the query may be esamined ancl corrected, if necessary. Adictioriary of expected or acceptable words may serve as a correct.ion mcchanisrn. C:on=ectinl; t[tc naturallangu:tge cltcm in thisw~ay allows for a degree of flexibility and lenicncy in the ccativersictn frcitYl natural language tu a structured logical query. T'le::ibilit}' may be a desirable attribute when the natural languaf;e.
is typed text wliich may contain typos or when the natural languagc is speech with a diick accent, for example. Iviown processes may be used to convert an audible natural language utterance irt the tetir, such as autc>matic sl~eech recognition techncilqgy.

[0031] Seec>ncl, the methoci preprocesses the natural l;tnguage clucry tzsinl;
the estensible ertgine toolkit lirtked to a knowledge base {204). As a part of the preprocessing, certain predictable word patte.rns may be idcntified and parsed. r-or example, if the word pattern "from NUMF3L:R] to N[JMI3}:122" is recognized, that plirase is a range of numbers, whercas the same general word pattern "frcani I'1..t1C:t::1 tt) 1'I..:1C:17;2"
may mean sUtnetlltrig entirely [Iltferent. RCCC>gnit7on of commQtlly expected or commonly used word pattertis niay simplify the preprocessing. '!1ie knowledge base tnav include semantic franieworks explaining business or ciomain models; or universal word patterns, mathematical or regular expressions or conversion tables. 'the knowledge base is a collection of knowledge about and relationsliips between concepts. The toolkit mait include one or more of a parts speech taMe.r, spell checker, domain instance recognizer, word compactor, synonym handlers, doniain specific ragl;ers, n snowflake processor, ambiguity handlers, series recognizers orword-laatterri rc.cognizers. Each toolkit item may be included in part or in whole and can alaplied multiple times. hlore onthese variatts tools in the toolkit will be provided below.

(0032) For example, the aniuiguittr handlers can recogriize and resolve ambiguity when different tools in the toolkit yield dlssltnilar outcomes.

10033] Third, the method processes the natural language qucrv usingan extensible ctigine to generate a logical duery, the extensible cnginc being linked to the toolkit (206).
1'rocc>sirig, the natur.al lans;uaf;e query ma}- include parsing the natural language query repeat"eClly Utit[l eVf:rY W<)rd is recognized as a dQtnaill, gencral, or an att'rlbute coitcept, i.e., no adclitional inforinatioa can be inferred about the natural language quen= within the scope of the toolkit and knowledge base. Repeatedly querying every word lintil it is recc)gnizt:d is a process similar to a barcode scanner. In barcode scanners, one scan of a danlal;ed barcocle rr)ay not provide cnough infortnation. Multiple scans at varying angles may bc made in an attempt to fill in any insufficiency in the previous scans or to verify the results from the Crst scan. In a similar way, the extensible engine can apply variaus tools in the toolkit to get differetlt perspectives, or cfamain views, of the natural language cluery. "1ltese multiple "angles" may be used to better understand the natural language yueryand the associations and relationships between the individual %vords in the natural language cluen'. `Ibtts, the extensible etlgine tmt}' pracesscs the natural lanl,niage ducr~- by scanning the natural language query multiplc times using the toolkit and the knowledge base.

(0034) 'T7ic extensiblc cngine may be itnplemcnteci as a collectioti of algoritltins and/or data structures. 'T'hc extensible engine tlitts may be casily added to or casilymoclified without significant investments irt software attd without significant downtime, ifatly at all.'I'he caigine is not dependent otl any onc tool in the toolkit. It uses a ccalli.ction of tools and can cnni.inue, to operate when one or more tools is removc:d.
1'=:ach algc7rithm or clata structure may be use(i tt) process the nattlral language qucrv in ordcr, at random, one time, multiplc rimes, or in any colnbinaaon desirable, etc. until the relationships and associations of~~ the words iti the natl.trallangustge clucry su=e sufficiently understood, i.e., no additional informatic)n can be inferred about the natural language qtierF>
within the scope of the toolkit and knowledgc base. "1'he extensible engitie may also include a main processing algoritltiii that directs the application of the toolkit and stores all interrriediate results inside a plurality of dara-structur.es. '17he e::tensiblc engine may store the intcrrnediatc results immediately. 'Me extensible engine may be (lesigned to storc the intermediate results in a structure .lilce tr tnulri-dimcnsional matrix. The estertsilate cnl,rinc may recogtuxc and resolve ambiguity when differertt tools ut the toolkit }aeld dissimilar outcwmes and assess potential concepts and relationships between the concepts based on combining one or more outcomes from the operation of the toolkit. In one aspect, the engine repeatedly applies tools, captures outcomes, and assesses the ct>ncepts and reladonships based on the combination of outcomes tmt:il tio additional infc)rmntiran can be infer.red ,vithin ttic scope of the toolkit and the knowledÃ,e l:iasc.

(00351 Attached hereto.is tlppenciia.rl which illustrates an e:+arnlale of the various functions which may bc used by the ettensible enl,rine. 'll7e cc?ntept in the Aphendix is incorporated herein by refcrence and individual portions or all of the Appendix rrtay be brought into the body ol'rhis specification byamGndrncnt.

100361 'flte generated logical qticr4' may be platform independent so as to be adaptable to work with any logical query language or constntct, for example a SQ.I:, database or an N1S Access database. '1'he logical dum is independent of platform, data source or database technologies. "I'hc lo~,~ical can be converted into platform and data source queries by anyone skilled irt the field raf querying data.

[00371 FIG. 21.3 illustrates some of the basic comportents of the, present irrvention. In connecuon ,vith the furtlrcr discussion lierein, Figure 2B illustrates a query 208 which may be a textual natural lattl;ual;e query or an audiblc natural languagc query which is received into the system, preprocessor 210 preprocesses thc query irt sa rnariner which communicates with toolkit ?12 which is coupled to krtowledge base 214.
I"ollowingthe preprocessing cti''the quen~, dara is comnluaicatcd to the processor 216 which performs the steps disclosed herein regarding implementing, via a collection of algoiithrns or data structures to process thc qucn~ mtiltila(c times in order to refer as much inforrnation as possible. 11ic extensiblc engine 215 preferably encompasses the preprocessor 210 and the processor 216 and cither generates a logical, query 218 or comrnunicctte.s data to a logical quen. module which generates the lol;ical query wltich is then used to searc.h database 220 which produces the ultimate respoiise wliich inay produce an output which may= also be corrununicat:ed to a Iao4t-proccssin, module 222 which optionallt' Iltay perform sontc processing on the outptit of the database search which may or _may not be based on t.hc toolkit knowledge base and which ultimately generates a response 224 to the user clucry. 1-t is fiirtlier note.dthat the various components of the preprocessor 210, the processor 216, the logical query module 218 and the post-processing niodulc 222 may all be in communicadon with data from the toolkit or the knowledl;e base to aid in carrying out the particular ftmctions of each individual module.

10033J FIG. 3 illustrates high level flow diagram of the extensible engine.
l~irst; the natural language query 302 is received for que,n, preprocessing 304. 'Clic preprocessing mav be ec>tnposccl of the followin9 steps: token standardization (illustrated in more detail in FIG. =1), multi-ivord compaction, and derived concelat recognition. lvlulti-word cornpaction may be useftil because some wor(is are not meaningful individuallt=. Two or more words may represent the actual intended concept. Somct.inics such individual words frotn a multi-word ccaneept may mean something different than the intended nleaning of that concept, e.g. New York or C;ash I3ack. In the phrases "New York" or "hico Score", 2 words together make..a state but.individualhy both of them have separate meanings. "I'hus it is important to convert such tnulti-word concepts into a single word or single token, so that they can be looked up easily in ontolol,y.

10039] Second, the natural language quer}, is tagged in a process called concept attribute reduction 306. Tltircl, domain concept association is performed an the nattitral language query 308 wliich results in a normalixecl query 310. Tt isimportant to know what infotmaticrn can be stored in ontology and hotiv it is stored. Once that is deci.ded, an algorithm which can use this information eft:iciencly is applied. Most of the tame, a concelat does nc>t come alone, meaning therestre otlier related concepts clerive.d frorn the core ccJncelat or domain ot question. Y`or examplc, in a crc.dit card or financial dc>tnzin, Cashi\dvance. is a fi-eqtle,nt'ly uscd concept from which rnay flow the derived concept of Numbe.r of Cash r'\dvanccs.r\nother example is the "concept" Purchase which rney have a rlerived concept of Purchase C<>unt and so ctn. The ontology tiesi, gn should be able to take care of such derived concepts when available and shotzld not fail whe.n they don't exist. Derived concepts are furtherilltrstrated'uiFIG. 5A.

(004(31 FIG. 4 illustrates ttn illustrative flow diagranl of token standardization. `1'oken standardization is the process of prc:parinl, tlte natural language clttecy-for processing.
CJften users type in wrWrlg spellinl,s, Jnay put spaces in the wrong lalaces, may not put spaces whererecluirecl, or may use different characters to separate tokens in mult.i-token words (e.g. fico.score, fico-score, ficosecJre, etc). 13ef'ore processing, it may be desirable that all variations be replaced by a standard fcJrn-. Multiple techniqttes may be used to standardize the tokens in the natural language cluer}; the example given is illustrative.
r-irst, the natural language cluery niay be cllecked for acronyms and abbreviations 402. In this step, for example, P13I,1 .13.1., and fbi could all be replaced wlth a llntfcJrm rcpresentation, sotiiethinf; Gke 1^cderal l3ttreau of.lnvetitif;ation.
Commonly used acronyms and abbre%riauons may be replaced wit:h their full forms. Ncst, the natural language query may be cliecked for commort separators ~1()-4. As an example, this stcp could identify that a slaacc, a;emtcolan, and a comma are all separators and treat them accordingiy. Next, the nanJral language qtlery niay bc evaluated for synunym replacement 406. 'T]le synonyrn replacement step could check for such phrases as "C)verduen, "30 days late", "a rnonth late:", u1J1st dtlc", ('..tc. lVhJcll sharC the s.ltnt'.

scroantic meaning and may be replaced with a uniform rord or phrase nf equivalent meaning.
[00411 Similarly, Nvords like"inf;tnt" >znd "baby" or "watch" and "clock"
rrtay be replaced with a stanclard word having the same meaning. '1'he next step is to pcrfon7t synonytn replacements 408. Lastly in this section, the natural lanl;uaF;r.
query is searched for doniain spcciCic equivalents 410 which rnay bc replaced to further standardize the natural language cluery. For cxample, a busincss tnanagcr may. use the word "sales", but the domaitt specific equivalent could be understood to mcan "monthly gross earnings."
In this case, the word "sales" could be replaced Wlth the domain $()eciric equtvale.rlt tllat best fits the domain at hand.

100421 FIG. SA illustrates an aspect of the disclosure with derived conec.pts in a snowflake formation. Snowilake is a knowleclge representation for entities \vhtch may be described usitig tnultiple words or ehxities. Nornlally, in order to recogniic mult7-word entities, uscrsmust either enter all the individual words in-order or exact sequencc of the multi-word cndty. Alternately, uscr.s can define a uniyuc one-word label corresponding to the multi-word entity and use that instead in free-form quer}'in.g. F3oth these approaches can be limiting in free-fortn querying or recogttizing a natural language sentence.

[00431 Sn<nvflake ktiowledgc representation offet-s a nlore flexible approach similar to how hunZan beings recognize complex entities.by looking at: a collection of torcls i.rrespect.itirc of order. '1'o aclticve this, Snowflake first captures the individual words atld then recognizes that some of these worcls can be more protnirtcnt than others fornling a context(s) for the entity.

[0044] A cluster or contcxt can be defined cither by experts choosing prominent worcis, or automatically by se.lecting common words across tnultiple snowtlakes, or based freciucnt:ly yuericd words potential combined with user feedbacl.. F'or example, strnong the multi-wordent.ities like Bureau 1-liF;hc;st (:redit. ]...imit Balance, Burcau Highest Lialance Credit Limit, Bureau }wlighest <:redit l..initetc. 13UR1".Al: is a candid;tte for tontext or ati expert can choose another wnrd "Credit Iamit".

[0045] (~IC'.. 513 illustrate.s another, tnore loose or liberal concept of the snowflake fdrniation. 1-iere, the concept or business field of bureau 516 may have several associatedconcepts like balance 511, liriiit 524 and utilization 518.
Similarly, aseparate coziccpt of cash 520 may have associated concepts of litnit 524 and :11'R 522.
In approach, whereverthere is an overlap, which in this example is the concept of a"liniit" 520, a cluster or context may be establisliecl. A cluster nanic such as bureau 510 may identify one clostcr and attother clttster, cash 522, tmy, idcntify another cluster or contcat.
Thus cluster words associated with the c.oncept of "limit" 520 tnay be words in the Bureau context or cash cc>ntext. 'Ilic systetn may be flexible and consider both possibilities. Anything tliat may be common and cstusc an overlap across mulci-tivord business terms may becotne a cluster. 'fhus a cluster may represent the bureau, the litnit:
or cash and the system might: process each sccnario.

(0046) Lastly, words c:tn have synonyms, abbreviations, hyponym, SIN1S
equivalents, etc.
all of which called Parallel Words in this disclosure. "1'o rccognize rnulti-wor(i entities with parallel words as well, Snowflake knowledge representaticm integrates with parallel words dictionary> and sometimes applicable only within a contcxt(s).
Sitnilarl}=, formulae orconversion tables c.an also be ineoi.)orated \vith tar without context.

(0047) As thc name suggests, this design resembles a snowflake. A central, or inain, concept. Bureau 502 appears in the snowflake w_ith derived crmcepts 504 surrounding it.
'I'hese derived concepts can be derived from tnor.e than one concept also, as shown by the second central, or main concejit Credit l:,itac. 506, uyith its associated derived concepts 508. A derived coacept would gencrally contain one or rnore concepts and/or Statistics and/or a domain adjective. A domain adjective is a domain specific word which gives morc informatictnabout other donlain concepts. .14-or example, I3:11..Ai\CF:
or' l5 AMC7UtNTwhen attached to concepts like Purchase, Sales, or Bureau mav forrit tiew concepts or represent other names for the same concepts. For example, derived concepts ~(~4 include $urcatil3al (I3ureatr Balance), atid Bureau 1-li(':i..
(13ureau high credit linc).

(,l)0481 A toolkit can uielu(le one or mor.e independent processors, sc>mc examples of wlaich include parts-of-shec:ch (POS) tagger, dornain-speciflc POS tagger, instance ctaaypactors, series compactors, range idcncifiers, string matching algorithms, operator-operand handlers, slmplc and comparati\=e-stattstics processor.s, instances hancilcrs, tirncfrarnes processors, concept handlers, ID variables processors, cqutvalent concepts processors, and assumption hancllers. A kno\Vledt;c base can lnclude regular eXprl''sstons, connected l;raphs, and entity-rclatiortship models. One of skill in the art wc>uld be capable of determininl; other objects wliich may bcincludcd in a toolkit or knowledge base.

(O0491 A toolkit can preprocess uscr inpot to standar,diie and consolidate dorriain, general, or attribute concepts. F-or example, a tiser input "Revenues ti=oni NY, NJ, and C`i"" can be reduce.d to t\eo conce.pts, i.e., a coucept called Revenue and concept called State with attxibutes or assets NY, NJ, and C'f' associated to State. A few preprocessinf, activities .include tokeu standardization, processing domain specific or independetit synonvm-s, acronvms and abbreviations, processing common separators, processing synonyms both corrtc:xt dependent: and independent, chccking for tnisspclled words usirrg strittg niatchingalgorithm suite, compacting the multi token words, finding aut derived concepts with the help of snowflake franiewnrk; tagging uscr query with POS
ta&ger, tag,*in~ %vith domain specitic tag,taer, rccogni=r.ing irtstances and replacing uith parcnt concept, rccngniziug ontological concepts, processing rimeframe concciats and replacingwith tanic.frame placeholders, and replacing common phr.ases like 13l;;'TW1~:;EN
X i1NT3Y, FROM B TO C, etc.

)Ot)a{)) A knowledge base is preferably a repository of resources sucll as semantic frameworks explaining business or donlaiit niodcls ancl universal word patterns. 711e individual components, like regular expressions, entity relationship modcls, or other kxtowledgc, rehre.sentations, are used itrxecogrdzing patterns, relationships betrveen concepts, identifving attributcs, inst:atces,:rttles, etc. within thc context of the relevant fratneworks. '111c knowledge base can include independent as well as overlapping frameworks. For cxamplc, a framework representing, a mortgage husiness model and a framework repr"enung a ta.nle trame Can be independcnt frameworks. 1lnotlier example of an inclcpenclent framework could be the construct of the word "F3etxveen"
as in "betNveeti X and Y" with its two attributcs X aiida'. t-lowever, a ntortgage business framework and credit card busincss framework may be overlapping as in "Customer has Accounts hns Loan Balancc;s". A user input "customer.s witll loan balance >
1000" will be relevant to both these ovcrlappinl; frameworks. 'lllus, an overlapping franle.work represents t-,vo different domains (such as the mortgage business domain and the crecfit card business ciomain) in which aduer}- may span or "overlap" the ciifferent framcs.vorks but that overlapping may be considered a single fi;umework. C7ne concupt ,uch a',; a custonier account could be coninion across thc two domains. A knowledge base cot,tld have many clomain frameworks. Some of the domain frameworks may overlap in te.rms of their Gst of concepts.

[(1051j 'T'he cxtensible engine processes a natural language queryf to gencrate a database or platfortn independent logical quer}: 1'referablv, the extensiblc cngine has an associated main procestiing algorithm that crig.tgcs portions of the ioolkit (in no specific order) attd stores all intermediate results in a lllurality of dat<i structures. A aspect of the eatensible engine is the ability to tag user query by scanning.it multiple times with the help of te toolkit and tltc knowlecige base. Each scan results in a potential new tag for the user query tokens. ':fhc systenl then determines the pote.ntial nieanittg of the user query based on the collection of tags from inultiple ticans and the corresponding canfidence levels or weights associated to each tag, much like abar-code scanner detezminuag the code with a;higlt degree of cc>nfidene:e tltr.ough multiplc scans.
1l'awr:.ver, there are, differences from the bar-crtde scanner in that the ticanner triay need multiple passesand c7nlyctne pass rrtay prtavide inforsnation in that it suc.cesstitlly reads the bar-cOde. In the multiple scans of the extensible etagine, clifficr.ent information may be gained if that pxsses. Each scan has a differetit set of goals. '1'he tools and l:noluleOge base dc>nr>t: neexl to be applied in sinyspecific order. Instead this s}>stem applies all the tools and kncwledl;e base repeatedly to uncjerstand thc tn<ast likely n7eaning, of the user cluen.. A secoiid ispect of the extcnsible engine is extensibility, the ability to acld new tools and knowledge base without cumbersome or u>znpler prcy~gr7mming sirnilar to a hutnan learsiinga new skill or accluirinl; a nexv tool. 'l'he extensible engine may be implemented as a collectioti af <tlgcarithrns and data structures. '1"hc extensible enginc has a main processiitgsilgczritlim that can ciigage tlicsc tools in no specifie order and can methodically stores all the intermediate results inside the data-structures sitnilnr to the multi-dimensional matrix. 'T'he queti! may be parsed reheatedly until no acidiuonal infc>rniativn can be infexredtivithin the scope of ttic natural lanpaagc query.

(00521 The knowledge base can be enhanced, or extended, any drrte to provide:
a better context for a shecificdcmsiin or rlotnains: Similarly. rnrarc ttaotscan be adeleci to thc toolkit to performing nexu pre-pracessing and tagging tasks. feitatres arc possible because each individual tool in the toolkit is inrlepencletu of other tool modules.While each individual t e,lis used independently, the resuits of theindividual tools may differ dependiag on the whethe.r other tool or tools were engaged or not.
1`'oritistiuic.e, ifa domain specific tagger is used, then a user quety like "Revolving Sale:a"
c.auld be appropriately understood as "Sales of Revolving Customers" i.e. "It.evolving"
would be xecc>gnizedasan adjective of "Sales" irtherwise, it could be construed as mrostand alone c.oticepts: Itevolvirtgand Sales - it=hich may rtot be what tlie. user itltertded. tn this way, nlore relevant tools in the toolkit can irnprove the accuracy of approxinlate artswers. t1n individttal tool from the toolkit may be called upon multiple tiunes depending on the tokens.and whetller thcrc are still tokens that are not recognized. For example, in the qticty "Sales for Year 2003", "2()03" could be recof,niicti by a PC:)S tag, t;cr as a cardinal nutl7ber, then.as a potential tilstiinee of Vear atld thf'n a ltt7le l"rame t:;ompactc:)r Can put these two pieces of infortnation togettter to recogrnixe, this as "tlle year 2()03", a timc-f'rame attributcr. '1 he strength of the fratnewark Ge.s in the fact that it does not reyuires a particular order that the tools must be applied or thc number of titnes a tool is used. It is driven by a two part goal or linliting criteria (1) each token is recogtlized as an attribute or a concept, and (2) toolkit cannot add any new information.

(00531 If a 2-dimensional matrix were to be used to visualize the process the converting a user's natural language cluera to a lol,rical querv, then the natural lanl;uaf;v query can be represetited on one dimension and the application of the different tools on a seconcf dimension. If N is the number of tokens in the user qttery atld M is the total number of applicatiotls of onc or nlore of the tools, also referred to as a "pass" in the scan, the tnatriz will be of sizc N x Al. I f the tokens are in colunlns and thc passes are in rows then ttte cell value corresponding to a pass and atoken represents the latcst tag and the conHdence level. The 2D matrix shotvs the progress of ttser clucry after each applicatiotl Of the tools, identification of associations antf allo"Ing refraClng the process. 171is matrix will also be llelpful in recognuingany contlicts; for example, if a token is recognized as an Organiiation in one pass and recognized as a Country in another pass they would be clearly evident in the maais. In one aspect, the rnulti-dimensional matrix structure stores a plurality of labels describing tluery tokens or out<:omes of cach application of a took to the natural language qucry.

[00541 "I1te catensifale engine is associated with the followinl; stagcs: (1) query pre-processing; (2) domain, general or attrtbttte conc(q)l reductlort; and (3) recoI;niiing associatittns. First, query prc-processing includes three substeps: (A) tokcn standardization, (13) tnulti-word compacti( >, and (C) derived concept recoguition.
Token standardization is the proccssofgetting the natural ianguage qucry rcadv for processing. t/'scrs often type in wrotll; spcllings, put spac.es in wrong places, not put spaces where rcquired, or use clifferent characters to separate tokens in multi-token wcards (e.g. 6co score, fico-scorc, ticoSccare, etc). 1'3cfr7re processing it is impttrtant that variations should he replaced by a standar(1 forni. Nitiltiple tcchniques can be used to achieve tliis: strint; niatcliinÃ;, common separators chcckinl;, acronyms and abbreviation replacenZent, synonyms replacenlenl:, and doinatn-specifc equtvalent replacement.

(0055) itilultl-word compaction combines words that are not meaningful individually and require a eombination of multiple words to understand tllc intencled concept.

Sometimes such individual words from a multi-word concept may have different meanings than what the user intendcd; e.g. New York, Cash Back. In New York two words together make a state but individually both of thcm have separate meanings. 'Iltc phrase Fico Score is the same way. Such multi word concepts should be convc.rted into a single word, so that thcy can be looked up easily in OntoloM% It is tiecessatv todca this stela before Domain Tagging, as dc nain tagger will be using, these compacted words.
[0056J Dcrived concept rccogni.tion may be exemplified by tlxe follx>wing algocit:hm designed to detcct the derived concepts (if any) embedded in the original user query.
First, tag the user qucry- usitig'I'agginl; passes. Second, identify the main concept, dornain adjectivcs and categoriLC the subordinating cotijunctiotis as pre or post. 'I'hird, associate the STAT with cither pomain Adjective or Concept next to it.
Fc:>urth, ignore the stop words, doniain independent and domain spccific. l-ifth, rearrange the concepts, doinain adjectives and Stats according tc> following orcier to get the derived concept:

main ccmcept, stat or domain adjec.t.ive., associated concept.

10{}$71 At the end of pre-processing step, the toolkit: would have trstnsfc>rmcci the words in the original clucry to their ttGrnializc.d vaiues. 'I1tc normalized value.s would come froin language dictionaries and the knowledge base. Pc.rsonaliied words are replaced by thcir standard equivalents and multiple tokens arc juxtaposed to r<;cof,nlir,e oac conibination word. All the t~~pographiutl errors are handled and corrected to create a new list of tiormalixec.i worcis.

(0058] Second, domain, general, or attribute concept reduction, also known as tagging, is a process of assil,nning the word of the cluery sntne mcaningft,l tag that dcscribes the purpose of the word in the query. For example, f ANUARY is a month. ANI7 is a coordinating conjunetion. Tagging includes two steps: Part of Speech (k'OS) taggittg and Custom llUmain Word tagging. POS tagginf; is the process of marl:inp, up the words in a text as corresponcling to a particular part of speech based on definition as well as context.
It reads text .in sotne language and assigns parts of speech to cach word (and otlier token), such as noun, verb, adjective, etc. Custom domain word tagging is used to identify domain speci6c tncanings of worcls. A worcl ma} be t.reatcd as an indicatur and be applied ~cith a default condition if it appears with another specific set of worcls.

[0059] 'I7iircl, the recognizing associarions step receives cac}t word in the original quer}' that is tagged with a list of all applicable tags as input. '1'he tstm could, for example, describe if the word is an instance, concept or a timefr,une unit. Recognizing associations includes ttivo substeps: establishing attribute-to-conccpt associations and capntringccanc.ept-to-concept reiationships. Fvstablishingattr.ibute-to-concept associations refcrs to ontologiea which contain relationships betwcen <iiffercnr domain concepts and tllcir defining attribute concepts, for example, ontoto~,~, could contain informauon that sales information is stored in monthly or ycstrly intervais, custamers are acquired in sonie specific rnontlis, FICC) Score is between r.anf;e of 300 and 600, or any other pertinent inforinatic>n. Establishing attrihutn-to-concept associations is an attempt to find the instances of suelrinformatioa in thc user cluan1ancl associate it to respcct,ive-domain concepts.

[0060] C7ncc the user qucry is reduced to collection of dom:rin concepts, understanding the relations between these domain conccpts is necessary, tc> unclerstand the cornplete meaning of that query. "I1ir donlain concept association step achieves this with help of dornain fraineworks or ontologies. Domain (:)ntology maintains information and associations between differetit domain concepts sttcli as "customers have multiple cr,edit carcl accounts", "e;tch account is z=elatecl to a specific credit card product", "custo,ners belong to a state", "each account has montlily information stored like sales, late-fecs, interest, credit litivt", or "customers are associated with different credit card products".
'This step relates the domain concepts with he.lp of information from onrology and joininÃ; words. Domain concepts may be associared kvith more than one relationship and in such cases, the user query m;ty specifv the association tc> be used, or a clefattlt relation rnay be sclected by domain ontology. After thesc iwo steps, the final goal is achieved;
the user query is.reduced to a graph specifying diffcrent domain concelats associated vvith cach other in a meaningful way.

[0061] Emltodimcnts within the scope of the present invention may also include computer-readable media for carrying or having computer-executable instrttctions or data ctructures stored tltereon. Such computer-readable media can be any available media that can be accessed hy ageneral putpose or special purpose computer. i3y, wayl of example, and not limitation, such cornputer-readable mcdia can comprise 1tf1NI, RONI, I':,1MPKC)INI, CD-ROM or other optical disk storage, magnetic disk storage or otlier magrtet:ic storage devices, or any other medium which can he uscd to carry or store desired program code st.ntctures in the form of computer-executable instructions or data structures. When information is transferred or provided over a network or an<:>ther communications connection (either hardv4ir.cd, vvir.eless, or combination thereoo to a compute.r, thccotnput.er properly views the connection as a compiuer-readable rnecliurn. 'l1)us, any such conncction is properly termed a computer-readable medium. Combinations of the above should also be iucluded within the scope of the computer-readable media.

100621 Ctrmputer-exccutable instructions include, for c.xample, instructit)ns and data which cause ageneral purposc computer, special purpose corrtputer, or special purpose processing device to perform a certain function or group of fianctions.
C:omputer-executableinserttctictns also include prol;rarn rnodt.tle.s that are executed by computcrs in stand-alone or network environments. Generally, program modules include routines, programs, objects, components, and ciata structures, etc. that perform particular tasks or ilnplement particular abstract data types. C:omputer-elecutable instructions, associated data stnictures, anci prol;ram modulesrepresent exunples of the program code strtlctures for executing steps of the methods disclosed herein. 113e particular sequertce of such executable instructions or associated data structures represents examples of corresponding acts for implementing the fiulctions described in such steps.

((10631 111ose of skill in the art will appreciate that other entboclinle.nts of ttic invention may be practiced in netrvork computing e21\'I.rQ712Tlenti with many f1'pCS of computer system configurations, including personal computers, hand-held devices, rnulti-processor s}>sterns, nnicroprcce.ssor-hased or programmable const.uner electronics, network PCs, rninicomputers, mfiinfrhme COntptttt:rs, Ittld fl7e like. I;rnbodirtterus. mav alsci be practiced in dist.ributed computing environmcnts \vliere tasks are perfortned by local and remote processing devices t'?iat are linkcd (either by, harclwircd links, wireless links, or bv a combination thereof) throul;h a communications n<:twork. In a distribirted computing environment, progr.am modules ttiay be located in both local and rernote Inentory storage devices.

(0064] Although the above descr.iption may contain specific details, thc.-yshoulcl not be construed as Lirniting the claims in any way. Other configurations of the described ernbodiments of the itivcntirtn 1re part of the scope of this invcntion.
L,(:)r cxantple, the invention may tac appiicd not only to search qucrics ott a web pal;e, but also natural language queries with partable GPS devices or autumatecl tclcphone: bascd custorncr scn-ice. Accc>rdinl;l}', the appended claims .tnd their legal CquiValerttS
should only definc the invcuticm, rathc:r, than an}r specitic extunplcs given.

Claims

We claim:

1. A method of converting a natural language query into a logical query, the method comprising:

receiving a natural language query;

converting the natural language query using an extensible engine to generate a logical query, the extensible engine being linked to the toolkit and knowledge base.

2. The method of claim 1, wherein the generated logical query is platform or data source independent.

3. The method of claim 1, wherein converting the natural language query includes parsing the natural language query repeatedly until every word is recognized as a domain, general, or an attribute concept, or until no additional information can be inferred about the natural language query within the scope of the toolkit and knowledge base.

4. The method of claim 1, wherein the knowledge base may include semantic frameworks explaining business or domain models, universal word patterns, mathematical or regular expressions, or conversion tables.

5. The methods of claim 1, wherein the toolkit may include one or more of a speech tagger, spell checker, domain instance recognizer, word compactor, synonyms handlers, domain specific tagger, snowflake processor, ambiguity handlers, series recognizers, or word-pattern recognizers, each toolkit item being included in part or in whole.

6. The method of claim 1, wherein the extensible engine is not dependent any one tool in the toolkit but instead uses a collection of tools, and wherein the extensible engine continues to operate when one or more tools are removed.

7. The method of claim 1, wherein the extensible engine converts the natural language query by a plurality of scans of the natural language query using the toolkit and the knowledge base, each scan having a different set of goals.

8. The method of claim 1, wherein the extensible engine is implemented as a collection of algorithms and/or data structures.

9. The method of claim 1, wherein the extensible engine has a main processing algorithm that engages portions of the toolkit and stores all intermediate results inside a plurality of data-structures.

10. The method of claim 9, wherein the main processing algorithm engages portions of the toolkit in no specific order.

11. The method of claim 9, wherein the plurality of data-structures is a multi-dimensional matrix-like data structure.

12. The method of claim 11, wherein the multi-dimensional matrix-like structure stores a plurality of labels describing query tokens or outcomes of each application of a tool to the natural language query.

13. The method of claim 9, wherein the extensible engine can recognize and resolve ambiguity when different tools in the toolkit yield dissimilar outcomes.

14. The method of claim 9, wherein the extensible engine assesses potential concepts and relationships between the concepts based on combining one or more outcomes.

15. The method of claim 9, wherein the extensible engine repeatedly applies tools, captures outcomes, and assesses the concepts and relationships based on the combination of outcomes until no additional information can be inferred within the scope of the toolkit and the knowledge base.

16. A system for converting a natural language query into a logical query, the system comprising:

a module configured to receive a natural language query;

a module configured to convert the natural language query using an extensible engine to generate a logical query, the extensible engine being linked to the toolkit and knowledge base.

17. The system of claim 16, wherein the generated logical query is platform or data source independent.

18. The system of claim 10, wherein converting the natural language query includes parsing the natural language query repeatedly until every word is recognized as a domain, general, or an attribute concept, or until no additional information can be inferred about the natural language query within the scope of the toolkit and knowledge base.

19. The system of claim 16, wherein the knowledge base may include semantic frameworks explaining business or domain models, universal word patterns, mathematical or regular expressions, or conversion tables.

20. The system of claim 16, wherein the toolkit may include one or more of speech tagger, spell checker, domain instance recognizer, word compactor, synonyms handlers, domain specific tagger, snowflake processor, ambiguity handlers, series recognizers, or word-pattern recognizers, each toolkit item being included in part or in whole.

21. The system of claim 16, wherein the extensible engine is not dependent on any one tool in the toolkit but instead uses a collection of tools, and wherein the extensible engine can continue to operate when one or more tools are removed.

22. The system of claim 16, wherein the extensible engine processes the natural language query a plurality of scans of the natural language query using the toolkit and the knowledge base, each scan having a different set of goals.

23. The system of claim 16, wherein the extensible engine is implemented as a collection of algorithms and/or data structures.

24. The system of claim 16, wherein the extensible engine has a main processing algorithm that engages portions of the toolkit and stores all intermediate results inside a plurality of data-structures.

25. The system of claim 24, wherein the main processing algorithm engages portions of the toolkit in no specific order.

26. The system of claim 24, wherein the extensible engine stores intermediate results methodically.

27. The system of claim 24, wherein the plurality of data-structures is a multi-dimensional matrix-like data structure.

28. The system of claim 27, wherein the multi-dimensional matrix-like structure stores a plurality of labels describing query tokens or outcomes of each application of a tool to the natural language query.

29. The system of claim 24, wherein the extensible engine can recognize and resolve ambiguity when different tools in the toolkit yield dissimilar outcomes.

30. The system of claim 24, wherein the extensible engine assesses potential concepts and relationships between the concepts based on combining one or more outcomes.

31. The system of claim 24, wherein the extensible engine repeatedly applies tools, captures outcomes, and assesses the concepts and relationships based on the combination of outcomes until no additional information can be inferred within the scope of the toolkit and the knowledge base.

32. A computer-readable medium storing instructions for converting a natural language query into a logical query, the instructions comprising:

receiving a natural language query;

converting the natural language query using an extensible engine to generate a logical query, the extensible engine being linked to the toolkit and knowledge base.

33. The computer-readable medium of claim 32, wherein the generated logical query is platform or data source independent.

34. The computer-readable medium of claim 32, wherein converting the natural language query includes parsing the natural language query repeatedly until every word is recognized as a domain, general, or an attribute concept, or until no additional information can be inferred about the natural language query within the scope of the toolkit and knowledge base.

35. The computer-readable medium of claim 32, wherein the knowledge base may include semantic frameworks explaining business or domain models, universal word patterns, mathematical or regular expressions, or conversion tables.

36. The computer-readable medium of claim 32, wherein the toolkit may include one or more of a speech tagger, spell checker, domain instance recognizer, word compactor, synonyms handlers, domain specific tagger, snowflake processor, ambiguity handlers, series recognizers, or word-pattern recognizers, each toolkit item being included in part or an whole.

37. The computer-readable medium of claim 32, wherein the extensible engine is not dependent on any one tool in the toolkit but instead uses a collection of tools, and wherein the extensible engine can continue to operate when one or more tools are removed.

38. The computer-readable medium of claim 32, wherein the extensible engine processes the natural language query by a plurality of scans of the natural language query using the toolkit and the knowledge base, each scan having a different set of goals.

39. The computer-readable medium of claim 32, wherein the extensible engine is implemented as a collection of algorithms and/or data structures.

40. The computer-readable medium of claim 32, wherein the extensible engine has a main processing algorithm that engages portions of the toolkit and stores all intermediate results inside a plurality of data-structures.

41. The method of claim 40, wherein the main processing algorithm engages portions of the toolkit in no specific order.

42. The method of claim 10, wherein the extensible engine stores intermediate results methodically.

43. The method of claim 40, wherein the plurality of data-structures is a multi-dimensional matrix-like data structure.

44. The method of claim 43, wherein the multi-dimensional matrix-like structure stores a plurality of labels describing query tokens or outcomes of each application of a tool to the natural language query.

45. The method of claim 40, wherein the extensible engine can recognize and resolve ambiguity when different tools in the toolkit yield dissimilar outcomes.

10. The method of claim 40, wherein the extensible engine assesses potential concepts and relationships between the concepts based on combining one or more outcomes.

47. The method of claim 40, wherein the extensible engine repeatedly applies tools, captures outcomes, and assesses the concepts and relationships based on the combination of outcomes until no additional information can be inferred within the scope of the toolkit and the knowledge base.