codesimian
Class MostCommonSubstrings

java.lang.Object
  extended by codesimian.CS<CSGeneric>
      extended by codesimian.DefaultCS
          extended by codesimian.MostCommonSubstrings
All Implemented Interfaces:
CodeSimian

public class MostCommonSubstrings
extends DefaultCS

The purpose of this class is to keep statistics on the most common substrings of random samples of text found in codesimian at random times, and from the internet and possibly other sources, and to use those statistics to parse natural-language better.

Example: mostCommonSubstrings(list(' ' 'a' 'b' 'common substrings'...) list(.17 .05 .03 .001...) optional_changeSpeedFraction optional_maxListSize 'optionally add all short substrings of this text to P0 and P1')

MostCommonSubstrings has 2 lists (in P0 and P1). One is a list of strings, ordered more common at lower index, and the other is a list of their relative frequencies that sums to 1 (except for roundoff error). For any normal english text, the P0 list will start with space, vowels, other letters, then combinations of 1 letter and 1 space, then combinations of 2 letters, combinations of 3 letters. Common phrases can be before rare words despite being longer.

P2 is a fraction between 0 and 1 but should be very close to 0 for best statistics (example: .0001). Use a smaller P2 if you use more input text. P2 is how fast the statistics will change with new text input. If P2 is 0, P3 is ignored. If P2 is 1, P3 overwrites all existing statistics of this MostCommonSubstrings. If P2 is very close to 0, the statistics slowly and accurately learn. Default is .001, if there is no P2.

P3 is max P(0).countP(), which equals P(1).countP(). This is the total quantity of unique strings that this MostCommonSubstrings keeps statistics on. Default is 1000, if there is no P3.

P4 is input text, and is input every time this MostCommonSubstrings is EXECUTED. Default is to not input any more text, if there is no P4.


Field Summary
static double defaultChangeSpeedFraction
          P2
static double defaultMaxListSize
          P3
 
Fields inherited from class codesimian.CS
DESCRIPTION, END, EXECPROXY, HEAP, JAVACODE, MYFUEL, NAME, NEWINSTANCE, NULL, PARENT, PARSEPRIORITY, PREV, TESTER, THIS
 
Constructor Summary
MostCommonSubstrings()
           
 
Method Summary
 java.lang.String description()
          a short description of this CS, shorter than the javadoc, but long enough to tell what the params are for.
 double DForProxy()
          Execute this CS and cast to double.
 int maxP()
          Maximum quantity of Params
 int minP()
          For DForProxy().
Minimum number of parameters in param[] needed to call DForProxy().
Defines which indexs of param[] DForProxy() can use.
Functions with a different number of parameters must override this.
OVERRIDE THIS FUNCTION IF EXEC USES A DIFFERENT NUMBER OF PARAMETERS.
Default is 1.
 
Methods inherited from class codesimian.DefaultCS
B, C, countP, decrementMyFuel, deleteP, F, fuel, getExec, getObject, heap, I, indexP, indexPName, insertB, insertC, insertD, insertF, insertI, insertJ, insertL, insertL, insertL1, insertP, insertS, insertZ, J, javaCode, keyword, LForProxy, LForProxy, myFuel, name, newInstance, objectToCS, objectToCSArray, objectToCSArray, P, prevD, prevL, PType, S, setB, setC, setCountP, setD, setD, setExec, setF, setFuel, setI, setJ, setL, setL, setL, setL1, setMyFuel, setName, setObject, setP, setPrevExec, setPType, setS, setZ, start, toString, V, Z
 
Methods inherited from class codesimian.CS
addB, addC, addD, addF, addI, addJ, addL, addP, addP, addP, addP, addP, addS, addZ, BForProxy, CForProxy, clone, cost, D, deleteP, FForProxy, GETB, GETC, GETD, GETF, GETI, GETJ, GETL, GETS, GETZ, IForProxy, isIllusion, JForProxy, L, L, L, L, L, maxD, minD, overwrites, parent, parsePriority, PB, PC, PD, PF, PI, PJ, PL, prevB, prevC, prevF, prevI, prevJ, prevS, prevZ, proxyOf, PS, PZ, reflect, reflect, reflect6, setB, SETB, setC, SETC, setCost, SETD, setDescription, setF, SETF, setHeap, setI, SETI, setJ, SETJ, SETL, setL, setL, setParent, setParsePriority, setProxyOf, setS, SETS, setTester, setZ, SETZ, SForProxy, tester, toJavaCode, VForProxy, voidReflect, ZForProxy
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

defaultChangeSpeedFraction

public static final double defaultChangeSpeedFraction
P2

See Also:
Constant Field Values

defaultMaxListSize

public static final double defaultMaxListSize
P3

See Also:
Constant Field Values
Constructor Detail

MostCommonSubstrings

public MostCommonSubstrings()
Method Detail

DForProxy

public double DForProxy()
Description copied from class: CS
Execute this CS and cast to double.

D() and DForProxy() are the 2 most important functions in CS. They execute this CS. All other execute functions, by default, return DForProxy cast to their own type.

For example, J() calls the proxy which calls JForProxy() which calls DForProxy(). D() calls the proxy which calls DForProxy().

By default, all other primitive EXECUTE functions defer to D.
Functions that EXECUTE this CS: L(Class) L(int,Class,int) Z() B() C() S() I() J() F() D() V()

Specified by:
DForProxy in class DefaultCS

minP

public int minP()
Description copied from class: DefaultCS
For DForProxy().
Minimum number of parameters in param[] needed to call DForProxy().
Defines which indexs of param[] DForProxy() can use.
Functions with a different number of parameters must override this.
OVERRIDE THIS FUNCTION IF EXEC USES A DIFFERENT NUMBER OF PARAMETERS.
Default is 1.

Overrides:
minP in class DefaultCS

maxP

public int maxP()
Description copied from class: CS
Maximum quantity of Params

Overrides:
maxP in class CS

description

public java.lang.String description()
Description copied from class: CS
a short description of this CS, shorter than the javadoc, but long enough to tell what the params are for. Example use: in automatically generated webpages for CodeSimian. Example: "returns sum of all params" for Add.

Overrides:
description in class DefaultCS