codesimian
Class SelfPredictingPrimeHashingNumberList

java.lang.Object
  extended by codesimian.CS<CSGeneric>
      extended by codesimian.DefaultCS
          extended by codesimian.SelfPredictingPrimeHashingNumberList
All Implemented Interfaces:
CodeSimian

public class SelfPredictingPrimeHashingNumberList
extends DefaultCS

THIS CLASS IS NOT FINISHED. 9/07 I still intend to finish building this.

The purpose of this algorithm is to play realtime audio from any subset of a compressed file, and for each possible subset to predict the entire audio a little differently. It will not matter which individual floats from the compressed file you load into memory, any subset of single floats from any set of indexs (in array bounds) can approximate the entire original audio data. Use this to dynamicly choose which parts of the compressed file to load into memory at a time, chosen by which were most needed to predict the values (-1 to 1) of audio that were requested. Many times per second, load many new floats from the compressed file, and remove many floats from memory, changing which subset of floats is loaded into memory but keeping the same size.

Example: 500,000 floats in memory and 5,000,000 floats in a file waiting to be loaded and 50,000,000 floats in the original audio data. I dont know exactly what compression ratio this will get.

Use that different set of floats to better predict the new values requested (of the original audio data). Predict which indexs (containing predictions of predictions... of audio data) will be requested by which indexs were requested recently.

ALL ARRAY SIZES MUST BE PRIME. THE ARRAYS data[] AND prime[] MUST BE THE SAME SIZE. IF YOU MUST HAVE A SPECIFIC ARRAY SIZE, CHOOSE THE SMALLEST PRIME AT LEAST THAT SIZE AND FILL THE END OF THE ARRAY WITH ZEROS.

------------------ALGORITHM:------------------

Start with a float[] array (size must be prime) containing numbers between -1.0 and 1.0, representing a sequence of audio samples.

A different list (same size as the float[] array) contains UNIQUE PRIME numbers, each number's value at least 1000 times bigger than the float[] array's size (for good hashing).

Each number in the float[] array is paired with one of the prime numbers.

Each (value at an) index in the float[] array is approximately a prediction derived from the values at many of the other indexs.

Each (value at an) index is paired with a distribution of its prime number hashing over index quantity and divided by index quantity to give a set of numbers all between 0 and 1.

Values in the float[] array should slowly change so that any set of other (values at) indexs predicts them better.

The result of this change is that you can approximate any (value at an) index by using any set of other (values at) indexs, and the more (i'm going to stop writing 'values at' here...) indexs you have, the better you can predict others.

This algorithm is a form of LOSSY COMPRESSION (not limited to but designed for: AUDIO).

By reordering the prime numbers (so each pairs with a different index than before), you can find a float[] array that predicts itself better (and still is similar to the original audio data).

By adjusting some floats in the array more or less than the others eacy cycle, the float[] array would learn differently therefore will become a slightly different array.

When the float[] array has been through enough cycles, you can choose any arbitrary subset of floats and use them to predict any arbitrary subset of floats from the uncompressed original audio data. Save this subset of floats as the compressed file.

You can even split the compression into multiple files, and any subset of the files combines to a slightly different version of the audio, the same way subsets of file(s) can be combined.

But it does not blur times. Times are at exact array indexs.

It is easy to predict the distribution of 1 prime number hashed over array length divided by array length. For the next biggest weight, add the prime to the current index and wrap around array length. For the next smallest, subtract and wrap. Of course you can multiply and modulus it to ***decrease the Big-O from linear to to constant***. It would work better to use the square-root of [the index divided by array length] instead of index divided by array length because influence of indexs on other indexs would be more localized (but still blurred), and much less influence from/to the majority.

Save the whole (or part of) changed audio data to a file. Load any part of the audio data by predicting it based on which parts of the changed data are currently in memory (less than all of it). Using the prediction described above, predict, then predict based on those predictions, and form a small prediction-tree... to more accurately predict the root of the tree (all indexs are roots, but you only play 1 at a time). Using this method, you can get more accurate numbers when predicting from a small part of the compressed data.

After the compressing starts, the prime numbers are never reordered, and should be saved in any file that contains a float[] array derived from those numbers.

The advantage of this algorithm for lossy audio compression is that you can decompress any part of the audio from any subset of the compressed file. You could take single floats from random places in the file and use them to predict any of the original audio data. All floats are part of all original audio data, relative to an exponent (2? 3? 6.7?) and prime number hashing.

To do the next cycle, for each index, use all (or the most-influencing subset calculated by prime numbers), to calculate the value of that index. Compare it to its current value. Modify it to be a little closer to the predicted value. After you do that for all indexs, the array is ready to predict itself and modify itself again. Each time it becomes more distorted and more compressed and each moment of sound is spread more through the whole array.

For each index x, the sum of all weight(x,y) for all indexs y, should equal 1 (and none may be negative). The final value at each index is a weighted sum of predictions of other indexs.

Combine this algorithm (above) with the neural network for natural-language and compiling I'm building 8/06??? 6/07 There is 1 best natural language algorithm, and it can be applied to words or letters.

I know it will later be better to use doubles and longs, but for now I have chosen floats for the audio data and ints for the primes. This is necessary to compress smaller than MP3s, which use 16-bit numbers.

floats that are always between -1 and 1 waste memory. 9 bits of each float are exponent. It would be better to use an int divided by 2^31.

THIS IS A COMPLEX ALGORITHM. THIS CLASS IS NOT FINISHED, BUT THE JAVADOC PROBABLY CONTAINS EVERYTHING YOU NEED TO BUILD IT (BUT NOT ORGANIZED WELL).

--------------------FILE:---------------------
For better compression in the file, maybe the float[] array should instead be saved as 24 bit unsigned integers that when interpreted as double (64 bits) are between 0.0 and 1.0 in memory. Its probably more efficient for its range to be 0.0 to .999999999999999 instead of 1.0 because the bits would align better (without a crack near the middle), therefore it should range 0 (inclusive) to 1 (exclusive). Or maybe use less bits per number because normal audio is only 16 bits per number. If you use anything other than the same list of prime numbers every time, you need to save the primes in the file with the floats. Use ints (32 bits) for primes, or for more efficiency, use a few less bits. Create a header which tells the sizes of: the header, the float section, and the int section. If there are more floats than ints, wrap. If there are more ints than floats, the extra ints are wasted and an error thrown. Or pair one float and one int and repeat? Many possible ways to store a file... ... A way that is usually smallest and always simplest: Dont reorder the primes. Specify a minimum prime and quantity of primes, and the set of primes is known from that information. You can compress without reording the primes. Thats just an optimization.


Field Summary
 
Fields inherited from class codesimian.CS
DESCRIPTION, END, EXECPROXY, HEAP, JAVACODE, MYFUEL, NAME, NEWINSTANCE, NULL, PARENT, PARSEPRIORITY, PREV, TESTER, THIS
 
Constructor Summary
SelfPredictingPrimeHashingNumberList()
           
 
Method Summary
 double cost()
          cost() should be changed to return a float, and should be renamed to costToExecute()

cost of EXECUTING this CS, not including any CSs it executes recursively.
 int defaultArraySize()
          Must be prime.
 java.lang.String description()
          Deprecated. public byte isIllusion(int index){ return 1; } public int countP(){ checkArrays(); return data.length; }
 double DForProxy()
          modify data[] 1 cycle farther, to predict itself better.
 java.lang.String keyword()
          For the CodeSimian language as a String.
CodeSimian language keyword, like "+" "*" "max" ">" etc.

Override this function if you want to specify a keyword other than how I derive them from the class name, like + for Add.

Some CSs might never be intended to be used in the language by their keyword.
The best example (4/05) is Num, because it is used in the language like "3.4" instead of "num()".
 int minP()
          For DForProxy().
Minimum number of parameters in param[] needed to call DForProxy().
Defines which indexs of param[] DForProxy() can use.
Functions with a different number of parameters must override this.
OVERRIDE THIS FUNCTION IF EXEC USES A DIFFERENT NUMBER OF PARAMETERS.
Default is 1.
 int minPrimeIndex()
          2 is the first prime-number.
 CS P(int index)
          WARNING: if add CSs then delete them, they are still in the param[] array and can be returned in this function, despite them being out of valid range: index at least countP().
 double PD(int index)
          Returns a param as a double.
 float PF(int index)
           
 float predict(int index, int useThisManyNumbersToPredictIt)
          looks at useThisManyNumbersToPredictIt numbers in data[] and predicts the value at data[index] without looking at data[index] except by accident.
 float[] predictAll(int useThisManyNumbersToPredictEachIndex)
           
static void randomizeIndexs(int[] anyArray)
           
 boolean setD(int index, double value)
          sets a param to a double value.
 boolean setF(int index, float value)
           
 boolean setP(int index, CS value)
          Every CS is a list of other CSs, between size minP() and maxP() inclusive.
 float weight(int nonlinearIndexsAhead)
          weights may or may not be calculated by some function of (prime[x]*nonlinearIndexsAhead)%data.length.
 float weight(int predictThisIndex, int observeMe)
          returns the weight of how much the index observeMe should be used to predict the value at predictThisIndex.
 
Methods inherited from class codesimian.DefaultCS
B, C, countP, decrementMyFuel, deleteP, F, fuel, getExec, getObject, heap, I, indexP, indexPName, insertB, insertC, insertD, insertF, insertI, insertJ, insertL, insertL, insertL1, insertP, insertS, insertZ, J, javaCode, LForProxy, LForProxy, myFuel, name, newInstance, objectToCS, objectToCSArray, objectToCSArray, prevD, prevL, PType, S, setB, setC, setCountP, setD, setExec, setFuel, setI, setJ, setL, setL, setL, setL1, setMyFuel, setName, setObject, setPrevExec, setPType, setS, setZ, start, toString, V, Z
 
Methods inherited from class codesimian.CS
addB, addC, addD, addF, addI, addJ, addL, addP, addP, addP, addP, addP, addS, addZ, BForProxy, CForProxy, clone, D, deleteP, FForProxy, GETB, GETC, GETD, GETF, GETI, GETJ, GETL, GETS, GETZ, IForProxy, isIllusion, JForProxy, L, L, L, L, L, maxD, maxP, minD, overwrites, parent, parsePriority, PB, PC, PI, PJ, PL, prevB, prevC, prevF, prevI, prevJ, prevS, prevZ, proxyOf, PS, PZ, reflect, reflect, reflect6, setB, SETB, setC, SETC, setCost, SETD, setDescription, setF, SETF, setHeap, setI, SETI, setJ, SETJ, SETL, setL, setL, setParent, setParsePriority, setProxyOf, setS, SETS, setTester, setZ, SETZ, SForProxy, tester, toJavaCode, VForProxy, voidReflect, ZForProxy
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SelfPredictingPrimeHashingNumberList

public SelfPredictingPrimeHashingNumberList()
Method Detail

keyword

public java.lang.String keyword()
Description copied from class: DefaultCS
For the CodeSimian language as a String.
CodeSimian language keyword, like "+" "*" "max" ">" etc.

Override this function if you want to specify a keyword other than how I derive them from the class name, like + for Add.

Some CSs might never be intended to be used in the language by their keyword.
The best example (4/05) is Num, because it is used in the language like "3.4" instead of "num()".
Default: Returns class name, minus package name (and its dots), and change the first letter to lowercase.

For example, CS.MaxParams does not override keyword(), which returns "maxP".

Overrides:
keyword in class DefaultCS
See Also:
CS.parent(), CS.newInstance(), CS.name()

DForProxy

public double DForProxy()
modify data[] 1 cycle farther, to predict itself better. countP() should equal data.length. Use the values in prime[] and data.length (equal to prime.length) to compute the new values of data[].

Instead of being a list of numbers, should this class contain 2 lists, 1 of numbers and 1 of primes? Its better to include the primes because they can be reordered to lossy-compress the data in the first list better.

Specified by:
DForProxy in class DefaultCS

cost

public double cost()
Description copied from class: CS
cost() should be changed to return a float, and should be renamed to costToExecute()

cost of EXECUTING this CS, not including any CSs it executes recursively. The INTERNAL cost.

One unit of myFuel() costs cost() units of fuel(). Beware of CSs that counterfeit fuel.

Returns 1 by default.

Overrides:
cost in class CS

minP

public int minP()
Description copied from class: DefaultCS
For DForProxy().
Minimum number of parameters in param[] needed to call DForProxy().
Defines which indexs of param[] DForProxy() can use.
Functions with a different number of parameters must override this.
OVERRIDE THIS FUNCTION IF EXEC USES A DIFFERENT NUMBER OF PARAMETERS.
Default is 1.

Overrides:
minP in class DefaultCS

P

public CS P(int index)
Description copied from class: DefaultCS
WARNING: if add CSs then delete them, they are still in the param[] array and can be returned in this function, despite them being out of valid range: index at least countP().

Overrides:
P in class DefaultCS
Parameters:
index - range 0 (or neg?) to countP()-1 inclusive
See Also:
CS.heap()

PD

public double PD(int index)
Description copied from class: CS
Returns a param as a double. Same as P(paramIndex).D()

Why not just use P(paramIndex).D() instead? Its not just for convenience. Some CSs store their params as doubles or other primitives (or any type of Object, usually String or CS), and if you predict correctly how params are stored, PD(paramIndex) is much faster than P(paramIndex).D(), but BEWARE: if you predict incorrectly its a little slower. It calculates the same double either way.

For all paramIndex: P(paramIndex).D() == PD(paramIndex)

Overrides:
PD in class CS

PF

public float PF(int index)
Overrides:
PF in class CS
See Also:
CS.PD(int)

setP

public boolean setP(int index,
                    CS value)
Description copied from class: CS
Every CS is a list of other CSs, between size minP() and maxP() inclusive. setP overwrites one of those CSs or adds a new one at the end, depending on the index. If index is between 0 and countP()-1, overwrites. If it equals countP(), adds at end.

Overrides:
setP in class DefaultCS

setD

public boolean setD(int index,
                    double value)
Description copied from class: CS
sets a param to a double value. Same as P(paramIndex).setD(value)

Overrides:
setD in class DefaultCS

setF

public boolean setF(int index,
                    float value)
Overrides:
setF in class DefaultCS
See Also:
CS.setD(int,double)

description

public java.lang.String description()
Deprecated. public byte isIllusion(int index){ return 1; } public int countP(){ checkArrays(); return data.length; }

everything is an illusion. for all P(x), x is the index into a float[] array instead of a CS, and a CS from the constant-pool is returned.

Overrides:
description in class DefaultCS

predictAll

public float[] predictAll(int useThisManyNumbersToPredictEachIndex)

predict

public float predict(int index,
                     int useThisManyNumbersToPredictIt)
looks at useThisManyNumbersToPredictIt numbers in data[] and predicts the value at data[index] without looking at data[index] except by accident. useThisManyNumbersToPredictIt is probably data.length.


defaultArraySize

public int defaultArraySize()
Must be prime. for#proofThat49999IsPrime( 0#i 51000 if(==I(%(i 49999) 0) ask("found divisible" i)) )


minPrimeIndex

public int minPrimeIndex()
2 is the first prime-number. This is the 200000th prime. Many primes are used in this class. Do not use the first minPrimeIndex quantity of primes. If this is 5, it will return the 5th prime: 9. The returned index is used before the primes are created. A prime number has exactly 2 factors, 1 and itself, therefore 1 is not a prime number.


weight

public float weight(int predictThisIndex,
                    int observeMe)
returns the weight of how much the index observeMe should be used to predict the value at predictThisIndex. For each index x, the sum of all weight(x,y) for all indexs y, should equal 1 (and none may be negative).

The distribution of weights should be much heavier around (prime[x]*z)%data.length when z is smaller. Usually z should only be calculated for values less than approximately data.length/20 or a higher ratio of the array is smaller. where z ranges from 0 to


weight

public float weight(int nonlinearIndexsAhead)
weights may or may not be calculated by some function of (prime[x]*nonlinearIndexsAhead)%data.length. nonlinearIndexsAhead ranges 0 to data.length-1, but its usually only necessary to calculate values close to 0, like data.length/20.


randomizeIndexs

public static void randomizeIndexs(int[] anyArray)