Plans for CodeSimian
things that should be added to CodeSimian,
|
CodeSimian is organized around the public functions of 1 file: CS.java,
but there are too many functions without clear relationships to eachother.
2/3 of those functions can be organized into a 6-dimensional grid.
The dimensions are: RETURN - any primitive or object (including arrays) INSTANCE - any CS object, or maybe any type of object. For example, to convert a CS to a StringTokenizer, execute thatCS.L(StringTokenizer.class) 1 OF 4 ACTIONS: GET, SET, INSERT, OR DELETE - the 4 basic list operations LOCATION - usually int, sometimes String or Class, possibly double (for interpolating between indexs), but any primitive/object/array is valid QUANTITY - usually int, possibly double (for interpolating between indexs) VALUE - any primitive/object/array These 6 dimensions can be used in any arbitrary combination to specify (or create?) new Java functions. I created approximately 100 functions inside CS.java, but this 6-dimensional grid can prove the need for specific other functions, and organize them at runtime. Example: new S("abcdefgh").L(3,double[].class,4) RETURN TYPE = double[].class RETURN VALUE = new double[]{ (double)'d', (double)'e', (double)'f', (double)'g' } INSTANCE TYPE = codesimian.S.class INSTANCE VALUE = new S("abcdefgh") 1 OF 4 ACTIONS TYPE = byte (only the lowest 2 bits) 1 OF 4 ACTIONS VALUE = CS.GET, which equals (byte)0 LOCATION TYPE = int.class LOCATION VALUE = 3 QUANTITY TYPE = int.class QUANTITY VALUE = 4 VALUE TYPE = java.lang.Class.class VALUE VALUE = double[].class Example: aCompiler.insertL1(1,"sound(*(microphone 3))") RETURN = true INSTANCE = aCompiler 1 OF 4 ACTIONS = INSERT LOCATION = 1 QUANTITY = 1 VALUE = "sound(*(microphone 3))" Create some simple Java interfaces to be used as labels to organize these 6 dimensions. 4 interfaces GET.java, SET.java, INSERT.java, DELETE.java would include the relevant functions in CS.java, and CS would implement all 4 of these. Maybe create a ONE.java class to label functions that only operate on 1 index or 1 thing, and maybe a MANY.java class for functions whose quantity varies. Or maybe combine them to get 8 interfaces including GETONE.java GETMANY.java SETONE.java etc. It does not fit perfectly, but there should be a way to specify any function in this 6-dimensional grid by a String. Use part of java's class file format, described here: Element Type / Encoding boolean / Z byte / B char / C class or interface / Lclassname; double / D float / F int / I long / J short / S For example: new S("abcdefgh").L(3,double[].class,4) would use the function described by something like this string: "Ljava.lang.Object;Lcodesimian.CS;GET;I;I;Ljava.lang.Class;" and it might be used dynamicly as codesimian code this way: Ljava.lang.Object;Lcodesimian.CS;GET;I;I;Ljava.lang.Class;( list(100 101 102 103) "abcdefgh" GET 3 4 javaType("double[]") ) |
compileFirstThatWorks#I_replace_tokensToObjects( 0 0 list#theObjects(objects...) createDefaultObject#first(0 0) findObjectByName#second(0 0 theObjects) createJavaMethodWrapper#third(0 0) createNewVar#fourth(0 0) )
To become more consistent, codesimian's syntax needs to be able to define itself as objects... When ; precedes (, ), or #, then it means the symbol for that thing. ;;;( is the symbol for the symbol for the symbol for (. Example: parseObjects(0 list( +(0 0) ;( 2 ;;;;# 4 ;) )) becomes parseObjects(+(2 ;;;# 4) list( +(0 0) ;( 2 ;;;;# 4 ;) )) Notice that ;;;# has 1 less ; than before it was used by parseObjects. TokenizeCode should recognize ;;;;# as 1 token. TokensToObjects should convert it to 1 object. ParseObjects should execute it to determine how many ; are in it. OR??? ; could be a token by itself: ; ; ; ; # then all symbols of the same type could be the same object. It would not be known until parseObjects that its the symbol for # instead of a real #. But tokensToObjects is incompatible with that (currently), because it uses # to give objects names before giving those objects to parseObjects. ???????????? what should I do? ?????????? I could use # as an object and put a string in after it and send that to parseObjects, like this... parseObjects( 0 list( +(0 0) ; # "thePlus" ; ( 2 3 4 ; ) ) ) which becomes parseObjects( +#thePlus(2 3 4) list( +(0 0) ; # "thePlus" ; ( 2 3 4 ; ) ) ) ... so this is what needs to be done: * delay parsing of # and the name after it until parseObjects (instead of during tokensToObjects). * tokenizeCode needs to return each ; as a separate token * parseObjects needs to check for these separately... ; # and names after # ... but if tokensToObjects doesnt give names to objects, then how can it have DUPLICATE OBJECTS in the list it returns? Duplicate objects are necessary for graph shapes of code. ... If tokensToObjects keeps checking for # and assigning names to objects, it will have to make sure its # instead of ;# or ;;;;;#, but it can ignore ; in combination with ( or ), because those can be done in parseObjects. TokensToObjects only needs to have that exception for NAMES because it uses names to identify equal objects. ... do I need a 4th compiler step somewhere in the middle? ...... MAJOR PROBLEM: ;;( is the symbol for the symbol for (, but what is the symbol for ; ? That would need an other symbol, and an other symbol for that, on to infinity... Do I ever need a symbol for ;? I do if I'm compiling code that includes ;. By making ;;( be 1 token, ; is not a symbol at all, but it causes other problems... ... Should I replace ; with \ ? \ is already the escape character for strings. I could use it to escape codesimian's 3 arbitrary syntax characters: # ( and ) with \# \( and \) of course \" \t \n \\ etc are still included for strings. If it can be done, this is the best solution because it combines 2 escape characters into 1: \ and ;. ... Can I assume \ is always followed by exactly 1 character to be escaped?