NaturalLanguage

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

codesimian
Class NaturalLanguage

java.lang.Object
  codesimian.NaturalLanguage

public class NaturalLanguage
extends java.lang.Object
extends java.lang.Object

Converts strings of natural language, like this sentence, to arrays of numbers so computers can understand them better.

Nested Class Summary
`static class`	`NaturalLanguage.OnlyTheMostCommonSymbolsOnKeyboard`
`static class`	`NaturalLanguage.RemoveEverythingExceptLetters`
`static interface`	`NaturalLanguage.StringTransform`

Field Summary
`protected java.lang.String`	`delimiters`
`protected static java.util.Vector<java.lang.String>`	`exampleText`
`protected java.util.HashMap<java.lang.String,java.lang.String[]>`	`similarStrings` Strings that can often be used interchangably, like "3" with "e" in "b33r" or "beer" Also bigger strings like "y" with "ies" in "penny" or "pennies".
`protected java.lang.String[]`	`word` most common words have lowest indexs
`protected int[]`	`wordsFound`
`protected java.util.HashMap<java.lang.String,java.lang.Integer>`	`wordToIndex` key is a word.

Constructor Summary
`NaturalLanguage(int wordsToUse)` You must addExampleText() some sentences for me to use in my calculations.
`NaturalLanguage(java.lang.String sentences, int wordsToUse)` Same as NaturalLanguage(int) except calls addExampleText(sentences)

Method Summary
`static void`	`addExampleText(java.lang.String t)`
`protected void`	`addSimilarStrings(java.lang.String[] similarToEachOther)` Adds all to similarStrings, each as its own key, and the array is the value.
`void`	`calculateNewMostCommonWords()` fills word[] (and wordToIndex) with the most common formatted words from exampleText.
`static double`	`chanceIsCorrectlySpelledWord(java.lang.String possibleWord)` the first call causes this class to read all of codesimian's inner text files
`static double`	`chanceTextIsNatLang(java.lang.String possibleNaturalLanguage)` Returns between 0 (certainly not natural language) and 1 (certainly natural language)
`static java.lang.String`	`endWithPunctuation(java.lang.String s, char punctuation)` replaces whatever punctuation, if any, the String ends with, or adds punctuation at the end.
`java.lang.String`	`formatWord(java.lang.String word)` Formats a word in a standard way.
`java.lang.String`	`getDelimiters()`
`int`	`getIndex(java.lang.String theWord)` returns the index of a wordwordToIndex.get(word).intValue().
`java.lang.String`	`getWord(int index)` returns word[index].
`boolean`	`setDelimiters(java.lang.String delimiters)`
`int[]`	`tokenize(java.lang.String sentences)` Returns a sequence of ints representing the most common words (from the other main text) in 'sentences'.
`int`	`wordCount()` Returns how many unique words are recognized, word.length.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

wordToIndex

protected java.util.HashMap<java.lang.String,java.lang.Integer> wordToIndex

key is a word. value is an Integer index for indexToWord.
word[ wordToIndex.get("is").intValue() ] equals "is".

word

protected java.lang.String[] word

most common words have lowest indexs

wordsFound

protected int[] wordsFound

delimiters

protected java.lang.String delimiters

similarStrings

protected java.util.HashMap<java.lang.String,java.lang.String[]> similarStrings

Strings that can often be used interchangably, like "3" with "e" in "b33r" or "beer" Also bigger strings like "y" with "ies" in "penny" or "pennies". Anywhere one string is found, an AI might think about what the text would mean if parts were replaced by a similar string. To cut down on the number of words, all similar words may be thought of as the same word.

Key is some string that is a key or value in similarStrings.
Value is a String[] array containing the key you searched for and all strings similar to it.

exampleText

protected static java.util.Vector<java.lang.String> exampleText

Constructor Detail

NaturalLanguage

public NaturalLanguage(int wordsToUse)

You must addExampleText() some sentences for me to use in my calculations.

Parameters:: wordsToUse - number of words that are recognized. All other words are ignored and never returned in a sequence

NaturalLanguage

public NaturalLanguage(java.lang.String sentences,
                       int wordsToUse)

Same as NaturalLanguage(int) except calls addExampleText(sentences)

Method Detail

getIndex

public int getIndex(java.lang.String theWord)

returns the index of a wordwordToIndex.get(word).intValue(). If the word is not already formatted, call getIndex(formatWord(word)) instead. The most common word is 0, second most common is index 1... Returns -1 if word does not exist therefore has no index.

getWord

public java.lang.String getWord(int index)

returns word[index]. The most common word is 0, second most common is index 1...

wordCount

public int wordCount()

Returns how many unique words are recognized, word.length. Indexs range 0 to wordCount()-1.

setDelimiters

public boolean setDelimiters(java.lang.String delimiters)

getDelimiters

public java.lang.String getDelimiters()

chanceIsCorrectlySpelledWord

public static double chanceIsCorrectlySpelledWord(java.lang.String possibleWord)

the first call causes this class to read all of codesimian's inner text files

addSimilarStrings

protected void addSimilarStrings(java.lang.String[] similarToEachOther)

Adds all to similarStrings, each as its own key, and the array is the value. If a String has already been added as a similar string, its value is updated to this new array.

formatWord

public java.lang.String formatWord(java.lang.String word)

Formats a word in a standard way. Makes word lower-case and changes most plurals to singular. Assumes english language. Removes anything thats not letter or digit. Returns null if word looks like its not a word.

addExampleText

public static void addExampleText(java.lang.String t)

calculateNewMostCommonWords

public void calculateNewMostCommonWords()

fills word[] (and wordToIndex) with the most common formatted words from exampleText. Call this once after a lot of text has been entered with addExampleText(), and before tokenize().

tokenize

public int[] tokenize(java.lang.String sentences)

Returns a sequence of ints representing the most common words (from the other main text) in 'sentences'. Tokenizes and formats the words before checking if they equal known words.

endWithPunctuation

public static java.lang.String endWithPunctuation(java.lang.String s,
                                                  char punctuation)

replaces whatever punctuation, if any, the String ends with, or adds punctuation at the end.

chanceTextIsNatLang

public static double chanceTextIsNatLang(java.lang.String possibleNaturalLanguage)

Returns between 0 (certainly not natural language) and 1 (certainly natural language)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

codesimian Class NaturalLanguage

wordToIndex

word

wordsFound

delimiters

similarStrings

exampleText

NaturalLanguage

NaturalLanguage

getIndex

getWord

wordCount

setDelimiters

getDelimiters

chanceIsCorrectlySpelledWord

addSimilarStrings

formatWord

addExampleText

calculateNewMostCommonWords

tokenize

endWithPunctuation

chanceTextIsNatLang

codesimian
Class NaturalLanguage