|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcodesimian.NaturalLanguage
public class NaturalLanguage
Converts strings of natural language, like this sentence, to arrays of numbers so computers can understand them better.
Nested Class Summary | |
---|---|
static class |
NaturalLanguage.OnlyTheMostCommonSymbolsOnKeyboard
|
static class |
NaturalLanguage.RemoveEverythingExceptLetters
|
static interface |
NaturalLanguage.StringTransform
|
Field Summary | |
---|---|
protected java.lang.String |
delimiters
|
protected static java.util.Vector<java.lang.String> |
exampleText
|
protected java.util.HashMap<java.lang.String,java.lang.String[]> |
similarStrings
Strings that can often be used interchangably, like "3" with "e" in "b33r" or "beer" Also bigger strings like "y" with "ies" in "penny" or "pennies". |
protected java.lang.String[] |
word
most common words have lowest indexs |
protected int[] |
wordsFound
|
protected java.util.HashMap<java.lang.String,java.lang.Integer> |
wordToIndex
key is a word. |
Constructor Summary | |
---|---|
NaturalLanguage(int wordsToUse)
You must addExampleText() some sentences for me to use in my calculations. |
|
NaturalLanguage(java.lang.String sentences,
int wordsToUse)
Same as NaturalLanguage(int) except calls addExampleText(sentences) |
Method Summary | |
---|---|
static void |
addExampleText(java.lang.String t)
|
protected void |
addSimilarStrings(java.lang.String[] similarToEachOther)
Adds all to similarStrings, each as its own key, and the array is the value. |
void |
calculateNewMostCommonWords()
fills word[] (and wordToIndex) with the most common formatted words from exampleText. |
static double |
chanceIsCorrectlySpelledWord(java.lang.String possibleWord)
the first call causes this class to read all of codesimian's inner text files |
static double |
chanceTextIsNatLang(java.lang.String possibleNaturalLanguage)
Returns between 0 (certainly not natural language) and 1 (certainly natural language) |
static java.lang.String |
endWithPunctuation(java.lang.String s,
char punctuation)
replaces whatever punctuation, if any, the String ends with, or adds punctuation at the end. |
java.lang.String |
formatWord(java.lang.String word)
Formats a word in a standard way. |
java.lang.String |
getDelimiters()
|
int |
getIndex(java.lang.String theWord)
returns the index of a wordwordToIndex.get(word).intValue(). |
java.lang.String |
getWord(int index)
returns word[index]. |
boolean |
setDelimiters(java.lang.String delimiters)
|
int[] |
tokenize(java.lang.String sentences)
Returns a sequence of ints representing the most common words (from the other main text) in 'sentences'. |
int |
wordCount()
Returns how many unique words are recognized, word.length. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected java.util.HashMap<java.lang.String,java.lang.Integer> wordToIndex
protected java.lang.String[] word
protected int[] wordsFound
protected java.lang.String delimiters
protected java.util.HashMap<java.lang.String,java.lang.String[]> similarStrings
protected static java.util.Vector<java.lang.String> exampleText
Constructor Detail |
---|
public NaturalLanguage(int wordsToUse)
wordsToUse
- number of words that are recognized.
All other words are ignored and never returned in a sequencepublic NaturalLanguage(java.lang.String sentences, int wordsToUse)
Method Detail |
---|
public int getIndex(java.lang.String theWord)
public java.lang.String getWord(int index)
public int wordCount()
public boolean setDelimiters(java.lang.String delimiters)
public java.lang.String getDelimiters()
public static double chanceIsCorrectlySpelledWord(java.lang.String possibleWord)
protected void addSimilarStrings(java.lang.String[] similarToEachOther)
public java.lang.String formatWord(java.lang.String word)
public static void addExampleText(java.lang.String t)
public void calculateNewMostCommonWords()
public int[] tokenize(java.lang.String sentences)
public static java.lang.String endWithPunctuation(java.lang.String s, char punctuation)
public static double chanceTextIsNatLang(java.lang.String possibleNaturalLanguage)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |