codesimian
Class Internet

java.lang.Object
  extended by codesimian.Internet

public class Internet
extends java.lang.Object

static functions relevant to downloading .html and text files from the internet, reading them, finding URL links, and searching for information.

Try to obey http://website/robots.txt if it has a robots.txt.


Field Summary
static int defaultSearchLiquidBytesDownloadInternetAmount
           
static int defaultSearchMaxCacheMillisOld
           
static int defaultSearchMaxDurationMillis
           
static int defaultSearchTargetReturnQuantity
           
 
Method Summary
static java.lang.String download(java.lang.String url, Liquid liquidBytesDownloadInternet, double maxCacheMillisOld)
          returns the download from cache if the cached copy is new enough.
static int maxMillisToWaitWhileNoBytesDownloadedBeforeEndingDownload()
          returns 20000, 20 seconds
static int minBytesPerSecondToKeepDownloading()
          returns 30000.
static java.lang.String[] possibleRobotsDotTxtLocations(java.lang.String url)
          given any URL, guesses where a robots.txt file may be found.
static CS[] search(java.lang.String[] startSearchingFromUrls, CS measureOfText)
          tries to get Liquid from class FreeLiquid
static CS[] search(java.lang.String[] startSearchingFromUrls, CS measureOfText, Liquid liquidBytesDownloadInternet)
           
static CS[] search(java.lang.String[] startSearchingFromUrls, CS measureOfText, Liquid liquidBytesDownloadInternet, int targetReturnQuantity, int maxSearchDurationMillis, int maxCacheMillisOld)
          Searches the internet for text that scores the highest.
static void throwIfNotValidURL(java.lang.String url)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

defaultSearchTargetReturnQuantity

public static final int defaultSearchTargetReturnQuantity
See Also:
Constant Field Values

defaultSearchMaxDurationMillis

public static final int defaultSearchMaxDurationMillis
See Also:
Constant Field Values

defaultSearchMaxCacheMillisOld

public static final int defaultSearchMaxCacheMillisOld
See Also:
Constant Field Values

defaultSearchLiquidBytesDownloadInternetAmount

public static final int defaultSearchLiquidBytesDownloadInternetAmount
See Also:
Constant Field Values
Method Detail

search

public static CS[] search(java.lang.String[] startSearchingFromUrls,
                          CS measureOfText,
                          Liquid liquidBytesDownloadInternet,
                          int targetReturnQuantity,
                          int maxSearchDurationMillis,
                          int maxCacheMillisOld)
Searches the internet for text that scores the highest. Scoring is done by a CS parameter. The search has limits on bytes downloaded, cache age, and search time.



Returns an array of CSs, each with at least 4 params:

P0 is a string URL (or string filepathname?), location of this search result.

P1 is a subset of the string page contents, the subset interesting to the search.

P2 is the start index of that subset. subset size + P2 - 1 is the last index.

P3 is the time (milliseconds since 1970) the page was downloaded, may be an old page from cache, but is at least as new as now-maxCacheMillisOld.



Parameters:
startSearchingFromUrls - are where to start searching.

measureOfText - is the goal function. Put text in its P0 and execute it. Returns a higher number for better text.

liquidBytesDownloadInternet.amount() - is maximum bytes that can be downloaded from internet in this search. liquidBytesDownloadInternet.amount() decreases when bytes are downloaded. If not all bytes are used, the caller should take the remaining Liquid.

targetReturnQuantity - is how big an array you want returned. Array may be smaller but not bigger.

maxSearchDurationMillis - is how long the search may last before being forcefully ended.

maxCacheMillisOld - is how many milliseconds old a page in cache may be and still return it.

search

public static CS[] search(java.lang.String[] startSearchingFromUrls,
                          CS measureOfText,
                          Liquid liquidBytesDownloadInternet)

search

public static CS[] search(java.lang.String[] startSearchingFromUrls,
                          CS measureOfText)
tries to get Liquid from class FreeLiquid


download

public static java.lang.String download(java.lang.String url,
                                        Liquid liquidBytesDownloadInternet,
                                        double maxCacheMillisOld)
returns the download from cache if the cached copy is new enough. Will not download more than liquidBytesDownloadInternet.amount() bytes.


minBytesPerSecondToKeepDownloading

public static int minBytesPerSecondToKeepDownloading()
returns 30000. Got a dial-up modem? Too bad.


maxMillisToWaitWhileNoBytesDownloadedBeforeEndingDownload

public static int maxMillisToWaitWhileNoBytesDownloadedBeforeEndingDownload()
returns 20000, 20 seconds


possibleRobotsDotTxtLocations

public static java.lang.String[] possibleRobotsDotTxtLocations(java.lang.String url)
given any URL, guesses where a robots.txt file may be found. Usually at root level of website.


throwIfNotValidURL

public static void throwIfNotValidURL(java.lang.String url)