mckay.utilities.webservices
Class NetworkSearch

java.lang.Object
  extended by mckay.utilities.webservices.NetworkSearch
Direct Known Subclasses:
GoogleWebSearch, YahooWebSearch

public abstract class NetworkSearch
extends java.lang.Object

An abstract class for submitting various types of search queries to arbitrary on-line services using arbitrary types of web service formats. Fields may be set in order to set parameters for all subsequent searches performed by objects of this class.


Field Summary
static int GOOGLE_SOAP_CODE
          The code used to identify searches using the GoogleWebSearch class that extends this class.
protected  boolean include_similar_but_non_matching
          Whether results returned by search queries performed by this object may include hits that do not contain one or more of the specified search string(s) but do contain terms very similar to them (e.g.
static java.lang.String[] included_countries
          The countries that may be specified in searches.
static java.lang.String[] included_file_types
          The file types that may be specified in searches.
static java.lang.String[] included_languages
          The languages that may be specified in searches.
protected  java.lang.String limit_to_country
          Name of a country that sites must be in in order to be included in search results.
protected  java.lang.String limit_to_file_type
          A file extension a document must have in order to be returned as a hit in search results.
protected  java.lang.String limit_to_language
          Name of a language that hits must be in in order to be included in search results.
protected  boolean literal_search
          Whether all search queries performed by this object should be literal searches (e.g.
protected  boolean or_based_search
          Whether search queries performed by this object need only contain one of the specified query words in order to result in a hit.
protected  java.lang.String region_to_search_from
          Name of a country where the search will be performed (i.e.
protected  java.lang.String specific_site
          Network site that will be exclusively searched in all search queries performed by this object.
protected  java.lang.String[] strings_to_exclude
          Strings to exclude in all search queries performed by this object (i.e filter strings).
protected  boolean suppress_adult_content
          Whether to suppress hits that are classified as containing adult content by the search service in question.
protected  boolean suppress_similar_hits
          Whether to suppress similar hits when reporting results.
static int YAHOO_REST_CODE
          The code used to identify searches using the YahooWebSearch class that extends this class.
 
Constructor Summary
NetworkSearch()
          Creates a new instance of NetworkSearch with fields set to defaults.
 
Method Summary
protected abstract  void formatErrorMessage(java.lang.Exception exception, java.lang.String query, int max_results)
          Takes in an exception and then throws a new Exception that identifies the problem that occured in a way that is standardized accross web services.
protected abstract  java.lang.String formatSearchString(java.lang.String[] search_strings)
          Returns a query formatted based on the settings of this superclass' fields and the formatting conventions of the particular search service used by the implementing subclass' particular search service.
 java.lang.String getHTMLFormattedSearchResults(java.lang.String[][] search_results, int start_rank, java.lang.String total_hits, java.lang.String query_used, java.lang.String service_name)
          Takes the results produced by a NetworkSearch subclass search and formats them into an HTML page, which is returned.
abstract  long getNumberHits(java.lang.String[] search_strings, java.lang.String[] query_used)
          Returns the number of hits for a query containing the given search strings, where the search is a boolean AND of the strings in the entries of the search_strings parameter.
 long getNumberHits(java.lang.String[] search_strings, java.lang.String[] query_used, int allowed_attempts)
          Returns the number of hits for a query containing the given search strings, where the search is a boolean AND of the strings in the entries of the search_strings parameter.
 long getNumberHits(java.lang.String search_string, java.lang.String[] query_used)
          Returns the number of hits for a query containing the given search strings.
abstract  java.lang.String getSeachServiceName()
          Returns the name of the web services used by the implementing class.
abstract  java.lang.String[][] getSearchResults(java.lang.String[] search_strings, int start_index, int max_results, java.lang.String[] number_hits, java.lang.String[] query_used)
          Returns the top results for a query containing the given search strings, where the search is a boolean AND of the strings in the entries of the search_strings parameter.
 java.lang.String[][] getSearchResults(java.lang.String search_string, int start_index, int max_results, java.lang.String[] number_hits, java.lang.String[] query_used)
          Returns the top results for a query containing the given search string.
abstract  java.lang.String getSearchServiceLimitations()
          Returns the specific limitations of this web service in the context of all of the search parameters available to the NetworkSearch class and its subclasses.
protected abstract  java.lang.Object prepareSearcher(java.lang.Object searcher)
          Returns an object used to perform searches and/or configures an existing search Object, based on the particular web services system in question.
 void setCountryResultsMustBeIn(java.lang.String country)
          Set a specific country that sites must be in in order to be included in search results.
 void setFileTypeResultsMustBelongTo(java.lang.String file_type)
          Sets the file extension a document must have in order to be returned as a hit in search results.
 void setIncludeSimilarButNonMatchingStrings(boolean include_similar_but_non_matching)
          Sets whether results returned by search queries performed by this object may include hits that do not contain one or more of the specified search string(s) but do contain terms very similar to them (e.g.
 void setLanguageResultsMustBeIn(java.lang.String language)
          Set a specific language that hits must be in in order to be included in search results.
 void setLiteralSearch(boolean literal_search)
          Sets whether all search queries performed by this object should be literal searches (e.g.
 void setOrBasedSearch(boolean or_based_search)
          Sets whether search queries performed by this object need only contain one of the specified query words in order to result in a hit.
 void setRegionToSearchFrom(java.lang.String country)
          Set the name of a country where the search will be performed (i.e.
 void setSearchStringsToExclude(java.lang.String[] strings_to_exclude)
          Sets strings to exclude in all search queries performed by this object.
 void setSpecificSiteToSearch(java.lang.String specific_site)
          Set a specific network site that should be exclusively searched in all search queries performed by this object.
 void setSuppressAdultContent(boolean suppress_adult_content)
          Sets whether to suppress hits that are classified as containing adult content by the search service in question.
 void setSuppressSimilarHits(boolean suppress_similar_hits)
          Sets whether to suppress similar hits when reporting results.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

literal_search

protected boolean literal_search
Whether all search queries performed by this object should be literal searches (e.g. for the query "heavy metal", hits must have the two words adjacent if the search is literal). This is also sometimes known as an exact search or a phrase search. The default set by the constructor is true.


or_based_search

protected boolean or_based_search
Whether search queries performed by this object need only contain one of the specified query words in order to result in a hit. If this is true, then only one of the query strings must be present. If this is false, then all of them must be present (although not necessarily in the specified order, unless the literal_search field is true). The default set by the constructor is false.


include_similar_but_non_matching

protected boolean include_similar_but_non_matching
Whether results returned by search queries performed by this object may include hits that do not contain one or more of the specified search string(s) but do contain terms very similar to them (e.g. alternative spellings). The default set by the constructor is false.


strings_to_exclude

protected java.lang.String[] strings_to_exclude
Strings to exclude in all search queries performed by this object (i.e filter strings). Search hits may not contain these filter strings. These excluded strings are treated as literal (i.e. must appear in the same order). A value of null (the default set by the constructor) means that no strings are excluded. No entries may be null.


specific_site

protected java.lang.String specific_site
Network site that will be exclusively searched in all search queries performed by this object. A value of null means that the entire available network should be searched. This is the default set by the constructor.


limit_to_language

protected java.lang.String limit_to_language
Name of a language that hits must be in in order to be included in search results. The language must be one of the terms in the included_languages field. A value of "No Limitations" means that any language is permissible. A value of null is not permitted. The default value set by the constructor is "No Limitations".


limit_to_country

protected java.lang.String limit_to_country
Name of a country that sites must be in in order to be included in search results. The country must be one of the terms in the included_countries field. A value of "No Limitations" means that sites in any country are permissible. A value of null is not permitted. The default value set by the constructor is "No Limitations".


region_to_search_from

protected java.lang.String region_to_search_from
Name of a country where the search will be performed (i.e. where the search service is located). Results are not limited to this country. The country must be one of the terms in the included_countries field. An entry of "No Limitations" causes the default service location to be used. A value of null is not permitted. The default value set by the constructor is "No Limitations".


limit_to_file_type

protected java.lang.String limit_to_file_type
A file extension a document must have in order to be returned as a hit in search results. The file type must be one of the terms in the included_file_types field. An entry of "No Limitations" means that file type will not by used to filter results. A value of null is not permitted. The default value set by the constructor is "No Limitations".


suppress_similar_hits

protected boolean suppress_similar_hits
Whether to suppress similar hits when reporting results. Similar in this context means either:

The default value set by the constructor is false.


suppress_adult_content

protected boolean suppress_adult_content
Whether to suppress hits that are classified as containing adult content by the search service in question. The default value set by the constructor is true.


included_languages

public static final java.lang.String[] included_languages
The languages that may be specified in searches.


included_countries

public static final java.lang.String[] included_countries
The countries that may be specified in searches.


included_file_types

public static final java.lang.String[] included_file_types
The file types that may be specified in searches.


GOOGLE_SOAP_CODE

public static final int GOOGLE_SOAP_CODE
The code used to identify searches using the GoogleWebSearch class that extends this class.

See Also:
Constant Field Values

YAHOO_REST_CODE

public static final int YAHOO_REST_CODE
The code used to identify searches using the YahooWebSearch class that extends this class.

See Also:
Constant Field Values
Constructor Detail

NetworkSearch

public NetworkSearch()
Creates a new instance of NetworkSearch with fields set to defaults.

Method Detail

setLiteralSearch

public void setLiteralSearch(boolean literal_search)
Sets whether all search queries performed by this object should be literal searches (e.g. for the query "heavy metal", hits must have the two words adjacent if the search is literal). The default set by the constructor is true. Literal searches are also sometimes referred to as exact or phrase searches.

Parameters:
literal_search - Whether or not literal searches are to be performed.

setOrBasedSearch

public void setOrBasedSearch(boolean or_based_search)
Sets whether search queries performed by this object need only contain one of the specified query words in order to result in a hit. If this is true, then only one of the query strings must be present. If this is false, then all of them must be present (although not necessarily in the specified order, unless the literal_search field is true). The default set by the constructor is false.

Parameters:
or_based_search - Value to set field to.

setIncludeSimilarButNonMatchingStrings

public void setIncludeSimilarButNonMatchingStrings(boolean include_similar_but_non_matching)
Sets whether results returned by search queries performed by this object may include hits that do not contain one or more of the specified search string(s) but do contain terms very similar to them (e.g. alternative spellings). The default set by the constructor is false.

Parameters:
include_similar_but_non_matching - Value to set field to.

setSearchStringsToExclude

public void setSearchStringsToExclude(java.lang.String[] strings_to_exclude)
                               throws java.lang.Exception
Sets strings to exclude in all search queries performed by this object. (i.e. filter strings). Search hits may not contain these filter strings. Note that these will each be treated literally regardless of whether the search strings are treated literally.

A value of null (the default set by the constructor) means that no strings are excluded. No entries may be null.

Parameters:
strings_to_exclude - Unformatted strings that will be used to filter searches.
Throws:
java.lang.Exception - Throws an informative exception if strings_to_exclude is invalid.

setSpecificSiteToSearch

public void setSpecificSiteToSearch(java.lang.String specific_site)
Set a specific network site that should be exclusively searched in all search queries performed by this object. A value of null means that the entire available network should be searched. This is the default set by the constructor.

Parameters:
specific_site - The network site to search.

setLanguageResultsMustBeIn

public void setLanguageResultsMustBeIn(java.lang.String language)
                                throws java.lang.Exception
Set a specific language that hits must be in in order to be included in search results. The language must be one of the terms in the included_languages field. A value of "No Limitations" means that any language is permissible. A value of null is not permitted. The default value set by the constructor is "No Limitations".

Parameters:
language - The language that results must be in, or "No Limitations".
Throws:
java.lang.Exception - Throws an informative exception if the language is invalid.

setCountryResultsMustBeIn

public void setCountryResultsMustBeIn(java.lang.String country)
                               throws java.lang.Exception
Set a specific country that sites must be in in order to be included in search results. The country must be one of the terms in the included_countries field. A value of "No Limitations" means that sites in any country are permissible. A value of null is not permitted. The default value set by the constructor is "No Limitations".

Parameters:
country - The cournty that results must be in, or "No Limitations".
Throws:
java.lang.Exception - Throws an informative exception if the country is invalid.

setRegionToSearchFrom

public void setRegionToSearchFrom(java.lang.String country)
                           throws java.lang.Exception
Set the name of a country where the search will be performed (i.e. where the search service is located). Results are not limited to this country. The country must be one of the terms in the included_countries field. An entry of "No Limitations" causes the default service location to be used. A value of null is not permitted. The default value set by the constructor is "No Limitations".

Parameters:
country - The cournty where the search will be performed, or "No Limitations".
Throws:
java.lang.Exception - Throws an informative exception if the country is invalid.

setFileTypeResultsMustBelongTo

public void setFileTypeResultsMustBelongTo(java.lang.String file_type)
                                    throws java.lang.Exception
Sets the file extension a document must have in order to be returned as a hit in search results. The file type must be one of the terms in the included_file_types field. An entry of "No Limitations" means that file type will not by used to filter results. A value of null is not permitted. The default value set by the constructor is "No Limitations".

Parameters:
file_type - The file type where the search will be performed, or "No Limitations".
Throws:
java.lang.Exception - Throws an informative exception if the file_type is invalid.

setSuppressSimilarHits

public void setSuppressSimilarHits(boolean suppress_similar_hits)
Sets whether to suppress similar hits when reporting results. Similar in this context means either:

The default value set by the constructor is false.

Parameters:
suppress_similar_hits - Whether to suppress similar hits.

setSuppressAdultContent

public void setSuppressAdultContent(boolean suppress_adult_content)
Sets whether to suppress hits that are classified as containing adult content by the search service in question. The default value set by the constructor is true.

Parameters:
suppress_adult_content - Whether to suppress adult content.

getSearchResults

public java.lang.String[][] getSearchResults(java.lang.String search_string,
                                             int start_index,
                                             int max_results,
                                             java.lang.String[] number_hits,
                                             java.lang.String[] query_used)
                                      throws java.lang.Exception
Returns the top results for a query containing the given search string. The search is subject to the NetworkSearch superclass' field settings.

Parameters:
search_string - The string to base the query on. The query is subject to the conditions of the NetworkSearch superclass' field settings. The argument passed to this parameter should not contain any special formatting.
start_index - The index of the first hit to be returned. A value of 1 refers to the highest ranked hit (there is no index 0). If this index exceeds the available number of hits then no hits are returned.
max_results - The maximum number of results to return. This imposes an upper maximum on the size of the returned array.
number_hits - A dummy array of size 1 that is filled with the number of hits for the specified query by this method. Ignored if null.
query_used - A dummy array of size 1 that is filled with the actual search query constructed and used by this method. Useful for debugging. Ignored if null.
Returns:
A matrix containing the search results. The first dimension of the matrix corresponds to each hit returned by the search, in the same order as they are returned/ranked. The second dimension specifes different types of information about each corresponding hit, as follows:
  • Entry 0: The document title.
  • Entry 1: The document URL.
  • Entry 2: The document summary.
Throws:
java.lang.Exception - Throws an informative exception if a problem occurs.

getNumberHits

public long getNumberHits(java.lang.String search_string,
                          java.lang.String[] query_used)
                   throws java.lang.Exception
Returns the number of hits for a query containing the given search strings. The search is subject to the NetworkSearch superclass' field settings.

Parameters:
search_string - The string to base the query on. The query is subject to the conditions of the NetworkSearch superclass' field settings. The argument passed to this parameter should not contain any special formatting.
query_used - A dummy array of size 1 that is filled with the search actual query constructed and used by this method. Useful for debugging. Ignored if null.
Returns:
The number of hits for the specified search string and corresponding field settings of the NetworkSearch superclass.
Throws:
java.lang.Exception - Throws an informative exception if a problem occurs.

getHTMLFormattedSearchResults

public java.lang.String getHTMLFormattedSearchResults(java.lang.String[][] search_results,
                                                      int start_rank,
                                                      java.lang.String total_hits,
                                                      java.lang.String query_used,
                                                      java.lang.String service_name)
Takes the results produced by a NetworkSearch subclass search and formats them into an HTML page, which is returned. This returned page shows the total number of hits, details (name, URL and description) on a selected number of these results, the query used and some important qualifications and limitations of the specific web service used.

Parameters:
search_results - A matrix containing the search results. The first dimension of the matrix corresponds to each hit returned by the search, in the same order as they are returned/ranked. The second dimension specifes different types of information about each corresponding hit, as follows:
  • Entry 0: The document title.
  • Entry 1: The document URL.
  • Entry 2: The document summary.
start_rank - The numerical ranking corresponding to the first hit.
total_hits - The approximate total number of hits found by the search service.
query_used - The query used by the search service. Does not include directly aspects of the query that were directly parametrized into the search service's web services.
service_name - The name of the web service used to perform the search.
Returns:
The HTML-formatted search results.

getSeachServiceName

public abstract java.lang.String getSeachServiceName()
Returns the name of the web services used by the implementing class.

Returns:
The implementing web services called.

getSearchServiceLimitations

public abstract java.lang.String getSearchServiceLimitations()
Returns the specific limitations of this web service in the context of all of the search parameters available to the NetworkSearch class and its subclasses.

Returns:
Limitations of the specific web service. This is formatted as an unnumbered HTML list.

getSearchResults

public abstract java.lang.String[][] getSearchResults(java.lang.String[] search_strings,
                                                      int start_index,
                                                      int max_results,
                                                      java.lang.String[] number_hits,
                                                      java.lang.String[] query_used)
                                               throws java.lang.Exception
Returns the top results for a query containing the given search strings, where the search is a boolean AND of the strings in the entries of the search_strings parameter. The search is subject to the NetworkSearch superclass' field settings.

Parameters:
search_strings - The strings to base the query on. The query consists of a boolean AND or OR of all entries of this array in addition to the conditions of the NetworkSearch superclass' field settings. The arguments passed to this parameter should not contain any special formatting.
start_index - The index of the first hit to be returned. A value of 1 refers to the highest ranked hit (there is no index 0). If this index exceeds the available number of hits then no hits are returned.
max_results - The maximum number of results to return. This imposes an upper maximum on the size of the returned array.
number_hits - A dummy array of size 1 that is filled with the number of hits for the specified query by this method. Ignored if null.
query_used - A dummy array of size 1 that is filled with the search actual query constructed and used by this method. Useful for debugging. Ignored if null.
Returns:
A matrix containing the search results. The first dimension of the matrix corresponds to each hit returned by the search, in the same order as they are returned/ranked. The second dimension specifes different types of information about each corresponding hit, as follows:
  • Entry 0: The document title.
  • Entry 1: The document URL.
  • Entry 2: The document summary.
Throws:
java.lang.Exception - Throws an informative exception if a problem occurs.

getNumberHits

public abstract long getNumberHits(java.lang.String[] search_strings,
                                   java.lang.String[] query_used)
                            throws java.lang.Exception
Returns the number of hits for a query containing the given search strings, where the search is a boolean AND of the strings in the entries of the search_strings parameter. The search is subject to the NetworkSearch superclass' field settings.

Parameters:
search_strings - The strings to base the query on. The query consists of a boolean AND or OR of all entries of this array in addition to the conditions of the NetworkSearch superclass' field settings. The arguments passed to this parameter should not contain any special formatting.
query_used - A dummy array of size 1 that is filled with the search actual query constructed and used by this method. Useful for debugging. Ignored if null.
Returns:
The number of hits for the specified search strings and corresponding field settings of the NetworkSearch superclass.
Throws:
java.lang.Exception - Throws an informative exception if a problem occurs.

getNumberHits

public long getNumberHits(java.lang.String[] search_strings,
                          java.lang.String[] query_used,
                          int allowed_attempts)
                   throws java.lang.Exception
Returns the number of hits for a query containing the given search strings, where the search is a boolean AND of the strings in the entries of the search_strings parameter. The search is subject to the NetworkSearch superclass' field settings. This query is submitted up to allowed_attempts times if it unsuccesful (i.e. if the web service request throws an Exception). If the number of unsuccesful attempts allowed exceeds allowed_attempts then the user is presented with a dialog box providing the opportunity to either continue trying or end processing. Processing is paused until the user makes a selection. If the user choooses to cancel, then the original Exception generated by the request is thrown. If the user chooses to continue then another allowed_attempts attempts are made.

Parameters:
search_strings - The strings to base the query on. The query consists of a boolean AND or OR of all entries of this array in addition to the conditions of the NetworkSearch superclass' field settings. The arguments passed to this parameter should not contain any special formatting.
query_used - A dummy array of size 1 that is filled with the search actual query constructed and used by this method. Useful for debugging. Ignored if null.
allowed_attempts - The number of unsuccesful attempts allowed before the user is notived.
Returns:
The number of hits for the specified search strings and corresponding field settings of the NetworkSearch superclass.
Throws:
java.lang.Exception - Throws an informative exception if a problem occurs.

formatSearchString

protected abstract java.lang.String formatSearchString(java.lang.String[] search_strings)
                                                throws java.lang.Exception
Returns a query formatted based on the settings of this superclass' fields and the formatting conventions of the particular search service used by the implementing subclass' particular search service.

This method should be called internally be each search method implemented by the subclass before searches are actually performed.

When possible, configurations are implemented by the prepareSearcher method instead in order to restrict query length, which is limited by some web services.

Parameters:
search_strings - Search phrases to include in the search. The search will combine these terms using a boolean AND or OR. The arguments passed to this parameter should not contain any special formatting.
Returns:
A formatted copy of the query. Note that some configurations are performed by the prepareSearcher method instead, and will not be reflected here.
Throws:
java.lang.Exception - Throws an informative exception if search_terms is null or one of its entries is null.

prepareSearcher

protected abstract java.lang.Object prepareSearcher(java.lang.Object searcher)
                                             throws java.lang.Exception
Returns an object used to perform searches and/or configures an existing search Object, based on the particular web services system in question. The configuration is performed based on the search settings stored in the fields of the superclass NetworkSearch object. The modified searcher object is returned.

This method should be called internally be each search method implemented by the subclass before searches are actually performed.

Some search settings will sometimes be incorporated directly into the query string by the formatSearchString method instead of here, based on the nature of the web services system in question. However, when possible, configurations are implemented here in order to restrict query length, which is limited by some web services.

Parameters:
searcher - The Object to perform the search with, as defined by the particular web services system in question. May be null if it is created by this method.
Returns:
The configured searcher object.
Throws:
java.lang.Exception - Throws an informative exception if a problem occurs

formatErrorMessage

protected abstract void formatErrorMessage(java.lang.Exception exception,
                                           java.lang.String query,
                                           int max_results)
                                    throws java.lang.Exception
Takes in an exception and then throws a new Exception that identifies the problem that occured in a way that is standardized accross web services.

This method can alternatively be calleded with null passed to the exception parameter in order to check the acceptability of the configurations stored in the fields of the NetworkSearch object that will be used to perform searches.

Parameters:
exception - An exception to be formatted. May be null if this method is being used to check field and max_results compatibility.
query - The query that generated the exception. May be "" if not applicable
max_results - The maximum number of hits that searches are configured to return.
Throws:
java.lang.Exception - An exception indicating the problem that occured in a descriptive way that is standardized accross web services that implement this method. The message stored in this new exception must identify the search service that generated the Exception and it must be suitable for display in an error dialog box.