Scalable Architectural Design
Inout Spider is a scalable spider software. You may start with just one server and can later scale to multiple servers, as your database grows. We are using a highly efficient and powerful Distributed File System (Hadoop) and database (Hypertable) for providing the scalability. It helps you to start small and dynamically expand to any number of servers.
Hypertable & Hadoop at the core
is a high performance distributed file system, currently supported by Apache. Hypertable
is scalable, non-relational database works on top of Hadoop which can handle Terra-bytes of data without affecting the performance. Hypertable works very similar to Google's BigTable (Google's Own Database, which is the core DB of all Google services). These two open source technologies give us great flexibility and reach.
XML Search Output
Inout Spider provides search results as XML instead of HTML results. It gives you great flexibility over the results. You may parse these XML results using your own PHP/ASP.NET/JSP program. Also, Inout Search Engine work seamlessly with Inout Spider to expose all the features/capabilities of this, by just configuring the Spider install path in Inout search engine admin area. If you want to write your own program instead of using Inout Search Engine, we are providing a free XML parser documentation to describe the Spider XML output.
Inout Spider web search always find the best matched results available in your database for the given keywords. Right now, Inout Spider can handle Web, Image and News searches.
Inout Spider is one of the finest and rarest spider software's available on the internet which provides image search. Inout Spider's crawler will crawl images from the added domains while crawling the HTML pages and it tags various keywords and other parameters related to the images. The spider saves a thumb shot of the image in the database. The dimensions of the thumb shots can be configured by the administrator.
News search is another type of search which works like web search, but here the results are taken only from the domains which are added as news domains. The data corresponding to these news domains are updated frequently so that the most updated contents will be displayed as the results of news search.
Unlimited Custom Result Channel
Apart from web, image and news results, you may create a number of search channels like Script Search, Soccer Search, Wikipedia Search etc, with the help of domain sets and categories.
Inout Spider allows you to define a domain set in your spider admin area. Each page the Spider Bots crawl, will be verified against the domain sets, and if it finds a match, it will tag the page to the corresponding domain set. You may later filter/retrieve your web/image/news results based on a domain set. It will help you to create a service (with a group of websites) specific search channel.
Similar to domain sets, Inout Spider allows you to define categories from the admin area. You may define the keywords related to a category so that if the Spider bot finds a match with the keyword, it will tag the page to the corresponding category. You may later filter your web/image/news results based on the categories to create your desired channel.
Intelligent Result Identifier System
Upon a user query, the result identifier finds out the best matched results from millions of pages. Since the Result Identifier System works very fast, it will fetch the required results as output within milliseconds. At the same time, this system will not compromise with the high level of quality and accuracy of the output expected in any way.
Inout Spider allows you to configure family filter setting from your admin area to perform screening. Family filter is easy to manage and it will help you to retrieve results based on the family filter condition you want.
Inout Spider determines a page rank for each page. Also, the Spider crawls are based on many factors like incoming/outgoing links, page depth, domain priority etc.
API keys protect your spider data from unauthorized access. It also helps you to authorize/sell search data access to external parties. You may also view the statistics based on API keys which helps you to track the search history and performance from each API key.
Upon a search query, the Query Analyser [Please see the tutorial] module of Inout Spider will check the database for a result cache. If it finds a result there, it will immediately get the results back to the requester. If a pre-calculated recent result is not available already, 'result generator' will quickly identify and generate the results from the pages crawled, store in cache database, and send back to the requester. Result caching will help you to get instant search results for popular keywords.
Although the result cache feature helps to get faster results, we cannot always fetch results from cache, considering the chance to get outdated results. Because of this, different cache periods can be set up and by default it is set as 1 day for news, 7 days for web and 45 days for images. It will make sure that the results delivered to the clients are always updated frequently as the time interval expires.
Global/Individual Domain Depth Control
Inout Spider helps you to control the crawling process by specifying the domain page depth limit. You can specify the settings globally for all domains or specifically for some selected domains. It will help you to make sure that the spider resources are utilized the way you want it to be.
Seamless Integration with Inout Search Engine
Inout Spider is designed to be compatible with all third party search engines however, Inout Search Engine
is designed to integrate and work with Inout Spider seamlessly. This pairing is recommended if script customization is part of your requirement.
Inout Spider allows you to run your script on different crawling mods. You may specify whether you want to let the crawler to find and manage new domains or the administrator need to approve the domains first before crawling.
Global/Individual Control for Automated Crawling
With Inout Spider you can configure the controlled/uncontrolled domain identification and global/selected domains fetching. This feature will help you to have more specific control on the spider and to acquire the desired result quality.
Inout Spider identifies the meta tags and related keywords of each page while crawling. Upon a user query, these top keywords can be retrieved along with each result. It helps you to find out the keyword tags, related keywords etc.
Inout Spider keeps track of each and every search query request it receives, along with all parameters related to the query. It also tracks the API key associated with each query. It helps you to grab all the statistical details related to the search queries.
This helps you to know your entire spider status. It will show you total domains count, total pages count, crawled pages count etc.
Multi-language and multi-domain setup
Allows users to submit their domain names for crawling.
New Domain identification during crawl process
Thumb shot size and width specification
View the list of identified keywords from different keyword clusters.
View the keyword logs along with the constraints
Limit page search by depth
Restrict domains post-crawling and avoid them in results
Database Search for domains.
Parse XML results using your own PHP/ASP/.NET/JSP program.
Automatically remove the old cache results to maintain freshness.
Unlimited Domains & APIs
Index unlimited amount of domains and build a vast valuable database ready for business. Create as many APIs as needed for search engines.