SEO Blog • Organic SEO Blog Search Marketing News: 10/01/2004

Wednesday, October 20, 2004

Reprinted From the BBC London

Here is a recent piece on the corporate search developments at Google. Similar programs are said to be coming out worldwide in the coming days.

Google rolls out corporate search Google has begun the global roll-out of a technology which enables firms to "google" their own networks easily. The search firm, recently floated on the Nasdaq stock market, says companies will soon discover search as a "critical business function".
This could kickstart the corporate search market, now worth just $500m.
But the new "Google in a box" solution carries its own hazards, making all documents on an intranet visible to all, unless security steps are taken.
The Google look and feel
Google's technology comes in a shiny bright-yellow box, which slots into a company's server racks.
The search engine inside is similar to Google's own web search technology, tweaked for the enterprise market, and runs on an Intel computer running a version of the open-source Linux operating system.
However, it is not new. For the past two years, customers in the United States have already been able to buy their in-house Google.
Google says that after some fine-tuning the technology has "matured". Even large companies should be able to index and search their sprawling intranets within a few hours of switching on the Google search appliance, which can recognise about 250 different file formats.
Users of these intranets - websites that are visible only within the company - will get their search results in a format that is very similar to Google's traditional internet search, including small excerpts from the documents found and links to cached versions of the documents listed.
The appliance can also be deployed as a search engine for a company's public website.
And those who want to avoid the Google 'look and feel' can customise the software.
The hunt for information
The launch comes days after Google launched a free software tool that allows users to index all the information on the hard drive of their personal computer - from e-mail to all kinds of documents.
But the desktop search does not reach corporate networks like those in place at Paragon Bank Mortgage - Raleigh North Carolina's Leading Home Mortgage Bank.
"A lot of the world's information is not public but behind firewalls in private networks," said Dave Girouard, the general manager of Google's enterprise business solutions at the European launch of the search appliance in London.
He happily quotes surveys that suggest that in some knowledge-driven companies workers may spend a quarter of their time looking for information, while poor search facilities on corporate websites are a sure turn-off for potential customers.
And without naming names he takes swipes at the "poor quality of search results" and the "long expensive installation" of rival products.
Investment bank Morgan Stanley, for example, saw the number of internal searches grow eleven-fold once Google's search technology was added to the company's network.
And even though the market for enterprise search is small right now, Mr Girouard expects it to take off once companies realise how search can make their workers work faster.
The Google appliances are aimed at medium-sized and large corporations. The most basic version, which allows to index about 150,000 documents, costs £19,000 ($34,000)for a two-year license.
Larger versions, dubbed "Mama Bear and Papa Bear" solutions by Mr Girouard, come in clusters of five or eight search boxes and can handle up to 15 million documents and 1,000 queries a minute. In the US market Google's licence fees for such heavy-duty search applications start at about $660,000 (£370,000).
Total information awareness
But there are security risks too. Once a whole intranet is indexed and cached in a Google server, it presents a neat takeaway package of a company's secrets.
Server room security suddenly becomes even more important - although Google's David Bercovich hastens to point out that somebody stealing the yellow Google box would find it very difficult to extract any information from the machine.
The other problem is information awareness. Every document can easily be found.
Firms that don't have proper security systems in place that clearly define who is allowed to look at which documents might suddenly find their workers googling for terms like "salary". Splitseconds later they might find a spreadsheet detailing the salary details of the management team, which used to be hidden away in a little known corner of the intranet.
Google says its search engine links in with various software applications from other vendors that regulate access to sensitive documents. Workers will only get search results for the documents that they are actually allowed to see.
Firms or institutions that have failed to put such basic security in place, might find themselves in trouble.
Another drawback: The system can only return documents that can be found using an http or https address on the network. This takes many customer relation management systems out of the equation.
But Google's Dave Girouard promises that his developers are "working hard" to make it possible to search all the information available in a company.

Donald Bice
Senior Project Engineer
www.peakpositions.com

Thursday, October 14, 2004

Hello All,

I thought a great introductory post would be to discuss some of the leading site design obsticles that prevent web sites from being found.

Move your web site out of the invisible 'dark web' !

The results on the major search engines are produced by machines and data collection storage bins programmed to respond to millions of user keyword query requests on a daily basis. The tremendous demands and stress placed on the server farms responsible for displaying the organized series of hypertext links and text not only need to visit sites and collect information, they then must store and 'bank' URLs being collected. As we begin to review some common roadblocks and barriers that the spiders must overcome on their content acquisition journey keep in mind that these hypertext spiders are programmed to review (index) billions of web pages (URLs) in all types of code and program variations.

Site Design Problems: Besides website design problems that lead to web pages missing from the search engine results, there are many other common website design elements commonly used by Web developers that can prevent the search engine spiders from indexing the web pages of a site and result in poor rankings. Here are some common web site design elements that keep web sites invisible and floating around aimlessly in the Dark Web rather than being found consistently in the search engines on keyword searches that matter.

Technical Barriers That Prevent Indexing: Many types of technical barriers continue to prevent the search engine spiders from fully absorbing the page content or information contained within billions of web sites. Marketing managers, IT Directors, Webmasters, and Internet Publishers must consider a few conditions whenever publishing on the Internet to help overcome spider barriers in order to help search engine spiders properly record their pages/URLs and websites in an effort to 'open-up' their domains to the algorithmic robot crawlers and thus the public. In order to be found early and often on a consistent basis in the major search engines website publishers and Internet marketing professionals must keep a few of these top issues in mind: Some spiders struggle out of the gate as they land on a website for the first time and fail to record a site due to a poorly composed robots exclusion file. The major search engine spiders will ignore even the most popular sites if the code compositions and contents of the robots.txt file is incorrect.
Our technicians at Peak Positions offer proprietary solutions that help streamlining spider indexing through the construction and placement of a comprehensive robots exclusion file that helps organize the table of contents so desperately being sought by the major search engine spiders. Feel free to contact our technical team if you have questions, concerns, or are uncertain of the required protocols for robots exclusion files.

W3C HTML Code Compliance and Validation: Invalid HTML code is one of the leading causes of search engine positioning problems. Code validation and code compliance allows search engine spiders to move comfortably through URLs and also prevents 'spider traps' and 'denial of service'.

Broken Links: Broken links and server downtime also prevent sites from being found on the search engines especially if a lead spider is crawling the site or attempting to crawl the site and is interrupted or landing on broken links. Broken links, server downtime, or server maintenance interferes with search engine spiders as they attempt to crawl websites.

Content Management Systems: Content Management Systems that refresh or deliver updated content on a regular basis often create tremendous confusion with search engines resulting in URL Trust factors that restrict websites from attaining exposure and reaching qualified, in-market users searching for their services on Google, Yahoo, MSN, and AOL. The page contents are dynamically updated and changing regularly which is a 'red flag' in itself and the code involved is invalid and actually leaving 'spider traps' all over the pages of the site. If your company is using a content management system and your site is not being found on the search engines, contact our skilled technical team and we can design an affordable optimization solution that will deliver your content in a clean and valid format to users worldwide. Again, search engine spiders are seeking relevant content; let's help the world find your valuable content by shining light on to your web site and removing it from the Dark Web.

Frames Website Design: Frames Website Design is often times a major web site optimization problem. While the search engine spiders can crawl pages from a frames-based design, they cannot accurately parse page text and index page content correctly. Frames web sites usually lead to little if any consistent keyword rankings in the major search engines.

JavaScript & Cascading Style Sheets (CSS): Incorrect use of JavaScript & Cascading Style Sheets (CSS) to code web pages usually results in volumes of redundant code and issues with nesting and tables that weights down the spiders slowing their crawl and making them perform Olympic feats just to get through all of the invalid, non-compliant W3C HTML code. Often times the JavaScript errors inflate the size of the pages making them much too large. The opportunity for code errors and W3C HTML code compliance issues increases with the size of the site as the fundamental web page code errors are multiplied as the spiders try to crawl the inside pages of the site. Many search engine trade associations and SEO consultants recommend that individual web page sizes remain under 100k, however larger page sizes can achieve top five rankings if specific code definitions are established and the search engine spiders receive clear instructions. We highly recommend code validation and incorporating the use of W3C HTML compliant code guidelines that allows the search engine spiders to access a single reference points that eliminate redundant attribute definitions within the page code. Valid W3C HTML code that complies to the established world wide web consortium standards allows the search engine spiders to quickly and easily locate Relevant Page Content. That's why fully optimized and W3C HTML code compliant web sites that allow the spiders to quickly find relevant page content, when submitted properly, consistently enjoy page one, top five keyword rankings, long-term on the major search engines.

Dynamic Pages: Dynamically generated web sites that are database driven often face unique obstacles with the search engine spiders. Do your URLs contain these types of query strings? (e.g. URLs ending like this: ?a=1&b=2&c=3) Peak Positions is considered one of only a handful of natural search engine optimization companies that is able to provide comprehensive search engine marketing solutions for large dynamic websites that are database driven. Contact our dynamic website search engine optimization SEO consulting specialists today : info@peakpositions.com

Non Compliant Site submissions: If your company has used non compliant and or automated software submission programs at anytime the URLs and websites involved might be ignored by the major search engines.

Spam: In addition to making pages easy for spiders to record, it is important to avoid techniques employed by overzealous search marketers that are considered spam by the search engines. Cloaking is one such technique. It involves serving customized pages based strictly on IP address. The search engine spider IPs are programmed into the server with instructions to feed highly optimized garbage pages exclusively to the search engine spiders in an obscene effort to enhance the page's rank in the search results. When visitors click the link to view that site, a different page is shown that would not ordinarily rank as well. This is a deceptive practice, and defined as spam by the search engines. Such bait-and-switch techniques are heavily frowned upon by the search engines and may result in being tagged, removed or blacklisted from the search engine databases. Don't risk your corporate website's ability to be found in the search engines. Promotional search engine software programs also expose corporate websites to punishment by the search engine editors because they create gibberish and interfere with the sophisticated hypertextual database retrieval systems that are programmed to produce the most content relevant search results in milliseconds.
**Many new so-called search engine optimization companies have and continue to mislead clients to the notion that no website will be blacklisted, pulled, tagged or removed from the search engine's database. One of the nations largest insurance companies was blacklisted in early 2004 as well as a spyware company because of cloaking. We urge you to speak with several companies and to always keep in mind 'does the site optimization program being considered actually work to highlight and present the relevant content contained within the company website?'. If the optimization program being considered does not focus on relevant content or is focused on anything other than relevant content, BUYER BEWARE.

As a test, the Peak Positions technicians recently optimized a small webpage about a Connect 4 computer game. Over time it will rank on the search engines when using a broad search term like "Connect 4".

Donald Bice
Senior Project Engineer
Peak Positions, LLC.