Building a Search Engine for Code
April 06, 06 by kenrichI was faced with a problem with searching our source code recently. Since we moved our hosting to a third party, we no longer have shell access that would allow us to search all our source code for the occurance of certain text. This makes it hard to track down common mistakes such as mis-spellings on the site.
In order to correct this, I have come up with a specialized search engine for the source code on the site. I am developing a crawler which will index every single file on the website and place the index into our database. I will then build a search page that will allow me to search for keywords in the source code of the site. This is all written using Classic ASP also known as Active Server Pages.
Another solution for this would be to just maintain a local backup of the entire site and then use your favorite search tool to search that content. But, even though I do keep a local backup, it’s only in one location and I certainly don’t want to keep a backup in two locations. The bandwidth usage in this scenario could easily get out of control.
With the search engine, I can easily build a search index of all of the content on the site and search it quickly and efficiently. The search engine keeps track of when each script is modified so that when a script is changed, the engine knows it needs to rebuild the index. Since source code also involves a lot of symbols, I will need to look into integrating symble search as well, but as it stands it should be a great tool for locating problem code on the site.