Chapter 3 Design Enterprise Search Engine An enterprise search engine is used to search content from multiple sources (databases, files, and intranet) within the organization. Components of a search engine are as follows: • Processing: Diversified data loaded from different sources will have different formats. This component processes the incoming documents to plain text and normalizes to improve precision and recall value, which includes stemming, that is, reducing words to their stem such as “texting” becomes “text”; lemmatization, which is the process to reduce the word into its base dictionary word such as “studies” becomes “study”; part of speech tagging, etc. An analyzer is used to analyze data and give back meaningful terms or words. • Indexing: Processed text is stored in an index, which is used for quick lookup and will be handled by indexer. The dictionary contains an index of all unique words as well as information about their ranking. • Query processing: A user from the web application executes the query. The query is broken into terms and operators using a query parser and analyzer. • Matching: The processed query is compared with the stored indexes in the dictionary. Now, we look at an enterprise search engine in detail, based on Apache Lucene and its technology stacks: Elastic Stack and Solr Stack. Apache Lucene is able to achieve fast search responses because it searches indexes instead of searching whole text. • Elastic Stack: The Elastic technology stack has multiple components available to build solutions on top of it, such as Beats, Logstash, Elasticsearch, and Kibana, as shown in Figure 3-29, are as follows: • Beats collects the data and parses it and pushes it to Elasticsearch for log analysis. Log analysis can be achieved using Logstash and Kibana. • Logstash can connect to a variety of sources such as Web API, social services, IoT sensors, and databases and data streams like Kafka or Redis, which collect the data and pipeline to Elasticsearch. 109
Building Digital Experience Platforms Page 128 Page 130