Our Client, one of the global players in online food ordering domain, wanted to change the way users search for restaurants. Currently, the website offers restaurant and zip code based searches to filter results. In order to stay competitive and improve user engagement, search behavior had to be enhanced with product based searches. Also from the end user’s perspective, the burden of filtering restaurants by cuisine and then by items, is overwhelming. Wouldn't be easier for the user to just type in what he/she is craving for, then the search returns the relevant restaurant list?
That is when kreuzwerker was mandated to create a proof of concept and demonstrate the benefits of using Elasticsearch to improve search feature's usability.
The company manages restaurants and products by serving them from a relational database that hosts around 12+ million products from restaurants worldwide. With no unique naming convention for products, there were .approx 3+ million redundant products with varied spelling changes. For example, "Pizza Margharita" and "Pizza Margarita” are semantically the same, but spelled differently with two product ids belonging to two different restaurants.
The challenge was to group products that are semantically close enough, so the search returns relevant results even with misspellings in the input text. Since the company has its operations spread across countries, the search is not just confined to one language. The database holds the global restaurant data, which would mean the strings we look for might contain language specific letters and translations. This doesn’t end with just product searches, the user should also be able to search by restaurant name if the searched term matches a restaurant.
On the ETL part, extracting relevant product data from the relational database, transforming and indexing Elasticsearch had to be done by defining appropriate Logstash configuration and document mappings for efficient search performance. As this could become the next big feature in our Client's product pipeline and also by keeping scalability in mind, we decided to introduce modern software tools and workflows to deliver the feature our Client has envisioned.
The architecture that we developed for the ETL process uses Logstash, which is an open source, server-side data processing pipeline that ingests data from various sources, transforms and sends it to the defined Elasticsearch index. Apart from collecting data from different systems, Logstash does something even more important: it normalizes different schema to one single format, which can be fed to the Elasticsearch index. The relevant data from the database was analyzed and queries were defined within the Logstash configuration and AWS Redshift JDBC driver was used to connect to the hosted database. Reimporting everyday database changes was done by setting appropriate relational queries that pull the latest data and the operation was automated to keep the Elastic index in sync with the primary database.
In Elasticsearch (ES), every document was identified by its product id and had all relevant details indexed for the search. Since the product data was not confined to one language, multi-language and phonetic analyzers were used to improve search relevancy. Also, in order to facilitate wildcard searches, ngram tokenizer was implemented in the document mapping. In case of changes or additions in the product detail, modifications can be easily applied to the ES index with an upsert operation. This makes sure that the update happens only if the document already exists, identified by the product id, otherwise indexing will perform an insert of the newly found document.
After structuring the query DSL to extract restaurants that offer the searched product based on a zip code, a frontend UI was developed in React JS to visualize the new search feature. The implementation can be extended in a way that users can create custom queries based on the multiple product attributes and metrics, such as distance to a restaurant, delivery time, rating etc.
Finally, terraform was chosen to provision the entire stack on AWS (Amazon Web Services) and the pilot project was conducted.
Along with the proposed architecture, a new global search for products and restaurants was introduced with improved features, which positively affects the user engagement. The ETL process using Logstash mirrors, the live database, and the Elasticsearch data will always be slightly relative to the new changes to the database. Also, the ES document structure representing products is completely denormalized to improve fast querying of results. Even though the project was envisioned as a proof of concept, the entire architecture was designed to consider scalability in mind, and is already ready for a production.
Being part of the next big iteration of the product makes us proud, and we are pleased that we have realized and demonstrated the advantages of using AWS Elasticsearch Service as the primary search engine.