The task
The online shop of a large Internet retailer is visited by several million people every day. All these people generate search queries, traces of their navigation and purchase transactions. Up to now, our customer has stored most of this information in relational databases. However, these are not designed to handle such a large volume of unstructured raw data from e commerce – and the licensing costs of the databases are enormously high. Our customer’s goal was therefore to design a database solution that was optimally suited for this application.
The added value
By switching to the new Hadoop platform, very large amounts of raw data can now be stored in a fail-safe and performant retrievable manner. In addition, this solution has massively reduced the operating and hardware costs of data storage.
The solution
From the outset, the focus was on a commercial Hadoop distribution, as this system offers the best performance for companies with enormous data volumes. With the right version of the framework we ensured support for problems in everyday operation, and a suitable query tool made it easy for users to switch to the new system.
Based on test data and queries, the distributions and tools from WidasConcepts were compared with each other and checked for handling, performance and stability. In addition, we evaluated the file format Parquet for the sensible storage of the files. Finally, in cooperation with the departments, we evaluated all wishes for the new platform and considered them according to the objectives.
The implemented technologies
Cloudera CDH, MapR Hadoop