Choosing the right partitions keys in the customer dimension is the new outside in for customer segmentation analytics.
Today in big data, when it comes to designing dimensions in Hive, there are 2 aspects big data practitioners (architects/engineers) need to pay attention to:
- Selecting the optimal partitions keys
- The correct insert (insert/insert overwrite) mechanisms. [We will cover this in another post]
Selecting the Optimal Partition Keys on a Customer Dimension becomes Segmentation Analytics’ BFF
Enduring performance on segmentation of customer data (conformed attributes, outrigger attributes, interactive attributes) has always been the norm. As more and more customer data can be stored at low cost, the ability to segment it faster for event measurement and prediction is the new normal. The underlying foundation for performance is the selection of the right partition keys on the customer dimension.
Customer Segmentation Use Case: Measuring success of a product promotion by direct/indirect marketing to a customer
Notable dimensions during modeling for this type of segmentation needs:
- A conformed customer dimension
- Promotion dimension
- Channel dimension
- Discount dimension
- Store dimension
- The cannot-do-without date dimension
Measuring the success for a promotion is a crosswalk, not just a straight fact-joining measurement event. Some of the outriggers that take place:
- Joining customer-by-location, to a promotion-by-location. See figure below: