Data transferred through SFTP, ingested into IIP. Data models & Rules design – based on Hive & Spark SQL models were set-up for all rules and for each country (27x2 for initial release in 6 weeks & 46rules x 17 countries for second release in 12-14weeks)
Configuration table feature of IIP was leveraged to provide finance super user community to change the parameters to incorporate country specific filter-in/out criteria which may change over time.
End-to-end data flow within IIP is completely automated through schedulers
Simple & intuitive Hawk-Eye Cockpit built for business users with color-coded outputs, and click-drill-download option to help easy focus & action during time critical month-end
Enhanced UI was delivered in second release, consisting of download-to-spreadsheet, graphical view of short-term and long term trends, search & filter on different parameters.
Purpose of addressing data quality issues through the month vs only during month-ends is already being served with Active daily use by 75% end-users in markets; 100% key users touch at least once a week. Flexibility to Super-Users to make country specific configuration changes without lengthy IT-Change Management process
Infosys Information Platform, which powers HawkEye, is being actively considered by client stakeholders for other key needs. Real-time analytics to identify Sales Order gap identification just before stock-pick-and-deliver operations.
Machine learning models were developed in Apache Spark to predict high propensity customers by sales territory to prioritize sales force action. Customer profiles and monthly transaction records were ingested into Hadoop and processed via Hive meta store and Spark SQL. An offline model on sample data to benchmark/validate results was constructed. Tableau was used for visualizations.
The Spark machine learning library was used to construct a model using logistic regression. This prediction took around 7 seconds with 2+million records demonstrating the usefulness of Spark's in-memory computing paradigm for near real-time analysis.
Customers who have high propensity to buy specific products and services were identified to help cross-sell and up sell.
A solution based on an industry leading visualization tool coupled with open source technology stack was used for real-time predictive analytics thus saving cost.
A 5-node AWS cluster was provisioned and installed. IIP was configured for out–of-stock analysis as a PoC project. Sample data from retail stores for 5 weeks at item and register level (5.7 million rows) was ingested into HDFS using IIP's Data Ingestion tools.
The IIP toolset enabled creation of on-demand schemas (models) to access data as well as data wrangling. The historical probability at item, store, and register level was calculated and prediction was run using R based binomial and geometric distribution model to predict out-of-stock.
A visual dashboard was developed in Tableau to demonstrate out-of-stock probability by store/ by hour/ by day/ by item. To demonstrate scalability PoS data was simulated for a retail store chain for 5 years (350 million rows).
The data was ingested to Oracle and the visualization was done using Tableau.
From no capability to business insights in just 3 weeks, IIP enabled the client to validate business hypothesis quickly, along with a needful environment to run multiple experiments at the same time for existing data in the platform. An end-to-end PoC project (from AWS cluster provisioning, IIP installation to delivering final insights) was delivered in 3 weeks.
For out-of-stock analysis, IIP demonstrated the ability to drill down to details of the stores with potential out-of-stock to granular level of store, item and register level. It had the ability to filter by various store and item attributes for analyzing the out-of-stock issue.
PoS data from fuel and convenience stores and master data feeds extracted from source systems were loaded into Hadoop/ HDFS using IIP's self-intuitive data Ingestion UI features. Overall 12.5 million rows were ingested in less than 4 minutes.
Leveraged the IIP Data Explorer GUI toolset to model on-demand schemas (based on Spark SQL queries at the backend) to extract data from HDFS and to perform arithmetic calculations; calculated values were pushed to HIVE tables. Materialized views on POS category, subcategory and item level were populated in 30 seconds on average for 1.5 million records.
A visual dashboard was developed in Tableau to demonstrate various insights such as profit by Item level, profit contribution margin (percentage), sales variance(percentage) etc. Data retrieval using Spark was achieved in 0.32 seconds on an average.
A Big Data lake and an advanced analytics environment were implementedbased on IIP that enabled client to derive business insights quickly, within weeks.
IIP Demonstrated that it can augment the EDW by processing huge data sets and combining or joining data from disparate systems as required to fulfill all the needs for intermediate data lake for a detailed level reporting. This eliminates the need for a separate data mart which was required otherwise; and at a substantially reduced cost.