Trade transactions data is pulled from Enterprise Oracle database into IIP using Sqoop and loaded into HDFS. Necessary processing logic and business rules were applied on the data to eliminate improper trades. Data deduplication, data validation, data enrichment and data derivation operations were also applied on the data to complete the data processing and analysis.
Report was generated and the data from the regulatory report was pushed back to Oracle. Various analytics reports can be displayed by Tableau.
PoC was generated with data volume of 596 million trade transactions on a 100-node cluster in AWS stack. Message insertion rate into IIP was achieved at 130,000 records/ second or 18.22 MB/ second. Report execution was done in 35 seconds for 30,000 trade records and corresponding 120,000 trade line items including end-to-end processing.
Bank currently needs 10 minutes on an average to report a trade. In 15% of the instances the 15-minute window has been overshot, which attracts a penalty. With IIP, the end-to-end processing and reporting just takes 35 seconds, avoiding the non-compliance and related penalties to the bank. IIP's solution is elastic and scalable supporting future increase in trade volumes with minimal cost increase to the bank.
For the PoC, 100 GB of data was generated which comprised 12 months of data for 530 million credit card accounts. 10 node cluster in AWS stack with 16 vCPU and 30 GB memory per node was used to process the data.
It took 11 minutes to process end–to-end data on IIP based on Hadoop and Apache Spark.
The processed output data was graphically depicted by different charts based in Tableau; with needful integration of in memory schema RDDs in IIP/ Apache Spark with Tableau.
The data was ingested to Oracle and the visualization was done using Tableau.
Analytics like standard deviation of outstanding balance, trend analysis of various balances, credit utilization ratio etc. was performed on a huge amount of retail data relatively much faster and in a cost effective manner, based on in-memory processing paradigm of IIP.
The ability to store historical data of 100 TB in Hadoop and move 20 TB of data in Spark for query and analysis gives the right mix of technology in this case.
The ability of Tableau to read data directly from Spark enables users to visualize data quickly without having another semantic or data storage layer.
The data set included master data and mapping info along with monthly sales inventory and costs. Lookups and aggregation of data was done during data load.
The ingestion and report processing and generation was done on a 2 core, 16 GB RAM, 2 node on premise cluster. Data volume of 19 million records was ingested into IIP in 6 minutes.
Reports for sales performance, variance between actuals and budgets, sales distribution, cost and sales comparison by category/period/geography/region was generated after the data load. Graphical features, geo maps and drill down functionalities were available in the reports. Report generation took less than 20 seconds for 19 million records.
Reports were visualized in Tableau with real-time analytics on the data.
Large month-end data feeds processed in near real-time and data loading and reporting was handled by IIP. Data load performance improved by 600 times vis-a-vis the current system using IIP. Reports were generated 60 times faster than the current system with millions of records in the backend.
Overall solution offered has substantially improved client's price performance ratio.