- This event has passed.
Kamanja: A New Open Source Real-Time System for Scoring Data Mining Models
July 27, 2015 @ 6:30 pm
Greg Makowski, Director of Data Science, LigaDATA
*** Bring ID (e.g. Driver’s License) for eBay Security ***
6:30 Doors Open, Food & Networking
*** Please arrive by 7 PM due to Security ***
This talk will start with a number of complex data real-time use cases, such as a) complex event processing, b) supporting the modeling of a data mining department and c) developing enterprise applications on Apache big-data systems. While Hadoop and big data has been around for a while, banks and healthcare companies tend not to be early IT adopters. What are some of the security or roadblocks in Apache big data systems for such industries with high requirements?
Data mining models can be trained in dozens of packages, but what can simplify the deployment of models regardless of where they were trained or with what algorithm? Predictive Modeling Markup Language (PMML), is a type of XML with specific support for 15 families of data mining algorithms. Data mining software such as R, KNIME, Knowledge Studio, SAS Enterprise Miner are PMML producers. The new open-source product, Kamanja, is the first open-source, real-time PMML consumer (scoring system). One advantage of PMML systems is that it can reduce time to deploy production models from 1-2 months to 1-2 days – a pain point that may be less obvious if your data mining exposure is competitions or MOOCs. Kamanja is free on Github, supports Kafka, MQ, Spark, HBase and Cassandra among other things. Being a new open-source product, initially, Kamanja supports rules, trees and regression.
I will cover an architecture of a sample application using multiple streams of open source data, such as social network campaigns and tracking sentiment for the bank client and its competitors. Other real-time architectures cover credit card fraud detection. A brief demo will be given of the social network analysis application, with text mining.
An overview of products in the space will include popular Apache big data systems, real-time systems and PMML systems.
LigaData will be sponsoring pizza and salads from The Garret restaurant in Campbell.
Greg Makowski is the Director of Data Science at LigaDATA, a Series A funded startup offering both big data / data mining consulting, as well as supporting big data open source systems discussed above. Greg has deployed about 90 data mining models since 1992. For a previous position, he brought in R, SAS Enterprise Miner and Zementis, a PMML Consumer to deploy production data mining models for a department. Greg has prototyped and deployed four new enterprise applications with embedded data mining, and worked in a variety of verticals including financial services, fraud detection, web behavior, retail supply chain and targeted marketing.
Event page provided by ACM