A Smarter Process for Sensing the Information Space, Scott Spangler, IBM Almaden

Posted September 3rd, 2010 by Paul O'Rorke and filed in Announcement, DM SIG Meeting

Posted September 3, 2010 by Paul O’Rorke

LOCATION: LinkedIn, 2025 Stierlin Ct, Mountain View, CA 94043

Date: Monday November 22, 2010; 6:30 pm 6:30 – 9:00 pm (6:30 – 7:00 networking & snacks; 7:00 – 7:10 announcements; 7:10+ presentation, Q&A)

Cost: Free and open to all who wish to attend, but membership is only $20/year. Anyone may join our mailing list at no charge, and receive announcements of upcoming events.

Speaker: W. Scott Spangler, IBM Almaden Research Center

Title: ”A Smarter Process for Sensing the Information Space”

Abstract:

With the growth of the internet the size of the information space is increasing exponentially. But more information is not always better. Furthermore, as the complexity of business relationships increases, there is a natural tendency towards less structured interaction. This highlights the growing relevance of unstructured information in documenting the interactions of organizations and individuals. Analyzing and making sense of this unstructured information space requires more than text mining algorithms, it requires a strategic approach.

While every information analysis situation is somewhat unique, we propose a unified approach that addresses a wide variety of information space analytics problems. Our method for making sense out of unstructured data is described by six steps that are analogous to the algebraic order of operations, PEMDAS. These basic text mining operations can be combined in many interesting ways to handle a diverse set of problems, and just as in algebra, it is critical that these operations be performed in the correct order to guarantee a meaningful result. In this talk, I describe how PEMDAS has been implemented within smart organizations to enable decisions that produced substantial and quantifiable business value.

Bio:
W. Scott Spangler IBM Research Division, Almaden Research Center, 650 Harry Road, San Jose, CA 95120 (electronic mail: spangles@almaden.ibm.com) Scott Spangler is a Senior Technical Staff Member and Master Inventor at the IBM Almaden Research Center. He has been doing knowledge base and data mining research for the past 20 years. Since coming to IBM in 1996, Scott has developed software components for data visualization and text mining, which are available through eClassifier, Business Insights Workbench, COBRA and SIMPLE service offerings. Scott holds a Bachelors in Math from MIT and a Master in Computer Science from the University of Texas. Scott holds 22 patents and has authored 24 conference/journal publications as well as a book entitled, Mining the Talk: Unlocking the Business Value in Unstructured Information.

Learning when Concepts Abound – Omid Madani, SRI AI Center

Posted September 3rd, 2010 by Paul O'Rorke and filed in Announcement, DM SIG Meeting

Posted September 3, 2010 by Paul O’Rorke

LOCATION: LinkedIn, 2025 Stierlin Ct, Mountain View, CA 94043

Date: Monday September 27th, 2010; 6:30 pm 6:30 – 9:00 pm (6:30 – 7:00 networking & snacks; 7:00 – 7:10 announcements; 7:10+ presentation, Q&A)

Cost: Free and open to all who wish to attend, but membership is only $20/year. Anyone may join our mailing list at no charge, and receive announcements of upcoming events.

Title: Learning when Concepts Abound

Abstract:

Categorization is fundamental to intelligence. Without categories
(concepts or classes), every experience would be new, and we couldn’t
make sense of our world. We humans also require numerous concepts for
our increased sophisticated intelligence. From a practical perspective,
in some of today’s applications, such as text categorization, image
tagging, and word prediction, the number of classes can easily exceed
tens of thousands. A number of applications can benefit from
scalable learning under a huge number of classes.

In this talk, I will briefly go over supervised learning, in
particular multiclass learning. I will then present the approach of
learning a sparse feature-to-class mapping, or index learning. The
crucial property in efficient index learning is constraining each
feature to connect to (predict) a relatively small number of classes.
Online updating and classification take time that is almost linear in
the number of features of a given instance. I will touch on a number
of update techniques and related approaches. While our primary driver
has been scalability and simplicity, we have observed that
classification accuracies remain competitive or better when compared
to a number of other approaches, while we obtain speed up of orders of
magnitude. I will discuss applications to several tasks.

Bio:

Omid Madani is a senior computer scientist at the Artificial
Intelligence Center of SRI International. He is interested in all
aspects of intelligence and mind, as well as algorithms design and
analysis. His current research revolves around the themes of
large-scale learning and data mining, including learning in the
presence of myriad concepts, online learning, and unsupervised
learning, in particular exploring and engineering systems that learn
their own many concepts (computational development). In the 2009
European PASCAL Challenge on Large-Scale Hierarchical Text
Classification, with just over 12k classes, his team’s approach
obtained top rankings from among 18 participants. He has
successfully applied learning techniques to a number of information
retrieval applications.

Omid obtained a PhD in computer science from the University of
Washington in 2000 (thesis topic: Computational Complexity of Markov
Decision Processes). After a brief period in the industry, he went
back to academia as a postdoc at the University of Alberta, and then
back to the industry, as a senior research scientist at Overture and
then Yahoo! Research, before joining SRI. He was awarded the Alberta
Ingenuity Associateship while in Alberta. He is a life-time member of
the Association for Advancement of Artificial Intelligence (AAAI), and
a member of the Association for Computing Machinery (ACM), and the
Cognitive Science Society.

web: http://www.omadani.net

DMSIG – Charting SearchLand: Search Quality for Beginners August 23, 2010

Posted May 8, 2010 by Patricia Hoffman, PhD

LOCATION: LinkedIn, 2025 Stierlin Ct, Mountain View, CA 94043  

Date: Monday August 23, 2010; 6:30 pm 6:30 – 9:00 pm (6:30 – 7:00 networking & snacks; 7:00 – 7:10 announcements; 7:10+ presentation, Q&A)

Cost: Free and open to all who wish to attend, but membership is only $20/year. Anyone may join our mailing list at no charge, and receive announcements of upcoming events.

Speakers: Valeria de Paiva PhD, Cuil, Inc.

Title: “Charting SearchLand:
Search Quality for Beginners”
Continue Reading »

DM SIG Google Prediction API: Machine Learning as a Service on the Cloud on July 26

Posted July 26th, 2010 by Paul O'Rorke and filed in DM SIG Meeting

LOCATION: LinkedIn, 2025 Stierlin Ct, Mountain View, CA 94043

DATE: Monday July 26, 2010; 6:30 – 9:00 pm    (6:30 – 7:00 networking & snacks;   7:00 – 7:10 announcements;  7:10+ presentation, Q&A)

COST: Free and open to all who wish to attend, but membership is only $20/year.  Anyone may join our mailing list at no charge, and receive announcements of upcoming events.

SPEAKER:  Max Lin, Google  Research

Continue Reading »

ACM Data Mining Camp, November 13, 2010

Posted June 29th, 2010 by TriciaHoffman and filed in ACM Meeting, Announcement, Conference, DM SIG Meeting

WHAT is an UNCONFERENCE or CAMP?

An unconference is an event where users suggest topics, get together and discuss them in detail. This camp is focused on Data Mining, Analytics, Cloud Computing, Machine Learning, and the various applications of these technologies. There is an option to join the SF Bay ACM for $20 per year. Our last Data Mining Camp had 380 participants.

DONORS:

DMSIG – Advances in Ensemble Learning from the $1,000,000 Netflix Prize Contest 6/28/10

Posted June 20th, 2010 by GregMakowski and filed in DM SIG Meeting

LOCATION: LinkedIn, 2025 Stierlin Ct, Mountain View, CA 94043 (Note -  a minor change in the address, from 2027 –> 2025)

DATE: Monday June 28, 2010; 6:30 – 9:00 pm    (6:30 – 7:00 networking & snacks;   7:00 – 7:10 announcements;  7:10+ presentation, Q&A)

COST: Free and open to all who wish to attend, but membership is only $20/year.  Anyone may join our mailing list at no charge, and receive announcements of upcoming events.

Continue Reading »