Data Science Camp 2016

Silicon Valley Data Science Camp 2016


Saturday, October 29, 2016 – 8:15am6:30pm




2200 Mission College Blvd

Santa Clara, CA

Event Details

Data Science Camp is SF Bay ACM’s annual event combining sessions, keynote, and optional tutorial (extra fee). It’s an excellent opportunity to learn about Data Science and connect with others, and we keep it near-free ($10 charge, includes lunch & coffee), now running in its eighth year.  You can also sign up as part of a group of 2-6 people for $8 per person.  The morning class is $60 and includes the afternoon camp.


“Deep Learning for the 99 percent”
Rajat Monga, Engineering Director of the Google TensorFlow group.  
A big goal with open sourcing TensorFlow has been to bring deep learning to every organization. Deep Learning has had a big impact at Google with improvements across a range of products including Photos, Voice Recognition and Search. The TensorFlow community is taking these same models from research to engineers across the world. This talk will cover some real wins in the community, along with practical lessons on how deep learning can be applied to the common problems in every organization.


Deep Learning with TensorFlow
Junling Hu, Founder, 

This 2-hour tutorial will give you an introduction to deep learning and TensorFlow. We will introduce the fundamental concepts of deep learning, including Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). Then we will introduce TensorFlow and its basics. You will get hands-on experience with TensorFlow, building a convolutional neural network in the class. (see below for more details)



Additional session topics are invited and may include (but are not limited to):
  • Models: Deep Learning, xgboost, clustering, training, deployment, feature engineering
  • Domains/verticals: Big Data, e-commerce, fraud detection, search, NLP/ontologies, trading/finance, Bitcoin, IT security, healthcare, environmental
  • Tools and technologies: Spark/MLib, R, Python, PMML, Hadoop, GPU
  • Related areas: Visualization, Data Engineering, Career Opportunities, Hiring Roundtable et al.
Session proposals are welcome from both individuals and companies. Please consider volunteering to speak or recommending people. You may submit your own session ideas on our Session Submission Form.
Here are some of the current submissions for 2016:
Speaker Name Session Title:  Description
Simon Frid Estimators – Managing and Versioning Machine Learning Models in Python:

Open-source libraries like scikit-learn, StatsModels and TensorFlow have made it very easy for developers and data scientists to implement cutting-edge algorithms in sandbox environments. However, most companies have production environments where there is a constant flow of new data, breaking changes with new product releases, and a need to use the best model for business purposes. Over the years, developers have built out sophisticated Continuous Integration systems to help version and manage their demanding production environments, but data science systems like these are up-and-coming. In this session, we will explore some open-source tooling and strategies that can help manage your real-world machine learning needs.

Boris Galitsky An NLP Tool for Efficient Content Compilation  (presented remotely from Moscow – Greg facilitates live Q&A):

We build a tool to assist in content creation by mining the web for information relevant to a given topic. This tool imitates the process of essay writing by humans: searching for topics on the web, selecting content fragments from the found document, and then compiling these fragments to obtain a coherent text. The process of writing starts with automated building of a table of content by obtaining the list of key entities for the given topic extracted from web resources such as Wikipedia.  For background, see the PDF paper, A Tool for Efficient Content Compilation

Greg Makowski Lifecycle of Model Management: 

Discuss the lifecycle, from selecting and describing the best model, to putting it in production, to recognizing it needs replacement and how much effort to put into the replacement. You don’t have to trade-off: accuracy, generalization and description, you can have all three. For model description, sensitivity analysis and LIME – Local Interpretable Model-Agnostic Explanations (from KDD 2016) will be discussed.

Vanja Paunić Scalable computing in R on Spark (using Microsoft libararies, covered in KDD 2016)

My colleague Mario Inchiosa and I (Vanja Paunic) will present on scalable computing in R. A very rough outline on what we planned is:

– R Server vs. base R comparison

– Demo 1: a) Data wrangling with SparkR, b) Modeling with ScaleR, c) Deployment operationalization using AzureML

– Demo 2: Parameter optimization in grouped time series forecasting with R Server on HDInsight cluster

Robert Benson IIoT Analytics (aka Digital Twin):

Robert Benson of Mitek Analytics will provide an overview of the Industrial Internet of Things (IIoT) field, IIoT Analytics, and digital twin. Rich Dost of General Electric Digital will provide an overview of developing digital twin analytics and an overview of the General Electric Predix platform for IIoT. Digital twin is a relatively new term that has become roughly synonymous with IIoT analytics. We will also provide an opportunity for people interested in IIoT and IIoT analytics to meet and learn from each other.

Athanasios Ladopoulos Personalizing Education using AI & Deep Learning:

Building an intelligent learning platform that learns how students learn and then teaches them back in the way they learn better and faster.

Dr. Iman Saleh Running Models and their Applications in Production (an overview of the Trusted Analytics Platform, or TAP):

In this session, Dr. Iman Saleh of Intel will lead a discussion about how models are created and deployed for production and made available to the applications who consume them. As a part of the discussion, Iman will give an overview of the Trusted Analytics Platform (TAP), an open source platform, for building and deploying analytics solutions. The platform hosts machine learning scripts, models and the applications that use them. This session will include both presentation and discussion.

Ling Yao Open Source Model Development, Training & Productionizing:

In this session, Ling Yao of Intel will give an overview of the open source frameworks and libraries that Intel uses to develop and train models as well as putting them into production. Included will be a discussion about how Jupyter Notebookss and other code environments can be used to develop and test models using data processing frameworks such as Spark, GearPump, file system HDFS, databases such as Postgres, Cassandra, Redis and algorithms such as ARIMA for time series analysis, while exposing the resulting model to production applications. This session will include a 15 minute presentation followed by a 35 minute discussion.

Michal Wroczynski Classification vs. Information Extraction in AI – The different approaches in the quest for language understanding
Waleed Abdulla Traffic Sign Recognition with TensorFlow:

A tutorial for those who know the basics of machine learning but want to learn how to apply it to a practical problem. The chosen problem here is to recognize traffic signs in images taken from a moving car.

Sujee Maniyam Effective Graphs with R:  

R as a fantastic language for data analytics.  It als has pretty amazing graphing capabilities.  This session will introduce graphing capabilities of R.  Come prepared with RStudio installed, so you can also practice along.

Robert M Horton Introduction to ROC curves:

“Receiver Operating Characteristic” or “ROC” curves are an important and widely used tool for characterizing the performance of binary classifiers. This introductory/intermediate level presentation will help you build a better intuitive understanding of the ROC curve as a way to visualize the tradeoffs between sensitivity and specificity. Using examples and simulations, we will show how these curves are constructed, and develop a better understanding of what they are telling us. We’ll examine a variety of “funny looking” ROC curves to see what they reveal about the relationships between model predictions and observed outcomes, and contemplate AUC (the area under the curve) as a way to summarize an ROC curve in a single number (and explore some cases where AUC turns out to be a poor metric for model selection.)

Jessie invites one or more to speak about How is “real Machine Learning” done at giant tech companies?  (we won’t have this session without a speaker volunteer)

Machine learning is a hot buzz word these days, but there has not been much info re. how serious machine learning has been done aside from the giant companies like Google, Apple, Facebook, and Amazon – info from these companies is often sketchy as well. I am happy to help moderate. But the key of making this session a success is if 3-4 people step up to share how they do large scale machine learning for real applications/systems

Invite panelists A Data Scientists hiring manager view – how to find, interview, hire and retain Data Scientists?  (Will seek panelists during session proposals)

Greg will ask for a show of hands for people who have hired, or are looking to hire, data scientsts, data mining or big data staff.  Greg can be a moderator, asking initial questions about challengers the hiring manager has, eliciting audience questions for the panel, and the panel can also ask the audience.  Possible questions may include: Q) What is a distribution of how you find candidates – HR, external recruiter, networking, LinkedIn, meetups?   Q) How would you advise people who want to move into data science?  Q) How much do hiring manager focus on finding a “unicorn”, fitting the most overlapping requirement circles vs. building a team with different skills?  Q) for the audience – if you are an experienced DS, what would you want to ask or tell hiring managers?

Bill Paserman Data Science and Quantitative Finance:

What tools does data science provide the quant?    Can a Neural Network help calculate parameters for established valuation methods?  Funds based on Sentiment analysis have had little general success, but have made money in special situations (e.g.  Brexit).  What other examples are there?  In this session participants will discuss the success of Data Science-based quant approaches they have used or heard about.

Bob Kirby Knowledge Representation Goals and Requirements:

Knowledge Representation choices drive the general purpose computing needed for Artificial Intelligence.  Classes of use cases derive requirements for Knowledge Representation.  A normative evaluation of Knowledge Representation considers goals and those requirements regardless of the inspiration for its client algorithms, such as studying the brain or noticing what smart people do.  Knowledge Representation is evaluated for associative (sub-symbolic) approaches and logic-based approaches.  Issues for crowdsourcing, natural language, and uncertainty get added attention.

Here are more details for our 2016 Session submissions.
See an example of our 2015 topics for more examples.


(subject to change)
8:15 Tutorial registration
8:50 – 10:45 Tutorial:Deep Learning with TensorFlow ($60), by Junling HuSee below for class description
10:30 Unconference Registration ($10)
11:00 Unconference Kickoff, welcome and Gold+ sponsor presentations
Keynote: Rajat Monga, Engineering Director of the Google TensorFlow group.  “Deep Learning for the 99 percent”
12:25 Session proposals, voting and assignment time slots and to rooms sized to match the voting
(Inviting proposals now) – see some of the existing proposals above.
1:15 Lunch, posting of session matrix
2:00 Session time slot 1 (~ 4-6 concurrent sessions)
3:00 Session time slot 2 (sessions are 50 minutes, with a 10 minute break between sessions)
4:00 Session time slot 3
5:00 Session time slot 4
6:00 Wrap up, invite session attendees to share highlights so you can hear about sessions you missed.
Optional morning tutorial ($60 includes Unconference):

Deep Learning with TensorFlow

This 2-hour tutorial will give you an introduction to deep learning and TensorFlow. We will introduce the fundamental concepts of deep learning, including Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). Then we will introduce TensorFlow and its basics. You will get hands-on experience with TensorFlow, building a convolutional neural network in the class.

The outline of the tutorial

  1. Introduction to deep learning (15 min)
  2. Brief intro to Recurrent Neural Network.(15 min)
  3. Convolutional Neural Network.(30 min)
  4. TensorFlow 101 (15 min)
  5. Hands-on exercise (45 min)

You will have access to an AWS machine where TensorFlow is pre-installed by the instructor. You will access to the machine through your browser, where you can run TensorFlow (in iPython notebook). The data and sample lab code are pre-installed.

Who this course is for:  This course is for anyone who is interested in introduction to deep learning and TensorFlow.

Prerequisite of this course:  We will use Python in this tutorial. Programming experience with Python is highly recommended. You can still follow the tutorial and our sample code in case you don’t have Python experience.

What to bring:  Your Laptop, charged for 2 hours

Instructor bio:  Junling Hu, a leading expert in artificial intelligence and data science, founder of, has built natural language application based on deep learning. She is the author of an upcoming book on artificial intelligence and deep learning. Prior to her current effort, she was Director of Data Mining at Samsung, building large-scale data mining solutions for business. She has also led data science teams in PayPal and eBay, creating enterprise solutions based on recommender systems, text mining and predictive modeling.

Venue Sponsor


Reimagine Your Business with Big Data Advanced Analytics

Now is the time to invest in the right technology foundation for your data center—one that will take you from data overload to data insights. Intel is enabling the cost-effective adoption of big-data-and-analytics-driven business models that fuel innovation.

With Intel® architecture-based advanced analytics solutions, you can efficiently and effectively capture, process, analyze, and store vast amounts of data of all types. Built in partnership with industry leaders in big data and analytics software, our highly available, performance-optimized, open-standards-based solutions will support your most ambitious analytics-driven initiatives. 

Platinum Sponsor

 UCSC Extension in Silicon Valley UCSC Extension

We offer an accredited, convenient, and attractively priced alternative to degree programs, serving the advanced professional education needs of Silicon Valley and beyond. Each year, more than 10,000 adults who live and work in the greater South Bay area study here to earn University of California certified credentials that are widely recognized in a range of industries. We are the region’s leading educator of professionals in more than 40 areas of expertise that are in high demand among Silicon Valley employers.


KDD provides the premier forum for advancement and adoption of the “science” of knowledge discovery and data mining. KDD encourages:

  • Research in KDD (through annual research conferences, newsletter and other related activities)
  • Adoption of “standards” in the market in terms of terminology, evaluation, methodology
  • Interdisciplinary education among KDD researchers, practitioners, and users
  • KDD activities include the annual  Conference on Knowledge Discovery and Data Mining  and the  SIGKDD Explorations Newsletter
  • The KDD 2017 conference will be in Halifax, Canada on August 14-17, 2017
DOBE Logo DOBE China

Creativity and innovation represent China’s future productivity. The development of cultural, creative and technological innovation enterprises will dominate the development of Chinese economy. “Smart Company”, featuring light, flexible and ambitious, are targeted by DOBE as long-term serving aims. Core-competencies of “smart company” can be summarized as light assets, small sizes, high intelligence, creative and innovative thinking.

The creation of “Smart Circle” intends to connect smart companies together for their further development. DOBE believes that both “fresh” bushes and “giant” trees should be involved in the forest of economy. Small but exquisite mode, advantaged technology or creative thought are crucial factors for smart companies to stand out.