Details

Added by on April 21, 2017

0 Flares Filament.io 0 Flares ×

Microsoft Azure provides a rich canvas for data science research and execution.

Two of the key components of the data science toolkit are Azure Notebooks and the Azure Machine Learning service. In this presentation, Nick will introduce the core concepts of data science and show how a data set can be imported, analysed and used as the basis for a predictive model using these two Azure components.

Azure Notebooks are an implementation of the Jupyter notebook technology (formerly IPython notebooks) that allows code and markdown to be created in a browser document for interactive execution. Jupyter notebooks attach to a back-end kernel for code execution, and can support much greater computational loads and richer functionality than client-side Javascript technology.

By supporting the mixing of documentation and live code, Azure Notebooks are an ideal technology for conducting and sharing data exploration exercises, and solve a number of issues that occur when analysis and the documentation are handled separately. Azure Notebooks currently support Python 2, Python 3, R and F#.

Azure Machine Learning is part of the Cortana Intelligence suite, and allows for the interactive creation of experiments supporting a wide range of machine learning algorithms. The machine learning experiments can be evaluated for effectiveness, and easily turned into web service endpoints for machine learning prediction based on live data.

This presentation assumes no prior experience with machine learning, and will introduce the topics for a traditional software developers perspective.

About the speaker(s)

Mr Data Science
Nick is a solution architect at SSW with a 20 year career in software engineering primarily focussed on large scale software projects in the financial industry. He has written a number of books and dozens of article on .NET, and has been awarded the MVP award by Microsoft for the .NET platform, C# and C++. He has a keen interest in data science, and sees the possibility of great productivity gains by achieving a deeper melding of the traditional software development world with emerging data science disciplines.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

1 Comment

  • DavidLean 1 month ago

    Hi Nick,
    Nice talk.
    I note the top 5-7 most popular talks are generating a significant percentage of your data points.
    This could easily have the effect of masking any trends you might find in the rest of the data you examine. eg: Subscriptions on Quiet vs active weeks.

    In these circumstances it is often useful to filter them in to their own data set. Then do your analysis on the 3 different sets of data. 1. The total data set, 2, The extremely popular talks, & 3. Data without the extremely popular talks. Similar to using 2 or more sets of training data.

    Another alternative is to add an attribute to each talk. Call it “session popularity” & create some discrete levels (very popular, popular etc) Then run it thru an Automatic Cluster detection algorithm to see if is it a significant factor in some trends / groups but not in others.

    I appreciate you did acknowledge my comment on the night & said the data was there if I wanted to do it. Unfortunately I’m a little busy at present. Given that you and Adam are looking to make business decisions around this in the short term. It may be faster from you to do the analysis.
    Cheers