Tutorials, Invited Talks

Usually, ICDATA hosts several tutorials and invited talks focused on data science topics, such as Conformal Prediction, Big Data Analytics etc.

As soon as tutorials/talks are negotiated with speakers and confirmed, they will be published here.

All workshops, tutorials etc. at CSCE are free for all CSCE attendees.

The following talks/tutorials were held in 2019:

Tutorial 1

SpeakerAndrew Johnston
Mandiant (Consultant)
Topic/TitleTutorial on "What's Yours is Mine: How Modern Attackers are Stealing Your Data"
Date & TimeJuly 30, 06:00 - 08:00pm
LocationGalleria B
DescriptionThis talk explores the state of modern cyberattack techniques against well-secured assets. Using examples gained from real-world compromises, modern attack techniques and tactics are explored with an emphasis how attackers evade technical defenses and law enforcement. We will also explore how advanced attack groups such as nation states evade "next-generation" defenses that utilize machine learning and anomaly detection. Although attackers are getting more sophisticated, recommendations for securing large organizations and personal systems will be presented and discussed.
Short BioAndrew Johnston is a proactive consultant with Mandiant, a division of FireEye. He utilizes real-world attacker tools and techniques to identify weaknesses in enterprise security to identify flaws before the attackers can find them. Prior to joining Mandiant, Andrew worked with the FBI in the Cyber and Counterterrorism divisions. Andrew has a Bachelor's degree from Fordham University in Computer Science and Applied Mathematics and is pursuing a Master's degree from Fordham University in Cybersecurity.
SlidesWill be published here as soon as they are provided by the speaker.

Tutorial 2

SpeakerUlf Johansson
Department of Computer Science and Informatics, Jönköping University, Sweden, ulf.johansson@ju.se
Topic/TitleTutorial on "Predicting with confidence – Conformal Prediction and Venn Predictors"
Date & TimeJuly 29, 03:40 - 05:40pm
LocationGalleria B
DescriptionHow good is your prediction? In risk-sensitive applications, it is crucial to be able to assess the quality of a prediction, but traditional classification and regression models don't provide their users with any information regarding the trustworthiness of a prediction.
Conformal predictors, on the other hand, are predictive models that associate each of their predictions with a precise measure of confidence. Given a user-defined significance level E, a conformal predictor outputs, for each test instance, a prediction region (for classification a label set, and for regression a real-valued interval) that, under relatively weak assumptions, contains the true target value with probability 1-E. In other words, given a significance level E, the error rate of a conformal predictor will be exactly E, in the long run. Since all conformal predictors have this remarkable property, called validity, the main goal becomes minimizing the prediction regions, thus maximizing the informativeness.
The conformal prediction framework allows any traditional classification or regression model to be transformed into a confidence predictor with very little extra work, both in terms of implementation and computational complexity.
For classification, the definition of validity in conformal prediction is often perceived as somewhat counter-intuitive, since the guarantee only applies a priori, i.e., once we have seen a specific prediction, the probability for that prediction to be wrong is no longer E. With this in mind, we recommend Venn predictors as a very strong alternative to conformal prediction for classification.
Venn predictors are multi-probabilistic predictors with proven validity properties. The standard impossibility result for probabilistic prediction is circumvented in two ways: (i) multiple probabilities for each label are outputted, with one of them being the valid one and (ii) the statistical tests for validity are restricted to calibration.
Hence, conformal prediction and Venn predictors are important tools that every data scientist should carry in their toolboxes, since they represent a straightforward way of associating the predictions of any predictive machine learning algorithm with confidence measures.
This tutorial aims to provide an introduction and an example-oriented presentation of the conformal prediction and Venn prediction frameworks, directed at machine learning researchers and professionals. The goal of the tutorial is to provide attendees with the knowledge necessary for implementing confidence predictors, and to highlight current research on the subject. The tutorial will contain examples of using confidence predictors in Python and KNIME.

The intended audience is machine learning researchers and professionals at intermediate to expert level. The participants are expected to have a good understanding of machine learning and data mining.
Short BioProf. Ulf Johansson holds a M.Sc. in Computer Engineering and Computer Science from Chalmers University of Technology, and a PhD degree in Computer Science from the Institute of Technology, Linköping University, Sweden. Since 2016, he is a full professor in computer science at the School of Engineering, Jönköping University.
Ulf Johansson’s main area of expertise is machine learning algorithms for data analytics. Most of the research is applied, and often co-produced with industry. Application areas include drug discovery, health science, marketing, high-frequency trading, game AI, sports analytics, sales forecasting and gambling. Prof. Johansson has published extensively in the fields of artificial intelligence, machine learning, soft computing and data mining. He is also a regular program committee member of the leading conferences in computational intelligence and machine learning. During the last few years, he has published several papers on conformal prediction and Venn predictors, some presented in top-tier venues like the Machine Learning journal and the ICDM conference.

Tutorial 3

SpeakerYuri Demchenko
University of Amsterdam, The Netherlands
Topic/TitleTutorial on "Developing Data Science and Analytics related competences and professional skills:
Practical recommendations to Data Science job seekers, educator and team managers"
Date & TimeJuly 30, 03:40 - 05:40pm
LocationGalleria B
DescriptionData Science is an emerging field of science, which requires a multi-disciplinary approach and has a strong link to Big Data and data driven technologies that created transformational effect to all research and industry domains. There is a critical gap in current supply of Data Scientists and other Data Science and Analytics (DSA) enabled professions in research, industry and government. Thousands of Data Scientist and related vacancies remain unfilled for months. Companies are looking to specialists who will help them to make the company data driven and benefit from the new technologies from Big Data, supercomputing, IoT to machine learning and cognitive technologies. However, getting the Data Science position is not easy and requires extensive knowledge and experience in many areas that comprise modern Data Science.

The education and training of Data Scientists currently lacks a commonly accepted, harmonized instructional model that reflects by design the whole lifecycle of data handling in modern, data driven research and the digital economy.

To address this problem, the tutorial will start from the definition of the Data Scientist that is based on the extended NIST SP1500-1 definition: “A Data Scientist is a practitioner who has sufficient knowledge in the overlapping regimes of expertise in business needs, domain knowledge, analytical skills, and programming and systems engineering expertise to manage the end-to-end scientific method process through each stage in the big data lifecycle , till the delivery of expected scientific and business value to science or industry.”

The competences required from the Data Scientists to successfully work in different work environments in industry and in research and through the whole career path include:
• Data Analytics including statistical methods, Machine Learning and Business Analytics
• Data Science Engineering: software and infrastructure
• Data Management and Governance
• Research Methods and Project Management
• Subject Domain competences and knowledge

This tutorial introduces the EDISON Data Science Framework (EDSF) that provides a foundation for the Data Science profession definition. The EDSF includes the following core components: Data Science Competence Framework (CF-DS), Data Science Body of Knowledge (DS-BoK), Data Science Model Curriculum (MC-DS), and Data Science Professional profiles (DSP profiles). The MC-DS is built based on CF-DS and DS-BoK, where Learning Outcomes are defined based on CF-DS competences and Learning Units are mapped to Knowledge Units in DS-BoK. In its own turn, Learning Units are defined based on the ACM Classification of Computer Science (CCS2012) and reflect typical courses naming used by universities in their current programmes.
The EDSF also defines the Data Science professional skills and 21st Century skills that are generally required by modern data driven companies.

For educators, the tutorial provides examples how the proposed EDSF can be used for designing effective Data Science curricula as well as individual competences assessment and Data Science teams building.

For job seekers, the tutorial will advise how to understand a vacancy description, understand what the company actually needs and how to successfully manage job application and interview.
Short BioYuri Demchenko is a Senior Researcher at the System and Network Engineering of the University of Amsterdam. He is graduated from the National Technical University of Ukraine "Kiev Polytechnic Institute" where he also received his PhD (Cand. of Science) degree. His main research areas include Data Science and Data Management, Big Data and Infrastructure and Technologies for Data Analytics, DevOps and cloud based software development, general security architectures and distributed access control infrastructure for cloud based services and data centric applications. He is currently involved in the European projects GEANT4, MATES, FAIRsFAIR where he develops different elements of cloud based infrastructures for scientific research, and issues related to Data Science and digital skills development. Yuri has coordinated the EU funded EDISON project (2015-2017) which has developed the EDISON Data Science Framework (EDSF) that provides a conceptual foundation and practical basis for building the Data Science profession. His recent research are also extending into data economics and open data market models.
He is actively contributing to the standardisation activity at RDA, OGF, IETF, NIST, CEN on defining Big Data Architecture Framework, Data Science competences, and data properties as economic goods.

Invited Talk 1

SpeakerPeter Geczy
National Institute of Advanced Industrial Science and Technology (AIST), Japan
Topic/TitleData Science: An Interdisciplinary Perspective
Date & TimeJuly 29, 11:00a - 12:20p
LocationGalleria B
DescriptionWe are in the midst of digital data explosion that is notably influencing nearly every part of contemporary society. Vast quantities of data are generated, transmitted and harvested daily across the globe. Data aware commercial and governmental organizations have been collecting large amounts of data for various purposes. Primary drivers in data collection are value and knowledge extraction. Commercial value extraction from data has been a main target for businesses. Actionable knowledge extraction has been a focus of a broader range of organizations. Both tasks are challenging and present numerous difficulties as well as opportunities. Data Science is an emerging field attempting to address such challenges in an interdisciplinary manner. We will shed light on the pertinent interdisciplinary aspects of data science—spanning the interests of a spectrum of organizations.
Short BioDr. Peter Geczy holds a senior position at the National Institute of Advanced Industrial Science and Technology (AIST). His recent research interests are in information technology intelligence. This multidisciplinary research encompasses development and exploration of future and cutting-edge information technologies. It also examines their impacts on societies, organizations and individuals. Such interdisciplinary scientific interests have led him across domains of technology management and innovation, data science, service science, knowledge management, business intelligence, computational intelligence, and social intelligence. Dr. Geczy received several awards in recognition of his accomplishments. He has been serving on various professional boards and committees, and has been a distinguished speaker in academia and industry. He is a senior member of IEEE and has been an active member of INFORMS and INNS.
SlidesWill be not published here.

Invited Talk 2

SpeakerDiego Galar
Luleå University of Technology, Sweden
Topic/TitleDigital twins development and deployment in bottom up approach
Date & TimeJuly 31, 03:40 - 05:40pm
LocationGalleria B
DescriptionThe technology and operation of assets are complex, but the adoption of IoT in and its use with OT platforms enables the use of ‘digital twins’ to manage, monitor and maintain assets. The digital twin connects complex assets and their OT systems to an IT environment by capturing data to monitor performance, deterioration and failure, location and safety compliance and remote monitoring systems for scheduling and asset utilisation.
Through data fusion, digital twins become virtual and digital representations of physical entities or systems. However, the clone created with IT and OT convergence to forecast failures, demand, customer behaviour, or degradation of assets is not complete since it lacks engineering knowledge. This happens because the digital engineering models developed during the engineering phase of projects do not typically play a role in the operational phase.
Therefore, digital transformation demands that engineering technology (ET) be included in the IT/OT convergence process as the importance of integrating product design increases. For that purpose, digital twins must be complemented by other information to assess the overall condition of the whole fleet/system, including information from design and manufacturing, as this obviously contains the physical knowledge of assets.
Short BioDr. Diego Galar is Professor of Condition Monitoring in the Division of Operation and Maintenance Engineering at LTU, Luleå University of Technology where he is coordinating several H2020 projects related to different aspects of cyber physical systems, Industry 4.0, IoT or industrial Big Data. He was also involved in the SKF UTC centre located in Lulea focused on SMART bearings. He is also actively involved in national projects with the Swedish industry and also funded by Swedish national agencies like Vinnova. He is also principal researcher in Tecnalia (Spain), heading the Maintenance and Reliability research group with the Division of Industry and Transport. He has authored more than four hundred journal and conference papers, books and technical reports in the field of maintenance, working also as member of editorial boards, scientific committees and chairing international journals and conferences and actively participating in national and international committees for standardization and R&D in the topics of reliability and maintenance. In the international arena, he has been visiting Professor in the Polytechnic of Braganza (Portugal), University of Valencia and NIU (USA) and the Universidad Pontificia Católica de Chile. Currently, he is visiting professor in University of Sunderland (UK), University of Maryland (USA), and Chongqing University in China.
SlidesSlides D. Galar