PyData Eindhoven 2022

To see our schedule with full functionality, like timezone conversion and personal scheduling, please enable JavaScript and go here.
08:00
08:00
50min
Registration
Auditorium
08:00
50min
Registration
Ernst-Curie
08:00
50min
Registration
Planck
08:50
08:50
10min
Opening notes
Auditorium
09:00
09:00
30min
AI Ethics in the Wild - Welcome to the Jungle
Marc van Meel

All organizations will need to become data-driven organizations, or they will go the way of the dinosaur. However, AI scales risk to organizational brand and profit. Trustworthy and Ethical AI are no longer luxuries, but business necessities. Let's explore together, why bias is not exclusive to AI, why technology has never been neutral and why Data Science has little to do with Science!

Keynote
Auditorium
09:35
09:35
15min
Small break
Auditorium
09:35
15min
Small break
Ernst-Curie
09:35
15min
Small break
Planck
09:50
09:50
30min
A Tour of the Many DataFrame Frameworks
Harizo Rajaona

Processing tabular data has been of the most common operations for data scientists and engineers for a while now. A few years ago, pandas was the single tool of reference for it, but is it still true today?
In this talk, we will review and compare the existing dataframe frameworks to see how they solve the challenges of performance, scalability and user experience.

Auditorium
09:50
30min
Predictive Maintenance at ASML
Anjan Prasad Gantapara, Hamideh Rostami

In the chip industry, time is money. Customers of ASML’s lithography systems expect high uptimes. But expected and unexpected maintenance is part of that equation, sometimes requiring to halt the production temporarily.

In this presentation, we show you how we are building and deploying Machine Learning models to predict upcoming maintenance actions within the upcoming three months. Our work helps to boost productivity, maximize system utilization and reduce unexpected workload for ASML’s customer support.

Planck
09:50
30min
Thompson sampling for personalising a car brands advertisements
Nico van Engelenhoven, Julien Hamerlinck

With targeted ads becoming more prevalent in the digital landscape, we share how we used Thompson sampling and a Hierarchical Bayesian Algorithm that makes its own decisions and serves the right ad to the right audience.

Ernst-Curie
10:25
10:25
30min
Coffee break
Auditorium
10:25
30min
Coffee break
Ernst-Curie
10:25
30min
Coffee break
Planck
10:55
10:55
30min
FuzzyTM: a Python package for fuzzy topic models
Emil Rijcken

We present FuzzyTM, a Python library for training fuzzy topic models and creating topic embeddings for downstream tasks. Its modular design allows researchers to modify each software element and for future methods to be added. Meanwhile, the user-friendly pipelines with default values allow practitioners to train a topic model with minimal effort.

Planck
10:55
30min
Is it a predictive model? Is it causal inference? Well... It is running a greenhouse.
Ruben Mak

In this talk I hope to convince you that models are not either predictive or causal, but both perspectives should be combined to solve real world problems. I will use a concrete example of how we automate irrigation in greenhouses at Source.

Auditorium
10:55
30min
Using Deep Learning to Reduce Flight Delays at Schiphol Airport
santiago ruiz, Tosca van Meer

Inefficiencies in the flight preparation processes (turnaround) are accountable for around 30% of the total delays at Royal Schiphol Group (the Amsterdam airport). This process has been a black box and for this reason, it was quite hard to improve. To open the turnaround black box, Schiphol has developed technology based on computer vision using deep learning that detects many different turnaround-related tasks from images that are streamed from cameras located in the aircraft ramps in real time. In this session, we will explain how this project started, the technologies that we have applied, and the business impact that is generated at enabling the airport to reduce delays.

Ernst-Curie
11:30
11:30
5min
Small break
Auditorium
11:30
5min
Small break
Ernst-Curie
11:30
5min
Small break
Planck
11:35
11:35
30min
Data Storytelling through Visualization
Marysia Winkels

Data is everywhere. It is through analysis and visualization that we are able to turn data into information that can be used to drive better decision making. Out-of-the-box tools will allow you to create a chart, but if you want people to take action, your numbers need to tell a compelling story. Learn how elements of storytelling can be applied to data visualization.

Auditorium
11:35
30min
Lowering the barrier for ML monitoring
Wesley Boelrijk

Building and fine-tuning models is exciting, but how do you know your model keeps performing in the way you carefully designed it? Bringing your model to production without adding any monitoring is like flying on autopilot, but blindfolded.

Adding a mature monitoring setup to your model deployments can be a daunting tasks that is often pushed off to the bottom of the to-do list, or put off entirely. How can we, Data Scientists and ML Engineers, introduce monitoring earlier in the MLOPS process and make it part of your deployment right from the start? This talk offers a practical setup to implement ML monitoring in your project using Prometheus and other open-source tools.

Ernst-Curie
11:35
30min
Turning your Data/AI algorithms into full web apps in no time with Taipy
Vincent Gosselin

In the Python open-source eco-system, many packages are available that cater to:
- the building of great algorithms
- the visualization of data
- back-end functions

Despite this, over 85% of Data Science Pilots remain pilots and do not make it to the production stage.
With Taipy, Data Scientists/Python Developers will be able to build great pilots as well as stunning production-ready applications for end-users.

Taipy provides two independent modules: Taipy GUI and Taipy Core.

In this talk, we will demonstrate how:
- Taipy GUI goes way beyond the capabilities of the standard graphical stack: Streamlit, Dash, etc.
- Taipy Core is simpler yet more powerful than the standard Python back-end stack: Airflow, MLFlow, etc.

Planck
12:10
12:10
75min
Lunch break
Auditorium
12:10
75min
Lunch Break
Ernst-Curie
12:10
75min
Lunch Break
Planck
13:25
13:25
30min
AI for Good- Then and Now
Marijn Markus

Using data in new and unexpected ways to solve real problems for real people - from farmers in Africa to refugees and the war in Ukraine

Keynote
Auditorium
14:00
14:00
5min
Small break
Auditorium
14:00
5min
Small break
Ernst-Curie
14:00
5min
Small break
Planck
14:05
14:05
30min
Bulk Labelling Techniques
Vincent Warmerdam

Let's say you've to some unlabelled data and you want to train a classifier. You need annotations before you can model, but because you're time-bound you must stay pragmatic. You only have an afternoon to spend. What would you do?

Auditorium
14:05
30min
How to create a Devcontainer for your Python project 🐳
Jeroen Overschie

Devcontainers are an open-source specification, which allow you to connect your IDE to a running Docker container and develop right inside it. This has numerous advantages. Because the dev environment is now formally defined, it is reproducible. This means others can easily reproduce your dev environment, too! This makes it much easier for others to join in on your project, and stay updated with changes to the environment.

In this talk, you will learn: why you might want to use a Devcontainer for your project (or when not 😉), what exactly a Devcontainer is, and how you can build one for your Python project 🐍.

Ernst-Curie
14:05
30min
Significant Roadblocks to Usefulness for Jupyter Notebooks and a Recipe to Fix them
Greg Michaelson

The most popular data science development tools have largely been developed by academics as scratch pads for interactive data exploration. Jupyter notebooks, for instance, were developed 20 years ago at Berkeley (they were called iPython notebooks at the time). Because of their flexibility and interactivity, these tools have become widespread amongst coding data scientists. More recently, GUI-based tools have begun to be popular. They reduce the technical load on the user, but typically lack much needed flexibility and interoperability. Both avenues of innovation are wildly inadequate for modern data science development. GUI-based tools are typically too expensive, too restrictive, and too closed. The development of automated machine learning tools only made this problem worse, with dozens of software startups urging business analysts to start building machine learning solutions, often with questionable results and even more questionable customer retention metrics. On the other hand, notebook-based solutions are typically too error-prone, too loose, and too isolated to be sufficient. The result is intractable challenges around collaboration, communication, and deployment. The most recent entrants into the notebook space have only marginally improved the experience without fixing the underlying flaws. This talk discusses the fundamental flaws with the way these tools have been developed and how they currently function. Advancement in this space will require reworking the architecture and functionality of these tools at some of the most basic levels. These fixes include things like multiprocessing capabilities; real-time collaboration tools; safe, consistent code execution; easy API deployment; and portable communication tools. Future innovation in the data science development experience will have to tackle these problems and more in order to be successful.

Planck
14:40
14:40
30min
Coffee break
Auditorium
14:40
30min
Coffee break
Ernst-Curie
14:40
30min
Coffee break
Planck
15:10
15:10
30min
How to not pull your hair out while providing data to the business: unit testing for your data pipelines
Lars Hanegraaf

Other people who use your datasets is nice, but updating the logic behind it could cause breaking dashboards and ML models down the line. In this talk I will explain how to prevent these stressful situations by applying unit testing to your data or preprocessing pipelines in Python.

Ernst-Curie
15:10
30min
Practical code archaeology
Judith van Stegeren

Code archaeology is figuring out what a thing is for, who built it, and how you can get it to run again.
Dealing with legacy code artefacts (while under time pressure) is something we data people encounter a lot in daily life. I will tell about my experiences from both a research and software engineering standpoint. After quickly going over some common sense approaches, I will dive deeper into real-world archaeology and digital forensics, and find out what we can learn from these fields to make dealing with old artefacts a bit easier. Expect a mix of code and non-code hacks, with ample pop culture archaeology memes.

Planck
15:10
30min
Predicting Cognitive Impairment in Patients With a Primary Brain Tumor: A Machine Learning Perspective
Sander Boelders

Cognitive impairment is common amongst patients with primary brain tumors (PBT). The exact mechanism by which primary brain tumors affect different cognitive functions, however, is not well understood. Cognitive impairment in PBT patients is likely the result of local effects of the tumor, global effects of the tumor, and patient characteristics. Finding predictors, or the potentially complex interactions between them, may improve our understanding of how different variables influence cognitive function. Moreover, this may facilitate personalized prediction of cognitive function aid personalized treatment decisions. Several big challenges arise when aiming to make personalized predictions of cognitive functioning in PBT patients, many of these problems likely generalize to other applied machine learning tasks.

Auditorium
15:45
15:45
5min
Small break
Auditorium
15:45
5min
Small break
Ernst-Curie
15:45
5min
Small break
Planck
15:50
15:50
30min
Becoming a Pokémon Master with DVC: reproducible machine learning experiments
Rob de Wit

In machine learning projects we need to experiment in order to find and maintain the best-performing model. While we can do initial prototyping in a Notebook, eventually we need to move towards more structured experiment tracking to facilitate reproducibility of our experiments.

The open-source DVC library aims to tackle this problem through a Git-based approach to versioning data and artifacts. In this talk we will explore how DVC works, how we can apply it to conduct ML experiments, and how we can use it to become a great Pokémon trainer.

Auditorium
15:50
30min
Causal inference and scenario generation within Just Eat Takeaway.com
Max Knobbout

When A/B testing is not possible but we are still interested in drawing causal conclusions from our data, we need to resort to quasi-experimental approaches. This is the landscape that Just Eat Takeaway.com is navigating in, where we often have experimental data about a specific city, and are interested in knowing what the effects would be on another city. When we drop the requirement of causality and are merely interested in generating likely scenarios, we can use the power of predictive modelling to our advantage. From predicting likely future scenarios, to generating synthetic order data on a minute-to-minute basis, all is possible using the right statistical tools. Even in the absence of pure experimental data, we are still able to model likely futures. This talk is relevant for data scientists that are interested in the intersection of statistics and predictive modelling, and some basic knowledge about these topics will be assumed. The first half of the presentation (0-15) will talk about quasi-experimental models, the second half (15-30) will talk about scenario and data generation.

Ernst-Curie
15:50
30min
Why does everyone need to develop a machine learning package?
Andrei Alekseev

This talk is about machine learning package development. I will speak about the pains and benefits it causes for developers and share why open sourcing makes the package even better. The talk is not focused on the package itself but rather on common problems so it will be interesting for a wide range of data scientists and python developers.

Planck
16:25
16:25
5min
Small break
Auditorium
16:25
5min
Small break
Ernst-Curie
16:25
5min
Small break
Planck
16:30
16:30
30min
Come connect to the active Brainport community
Yannic Suurmeijer

PyData provides a forum for an international community of users and developers to share ideas and learn from each other. So let’s connect, come to this interactive session to meet people from other cultures, with new questions and fresh perspectives.
Sharing knowledge and experience with others is not only rewarding but also actually improves your professional skills. CodeMasters offers skills of the future to those who need it most. During a 10 week program participants are supported in learning how to code to grow towards a career in the Netherlands.

Planck
16:30
30min
DuckDB: Bringing analytical SQL directly to your Python shell.
Pedro Holanda

In this talk, we will present DuckDB. DuckDB is a novel data management system that executes analytical SQL queries without requiring a server. DuckDB has a unique, in-depth integration with the existing PyData ecosystem. This integration allows DuckDB to query and output data from and to other Python libraries without copying it. This makes DuckDB an essential tool for the data scientist. In a live demo, we will showcase how DuckDB performs and integrates with the most used Python data-wrangling tool, Pandas.

Auditorium
16:30
30min
Everything in its Right Place: Optimising Ranking in Online Grocery
Bas Vlaming

An ever increasing number of people are discovering mobile grocery shopping as an alternative to brick-and-mortar supermarkets. This talk will cover how we can use machine learning to make these customers' grocery shopping as smooth and frictionless as possible. We do this by applying ML models that rank products in agreement with the customer’s intent: e.g., by detecting personal shopping habits, and by striking a balance between query relevance and margin.

Ernst-Curie
17:05
17:05
15min
Closing notes
Auditorium
17:20
17:20
60min
Drinks & Bitterballs
Auditorium
17:20
60min
Drinks & BITTERBALLS
Ernst-Curie
17:20
60min
Drinks & Bitterballs
Ernst-Curie
17:20
60min
Drinks & Bitterballs
Planck
17:25
17:25
60min
Drinks & Bitterballs
Ernst-Curie