WEBVTT - generated by Mediathek

1
00:00:07.000 --> 00:00:10.000
Thank you Samuel, for this meeting. It's a
great pleasure having you here and with us.

2
00:00:11.000 --> 00:00:12.000
Thank you, Sir.

3
00:00:13.000 --> 00:00:17.000
And today, we want to talk about most important
best practises in the field

4
00:00:18.000 --> 00:00:22.000
of data science and being successful in project
management.

5
00:00:23.000 --> 00:00:25.000
You come from Spain and have been awarded a
bachelor’s degree.

6
00:00:26.000 --> 00:00:28.000
Can you tell us a little bit more about your
background please?

7
00:00:29.000 --> 00:00:31.000
Of course, and first of all, thank you for
having me here.

8
00:00:32.000 --> 00:00:35.000
It's very nice to be working again with your
university.

9
00:00:36.000 --> 00:00:40.000
And of course, I am from Spain, I'm from the
Canary Islands, from Tenerife

10
00:00:41.000 --> 00:00:43.000
and I went to Germany at the age of 18 to study.

11
00:00:44.000 --> 00:00:47.000
I did a bachelor in Physics in Munich

12
00:00:48.000 --> 00:00:52.000
and I did a Master in Integrative Neuroscience
in the university of Magdeburg.

13
00:00:53.000 --> 00:00:57.000
And it was only later when I was already working
that, I started getting into data science

14
00:00:58.000 --> 00:01:02.000
and I started teaching myself the concepts
that I'm actually using today on my day to
day life.

15
00:01:03.000 --> 00:01:08.000
That's perfect. Thank you. Meanwhile, you work
in both fields, data science and project management.

16
00:01:09.000 --> 00:01:16.000
Well, I'm working as a freelancer. So, my clients
change all the time.

17
00:01:17.000 --> 00:01:21.000
I look out and find them and then I typically
develop machine learning models

18
00:01:22.000 --> 00:01:30.000
and perform all the steps of a typical data
science project for all sorts of clients like
Deutsche Bahn, Lufthansa.

19
00:01:31.000 --> 00:01:36.000
I had a client from Canada who was in the health
sector.

20
00:01:37.000 --> 00:01:40.000
So, all possible kinds of clients and all possible
kinds of data.

21
00:01:41.000 --> 00:01:44.000
It's a lot of skills you need to be successful
in business life.

22
00:01:45.000 --> 00:01:50.000
And as per my understanding selling your work
to the management is very crucial.

23
00:01:51.000 --> 00:01:55.000
The management typically uses different languages
or sets of terms

24
00:01:56.000 --> 00:02:01.000
and the communication with the management is
important to keep the management satisfied.

25
00:02:02.000 --> 00:02:06.000
Can you give us a little bit more insights
about it? What is your experience in doing
this?

26
00:02:07.000 --> 00:02:13.000
It's very interesting because a very important
part of the work is being a good translator.

27
00:02:15.000 --> 00:02:18.000
As a data scientist again, you are typically
working with clients

28
00:02:19.000 --> 00:02:24.000
who may not fully understand the technical
aspects of what we're doing.

29
00:02:25.000 --> 00:02:28.000
And it's very, very important that as a data
scientist,

30
00:02:29.000 --> 00:02:33.000
you're in the position of understanding the
business needs.

31
00:02:34.000 --> 00:02:37.000
Understanding what the company wants to achieve
and where it wants to go

32
00:02:38.000 --> 00:02:43.000
and what the requirements and all the constraints
are from a business perspective.

33
00:02:44.000 --> 00:02:48.000
So not from “I should choose this model or
I should do this data imputation or I should
apply that transformation.”

34
00:02:49.000 --> 00:02:53.000
No, no, no. From this business perspective,
understanding what is needed

35
00:02:54.000 --> 00:02:57.000
and then being able to translate that back
into a technical solution.

36
00:02:58.000 --> 00:03:01.000
And the other way around. Once you have the
solution, being able to explain:

37
00:03:02.000 --> 00:03:07.000
“This, what we can do. This is what we cannot
do, and these are the constraints.

38
00:03:08.000 --> 00:03:09.000
This is how it's going to work. It's going
to help you.”

39
00:03:10.000 --> 00:03:14.000
It's a super important part of my work, especially
because again, I'm a freelancer.

40
00:03:15.000 --> 00:03:17.000
So, it's always important for data scientists,
but when you're external,

41
00:03:18.000 --> 00:03:21.000
it's even more important that you're able to
explain what you're doing

42
00:03:22.000 --> 00:03:24.000
and to sell what you're doing to some point.

43
00:03:26.000 --> 00:03:29.000
And I think this comes with experience.

44
00:03:30.000 --> 00:03:33.000
I think this comes from watching more experienced
people,

45
00:03:34.000 --> 00:03:37.000
more experienced colleagues, how they do it.
And by trial and error.

46
00:03:38.000 --> 00:03:44.000
And what I always like and try to follow for
myself is being very, very honest.

47
00:03:45.000 --> 00:03:51.000
Some people like to market and sell everything
as a big success.

48
00:03:52.000 --> 00:03:54.000
In my experience, if you're very honest with
the managers

49
00:03:55.000 --> 00:03:57.000
and with the people who are paying for the
job.

50
00:03:58.000 --> 00:04:01.000
And explain them why something maybe will not
work or it's not a good idea.

51
00:04:02.000 --> 00:04:04.000
That builds trust.

52
00:04:05.000 --> 00:04:10.000
That builds trust, they understand that what
you're telling them is useful, is valuable.

53
00:04:11.000 --> 00:04:13.000
And that there's a trust that you can later
use

54
00:04:14.000 --> 00:04:18.000
for moving the project in the direction that
you know is best.

55
00:04:19.000 --> 00:04:23.000
The process of following and developing on
this data science project end to end,

56
00:04:24.000 --> 00:04:27.000
maybe 1% is the core skill of building machine
learning models

57
00:04:28.000 --> 00:04:31.000
and everything else is communication, software
engineering, data engineering,

58
00:04:32.000 --> 00:04:36.000
data cleaning, data analysis, and lots of soft
skills.

59
00:04:37.000 --> 00:04:39.000
Let’s call them the soft skills, but you
get my point.

60
00:04:41.000 --> 00:04:46.000
Which are very fulfilling. But I think new
data scientists should be aware

61
00:04:47.000 --> 00:04:51.000
that they're all needed in order to make a
machine learning model actually successful.

62
00:04:52.000 --> 00:04:54.000
Can you help our students or give them some
ideas

63
00:04:55.000 --> 00:04:59.000
how to learn this very important soft skill
part.

64
00:05:00.000 --> 00:05:05.000
To communicate with the stakeholder, being
honest. Is there a way to practise for students?

65
00:05:06.000 --> 00:05:08.000
How would you approach this, or how would you
prepare yourself

66
00:05:09.000 --> 00:05:11.000
if you would be in the shoes of our students?

67
00:05:12.000 --> 00:05:16.000
I think it all starts with knowing who you
are,

68
00:05:17.000 --> 00:05:19.000
knowing where your weaknesses are and what
your strengths are.

69
00:05:20.000 --> 00:05:22.000
Some people are born natural communicators,

70
00:05:23.000 --> 00:05:26.000
and they will have no problem communicating
complex topics.

71
00:05:27.000 --> 00:05:34.000
I think for students who may feel a bit uncertain
about this part of the job,

72
00:05:35.000 --> 00:05:38.000
I wouldn't hesitate to ask for help.

73
00:05:39.000 --> 00:05:42.000
To do a course in communication, to do a course
in negotiation,

74
00:05:43.000 --> 00:05:46.000
to look for peers, to look for mentoring.

75
00:05:47.000 --> 00:05:51.000
Mentoring with a person who's aligned with
you is extremely powerful.

76
00:05:52.000 --> 00:05:56.000
And also remember that at the end of the day,
we learn by doing.

77
00:06:00.000 --> 00:06:06.000
So also the case study approach in the classroom
would be one of the starting points, I guess.

78
00:06:07.000 --> 00:06:09.000
Yeah, thank you so much, Samuel.

79
00:06:10.000 --> 00:06:19.000
And, if I may, let’s move forward to some
data science related questions.

80
00:06:20.000 --> 00:06:24.000
If you have a look at problem solving, more
or less complex problems,

81
00:06:25.000 --> 00:06:27.000
and dealing with larger data sets and complex
algorithms.

82
00:06:28.000 --> 00:06:34.000
What are your findings in doing this? Are there
specific challenges?

83
00:06:35.000 --> 00:06:37.000
Are there some things you want to recommend
to our students?

84
00:06:38.000 --> 00:06:41.000
Yes, and I have a very clear recommendation,
actually.

85
00:06:42.000 --> 00:06:46.000
I think data science, especially at the beginning
can be a bit overwhelming.

86
00:06:47.000 --> 00:06:51.000
There are so many models, there are so many
techniques, there are so many things that you
can do,

87
00:06:52.000 --> 00:06:55.000
and there's a certain tendency, especially
among newcomers.

88
00:06:56.000 --> 00:07:02.000
To try out the biggest, more powerful models
first. And this is a mistake.

89
00:07:03.000 --> 00:07:06.000
My recommendation, is if the problem is hard,
with any problem,

90
00:07:07.000 --> 00:07:10.000
but especially if the problem is hard, start
easy.

91
00:07:11.000 --> 00:07:14.000
Start with something as easy as possible.

92
00:07:15.000 --> 00:07:17.000
Maybe breakdown, if the problem is too big,

93
00:07:18.000 --> 00:07:21.000
maybe break it down into parts and just solve
one easy part first.

94
00:07:22.000 --> 00:07:24.000
And once you have that solved go to the next.
Don't try to solve every single once.

95
00:07:25.000 --> 00:07:30.000
And don't throw a very complex neural network
on top of the problem because that's…

96
00:07:31.000 --> 00:07:34.000
If it doesn't work, you're not going to know
why it does not work.

97
00:07:35.000 --> 00:07:40.000
And in complex problems the key is to always
have some reproducibility and traceability

98
00:07:41.000 --> 00:07:43.000
of what did I do that worked in order to get
there.

99
00:07:44.000 --> 00:07:47.000
So, my suggestion for new students is,

100
00:07:48.000 --> 00:07:53.000
don't fall into the trap of trying the newest,
shiniest, fanciest models.

101
00:07:54.000 --> 00:07:57.000
Go simple and trust your instincts.

102
00:07:58.000 --> 00:08:04.000
I think for me it's always very, very useful
to dive deep into the data.

103
00:08:05.000 --> 00:08:06.000
So not just...

104
00:08:07.000 --> 00:08:11.000
If I have a feature, you have some attribute
that I want to know more about,

105
00:08:12.000 --> 00:08:14.000
not just compute the mean and the standard
deviation and go on,

106
00:08:15.000 --> 00:08:19.000
but plot the distribution, look at the extreme
values,

107
00:08:20.000 --> 00:08:22.000
find the outliers, ask yourself why am I seeing
this.

108
00:08:24.000 --> 00:08:31.000
Ask questions. Ask a lot of questions and go
deep into your data in order to master the
complexity.

109
00:08:32.000 --> 00:08:37.000
I just finished the project with Deutsche Bahn
that was extremely, extremely, extremely complex.

110
00:08:38.000 --> 00:08:40.000
Extremely complex for a variety of reasons,

111
00:08:41.000 --> 00:08:44.000
and what worked at the end was taking the big
problem and again chunking it down,

112
00:08:45.000 --> 00:08:50.000
solving this small part first and then slowly
going up to the entire network.

113
00:08:51.000 --> 00:08:58.000
Thank you so much, Samuel. It was really a
pleasure having you here and learning from
your experience.

114
00:08:59.000 --> 00:09:03.000
Thank you so much for your time and hopefully
we will meet us again. Thank you, Samuel.

115
00:09:04.000 --> 00:09:10.000
Thank you very much as well and I'm looking
forward to our next opportunity to collaborate
together. Thank you.

