Portfolio
If you are in search of an experienced machine learning practitioner, software engineer with a proven track record to deliver top-quality research and code, or IT infrastructure architect then you’re in the right place. Get in touch to discuss your next project now!
I am Jan, I completed my Ph.D. degree at the Ubiquitous Knowledge Processing (UKP) Lab, Technical University of Darmstadt, Germany. My research focuses on data annotation, especially improving annotation efficiency for NLP. I also am an experienced software engineer, especially in backend development as well as a system architect and administrator. In my free time, I love to learn Mandarin.
Data Annotation Consulting #
My Ph.D. work focused on increasing data annotation efficiency, that is improving quality, reducing costs and time as well as making annotation projects more enjoyable and practical for annotators.
Due to my extensive experience in the annotation space, I can help you with conducting annotation campaigns, from consulting over data selection and crawling, annotator training and task setup in annotation platforms as well as analyzing, adjudicating and evaluating annotation results. I published several research papers in top venues investigating related problems.
Data Annotation Tooling
I am part of the INCEpTION project where we develop a batteries-included and highly customizable annotation platform mainly for text annotation. I can provide business support as well as custom features to suit your annotation needs.
Machine-learning Supported Annotation
Many annotation tasks, e.g. creating corpora with linked entities or predicate-argument structures can be augmented by machine learning to increase quality or reduce annotation time or cost. These can be in form of pre-annotations that then just need to be corrected by annotators or in form of recommenders that show inline suggestions.
Annotation Error Detection
Data quality is an integral part of ML model performance. As annotators inherently can do mistakes, quality control is needed to find annotation errors, gauge the error rate and then take countermeasures like more annotator training, adjusting annotation guidelines or sending bad batches back to re-annotation. Finding errors often turns out to be expensive and time-consuming, which is why annotation error detection (AED) may be useful to try. For my Ph.D., I investigated existing methods for AED and re-implemented them in an open-source Python package. Also, I applied AED during an internship on production data and showed that it can significantly reduce time and cost.
Machine Learning #
While completing my Master’s and Ph.D. degree, I extensively used statistical and deep machine learning. I especially enjoy bringing machine learning into production, from data collection and annotation to model training, evaluation, iteratively improving, and then finally deploying it. I can help you out with investigating whether machine learning can benefit your business case and soundly analyzing the results. Besides that, I am also especially interested in the data science and statistics side of machine learning.
AI Infrastructure #
As part of my Ph.D., I lead the system administration team for our group of over 40 people, including 25 Ph.D. students. Our team consists of one system administrator and five Ph.D. students that manage our IT. We host many services by ourselves, for instance, a virtual infrastructure based on Proxmox, Nextcloud, Matrix chat, monitoring with Grafana and Prometheus as well as automation with Puppet and Ansible.
We also co-locate and host our own Slurm GPU compute cluster. It is used every day by researchers at UKP to conduct cutting-edge deep learning research.
Not only do I lead the system administration team, but I also deployed or helped deploy and currently maintain most of these services. I would be happy to help you with your own IT and research infrastructure.
Software Engineering #
I thoroughly enjoy programming at work and as a hobby, especially custom tooling/scripting and application backends. My focus is on delivering maintainable, well-tested, and documented projects. I have also experience in building full-stack prototypes. During my career, I professionally developed (among others) Java, Python, C/C++, Rust, Go and dabbled in many more languages. I am always eager to try out new languages, frameworks, and tools and have a track record of picking these up quickly. You can check my Github or my personal blog for a list of some projects. If that sounds interesting, then please reach out, I would be delighted to develop some software for you.