Master's thesis: Artificial curiosity using deep neural networks for audio analysis
16.10.2024, Abschlussarbeiten, Bachelor- und Masterarbeiten
The main goal of this thesis is to develop a "curious agent" that can autonomously identify new, previously unseen classes in a large pool of audio files. The agent will rely on out-of-vocabulary detection and clustering techniques for identifying new classes. Additionally, it will leverage textual descriptions of potential candidates to pre-assign a label that will then be corrected using human annotations (this last step will be simulated using existing labels).
Concretely, we will begin with the [AudioSet](https://research.google.com/audioset/) dataset and ontology, and investigate the following steps:
1. What is the best way to cluster individual instances of sounds?
2. After training on a subset of AudioSet, how can we identify instances that belong to entirely different classes (the simplest way, of course, is to set a threshold on softmax probabilities, that will be the baseline). In this step, the labels of all those instances will be remain to the algorithm
3. We will leverage pretrained LLMs and handcrafted features to obtain linguistic descriptions of how the instances we identify in step 2 sound. We will use the clustering step to derive descriptions for the entire cluster. This will be used to automatically assign labels to all instances in each cluster
4. We will design a criterion using active learning principles for "revealing" a small set of labels to the algorithm. This will simulate a human-in-the-loop step where we obtain these labels from human annotators
5. The process will be repeated until the entire dataset has been correctly annotated
6. The ultimate measure for the approach is "annotation cost", i.e. how many rounds of human labels are needed to annotate the whole dataset and hierarchy correctly
7. Optionally, we will aim to integrate all steps in a single, deep learning architecture that can be trained end-to-end (for instance, we can attempt to do the clustering on-the-fly)
A rough 6-month plan on how the master's thesis will proceed is as follows:
+ Month 1: Onboarding, literature review, data curation, experiment setup
+ Month 2: Clustering and out-of-vocabulary detection. This is the most crucial step in this process and we might need to expand the work beyond 1 month
+ Month 3: Linguistic descriptions using handcrafted features and LLMs. While handcrafted features already exist and template approaches can be used, we will investigate how pretrained LLMs can be used to streamline this step
+ Month 4: Active learning. As discussed, we will simulate the presence of oracle annotators that can reveal some of the labels to the agent. The goal is to reduce the amount of ground truth annotations needed as this is a costly step in the whole process
+ Month 5: End-to-end learning. We will attempt to incorporate multiple of those steps in a single architecture. Moreover, we will package the components developed in the previous months in a toolkit that can be installed and reused by the community
+ Month 6: Thesis writing and presentation preparation. We might conduct some additional analyses at this point, but the main emphasis will be on writing
Ways of working:
+ We will have regular one-on-one meetings, either in-person (in Max-Weber Platz where our office is located) or via zoom. These will be done in a weekly or bi-weekly basis depending on the project status and the student's needs
+ The student will provide short, weekly reports per email, which will help as a constant point to track progress
+ We will use GitHub to communicate regarding code and experiments. Specifically, new code will be introduced using PRs and experiment results will be discussed in more detail in issues (and their outcome will be part of the weekly report)
Application process:
+ Please send an email to andreas.triantafyllopoulos@tum.de using Curiosity@CHI as the mail header. This is your first test ;)
+ Please provide your CV and transcripts, as well as a very short cover letter with your background (this can be in the email body or as separate file)
+ Please write one sentence about how you understand "artificial curiosity" and one more sentence about how you would approach it
What you need to know:
+ Deep learning. You should have some familiarity
+ Coding with Python and PyTorch. This is an absolute must as the thesis requires intensive coding using those two
+ Git(Hub). We will collaborate using a private repository on GitHub. You do not need to know the specifics of GitHub as a platform, but Git knowledge is necessary
+ (Optional) Digital signal processing theory is good to have, as it will save us some time discussing audio background. With theory I mean the basics of the Fourier transform, sampling, etc.
What you do not need to know:
+ Background on audio. As I expect most students to be unfamiliar with audio, I will give you a one-on-one introduction when we start
+ What active learning is. You will learn about this during your literature review
What you can expect to gain:
+ Even more experience with PyTorch and Git
+ The chance to work with a world-leading group on audio. We hope to connect you with other group members by extending invitations to our regular seminars and presentations
+ The chance to work on a project that goes beyond "simply" training a DNN. Rather, you will gain experience that is very relevant for real-world AI projects, where there are huge amounts of unlabeled data and the resources for annotation are never enough
+ Intensive supervision and a chance to publish your findings in a top-tier conference.
Deadline: 31.10.2024
Starting date: 01.12.2024 (can be a bit sooner or later, I'm flexible)
I will take the week 04.11.-08.11. to filter through applicants and reach out to each of them. We will then have a kickoff meeting either on 11.11. or on 12.11. I might reach out to rejected applicants for alternative topics, or forward your CV to any interested colleagues.
Contact: andreas.triantafyllopoulos@tum.de
P.S. I am looking for exceptional candidates on a rolling-basis. "Exceptional" means that you have to browse through my Google Scholar profile, identify a recent paper that you find interesting (or two!) and write me a personalized message that describes why you want to work on the topic and how you propose to improve what I've done on the paper. If you want to do that, just drop me an email with Thesis@CHI as the subject. Otherwise, you can wait for future calls.
Kontakt: andreas.triantafyllopoulos@tum.de