A Map of Music Technology

Oct 16

Written By Andreas Papaeracleous

This page outlines the contents presented to university students in a presentation given on October 16th, 2025.

What is Music Technology?

By definition, it refers to all inventions and discoveries within the field of music.

Things we can associate with music technology:

Musical instruments: flutes, organs, synths...
Recording equipment: microphones, speakers, mixing desks, DAWs...
Music distribution: CDs, LPs, MP3, streaming...
Music understanding: recommender systems, song detection, transcription...
Music composition: sheet music, composition software, e-readers...

Things that we cannot associate with music technology:

Human voice
Human ear
Our sense of music and groove
Whisky

This however does not narrow things down too much. For our purposes, we typically mean modern technology.

Defining the music technologist

By extent, defining the role of a music technologist is very difficult. Nobody is an expert in everything.

Types of music technologists (specialists):

Instrument makers, luthiers...
Sound engineers, mix engineers, mastering engineers...
People who innovate in the way that we make music
People who innovate in the way that we experience music

Active themes and topics in Music Technology

Throughout the years, topics in music technology have gone in and out of focus, based on feasibility and marketability. While the topics may change and evolve, the themes of Music Technology are constant:

The themes are:

Audio i/o: transfering audio between mediums (air, analog, digital)
Sound synthesis: The process of creating sound.
Sound manipulation: The process of changing sound in some way.
Machine listening: Automating the need to listen.
Generative audio: Automating the music / SFX creation process.

Each of these themes have many topics within.

Audio i/o

i/o (or IO) means input - output. i/o is fundamental to music technology, as it deals with the transfer of music from one domain to another.

The main three audio domains are the following:

Ambient air around us
Active electric signal (Analog)
Digital string of audio samples (WAV, MP3...)

We can transfer to any of these domains via key inventions:

Air -> Microphone -> Weak electric signal -> Pre-Amplifier -> Boosted electrical signal -> ADC -> Digital samples
Digital Samples -> DAC -> Weak electric signal -> Amplifier -> Boosted electrical signal -> Speakers -> Air

Storage

We can store audio for future use in two key ways: Analog, and Digital.

An analog storage device holds the audio data in a medium that can directly be converted to an electric sinal. A digital storage device holds the audio data in a medium that needs to be decoded back (from 1s and 0s) to an electric signal.

The debate of Analog vs Digital is entirely subjective, and analog is considered outdated. Digital representations of sound are far more adventageous than analog. Therefore, the activity in Audio IO is happening in the digital domain.

Active topics in Audio IO:

Hardware improvements to lower the noise floor (Mics, converters, and even speakers)
Audio compression in the digital domain (very active)
Reducing bleed in multi-track recordings

Topic Deep Dive: Audio Compression

Note that compression here is different to the audio effect compression

The main goal of compression is to save space.

Why saving space is important for streaming services:

Can store more audio per disk
Can send audio faster

Open Compression algorithms of interest:

MP3, AAC... : Lossy (not sample-perfect reconstruction), but shave off information that we don't hear anyway.
FLAC... : Lossless - reduce file size without sacrificing quality.
WAV: Uncompressed

Many compression algorithms are proprietary and normal people like you and I do not have access to them.

Compressing can also mean moving to a more efficient representation for the content in question. For example, stereo audio typically has double the information than that in a mono audio. However, most sterero recordings in music have overlapping sounds which can be leveraged (active research topic). Pushing that to 3D audio where we can have up to 128 channels, we cannot justify having an audio file be 128x larger. Innovations by Dolby Labs directly tackle these problems, with solutions like Dolby Atmos and AC-4.

The cutting edge compression algorithms are now leveraging Machine Learning to encode and decode audio into a 'latent space'. These algorithms use more computing power and need space themselves, and are much more lossy, althought the loss is not in the quality domain. These algorithms are of very high interest because they can compress by a factor of 100 (and growing), and can represent a second of audio very efficiently, opening the doors for AI generative music. See Meta's Encodec

Audio Synthesis

Audio synthesis refers to the creation of sound. This is the oldest theme in music technology, since the days we invented flutes made out of bones.

Sound synthesis is deeply intertwined with instrument design. In the ambient air around us, we create vibrations by moving strings, membranes, and wind, and any other way we can. While all valid forms of synthesis, we will focus on analog and digital synthesis.

Analog Sound Synthesis

When we think of analog synthesis, we typically mean listening directly to the electrical voltage produced by an electrical circuit. Over the years, we have created many different circuits for synthesis purposes. Initially, these circuits were encased in their own cases, which could be patched together via patch cables. The famous example of this is the Moog Modular. A lot of the ideas and language introduced during that time is still in use today.

Here are some examples of electric circuits - modules that we use:

Voltage Controlled Oscillator (VCO): Creates the base waveform (Sine, Triangle, Sawtooth, Square) at a given frequency
Voltage Controlled Filter (VCF): Filters out frequencies past a target frequency.
Voltage Controlled Amplifier (VCA): Amplifies / Attenuates the signal based on incoming voltage.
ADSR: creates a 4 stage envelope on trigger: (Attack -> Decay) hold: (Sustain) -> Release

These circuits are just the basic types, and many many more have been developed. We can argue that only the VCO actually synthesises sound while the others are modifiers, but as they are all part of a synthesis ecosystem, I group them together here.

Digital Sound Synthesis

This is where synthesis gets interesting. There are so many ways to synthesise sounds digitally. Breaking free of the constraints of the imperfect analog world, we can leverage mathematical perfection and compute power to create some truly powerful algorithms.

Subtractive Synthesis:

This method follows the same principles as analog synthesis. We start with a harmonically rich waveform (Sawtooth, Square, triangle, noise), and apply different filters to shave off frequencies we don't want.

Additive Synthesis:

Instead of starting with a rich signal and shaving off frequencies, we add the exact frequencies we want together to create the sound we want. If we apply different shapers to each partial, we can get really fluid sounds.

Wavetable Synthesis:

We have a repeatable waveform with a predetermined evolution path which we can play on loop (really fast) to create the sound we want.

Frequency Modulation Synthesis:

Strange things start to happen when we modify the frequency of an oscillator at rates comparable to it's own oscillation. Even stranger things start to happen when you modify the modifier's rate as well... This is a really powerful engine that is at the essence of a really influencial synthesiser, the Yamaha DX7.

Physical Modelling Synthesis

Calculate the movement of physical objects using the laws of physics, allowing us to calculate the exact movement of strings, which we can use to synthesise sounds. This synthesis engine answers the question: What would a 20m long trumpet sound like if blown by a giant. Can also model electrical components to do virtual-analog modelling, reintroducing the imperfections of analog into the digital realm.

Granular Synthesis:

Take a pre-loaded audio sample and play little pieces (grains) of it, maybe 100 random pieces at the same time? maybe they start at random times? Maybe we move through the audio sample slowly?

Concatenative Synthesis:

Similar to granular synthesis, but the grains are chosen from a corpus, based on similarity to another target audio recording.

Neural Synthesis:

The newest form of synthesis, leveraging trained Neural Networks to create sounds. Algorithms include DDSP, GANsynth, RAVE, MusicGen, and now the private algorithms owned by Suno, Udio, stableAudio...

Sound Manipulation

Audio effects

This theme covers all things related to audio effects. There are many, and each have their own character. Some are utility, and routinely used in mixing and mastering, while others are more creative.

Routine effects:

Compressor
Limiter
EQ
Saturation
…

Creative effects:

Phasor
Delay
Reverb
Filters
Many more...

The topics in Sound Manipulation are similar to those in Audio Synthesis. We are always looking for creative algorithms to modify sounds in new ways. The themes of sound manipulation and sound synthesis blend together, as many algorithms that can be used to generate sounds can also modify sounds. With the current research lines in ML, these themes are practically studied under the same lens.

You can use Audio Effects as a musical instrument! - the sound of explosions.

Stem Separation

This topic looks at splitting a full mix back into separate stems. This is still an expensive (compute heavy) process which is done offline (not real-time).

The state of the art is Demucs, which can split a mix into 5 different categories:

Piano
Vocals
Drums
Bass
Other

Current research lines are looking to make this process faster, more efficient, and expanding the set of available stems.

Machine Listening

The theme of machine listening extends beyond music. While of course a lot of work has happened for musical applications, lots of work has been done for other purposes.

Machine listening in Music

Recommender Systems

An example of machine listening in action is in recommender systems. Recommender systems use a mix of 'Collaborative Filtering' and 'Content-Based filtering'.

Collaborative filtering recommends items by matching your habits to habits of other people. Other people who listened to Ibrahim Maalouf also listened to Ziad Rahbani.
Content-Based filtering recommends items based on calculated similarity metrics between items you interract with and the wider database.

Calculating similarity is a huge topic, and is done in many different ways depending on the desired outcome. In music streaming, we don't want to recommend the most similar tracks, but maybe tracks in a similar category. Identifiying those categories then becomes the interesting part. Generally, you'll want to extract features such as tempo, rhythm, feel, emotion, spectral content, loudness, dynamic range, etc. These feature extractors may seem trivial but work is still being done on all of them to make them better and more relevant.

Audio retrieval

If you have a huge database of sounds, and you want to search through it? A website like Freesound which hosts user-uploaded sounds and allows other users to search for them. Search used to be about matching search terms to words in the title. Now, it's about filtering tags, searching for sounds that sound similar to search matches that don't have a name-match, etc. Maybe you are a music studio and you have so many badly named audio files on your hard drive that needs sorting through... This is all the same topic.

Streaming services receive up to 100,000 song uploads per day. Deezer claims 30% are fully-AI generated. They didn't sit there and listen to them one by one.

Music Education

There are more and more apps today that promise to listen to beginners play their instruments, analyse the preformance for wrong notes, and give feedback for the user to improve. This is a very hard problem to solve well, and is a reason why more advanced users may realise that the algorithms are not as good. This is improving at high rates. With technology like denoisers, better microphones, more compute power and machine learning, we can run better algorithms on clearer audio to provide feedback on levels that we couldn't previously, such as intonation, character, and even emotion.

Computer-aided Musicology

The field of computer-aided musicology is growing.

There exist three main forms of music data out-there:

Audio recordings
Sheet music
Computer representations (Like MIDI, MusicXML, etc.)

By leveraging the computer representations, we can research music at a very high scale. Things like common chord progressions, voicings, melodic motifs, common rhythmic patterns... And this is not only applicable to Western Music. Imagine if we could transform many solo-oud or violin performances out there, even stems sitting around in Lebanese mixing engineers' computers, into music data to analyse. We can analyse different playing patterns by instrument, identify common maqams and their exact pitch intonations, perhaps even compare them by region. We can learn about percussion patterns and groove across a wide range of instruments and cultures. An example is a study where the authors asked people from all around the world to tap grooves based on a click, and created a map of groove.

Generative Music

For a very long time, composers have toyed with the idea of offloading the creative process to other systems to create music.

Today it comes in many forms:

Generative systems
Symbolic generation
Audio generation

Generative systems

This is the earliest form of generative music. One of the notable, early examples comes from Mozart, when he composed his Musical Dice Game - Score. In this 'game', players roll dice to determine the order of pre-written music bars, and play the result. These games were popular at the time, and other composers made some too.

Composer and producer Brian Eno experimented heavily with generative systems, and created loops for tape machines to play at different rates, generating music that would play for a long time without repeating. This is present in his album music for airports. Today, Hainbach does this and shows his process on YouTube.

In the world of synthesisers, the sequencer module has appeared to create sequences of voltages, that could be programmed to play specific melodies. By adding randomisation in many elements of their patches, syntehsists are able to create huge music machines while barely moving a finger to play them.

Symbolic Generation

The symbolic part of the term refers to the music symbols that we generate. This contrasts audio generation in that we generate instructions for another system to play the output. This means we output MIDI, Sheet Music, or any other symbols that can be interpreted in a musical way. In 2019, Google created a doodle of a small symbolic music generator which they called the Bach Doodle. This algorithm would take an input melody, and harmonize it in the style of a Bach chorale.

Another example of symbolic generation could be a system that automatically creates jazz lead sheets based on an input of some kind.

Audio Generation

This is the area with the most buzz right now. This is where labels are fighting music technology companies over the rights of musicians. These are the algorithms that allow anybody to create a full song in minutes based on a text prompt. This is problematic for so many reasons, and the wider scope of audio generators should be monitored carefully, as it is becoming more and more accessible to create harmful content to publish on the internet, with intent to decieve.

While full song generators are getting all the hype these days, there are other options and ways to create audio. For instance, some of the algorithms mentioned in the Synthesis sections fit right at home here.

DDSP: A new way of thinking about common audio operators that can be trained, allowing for careful timbre transfer of any input into a sound model.
GANsynth: A method rooted in image generation that creates audio samples on the fly, allowing for interpolation between sounds that are unable to exist.
RAVE: An algorithm that blurs the line between synthesis, compression, and timbre transfer. Trained on a corpus of sounds, it uses Neural Networks to compress the input into an embedding which it uses to synthesise new sounds.

The hot insider topic is that we are working on figuring out controllability and playability for these models. Nobody thinks of music as pure text.

Andreas Papaeracleous