# Algorithm

Probability of a melody complying to a listener’s auditory expectations.

Given a generally accepted probability P about a certain event (A) happening (prior probability, P (A)), how is the probability changed (posterior probability) by additional information (B) about the event?

## Bayes Theorem

### P (A given B) = P (B given A) x P (A) / P (B)

1. Applied to melody

In a melodic line, consecutive notes normally follow a pattern that is pleasing to the ear. Simple melodies follow this pattern more rigorously than more complex or sophisticated creations.

Consider the melodic line, "Che farò senza Euridice" by Gluck: In its simplest description, the melody may be represented as a sequence of notes:

{E4, F4, G4, G4, G4, C5, C5, B4, B4, C5, G4, G4, A4, A4, G4, F4, F4, E4, C4, E4, G4, C5, C5, B4, C4, E4, G4, C5, C5, B4} (1)

The duration of the note and the rhythmic aspect of a melody are not parts of this analysis and will be introduced later. The number following the note indicates the octave number as in a piano scale. C1 is the central C, whose frequency is 261.626 Hz.

Two consecutive notes define an interval and we address the following question: what is the probability that individual and succession of notes contribute to a melodic pattern in a manner that is biased towards auditory expectations?

2. Auditory expectations
1. Intervals and their integer ratios

In the western musical style, and since Pythagoras, intervals which are generally the most consonant to the human ear are intervals represented by small integer ratios.1

Consider two consecutive notes of frequencies fE4 and fG4 as the first two notes (1). The ratio ƒE4 / ƒG4 can be converted to its nearby integer number ratio with small denominator:

ƒE4 / ƒG4 ≈ ΝE4 / ΝG4 (2)

The sum of the numerator and denominator in Eq. (2),

S (E4, G4) = ΝE4 + NG4 (3)

is a positive integer which may be used as a "dissonance metric" or an indication of a lack of compliance (or otherwise) to the auditory expectation of a listener, of the E1-G1 interval, measured in semitones of a piano chromatic scale. Let’s call IN(1) the 1st interval, IN(i) being the i-th interval in the sequence. The greater the sum, the smaller the listener's expectation is supposed to be.

From now on, we refer to this sum also as "Fractional sum".

Back to Bayes’ theorem:

P (A given B) = P (B given A) x P (A) / P (B)

that we apply to the following question:

Given a sequence of notes in a melody, what is the probability that the interval between two consecutive notes contributes to the overall probability that the sequence is biased towards auditory expectation?

P (A given B) -> Probability that the sequence is biased based on the evidence (B) of two successive notes.

P (A) -> Prior, generally accepted probability of evidence A, to be verified by new evidence B.

P (B) -> Degree of confidence in new evidence B:

P (B) = P (B given A) x P (A) + P (B given not A) x P(not A) (4)

Or, more explicitly: (5)

P (bias) = prior probability of the melody being biased. Its value for the 1st interval is set at 0.5 in the present algorithm. On subsequent intervals, it is the value found for the previous interval.

P (no bias) = 1 - P (bias) (6)

P (IN(i), given bias) is the probability that the interval under observation occurs with a biased composer. As mentioned earlier in this paragraph, according to our model this probability is inversely proportional to S in Eq. (3). The probability model we chose is the sloped distribution shown in the top display of the two figures below. The horizontal axis extends from 0 to 40. For S > 40, the probability density is set at zero (actually, 0.001 for computational reasons). The area under the plot is equal to 1, as it should be for a probability density distribution. The blue circles in the lower figure represent the value of S in Eq.(3) of the melody under consideration (Euridice). Figure 1 Figure 2

The histogram above shows that, over 1000 hypothetical songs randomly picked according to the chosen probability density, the number of songs having a certain value within a given interval of the fractional sum of Eq.(3). About 23% have S = 0 - 5 and 8% have S = 35 - 40.

The equation of the sloped line (probability of sum given bias) in Figure 1 is: (7)

ls is the range under consideration, 0 ≤ Sls. The numerator is the expression for a linear slope in S, the independent variable. The denominator is the area of the "sloped distribution" region. This guarantees that as it should be for a density distribution function. C is a constant proportional to the probability density at S=0, α is proportional to the slope of the line.

For the case under consideration, we have taken

ls = 40, α = -0.0025, c = 0.1 is the constant probability density across the note interval ls.

2. Proximity

There is an additional constraint, influenced by cultural preferences, that has to do with the length of an interval between consecutive notes. Scholars in the field have found a statistical pattern by which the probability of an interval depends inversely on the length of the interval. In other words, the human ear likes successive notes to be not to distance apart in terms of their pitch. Proceeding like in the fractional sum case, the models for the probability density, PProx,Bias (l), PProx,noBias (l), within the length of the interval of +- one octave (12 semitones) are shown in the following figure, Figure 3

The variance of the distribution (σ2) is 8.0 interval length. Data gathered and processed on over 6,000 songs (Essen Folksongs collection, see Ref. iii), resulted in a variance of the "proximity profile" (analogous to our interval length) of 7.2.

3. Combined fractional sum (FS) and proximity (PR) probabilities

We assume the initial prior probability (0.5, or 50%) to be same for the fractional sum and proximity cases and we adjust it turn by turn based on the last result.

The Bayes probability for FS is Eq. (5): (8)

The Bayes probability for PR is: (9)

IN(i) is the i-th interval between two consecutive notes, Si its fractional sum, li its length in semitones.

We apply Bayes formula sequentially, musical interval by musical interval. At each step the Prior Probability P (A) is replaced by the newly found P(A given B).

P (A given B) = (P (B given A ) * P (A)) / (P (B given A) * P (A) + P (B given not A) * P (not A))

4. Repetition

Excessive repetition of a melodic line should, we think, be included as a factor affecting expectation and surprise. As a metric for repetition, we have used the autocorrelation, a well know statistical quantity of note succession. Given a sequence { a1, a2, …, an}, the autocorrelation rk measures if the separation of k notes is a repetitive pattern iv: (10)

Where is the mean of the notes.

The rk's can take any value between 0 and 1: 0 implies total lack of autocorrelation (extreme surprise), 1 total correlation (total bias). The ai's are the semitone progressive numbers in a piano scale. This simplistic interpretation may lend itself to gross misinformation about the intentions of the musical piece: repetition of a musical is not always a concession to auditory expectation. For instance Philip Glass' music is purposefully full of repetitions.

After several studies of musical repetition and correlation, we have adopted the strategy.

In our experience, the autocorrelation values r fall into three regions:

• 0 ≤ r ≤ 0.3, region typical of totally uncorrelated (or random) sequences.
• 0.3 ≤ r ≤ 0.8, region typical of most classical and popular melodies.
• 0.8 ≤ r ≤ 1.0, region typical of where rhythmic aspect dominates.
1. Procedure

The repetition pattern is analyzed ahead of the Bayes process of the fractional sum and proximity probabilities. The algorithm computes the autocorrelation: if it is less or equal to 0.8, it is displayed and no further action is taken. The Bayes' algorithm takes over.

If the autocorrelation is > 0.8, we found that the surprise algorithm is swamped by the repeated pattern, and we use a little artifice to damp it, best illustrated by the following example. Take the pattern P having a strong autocorrelation and lag k = 3:

P = 1 2 3 1 2 3 2 3 4 2 3 4
k = 3

is replaced by the possible k=3 patterns, but without repetition. This process smooths the subsequent Bayes algorithm without appreciably affecting the final surprise.

5. Probability and surprise

Proceeding as above, consecutive notes by consecutive notes, one obtains the progression of the bias probability until the final value, which summarizes the whole musical sequence.

A more easily definable way to measure "surprise" (the opposite of bias) is given by the formula (Ref. i)

Surprise = - Log [2, P (bias)]

depicted by the following histogram for the Euridice melody: Figure 4. Autocorrelation = 0.55, k(lag) = 1

6. Surprise calibration

In order to calibrate the surprise we compare the output of our program in the following midi files:

• a) Euridice, as in Figure 4.
• b) Thirty randomly picked semi notes in a chromatic scale from a range of 2 octaves. Figure 5. Autocorrelation = 0.30, k = 5

• c) Arpeggios, six notes repeated 5 times. Figure 6. Autocorrelation = 0.90, k = 5

• d) Finally, a simple repetition of the same C4 note. Figure 7

7. Calibration

The previous section gives us an indication on how the relative predictability of a melody compared to others.

Surprise Range Summarizing Comment Additional Comment
1 - 5 Strong compliancea No great excitement. Maybe in next line
5 - 10 Compliance Pleasing for a while
10 - 20 Mild surprise Pleasant. Maybe Goldilocks area
20 - 40 Unexpected development May take a while to appreciate
40 + Not a melody in a classical sense Perhaps in atonic music

a) To auditory expectation.

8. Rhythm

In our attempt to analyze the rhythmic patter of a piece, we adopt an algorythm similar to the one used to characterize the repetitive aspect, namely, the autocorrelation method of Section 4. The procedure is the same as the one illustrated in Eq.10, but where the terms ai, now represent the duration of each note.

Consider the simple pattern 2-1-1, repeated twice: The rhythm is represented by the duraton of its notes, the unit duraton being a quaver (eighth note):

{2,1,1,2,1,1}

Given the above sequence , the autocorrelaton coefficients are, again: (12)

Where is the mean of the durations. The rk's can take any value between 0 and 1. 0 implies total lack of rythmic pattern (extreme surprise). 1 implies total correlation when k = 0.

The autocorrelation function as a function of the note separation k is: The above graph correctly interprets the main repetition pattern to be every 3 notes (2-1-1,2-1-1).

Consider now a less obvious rhythmic pattern, as in the "Euridice" example of Section 1: The rhythm is represented by the duration of its notes, the unit duration being a quaver (eighth note):

{1, 1, 3, 1, 3, 1, 1, 1, 4, 1, 1, 1, 2, 1, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2, 5}(11) 9. Chords and Melody Extraction

The formalism described above for melody and rhythm evaluation assumes that they can be represented as a sequel of non-overlapping single notes. This condition severely limits the choice of midi files that can be downloaded and utilized.

In order to overcome this limitation, a multi-instrument, multi-chord midi files is manipulated before it enters the melody and rhythm evaluation. In the case where the file contains multiple instruments, the user is presented with the list of instruments and is required to chose one that drives a melody. Once the instrument is chosen, say, a piano, it’s likely that file will contain chords (poliphonic melody). If this is the case, an algorithm extract the melody in a file that contains only non-overlapping single notes (monophonic melody).

The extraction of of monophonic melodies from a midi files is a well studied field in "melody information"v. We have chosen to use an algorithm called "Skyline"vi. According to this publication, it consists of the following steps:

• 1) Each note in the poliphonic MIDI file are ordered in increasing order according to the notes onset times.
• 2) When notes overlap, the one with the highest pitch (frequency) is chosen and the others are eliminated.
• 3) When a new note occurs with a difference onset time, duration of the existing note will be shortened.

The following figures illustrated the process. Horizontal axis: pitch onset and offset times and durations. Vertical axis: pitch frequency (arbitrary units). Before melody extraction: After melody extraction (Music Mathematics): Final note — In order to make this web-based system robust, the analysis described above (Bayes, rhythm, melody extraction) is limited to 100 notes per track. If a users downloads a longer track, Music Mathematics considers only the first 100 notes. These are normally sufficient to assess a melodic line. In the more complex pieces where they are not, it is left to the users to select the proper 100 notes interval. There are several free web-based program that can do that.

1. Loy, Gareth. Musimathics: The Mathematical Foundations of Music. Cambridge: MIT, 2006.
2. Lopresto, Michael. "Experimenting with consonance and dissonance." Physics Education 44, no. 145 (2009)
10.1088/0031-9120/44/2/005.
3. Temperly, David, Music and Probability, MIT Press, 2010.
4. Solanas, Manolov, Sierra "Lag-one autocorrelation in short series: Estimation and hypotheses testing" Psicologica 31. 357-381.
5. C. Isikhan and G. Ozcan, "A Survey of Melody Extraction Techniques for Music Information Retrieval", Proceedings of the fourth Conference on Interdisciplinary Musicology (CIM08), Thessaloniki, Greece, 3-6 July 2008, http//web.auth.gr/cim08/.
6. G. Ozcam, C. Isikhan, A. Alpkocak, "Melody Extraction on MIDI Music Files", Proceedings of the Seventh IEEE International Symposium on Multimedia (ISM'05), Irvine, CA, USA, 14 Dec., 2005.