Untangling Phase and Time in Monophonic Sounds

doi:10.4236/jsip.2010.11001

Paper Menu >>

Journal Menu >>

Journal of Signal and Information Processing, 2010, 1, 1-17

doi:10.4236/jsip.2010.11001 Published Online November 2010 (http://www.SciRP.org/journal/jsip)

Untangling Phase and Time in Monophonic

Sounds

Henning Thielemann

Institut für Informatik, Martin-Luther-Universität Halle-Wittenberg, Halle, Germany.

Email: henning.thielemann@informatik.uni-halle.de

Received September 26th, 2010; revised November 11th, 2010; accepted November 15th, 2010.

ABSTRACT

We are looking for a mathematical model of monophonic sounds with independent time and phase dimensions. With

such a model we can resynthesise a sound with arbitrarily modulated frequency and progress of the timbre. We propose

such a model and show that it exactly fulfils some natural properties, like a kind of timeinvariance, robustness against

non-harmonic frequencies, envelope preservation, and inclusion of plain resampling as a special case. The resulting

algorithm is efficient and allows to process data in a streaming manner with phase and shape modulation at sample

rate, what we demonstrate with an implementation in the functional language Haskell. It allows a wide range of appli-

cations, namely pitch shifting and time scaling, creative FM synthesis effects, compression of monophonic sounds, ge-

nerating loops for sampled sounds, synthesise sounds similar to wavetable synthesis, or making ultrasound audible.

Keyword s: Pitch Shifting, Time Stretching, Wave Table Synthesis

1. Introduction

An example of our problem is illustrated in Figure 1.

Given is a signal of a monophonic sound of a known con-

stant pitch. We want to alter its pitch and the progression

of its waveshape independently, possibly time-dependent,

possibly rapidly. The sound must not contain noise por-

tions such as speech does. We also do not try to preserve

formants, that is, like in resampling, we accept that the

spectrum of harmonics is stretched by the same factor as

the base frequency. E.g. a square waveform shall remain

squarem and so on. For some natural instruments this is

appropriate (e.g. guitar, piano) whereas for other natural

sounds this is inappropriate (e.g. speech).

With the paper we like to contribute the following:

1) In Subs ection 2.1 we specify our problem. In Subsec-

tion 2.2 we propose a mathematical model for monophonic

sounds given as real functions. This model untangles

phase and time and allows us to describe frequency mod-

ulation and waveshape control. In Subsection 2.3 we show

how we utilize this model for phase and time modification

and we formulate natural properties of this process.

2) Section 3 is dedicated to theoretical details. To this

end we introduce some notations and definitions in Sub-

section 3.1 and Subsectio n 3.2. We investigate the prop-

erties (Subsectio n 3.3.7), and we prove that our model

Figure 1. A typical use case of our method: From the above

signal of a single tone we want to compute the signal below.

That is, we want to alter the pitch while maintaining the

progression of its waveshape and without knowing, how the

signal was generated.

satisfies these properties exactly. That is, our method is

altogether theoretically sound. (I could not resist that

pun!)

3) The problems of handling discrete signals are treated

in Section 4, including notes on the implementation in the

purely functional programming language Haskell.

4) We suggest a range of applications of our method in

Section 5.

5) In Section 6 you find a survey of related work and

Untangling Phase and Time in Monophonic Sounds

in Section 7 we compare some results of our method with

the ones produced by the similar wavetable synthesis.

6) We finish our paper in Section 8 with a list of issues

that we still need to work on.

2. Continuous Signals: Overview

2.1. Problem

If we want to transpose a monophonic sound, we could

just play it faster for higher pitch or slower for lower

pitch. This is how resampling works. But this way the

sound becomes also shorter or longer. For some instru-

ments like guitars this is natural, but for other sounds like

that of a brass, it is not necessarily so. The problem we

face is that with ongoing time both the waveform and the

phase within the waveform change. Thus we can hardly

say what the waveshape at a precise time point is.

If we could untangle phase and shape this would open

a wide range of applications. We could independently

control progress of phase (i.e. frequency) and progress of

the waveshape.

2.2. Model

The wish for untangled phase and shape leads us straight

forward to the model we want to propose here. If phase

and shape shall be independent variables of a signal, then

our signal is actually a two-dimensional function, map-

ping from phase and shape to the (particle) displacement.

Since the phase

is a cyclic quantity, the domain of

the signal function is actually a cylinder. For simplicity

we will identify the time point t in a signal with the shape

parameter. That is, in our model the time points to the

insta nt a ne o u s s hape.

However, we never get signals in terms of a function

on a cylinder. So, how is this model related to real-word

oned imensional audio signals? According to Figure 2 the

easy direction is to get from the cylinder to the plain au-

dio signal: We move along the cylinder while increasing

both the phase and shape parameter proportionally to the

time in the audio signal. This yields a helical path. The

phase to time ratio is the frequency, the shape to time

ratio is the speed of shape progression. The higher the

ratio of frequency to shape progression, the more dense

the helix. For constant ratio the frequency is proportional

to the speed with which we go along the helix. We can

change phase and shape non-proportionally to the time,

yielding non-helical paths.

When going from the one-dimensional signal to the

twodimensional signal, there is a lot of freedom of inter-

pretation. We will use this freedom to make the theory as

simple as possible. E.g. we will assume, that the one-

dimensional input signal is an observation of the cylin-

drical function at a helical path. Since we have no data

for the function values beside the helix, we have to guess

them, in other words, we will interpolate.

This is actually a nice model that allows us to perform

many operations in an intuitive way and thus it might be

of interest beyon d pitch shifting and time scaling.

2.3. Interpolation Principle

An application of our model will firstly cover the cylind-

er with data that is interpolated from a one-dimensional

signal

by an operator

and secondly it will choose

some data along a curve around that cylinder by an oper-

ator S. The operator that we will work with here has the

structure

()() ()

Fx txktk

ϕϕ κϕ

∈

=+ ⋅−−

∑



where

is an interpolation kernel such as a hat func-

tion or a sinus cardinalis (sinc). Intuitively spoken, it lays

the signal on a helix on the cylinder. Then on each line

parallel to the time axis there are equidistant discrete data

points. Now,

interpolates them along the time direc-

tion using the interpolation kernel

. You may check

that

( )

,Fx t

has period 1 with respect to

. This is

our way to represent the radian coordinate of the cylinder

within this section.

The observation operator

shall sample along a he-

lix with time progression v and angular speed

( )()

.Sy tyv tt

=⋅⋅ ⋅

Interpolation and observation together, yield

( )()( )

() ()

( )

Mx tSFxt

x tkkvtk

αα

∈

=⋅+ ⋅−⋅−

∑



This operator turns out to have some useful properties:

1) Time-i nva r iance

In audio signals often the absolute time is not i mpor-

tant,but the time differences. Where you start an audio

recording should not have substantial effects on an oper-

ation you apply to it. This is equivalent to the statement,

that a delay of the signal shall be mapped to a delayed

result signal. In particular it would be nice to have the

property, that a delay of the input by

vt⋅

yields a delay

by t of the output. However this will not work. To this

end consider pure time-stretchin g

( )

applied to

grains, and we become aware that this property implies

plain resampling, which clearly changes the pitch. What

we have at least, is a restricted time invariance: You have

a discrete set of pairs of delays of input and output signal

that are mapped to each other wherever the helices in

Figure 2 cross, that is wherever

( )

− ⋅∈

However, the construction

of our model is time

invari a nt in the sense:

Untangling Phase and Time in Monophonic Sounds

( )()

xtxt x

Fx tFxt

ϕτϕ τ

= −

⇒= −−

(1)

2) Linearity

Since both

and

are linear, our phase and time

modification process is linear as well. This means that

physical units and overall magnitudes of signal values

are irrelevant (homogeneity) and mixing before interpo-

lation is equivalent to mixing after interpolation (additiv-

ity).

( )

Homogeneity M xMx

λλ

⋅=⋅

(2)

( )

AdditivityM xzMxMz+=+ (3)

3) Resampling as special case

We think, that pitch shifting and time scaling by factor

1 should leave the input signal unchanged. We also think,

that resampling is the most natural answer to pitch shift-

ing and time scaling by the same factor

For in-

terpolating kernels, that is

(){ }()

01,\ 0 :0k jkj= ∀∈= this actually holds.

( )()

Mx txv t= ⋅

4) Mapping of sine waves

Our phase and time manipulation method maps sine

wav es to sine waves if the kernel is the sinus cardinalis

normalized to integral zeros.

( )( )

1 :0

sin : otherwise

kt t





=⋅



⋅



Choosing this kernel means WHITTAKER interpola-

tion. Now we consider a complex wave of frequency

as input for the phase and time modification.

( )()

exp 2

xt iat

a bn

= ⋅⋅

= +

∈



∈−







(4)

( )()

( )

Mexp 2xtibv + nat

=⋅⋅⋅⋅

(5)

Note that for

frac a =

, the WHITTAKER interpo-

lation will diverge.

If 0b=

, that is the input frequency

is integral, then the time progression has no influence

on the frequency mapping, i.e. the input freq uenc y

mapped to

⋅

. We should try to fit the input signal as

good as possible to base frequency 1 by stretching or

shri nk in g, since then all harmonics have integral fre-

quency.

The fact, that sine waves are mapped to sine waves, im-

Figure 2. The cylinder we map the input signal onto (black

and dashed helix) and where we sample the output signal

from (grey).

plies, that the effect of

to a more complex tone can

be described entirely in frequency domain. An example

of a pure pitch shift is depicted in Figure 3. The peaks

correspond to the harmonics of the sound. We see that

the peaks are only shifted. That is, the shape and width of

each peak is maintained, meaning that the envelope of

each harmonic is the same after pitch shifting.

5). Preservation of envelope

Consider a static wave x, i. e.

( )()

1txt xt∀=+

,that is

amplified according to an envelope f. If interpolation

with k is able to reconstruct f and all of its translates from

their respective integral values, then on the cylinder wave

and envelope become separated

()( )()

,Fx tftx

ϕϕ

= ⋅

and the overall phase and time manipulation algorithm

modifies frequency and time separately:

( )()()

Mx tfv txt

=⋅⋅ ⋅

Examples for

and f are:

being the sinus cardinalis as defined in item 4 and

f being a signal bandlimited to

( )

,−

(

]

= 1,0

κχ

−

and

being constant,

( )()

= max0,1tt

−

and

being a linear func-

tion,

being an interpolation kernel, that preserves

pol ynomial func tions up to degree n and f being such a

polynomial function.

Figure 3. The first graph presents the lower part of the ab-

solute spectrum of a piano sound. Its pitch is shifted 2 oc-

taves down (factor 4) in the second graph.

Untangling Phase and Time in Monophonic Sounds

being an interpolation kernel, that preserves

polynomial functions up to degree

and

being

such a polynomial function.

3. Continuous Signals: Theory

In this section we want to give proofs of the statements

found in Section 2 and we want to check what we could

have done alternatively given the properties that we

found to be useful. You can safely skip the entire section

if you are only interested in practical results and applica-

tions.

3.1. Notation

In order to give precise, concise, even intuitive proofs,

we want to introduce some notations.

In signal processing literature we find often a term like

( )

being called a signal, although from the context

you derive, that actually x is the signal and thus

( )

denotes a displacement value of that signal at time t. We

like to be more strict in our paper. We like to talk about

signals as objects without always going down to the level

of single signal values. Our notation should reflect this

and should clearly differentiate between signals and sig-

nal values. This way, we can e.g. express a statement like

“delay and convolution commute” by

()( )

=t xyxty∗∗

(cf. (22)) which would be more difficult in a pointwise

and correct (!) notation.

This notation is inspired by functional programming,

whe re functions that process functions are called high-

er-order functions. It allows us to translate the theory

described here almost literally to functional programs

and theorem prover modules. Actually some of the theo-

rems stated in this paper have been verified using PVS

[1]. For a more detailed discussion of the notation, see

[2].

In our notation function application has always higher

precedence than infix operators. Thus

Qx t→

means

( )

Qx t→

and not

( )

Qx t→

. Function application is

left associative, that is,

( )

Qx t

means

()( )

Qx t

and not

( )

Qxt

. This is also the convention in Functional

Analysis. We use anonymous functions, also known as

lambda expressions. The expression

xY

denotes a

function

where

( )

=xf xY∀

and

is an ex-

pression that usually contains

. Arithmetic infix op er-

ators like “

” and “

⋅

” shall have higher precedence than

the mapping arrow, and logical infix operators like “

”

and “

∧

” shall have lower precedence. That is,

( )

=tf tf

ττ

−

means

( )()

( )

=tf tgtfg

τ ττ

−+ −+

1) Definitio n (Function set). With

AB→

we like to denote the set of all functions mapping from

set

to set

. This operation is treated right associa-

tive, that is,

ABC→→

means

()A BC

→→, not

()ABC→→

. This convention matches the convention

of left associative function application.

3.2. Basic functions

For the description of the cylinder we first need the no-

tion of a cyclic quantity.

2) Definition (Cyclic quantity). Intuitively spoken,

cyclic (or periodic) quantities are values in the range

[

)

0,1

that wrap around at the boundaries. More precisely,

a cyclic quantity

is a set of real numbers that all have

the same fractional part. Put differently, a periodic quan-

tity is an equivalence class with respect to the relation,

that two numbers are considered equivalent when their

diffe r enc e is integral. In terms of a quotient space this

can concisely be written as

∈



3) Definition (Periodisation). Periodisation c means

mapping a real value to a cyclic quantity, i.e. choosing

the equivalence class belonging to a representative.

( )

{ }

pcp p

qqp

∈→

∀∈=+

= −∈







It holds

( )

0c=

. We define the inverse of

picking a representative from the range

[

)

0;1

( )

[

)

0,1

ϕ ϕϕ

−

∈→

∀∈ ∈∩









In a computer program, we do not encode the elements





by sets of numbers, but instead we store a rep-

resentative between 0 and 1, including 0 and excluding 1.

Then c is just the function, that computes the fractional part,

i.e.

c t = tfloor t.−

A function y on the cylinder is thus from

( )

V×→





, where

denotes a vector space. E.g. for

V=

we have a mono

signal, for

V= ×

we obtain a stereo signal and so on.

The conversion

from the cylinder to an audio sig-

nal is entirely determined by given phase control curve g

and shape control curve h. It consists of picking the val-

ues from the cylinder along the path that corresponds to

these control curves.

Untangling Phase and Time in Monophonic Sounds

( )

,hg

S VV∈ ×→→→







(6)

( )( )( )

( )

Sytyht gt∈

(7)

For the conversion F from a prototype audio signal to

a cylindrical model we have a lot of freedom. In Section

2.3 we ave seen what properties a certain F has, that we

use in our implementation. we will going on to check

what choices for F wehave, given that these properties

hold. For now we will just record, tha t

( )

FV V∈→→ ×→







3.3. Properties

3.3.1. Time-Invariance

4) Def inition (Translation, Rotation). Shifting a signal x

forward or backward in time or rotating a waveform with

respect to its phase shall be expressed by an intuitive

arrow notation that is inspired by [3,4] and was already

successfully applied in [2]:

()( )()

xt xt

ττ

→=−

(8)

()( )()

xt xt

ττ

→=+

(9)

For a cylindrical function we have two directions, one

for rotation and one for translation. We define analo-

gous l y

( )

(

)

( )()

,, ,yt yt

ταϕτϕ α

→ =−−

(10)

( )

(

)

( )()

,, ,ytyt

ταϕτϕ α

→ =++

(11)

The first notion of time-invariance that comes to mind,

can be easily expressed using the arrow notation by

()( )

( )

,0t FxtFxtc∀→= →

. However, this will not

yield any useful conversion. Shifting the time always

includes shifting the phase and our notion of

time-invariance must respect that. We have already given

an according definition in (1) that we can now write us-

ing the arrow notation.

5) Definition (Time-invariant cylinder interpolation).

We call an interpolation operator F time-invariant when-

ever it satisfies

()( )

( )

,xtFxtFxtct∀∀→ =→ (12)

Using this definition, we do not only force F to map

translations to translations, but we also fix the factor of

the translation distance to 1. That is, when shifting an

input signal x, the according model Fx is shifted along

the unit helix, that turns once per time difference 1.

Enforcing the time-invariance property restricts our

choice of F considerably.

( )

()( )

( )

, 0,

Fx t

Fx tctct

F xtct

=←−

We see, that actually only a ring slice of

( )

Fx t←

time point zero is required and we can substitute

( )()

0,Ix Fx

ϕϕ

I is an operator from

( )

VV→→ →





, that turns a straight signal into

a waveform. Now we know, that time-invariant interpo-

lations can only be of the form

()()( )

( )

,FxtI xtct

ϕϕ

=→−

(13)

or more concisely

()()( )

,FxtI xtct

ϕϕ

= →→

(14)

The last line can be read as: In order to obtain a ring

slice of the cylin drical model at time t, we have to move

the signal, such that time point t becomes point 0, then

apply I to get a waveform on a ring, then rotate back that

ring correspondingly.

We may check, that any F defined this way is indeed

timeinvariant in the sense of (12).

()( )

( )

()()

( )

Fx xt

Ix tc

Ixtc

I xttctct

Fxtc t

Fxt ct

τϕ

τϕ τ

τ ϕτ

τ τττ

τϕ

→

=→← −

= ←−−

=←−−−− −

= −−

= →

3.3.2. Linearity

We like that our phase and time modification process is

linear (as in (2) and (3)). Since sampling S from the cy-

linder is linear, the interpolation F to the cylinder must

be linear as well.

( )

Homogeneity

Additivity

F xFx

F xzFxFz

λλ

⋅=⋅

+= +

The properties of F are equivalent to

( )

I xIx

I xzIxIz

λλ

⋅=⋅

+=+

3.3.3. Static wave preservation

Another natural property is, that an input signal consist-

ing of a wave of constant shape is mapped to the cylinder

where each ring contains that waveform. A static wave-

form can be written concisely as

wc

. It denotes the

function composition of

and

, that is,

is applied

Untangling Phase and Time in Monophonic Sounds

to the result of

, for example

()( )( )

( )

2.3 =0.3w cwc

Thus

and

wc

both represent periodic functions,

but

has domain





and thus is periodic by its

type, whereas

wc

is an ordinary real function, that

happens to satisfy the periodicity property

()=1()

wc wc



. We can write our requirement as

()()( )

,= .tFwc tw

φ ϕϕ

∀∀ 

As an example we have a constant interpolation

( )

()( )

( )

Ix xc

Fxtxt cct

ϕϕ

−

=+−



We illustrate the constant interpolation in Figure 4,

but with a sine wave, that does not have frequency 1, and

thus looks for the interppoow it preserves static waves.

We can consider an input signal of the form

wc

a wave with constant envelope and we will generalise

this to other envelopes in Sub s ection 3.3.6.

3.3.4. Mapping of Pure Sine Waves

We like to derive, how frequencies are mapped when

converting from an audio signal to the cylindrical model

and observing the signal along a different but uniform

helix. To this end, we need an interpolation that maps

sine waves to sine waves. Actually, the Whittaker inter-

polation has this property.

( )

sin

sin 1lim

1 :0

sin : otherwise

τπ

→

⋅

=⋅





=⋅



⋅



Figure 4. Constant interpolation (below) of a sine wave

(above) that is out of sync. The interpolation picture repre-

sents the surface

( )

,1yt

= −

and a white dot repre- sents

1. The sine wave can be found in the interpolation image at

the right border of each of the skew stripes. Along the ver-

tical line from bottom to top you find the first period of the

input signal, where “first” is measured from time point 0.

()( )()

,sin 1Fx txct

τϕ

ϕτ τ

∈

= ⋅−

∑

(15)

Since

∈



, when

τϕ

∈

then

assumes all

values that differ from

( )

− by an integer. The infi-

nite sum

( )

τϕ

∈

∑

shall be understood as

[ ]

( )

lim

nnn

τϕ

→∞ ∈−

∑

The proof of

being time-invariant according to

time-invariance is deferred to kernel-interpola tion-time -

invariant, where we perform the proof for any interpo-

lating kernel, not just

1sinc

We will now demonstrate, that

1sinc

-interpolation

preserves sine waves and how frequencies are mapped.

Mapping a complex sine wave to the cylinder Since

exponential laws are much easier to cope with than addi-

tion theorems for sine and cosine, we use a complex

wave defined by

( )

1= exp2.cis tit

⋅

For the following derivation we need the Whittaker-

Shannon interpolation formula [5] in the form

()

b∀∈−

()( )()( )

( )

11 =11

cis bksinc tkcis btsine tk

cisbt

∈

⋅ ⋅−⋅⋅−

= ⋅

∑



(16)

We choose a complex wave of frequency

as input

for the conversion to the cylinder. The fractional fre-

quency part b and the integral frequency

are cho-

sen as in decompose-freque ncy

are chosen as in (4)

( )()

( )

x tcisat

with abn

= ⋅

= +

∈

∈−



This choice implies the following interpolation result

( )()()

( )()()()

( )()()

( )()

( )

()( )

( )

,1sin 1

,11sin1

because

11sin1

Fx tcisact

Fxtcis acis aktk

cisacisb ktk

cis acis bt

cisbtn

Fx tcisb tnc

ϕ ττ

τϕ

ϕτ τ

ττ

ϕϕ

∈

−

=⋅⋅ −

∀∈

=⋅ ⋅⋅⋅−−

−∈

=⋅ ⋅⋅ ⋅−−

=⋅⋅⋅−

=⋅+ ⋅

∑



(17)

The result can be viewed in Figure 5. We obtain, that

for every t the function on a ring slice

( )

,Fx t

ϕϕ



is a

Untangling Phase and Time in Monophonic Sounds

Figure 5. The sine wave as in Figure 4 is interpolated by

WHITTAKER interpolation. Along the diagonal lines you

find the original sine wave.

sine wave with the integral frequency n that is closest to

a. That is, the closer a is to an integer, the more harmon-

ics of a non-sine wave are mapped to corresponding

harmonics in a ring slice of

Mapping a complex wave from the cylinder to an

audio signal For time progression speed v and frequency

we get

( )()

( )

because

ztFxvt cat

cis bvtnccat

cis bvtnat

cisbvnat

τ ττ

−

=⋅⋅

=⋅⋅+⋅⋅

∀∈ −∈

=⋅⋅+ ⋅ ⋅

=⋅+⋅ ⋅



This proves (5).

3.3.5. Interpolation using kernels

Actually, for the two-dimensional interpolation F we can

use any interpolation kernel

, not only sinc1 as in (15).

()( )()

,Fx txt

τϕ

ϕτκ τ

∈

= ⋅−

∑

(18)

The constant interpolation corresponds to

(

]

1,0

κχ

−

Linear interpolation is achieved using a hat function. 6

Lemma (Time invariance of kernel interpolation). The

operator F defined with an interpolation kernel as in (18)

is time-invariant according to Definition 5.

Proof.

( )()( )()()

( )()( )

( )

Fxdtxd t

x dtdd

x td

Fxdcdt

τϕ

ϕτκ τ

τκ τ

∈

∈−

→= →⋅−

=−⋅ −−−

=⋅−−

= →

∑

Conversely, we like to note, that kernel interpolation is

not the most general form when we only require time-

invariance, linearity and static wave preservation.

The following considerations are simplified by rewrit-

ing general kernel interpolation to a more functional style

using a discretisation operator and a mixed discrete/con-

tinuous convolution.

7) Definition (Quantisation). With quantisation we

mean the operation that picks the signal values at integral

time points from a continuous signal.

() ()

( )()

QV V

nQxn xn

∈→→→

∀∈ =





(19)

Here is, how quantisation operates on pointwise mul-

tiplied signals and on periodic signals:

( )

QxzQx Qz⋅= ⋅

(20)

()( )()

( )

0nQw cnwc∀∈ =

(21)

8) Definition (Mixed Convolution). For

uV∈→

and

x∈→

then mixed discrete/continuous convo-

lution is defined by

()()()()

u x tukxtk

∈

∗ =⋅−

∑



We can express mixed convolution also by purely dis-

crete convolutions:

( )

Qu xtuQxt∗←=∗ ←

It holds

( )()

uxt uxt∗→=∗→

(22)

because translation can be written as convolution with a

translated DIRAC impulse and convolution is associative

in this case (and generally when infinity does not cause

problems). Thus we will omit the parentheses. We like to

note, that this example demonstrates the usefulness of the

functional notation, since without it even a simple state-

ment like (22) is hard to formulate in a correct and un-

ambiguous way.

These notions allow us to rewrite kernel interpolation

(18):

()()()

( )

( )()

Fx tx ktk

tFx tQx

τϕ ϕτκτ

τϕϕτ κτ

∈

∀∈=+⋅−+

∀∈=←∗→

∑





(23)

The last line can be read as follows: The signal on the

cylin der along a line parallel to the time axis can be ob-

tained by taking discrete points of x and interpolate them

using the kernel

3.3.6. Envelope preservation

We can now generalise the preservation of static waves

from Subsec tio n 3.3.3 to envelopes different from a con-

stant function.

9) Lemma. Given an envelope f from

→

and an

interpolatio n kernel

that preserves any translated

version of f, i.e.

Untangling Phase and Time in Monophonic Sounds

( )

,tQft ft

∀←∗= ←

(24)

then and only then, a wave of constant shape w enve-

loped by f is converted to constant waveshapes on the

cylinder rings enveloped by f in time direction:

( )

()( )()

,F fwctftw

ϕϕ

⋅=⋅

(25)

Proof.

( )

()()

( )

( )()

,tFfwct

Q f wc

Qfw c

w Qf

τϕ ϕ

τκτ

τϕ κτ

ϕτκ τ

∀∈ ⋅

=⋅← ∗→

=← ⋅←∗→

=⋅← ∗→





Now the implication (24)

⇒

(25) should be obvious,

whereas the conve r se (25)

⇒

(24) can be verified by

setting

( )

ϕϕ

∀=

.This special case means that the

envelope f used as input signal is preserved in the sense

()( )

,Fftft

10) Corollary When we convert back to a

one-dimensional audio signal under the condition (24),

then the time control only affects the envelope and the

phase control only affects the pitch:

( )

()()

,hg

SFfwcf hwg⋅=⋅ 

3.3.7. Special Cases

As stated in item 3 of Section 2.3 we like to have resam-

pling as special case of our phase and time manipulation

algorithm. It turns out, that this property is equivalent to

putting the input signal x on the diagonal lines as in Fig-

ure 4 and Figure 5. We will derive, what this imposes on

the choice of the kernel

when F is defined via a ker-

nel as in (23).

11) Lemma. For F defined by

( )()

,tFx tQx

τϕϕτ κτ

∀∈=←∗ →

it holds

()( )

( )

,xtxtFxtct∀∀∈ =

(26)

if and only if

κδ

that is,

is a so called interpolating kernel.

Here,

is the discrete DIRAC impulse, that is

( )

1 :0

kkotherwise



∀∈ =







Proof. “

⇒

”

( )( )

( )

()

,x txtFxtct

Qx ttt

∀∀∈ =

= ←∗→



consider only

t∈

and rename it to k

( )()

( )

()( )

( )

x kxkQxkkk

Qx k

xQxQ Qx

QxQdiscrete convolution

∀∀ ∈=←∗→

= ∗

∀=∗

= ∗



“

⇐

”

Conver sely, every interpolating kernel _ asserts (26):

( )()

( )

()()

( )

()( )

( )

x kxkQxkkk

Qx t

QQx t

Qxt Q

Qx t

∀∀ ∈=→∗→

= ←∗

= ←



Now, when our conversion from the cylinder to the

oned imensional signal does only walk along the unit he-

lix, we get general time warping as special case of our

method:

()( )()

( )

hc h

SFxtFxh tc h t

t xht







For

idh=

we get the identity mapping, for

( )

ht vt=⋅ we get resampling by speed factor v.

4. Discrete Signals

For the application of our method to sampled signals we

could interpolate a discrete signal u containing a wave

with period T, thus getting a continuous signal x with

( )

x un=

and proceed with the technique for con-

tinuous signals from Section 2. However, when working

out the interpolation this yields a skew grid with two al-

ternating cell heights and a doubled number of parallelo-

gra m cells, which seems to be unnatural to us. Additional-

ly it would require three distinct interpolations, e.g. two

distinct interpolations in the unit helix direction and one

interpolation in time direction. Instead we want to propose

a periodic scheme where we need two interpolations with

the same parameters in unit helix (“step”) direction and

one interpolation in the skew “leap” direction. This in-

Untangling Phase and Time in Monophonic Sounds

terpolation scheme is also time-invariant in the sense of

item 1 in Subsection 2.3 and Definition 5 when we re-

strict the translation distances to multiples of the sam-

pling period.

The proposed scheme is shown in Figure 6. We have a

skew coordinate system with steps s and leaps l. We see,

that this scheme can cope with non-integral wave periods,

that is, T can be a fraction (in Figure 6 we have

Whenever the wave period is integral, the leap direction

coincides with the time direction. The grid nicely matches

the periodic nature of the phase. The cyclic phase yields

ambiguities, e.g. a leap could also go to where

l′

placed, since this denotes the same signal value. We will

later see, that this ambiguity is only temporary and will

vanish at the end (29). Thus we use the unique represent-

ative

( )

−

. To get

( )

,ls fro m

( )

,tc

−

have to convert the coordinate systems, i.e. we have to

solve the simultaneous linear equations

( )

roundT l

roundT Ts

−



 

⋅ ⋅=



 

−

 

where round is any rounding function we like. E.g. in

Figure 6 it is round T = 4. Its solution is

( )

ltc

st TlroundT

−

= −

=⋅ −⋅ (27)

Using the interpolated input x we may interpolate y li-

nearly

( )()( )

( )

round

frac

, lerp,frac

t roundT

rls

lerp

ytxxl

ηξλ ξληξ

λλ λ

=⋅+





−=+⋅ −

=− 



(28)

or more detailed

( )

( )()()

round

lerp,1 frac

lerpunround,round1

fracs

,lerp ,

nlT+ s

aun uns

bT unT

yta bfracl

= ⋅

 

 

= +

= +++

Actually, we do not even need to compute s since by

expa ns io n of s the formula for r can be simplified and it

is frac s = frac r. From l we actually only need frac l.

This proves, that every representative of

could be

used in (27).

frac roundr = tT l T⋅− ⋅

(29)

Figure 6. Mapping of the sampled values to the cylinder in

our method. The variables s and l are coordinates in the

skew coordinate system.

( )()

( )

() ()

( )

lerp

lerpround round1

n = r

a = un,un + 1frac r

b = un + T, un + T + frac r





4.1. General Interpolations

Other interpolations than the linear one use the same

computatio ns to get frac l and r, but they access more

values in the environment of n, i.e.

( )

roundunj kT++⋅

for some j and k. E.g. for linear

interpolation in the step direction and cubic interpolation

in the leap direction, it is

{ }{}

0,1,1, 0,1, 2jk∈ ∈−

4.2. Coping with Boundaries

So far we have considered only signals that are infinite in

both time directions. When switching to signals with

finite time domain we become aware that our method

consumes more data than it produces at the boundaries.

This is however true for all interpolation methods.

We start considering linear interpolation: In order to

have a value for any phase at a given time, a complete

vertical bar must be covered by interpolation cells. That

happens the first time at time point 1. The same consid-

eration is true for the end of the signal. That is, our me-

thod always reduces the signal by two waves. Analo-

gously, for k node interpolation in leap direction we lose

k waves by pitch shifting.

If we would use extrapolation at the boundaries, then

for the same time but different phases we would some-

times have to interpolate and sometimes we would

extrapolate. In order to avoid this, we just alter any

[

)

0,1t∈

1t=

and limit t accordingly at the end of

the signal.

4.3. Efficiency

The algorithm for interpolating a value on the cylinder is

actually very efficient. The computation of the interpola-

tion parameters and signal value indices in (29) needs

constant time, and the interpolation is proportional to the

number of nodes in step direction and the number of

nodes in leap direction. Thus for a given interpolation

type, generating an audio signal from the cylinder model

Untangling Phase and Time in Monophonic Sounds

needs time proportional to the signal length and only

const a nt memory additional to the signal storage.

4.4. Implementation

A reference implementation of the developed algorithm

is written in the purely functional programming language

Haskell [6]. The tree of modules is located at http://code.

haskell.org/synthesizer/core/src/. In [7] we have already

shown, how this language fulfils the needs of signal

processing. The absence of side effects makes functional

programming perfect for parallelisation. Recent progress

on parallelisation in Haskell [8] and the now wide avail-

ability of multi-core machines in the consumer market

justifies this choice.

We can generate the cylindrical wave function with

the function Synthesizer.Basic.Wave.sampledTone given

the interpolation in leap direction, the interpolation in

step direction, the wave period of the input signal and the

input signal. The result of this function can then be used

as input for an oscillator that supports parametrised

waveforms, like Synthesizer.Plain.Oscillator.shapeMod.

By the way, this implementation again shows how func-

tional programming with higher order functions supports

modularisation: The shape modulating oscillator can be

used for any other kind of parametrised waveform, e.g.

waveforms given by analytical functions. This way, we

have actually rendered the tones with morphing shape in

the figures of this paper. In an imperative language you

would certainly call the waveform being implemented as

callback function. However due to aggressive inlining

the compiled program does not actually need to callback

the waveform function but the whole oscillator process is

expanded to a single loop.

4.5. Streaming

Due to its lazy nature, Haskell allows simple implemen-

tation of streaming, that is, data is processed as it comes

in, and thus processing consumes only a constant amount

of memory. If we apply our pitch shifting and time stret-

ching algorithm to an ascending sequence of time values,

streaming is possible. This applies, since it is warranted,

that

is not too far away from t. Since

[

)

0,1fracl ∈

it holds

round

tTT



−∈ 





(50)

Thus we can safely move our focus to

roundtT T⋅−

in the discrete input signal u, which is equivalent to a

combined translation and turning of the wave function on

the cylinder.

What makes the implementation complicated is the

handli ng of boundaries. At the beginning we limit the

time parameter as described in Subsection 4.2. How-

ever at the end, we have to make sure that there is

enough data for interpolation. It is not so simple to

limit t to the length of input signal minus size of data

needed for interpolation, since determining the length

of the input signal means reading it until the end. In-

stead when moving the focus, we only move as far as

there is enough data available for interpolation. The

function is implemented by Synthesiz e r.Pl a in. Osc il-

lator.shapeFreqModFromSampledTone.

5. Applications

5.1. Combined Pitch Shifting and Time Scaling

With a frequency control curve f and a shape control g

we get combined pitch shifting and time scaling out of

our model using the conversion

,fg

∫

(see (7)).

5.2. Wavetable synthesis

Our algorithm might be used as alternative to wavetable

synt he si s in sampling synthesisers [9]. For wavetable

synthesis a monophonic sound is reduced to a set of

waveforms that is stored in the synthesiser. On replay the

synthesiser plays those waveforms successively in small

loops, maybe fading from one waveform to the next one.

If we do not reduce the set of waveforms, but just chop

the input signal into wave periods, then apply wavetable

synt he si s with fading between waveforms, we have

something very similar to our method. In Figure 7 we

compare wavetable synthesis and our algorithm using the

introductory example of Figure 1. In this example both

the wavetable synthesis and our method perform equally

well. If not stated otherwise, in this and all other figures

we use linear interpolation. This minimises artifacts from

boundary handling and the results are good enough.

5.3. Compression

Wavetable synthesis can be viewed as a compression

scheme: Sounds are saved in the compressed form of a

few waves in the wavetable synthesiser and are decom-

pressed in realtime when playing the sound. Analogously

we can employ our method for compression of mono-

phonic sounds. For compression we simply shrink the

time scale and for decompression we stretch it by the

reciprocal factor. An example is given in Figure 8.

The shrinking factor, and thus the compression factor,

is limited by non-harmonic frequencies. These are al-

ways present in order to generate envelopes or phasing

effects. Consider the frequency a that is decomposed into

bn+

in (4), no pitch shift, i.e.

, and the shrinking

factor v. According to (5), the frequency

bn+

mapped to

bv n⋅+

In order to be able to decompose

bv n⋅+

into

bv⋅

and

again on decompression, it

Untangling Phase and Time in Monophonic Sounds

Figure 7. Pitch shifting performed on the signal of Figure 1

using linear interpolation in both directions. Above is the

result of wavetable synthesis, below is the result of our me-

thod.

Figure 8. We show how a piano sound is altered by com-

pressio n and decompression. The top-most graph is the

original sound. The graphs below are the results of com-

pression and decompression with cubic interpolation by the

associated factors in the left column. Because the interpola-

tion needs a margin at beginning, we have copied the first

two periods when compressing and decompressing.

must be

( )

,bv⋅ ∈−

. This implies, that if b is the

maximum absolute deviation from an integral frequency,

that you want to be able to reconstruct, then it must be

⋅

The mapping of frequencies can be best visualised using

the frequency spectrum as in Figure 9. Note how the

peaks become wider by the compression factor while

their shape is maintained. The resolution is divided by

the compression factor, and this is why the compressed-

data actually consumes less space. The shape of a peak

expresses the envelope of the according harmonic and

widening it, means a time shrunken envelope.

If we compress too much, then peaks will overlap and

we get aliasing effects on decompression. Aliasing can

be suppressed by smoothing across the same phase of all

waves. That is, for the monophonic sound x with period

T and a smoothing filter window w, we should compress

( )

roundxw T∗↑

instead of x. We use the up arrow for

the upsampling operator where

 

/:0 mod

,0:0mod

wk c

kcw ckc







 











Actually, we could use the frequency spectrum not only

for visualising the compression (or pitch-shifting), but we

could also use the frequency spectrum itself for compres-

sion. The advantages would be simpler anti-aliasing (we

would just throw away values outside bands around the

harmonics) and we could also strip high harmonics, once

they fall below a given threshold. The advantage of com-

puting in the time-domain is, that it consumes only linear

time with respect to the signal length, not liear-logarithmic

time like the FOURIER transform, that it can be applied in

a streaming way and allows to adapt the compression fac-

tor to local characteristics of a sound. For instance, you

may use a shrinking factor close to 1 for fast varying por-

tions of the signal and use a larger shrinking factor on

slowly modulated portions.

Figure 9. The first graph presents the lower part of the ab-

solute spectrum of a piano sound. This is then compressed

Untangling Phase and Time in Monophonic Sounds

by a factor 4 in the second graph.

5.4. Loop Sampled Sounds

Another way to save memory in sampling synthesisers is

to loop sounds. This is especially important in order to get

infinite sounds like string sounds out of a finite storage.

Looping means to repeat portions of a sampled sound. The

problem is to find positions of matching sound characteris-

tics: A loop that causes a jump or an abrupt change of the

waveform is a nasty audible artifact. Especially in samples

of natural sounds there might be no such matching posi-

tions, at all. Then the question is, whether the sample can

be modified in a way that preserves the sound but provides

fine loop boundaries. Several solutions using fading or

time reversal have been proposed.

Our method offers a new way: We may move the time

for th and back while keeping pitch constant. In Figure

10 we show two reasonable time control curves. Both

control curves start with exactly reproducing the sampled

sound and then smoothly enter a cycle. Actually, we

copy the first part verbatim instead of running time

stretching with factor 1, since our method cannot repro-

duce the beginning of the sound due to interpolation

margins. The cycle of the first control curve consists of a

sine, that warrants smooth changes of the time line.

However with this control, interferences are prolonged at

the loop boundaries, which is clearly audible. It turns out

that the second control curve, namely the zig-zag curve,

sounds better. It preserves any chorus effect and the

change of the time direction is not as bad as expected.

A nice property of this approach is that the loop dura-

tion is doubled with respect to the actually looped data.

In contrast to that, a loop body generated by simple

cross-fading of parts of the sound, say, with a VON

HANN window, would half the loop body size and

sounds more hectically.

Since the time control affects only the waveform, it is

warranted that at the cycle boundaries of the time control

the waveforms of the time manipulated sound match, too.

In order to assert the also the phases match you have to

choose a time control cycle length that is an integral mul-

tiple of the wave period.

5.5. Making Inaudible Harmonics Audible

Remember, that our model does not preserve formants.

Anothe r application, where this is appropriate, is to

process sounds, where formants are not audible anyway,

namely ultrasound signals. Our method can be used, to

make monophonic ultrasound signals audible by de-

creasing the pitch and while maintaining the length. In

Figure 11 we show an echolocation call of a bat. It is a

chirp from about 35 kHz to 25 kHz sampled at 441 kHz.

The chirp nature does not match the requirements of our

algorithm, so it is not easy to choose a base frequency.

We have chosen 25 kHz and divide the frequency by

factor 5 while maintaining the length. Unfortunatel y the

waves have no special form that we can preserve. So this

examp le might serve a demonstration of the robustness

of our algorithm with respect to non-harmonic frequen-

cies and the preservation of the envelope. In the same

way our method might be used to increase the pitch of

infrasound.

5.6. FM synthesis

Since we can choose the phase parameter per sample, we

can not only do regular pitch shifting, but we can also

apply FM synthesis effects [10]. An FM effect alone

could also be achieved with synchronised time warping,

however with our method we can perform pitch shifting,

time scaling and FM synthesis in one go. See Figure 12

for an example.

5.7. Tone Generation by Time Stretching

The inability to reproduce noise can be used for creative

effects. By time stretching we can get a tone out of every

sound. This is exemplified in Figure 13. If we stretch

time by a factor n for a specific period T (source and tar-

get period shall be equal), then in the spectrum the peak

for each harmonic of frequency

is narrowed by a

factor n.

Figure 10. Two possible time control curves for generating

a loopable portion of a sampled sound.

Untangling Phase and Time in Monophonic Sounds

Figure 11. Echolocation call of Nyctalus noctula. The time

values are seconds.

Figure 12. Above is a sine wave that is distorted by

sgn

v vv

⋅

for p running from

to 4. Below we

applied our pitch shifting algorithm in order to increase the

pitch and change the waveshape by modulating the phase

with a sine wave of the target frequency.

6. Related Work

The idea of separating parameters (here phase and shape)

that are in principle indistinguishable is not new. For

example it is used in [11] for separation of sine waves of

considerably different frequencies. This way a numeri-

cally problematic ordinary differential equation is turned

into a well-behaved partial differential equation.

Also the specific tasks of pitch shifting and time scal-

ing are addressed by a broad range of algorithms [12].

Some of them are intended for application on complex

music signals and are relatively simple, like “Overlap

and Add” (OLA), “Synchronous Overlap and Add”

(SOLA) [13,14], or the three-phase overlap algorithm

using cosine windows presented in [15]. They take seg-

ments of an audio signal as they are, rearrange them and

reduce the artifacts of the new composition. Other me-

thods are based on a model of the sound. E.g.

“pitch-syn chrono us overlap-add” (PSOLA) is roughly

based on the excitation + filter model for speech [16-18],

sinusoidal models interpret sounds as mixture of sine

waves that are modulated in amplitude and frequency

[19], even more sophisticated models treat sounds as mix

of sine waves, transients and a residual [20]. There are

also methods specific to monophonic sig-nals, like wa-

vetable synthesis [9] and advanced methods, that can

cope with frequency modulated input signals [21].

In the following two sections we like to compare our

met ho d with the two methods that are most similar to the

one we introduced here, namely with wavetable synthesis

and PSOLA.

6.1. Comparison with Wavetable Synthesis

When we chop our input signal into wave periods and

use the waves as wavetable, then wavetable synthesis be-

comes rather similar to our method [9]. Wavetable syn-

thesis also preserves waveforms, rather than formants, it

allows frequency and shape modulation at sample rate.

However, due to the treatment of waveforms as discrete

objects, the wavetable synthesis cannot cope well with

non-harmonic frequencies (Figure 16). Thus, in waveta-

ble synthesisers, phasing is usually implemented using

multiple wavetable oscillators. A minor deficiency is,

that fractional periods of the input signal are not sup-

ported. The wavetables always have to have an integral

length. We consider this deficiency to be not so impor-

tant, since when we do not match the wave period exact-

l y, this will appear to the wavetable synthesis algorithm

as a shifting waveform. But that algorithm must handle

var ying waveshapes anyway.

The wavetables in a wavetable synthesiser are usually

created by a more sophisticated preprocessing than just

chopping a signal into pieces of equal length. However,

for comparison purposes we will just use this simple

procedure.

Chopping and subsequent wavetable synthesis can also

be interpreted as placing the sample values on a cylinder

and interpolating between them. It yields the pattern

shown in Figure 14. T he variable s denotes the “step”

direction, which coincides with the direction of the phase

in this scheme. The variable l denotes the “leap” direc-

tion, which coincides with the time direction. In order to

fit the requirement of a wave period of 1 we shrink the

discrete input signal. Say, the discrete input signal is u,

the wave period is T, that must be integral, and the real

input signal is x, that we define at some discrete fraction-

al points by

( )

x un=

and at the other ones by in-

terpolation. In Figure 14 it is

4T=

and for example

( )

1.7, 0.6yc

is located in the rectangle spanned by the

time points 6; 7; 10; 11. For simplicity let us use linear

interpolation as in (28). We would interpolate

() ()

( )

( )()

( )

()()( )

( )

( )()

1.70.6 =

lerplerpu6; u70:4; lerpu10; u110.40.7

In general for

( )

,yt

we get

( )

() ()

( )

()( )()

( )

frac

lerp,1 frac

,lerp,1 frac

rrr r

rxururr

ytx xt

τϕ

ϕ ττ

−

∀∈=−





∀∈ =+





= +





= +



or more detailed

Untangling Phase and Time in Monophonic Sounds

Figure 13. A tone generated from pink noise by time stret-

chi ng. The source and the target period are equal. The time

is stretched by factor 4.

( )

( )()()

( )

() ()()

( )

( )()()

lerp,1fracs

lerp,1 frac

,lerp,frac.

s Tc

n Tts

aunun

bun TunTs

y tabt

−

= ⋅

= ⋅+

 

 

= +

=+ ++

The handling of waveform boundaries points us to a

problem of this method: Also at the waveform boundaries

we interpolate between adjacent values of the input signal

u. That is, we do not wrap around. This way, waveforms

can become discontinuous by interpolation. We could as

well wrap around the indices at waveform boundaries.

This would complicate the computation and raises the

question, what values should naturally be considered

neighbour s. We remember, that we also have the ambi-

guity of phase values in our method. But there, the am-

biguity vanishes in a subsequent step .

6.1.1. Boundaries

If we have an input signal of n wave periods, then we

have only

1n−

sections where we can interpolate li-

nearly. Letting alone that this approach cannot recon-

struct a given signal, it loses one wave at the end for li-

near interpolation. If there is no integral number of

waves, then we may lose up to (but excluding) two

waves. For interpolation between k nodes in time direc-

tion we lose

1k−

waves. Of course, we could extrapo-

late, but this is generally problematic.

That is, the wavetable oscillator cuts away between

one and two waves, whereas our method always reduces

the signal by two waves. Thus the wavetable oscillator is

slightly more economic.

6.2. Comparison with PSOLA

Especially for speech processing, we would have to pre-

serve formants rather than waveshapes. The standard

method for this application is “(Time Domain) Pitch-

Synchronous Overlap/Add” (TDPSOLA) [16,17]. PSO-

LA decomposes a signal into wave atoms that are rear-

ranged and mixed while maintaining their time scale.

The modulation of the timbre and the pitch can only be

done at wave rate. As for wavetable synthesis it is also

Figure 14. Mapping of the sampled values to the cylinder in

the wavetable-oscillator method. The grey numbers are the

time points in the input signal.

true for PSOLA, that due to the discrete handling of

waveforms, non-harmonic frequencies are not handled

well.

Incidentally, time shrinking at constant pitch with our

met ho d is similar to PSOLA of a monophonic sound. For

time shrinking with factor v and interpolating with kernel

our algorithm computes:

( )( )

( )

( )( )

( )

() ()

( )

( )()

with

z ty vtc t

xtkvt tk

xt kvt k

d tdt

κκ

∈

= ⋅

=+⋅⋅− +

=+⋅− ⋅−

↓=⋅

∑



() ()()

( )

zxkk v

∈

=←⋅ →↓−

∑



We see that the interpolation kernel _ acts like the

segment window in PSOLA, but it is applied to different

phases of the waves. For v = 1, only the non-translated x

is passed to the output.

Intuitively we can say, that PSOLA is source oriented

or push-drive n, since it dissects the input signal into

segments independent from what kind of output is re-

quested. Then it co mputes where to put these segments in

the output. In these terms, our method is target oriented

or pull-driven, as it investigates for every output value,

whe re it can get the data for its construction from.

Actually, it would be easy to add another parameter to

PSOLA for time stretching the atoms. This way one

could interpolate between shape preservation and for-

mant preservation.

7. Results and Comparisons

Finally we like to show some more results of our method

and compare them with the wavetable synthesis.

In Figure 15 we show, that signals with band-limited

amplitude modulation can be perfectly reconstructed,

except at the boundaries. Although we do not employ

WHITTAKER interpolation but simple linear interpola-

tion the result is convincing.

Untangling Phase and Time in Monophonic Sounds

In Figure 16 we apply our method to a sine with a

freq uency that is clearly distinct from 1. To a mono-

phonic pitch shifter this looks like a rapidly changing

waveform. As derived for WHITTAKER interpolation in

(17) our method can at least reconstruct the sine shape,

however the frequency of the pitch shifted signal differs

from the intended one. Again, the used linear interpola-

tion does not seem to be substantially worse.

We also like to show how phase modulation at sample

rate can be used for FM synthesis combined with pitch

shifting. In Figure 17 we use a sine wave with changing

distortion as input, whereas in Figure 18 the sine wave is

not distorted, but detuned to frequency 1:2, which must

be treated as changing waveform with respect to fre-

quency 1.

As a kind of counterexample we demonstrate in Fig-

ure 19, how the boundary handling forces our method to

limit the time parameter to values above 1 and thus it

cannot reproduce the beginning of the sound properly.

For completeness we also present the same sound trans-

posed by PSOLA in Figure 20.

Please note that the examples have a small number of

periods (7 to 10) compared to signals of real instruments

(say, 200 to 2000 per second). On the one hand, graphs

Figure 15. Pitch shifting performed on a periodically am-

plitude modulated tone using linear interpolation. The fig-

ures show from top to bottom: The input signal, the signal

recomputed with a different pitch (that is, the ideal result of

a pitch shifter), the result of wavetable oscillating, the result

of our method.

Figure 16. Pitch shifting performed on a sine tone with a

frequency that deviates from the required frequency 1. The

graphs are arranged analogously to Figure 15.

Figure 17. Above is a sine wave that is distorted by

sgn p

v vv⋅ for p running from

to 4. Below we ap-

plied our pitch shifting algorithm in order to increase the

pitch and change the waveshape by modulating the phase

with a sine wave of the target frequency. The graphs are

arranged analogously to Figure 15.

Untangling Phase and Time in Monophonic Sounds

Figure 18. Here we demonstrate FM synthesis where the

carrier sine wave is detuned. The graphs are arranged ana-

logously to Figure 15.

Figure 19. Pitch shifting performed on a percussive tone.

The graphs are arranged analogously to Figure 15.

of real world sounds would not fit on the pages of this

journal at a reasonable resolution. On the other hand,

Figure 20. Pitch shifting with the tone from Figure 19 that

preserves formants performed by PSOLA.

only for those small numbers of periods we get a visib le-

difference between the methods we compare here. How-

ever, if you are going to implement a single tone pitch

shifter from scratch you might prefer our method, be-

cause it handles the corner cases better and the complex-

ity is comparable to that of the wavetable oscillator. Also

for theoretical considerations we recommend our method

since it exposes the nice properties presented in Section

7.1. Conclusions

We shall note that despite the differences between our

method and existing ones, many of the properties dis-

cussed in Section 2.3 hold approximately also for the

existing methods. Thus the worth of our work is certainly

to contribute a model where these properties apply ex-

actly. This should serve a good foundation for further

development of a sound theory of pitch shifting and time

scaling. It also pays off, when it comes to corner cases,

like FM synthesis as extreme pitch shifting.

8. Outlook

In our paper we have omitted how to avoid aliasing ef-

fects in pitch shifting caused by too high harmonics in

the waveforms. In some way we have to band-limit the

waveforms. Again, we should do this without actually

constructing the two-dimensional cylindrical fu nc tion.

When we use interpolation that does not extend the fre-

quency band, that is imposed by the discrete input signal,

then it should be fine to lowpass filter the input signal

before converting to the cylinder. The cut-off frequency

must be dynamically adapted to the frequency modula-

tion used on conversion from the cylinder to the audio

signal.

We could also handle input of varying pitch. We

would then need a function of time describing the fre-

quency modulation which is used to place the signal

nodes at the cylinder. This would be an irregular pattern

and renders the whole theory of Section 3 useless. We

had to choose a generalised 2D interpolation scheme.

9. Acknowledgements

I like to thank Alexander Hinneburg for fruitful discus-

sions and creative suggestions. I also like to acknowledge

Untangling Phase and Time in Monophonic Sounds

Sylvain Marchand and Martin Raspaud for their com-

ments on my idea and their encouragement. Finally I am

grateful to Stuart Parsons, who kindly permitted usage of

his bat recordings in this paper.

REFERENCES

[1] S. Owre, N. Shankar, J. M. Rushby and D. W. J. Stringer-

Calvert, “The Prototype Verification System,” PVS Sys-

tem Guide, 2001.

[2] H. Thielemann, “Optimally Matched Wavelets,” PhD.

Thesis, Universität Bremen, March 2006.

[3] G. Strang, “Eigenvalues of

( )

2H↓

and Convergence of

the Cascade Algorithm,” IEEE Transactions on Signal

Processing , Vol. 44, 1996, pp. 233-238.

[4] I. Daubechies and W. Sweldens, “Factoring Wavelet

Transforms into Lifting Steps,” Journal of Fourier Anal-

ysis and Applications, Vol. 4, No. 3, 1998, pp. 245-267.

[5] R. W. Hamming, “Digital Filters,” Signal Processing

Series, Prentice Hall, Upper Saddle River, January 1989.

[6] S. P. Jones, “Haskell 98 Language and Libraries, the Re-

vised Report,” 1998. http://www.haskell.org/definition/

[7] H. Thielemann, “Audio Processing Using Haskell,”

DAFx: Conference on Digital Audio Effects, G. Evange-

lista and I. Testa, Eds., Federico II University of Naples,

Italy, October 2004, pp. 201-206.

[8] S. P. Jones, R. Leshchinskiy, G. Keller and Manuel M. T.

Chakravarty, “Harnessing the Multicores: Nested Data

Parallelism in Haskell,” IARCS Annual Conference on

Foundations of Software Technology and Theoretical

Computer Science (FSTTCS’08), 2008.

[9] D. C. Massie, “Wavetable Sampling Synthesis,” in Ap-

plications of Digital Signal Processing to Audio and

Acoustics, Mark Kahrs and Karlheinz Brandenburg, Eds.,

pp. 311–341. Kluwer Academic Press, 1998.

[10] J. M. Clowning, “The Synthesis of Complex Audio Spec-

tra by Means of Frequency Modulation,” Journal of the

Audio Engineering Society, Vol. 21, No. 7, 1973, pp.

526-534.

[11] B. Lang, “Einbettungsverfahren Für Netzwerkgleichun-

gen,” Ph.D. Thesis, Universität Bremen, Germany, No-

vember 2002.

[12] U. Zölzer, Ed., “DAFx: Digital Audio Effects,” John

Wiley and Sons Ltd., Hoboken, February 2002.

[13] S. Roucos and A. M. Wilgus, “High Quality Timescale

Modification for Speech,” Proceedings of IEEE Interna-

tional Conference on Acoustics, Speech, and Signal

Processing , 1985, pp. 493-496.

[14] J. Makhoul and A. El-Jaroudi, “Time-Scale Modification

In Medium To Low Rate Speech Coding,” Proceedings of

IEEE International Conference on Acoustics, Speech, and

Signal Processing, 1986, pp. 1705-1708.

[15] S. Disch and U. Zölzer, “Modulation and Delay Line

Based Digital Audio Effects,” Proceedings DAFx-99:

Workshop on Digital Audio Effects, Trondheim, Decem-

ber 1999, pp. 5-8.

[16] C. Hamon, E. Moulines and F. Ch arpenti er, “A Diphone

Synthesis System Based on Time-Domain Prosodic Mod-

ifications of Speech,” Proceedings of IEEE International

Conference on Acoustics, Speech, and Signal Processing,

1989, pp. 238-241.

[17] E. Moulines and F. Charpentier, “Pitch Synchronous

Waveform Processing Techniques for Text to Speech

SynThesis Using Diphones,” Speech communication, Vol.

9, No. 5-6, 1990, pp. 453-467.

[18] S. Lemmetty, “Review of Speech Synthesis Technology,”

M.S. Thesis, Helsinki University of Technology, March

1999.

[19] M. Raspaud and S. Marchand, “Enhanced Resampling for

Sinusoidal Modeling Parameters,” WASPAA’07, 2007.

[20] F. X. Nsabimana and U. Zölzer, “Audio Signal Decom-

position for Pitch and Time Scaling,” ISCCSP 2008,

March 2008.

[21] A. Haghparast, H. Pent tinen and V. Välimäki,

“Real-Time Pitch-Shifting of Musical Signals by a

Timevarying Factor Using Normalized Filtered Correla-

tion Timescale Modification (NFC-TSM),” International

Conference on Digital Audio Effects, September 2007, pp.

7-13.