Journal of Computer and Communications
Vol.05 No.09(2017), Article ID:77423,12 pages

Study of an Interactive System of Sight-Singing and Ear-Training

Qian Lin*, Shan-Ji Chen, Wei Jiang, Guo-Qing Jia

College of Physics and Electronic Information Engineer, Qinghai University for Nationalities, Xining, China

Copyright © 2017 by authors and Scientific Research Publishing Inc.

This work is licensed under the Creative Commons Attribution International License (CC BY 4.0).

Received: November 11, 2016; Accepted: July 3, 2017; Published: July 6, 2017


This paper introduces an interactive system of sight-singing and ear-training which combines the technology of single chip microprocessor (SCM) with learning music. It contains the functions of scanning keyboard, reading the key value, interruption service, reading file and displaying file. In order to strengthen the commonality and practicability of the system, SD card is used to store the materials and the ZNFAT32 file system is transplanted to organize the materials in the form of file directory. This interactive system can recognize and perform music automatically for the users and play significant role in education and entertainment.


Sight-Singing and Ear-Training, Single Chip Microprocessor (SCM), Materials, SD Card, Serial Peripheral Interface (SPI)

1. Introduction

As an enlightened course, sight-singing and ear-training are the important technique trainings for the music major students. It fuses the basic music theory and contents of sight-singing and ear-training. Informally, the sight singing is the skill to sing the score accurately [1] . The ear training is the training for auditory sense [2] , which includes the ability to distinguish the chord or rhythm and take down the sound or tune [3] . The traditional teaching takes the piano as the main part of sight singing and the percussion as the main part of ear training. However, with the popularization of computer and multimedia, the music processing system is increasing popularly [4] . At present, the research mainly focuses on the extraction, recognition and composition of music [5] . Some articles of professional software have been developed for music learners and amateurs. All these have bought some profound effects to the teaching of sight-singing and ear- training [6] . Especially for the professional teaching and examination, electronic music platform is convenient and necessary. For example, the sight-singing and ear-training are the main parts for the competition of the CCTV National Youth Singer every year. However, the operation is usually performed by operating the computer manually.

In order to meet the needs of real-time, convenient and cheap [7] , single chip microprocessor (SCM) has been used in some devices, which can integrate various chips. It is more suitable to control the appliances or devices and work independently. Due to the small size, it is more portable comparing with the computer. So far, no one combines the function of sight-singing and ear-training with the SCM. Therefore, a music processing system based on SCM is introduced in this paper. It is an interaction platform which can display the musical staff of sight singing and play the midi score of ear-training [8] . It is feasible to learn and train for the learners even lacking of computer. Section II gives a brief introduction of the system function. Section III gives the hardware design including the circuits of SD card, memory expansion and video communication. Meanwhile, the software design of Serial Peripheral Interface (SPI) mode for SD card is presented. The results and discussions of the system test and integration are illustrated in section V.

2. System Function

The system frame is shown in Figure 1. The STC single chip microprocessor (SCM) is the core of the whole system, which is responsible for reading SD card, extending memory and communicating with the displayer and speaker. According to the received data, it can choose to play the music or display image [9] .

Taking the course of sight-singing and ear-training as the prototype of materials, including the files of image and MIDI, staffs and chords which are stored

Figure 1. System frame.

in SD card and organized as the ZNFAT32 file system [10] . Furthermore, in order to read and display fast, the memory of 64 kb is extended by connecting the HM62256. Firstly, the user can choose the relevant materials by pressing the keys. Then, the SCM can receive the key value and search the file directory to read the corresponding file from the SD card. After few seconds, the image or MIDI will be shown by the displayer or played [11] . For example, the ear training can be achieved by pressing the key of R1. Then system will read the file from SD card and send the data to audio source. After few seconds, the user can hear the music from the speaker. By the repeated training, the user can be trained to distinguish the chord or rhythm and take down the sound or tune accurately. If you want to do some training of sight sing, it also needs to press the key of S1. Then the system will read the file from the SD card and send the data to displayer. After few seconds, the user can see the image on the screen. By some repeated training, the user can distinguish the chord or rhythm and sing the score accurately.

3. Design Method of the System

3.1. Hardware System Design

The hardware circuit of system is shown in Figure 2. It is based on the SCM and the peripheral circuits include the SD card, memory extension, material selection and display. As Figure 2 shown, the ports of P0 and P2 for SCM are used to connect the memory extended circuit. The P1 port is used to connect the SD card. The P3 port is used to connect the VGA controller and keyboard.

3.1.1. A. Choose of STC Chip

As we all known, STC is the new generation of 8051 SCM with the advantages of the single clock cycle, high speed, low power consumption and strong anti-in- terference [12] . Its instruction is fully compatible with the traditional 8051

Figure 2. System hardware circuit.

and the speed is 8 to 12 times faster than the normal one. With two PWM, eight high-speed A/D conversion ports and 1280 bytes of RAM, the program can be directly downloaded through the serial port (P3.0/P3.1) in a few seconds. The driving ability of the I/O port is up to 20 mA. Therefore, it is especially suitable for the occasion of strong interference due to the serial port of full-duplex asynchronous.

3.1.2. SD Card CIRCUIT

Due to the advantages of small portability, cheapness, lightness, large memory, simple interface and low power consumption and so on, the SD card is chosen to store the materials database and strengthen the commonality and practicability of system.

In addition, there are SD communication and SPI communication modes for SD card. SD mode is used on the occasion of reading and writing with high speed. And the SPI mode is used on the low speed occasion and has good compatibility with the hardware interface [13] . Due to the popular communication agreement and the concise connection, SD card is often used to exchange information in SPI mode with the structure of HOST/SLAVE and the transmission unit is byte [14] . Moreover, there are four pins for SPI communication, such as CS, MOSI, MISO and SCK. CS is used for the chip selection and it is valid with the low level. MOSI means that the host is output and the slave is input. MISO means the host is input and slave is output. SCK is the synchronous clock. The transmission data of SPI mode include the serial clock, input data and output data. Meanwhile, there are four lines for SCM to connect the SD card in SPI mode [15] . Pin1 (DAT3) is used to select the chip (CS), pin2 (CMD) is used as the data output (MOSI), pin7 (DAT0) is used as the data input (MISO) and the pin5 is used as the clock (SCK). Except for the power and ground, the other pins can be hung up.

In addition, the logic level of SCM is 5 V but the SD card is 3.3 V of TTL. They cannot be directly connected, otherwise it will burn the SD card [16] . Therefore, it is necessary to use a switch circuit to match the voltage, in which ASM117 is used to convert the input voltage of 5 V into output voltage of 3.3 V for SD card. The circuit to connect the SD card is shown in Figure 3.

3.1.3. Memory Expansion Circuit

Due to the average size of the materials is about 20 kb, and the memory of STC SCM is only 1280 bytes. Therefore, it needs to add a RAM to expand the memory. HM62256 chip which has capacity of 32 kb is the suitable choose. In order to read and write the RAM, the low eight-bit address is used to get the address space of 16-bit of P0, which can help the SD card read and write the block of 512-byte. At the same time, the latch of 74HC373 is connected with HM62256 to reuse the 64 k space of P0 port. By connecting the circuit of memory expansion as shown in Figure 4, the speed of reading and writing can be improved significantly.

Figure 3. SD card circuit.

Figure 4. Memory expansion circuit.

3.1.4. Video Communication

Meanwhile, in order to display the image, the FX-VXC256 is used for VGA controller in this system. By using the instruction to connect the SCM with the corresponding interface, it can convert the data from the serial port into the image which is shown in the displayer. The outline of FX-VXC256 is shown in Figure 5, where J2 interface is used to connect the SCM and J4 is used to connect the displayer.

3.2. Software System Design

The software flow chart of system is shown in Figure 6. It starts from the initialization of serial port, SD card and file system. Then the keyboard is scanned by SCM continuously. If any key is pressed, the SCM will quickly respond to the action by reading the key value and executing the program of interrupt service. When the SCM returns from the interruption, the system will open and read the corresponding file. If the file is the image file, the system will launch the VGA

Figure 5. Outline of the FX-VXC256.

Figure 6. The software flow chart of system.

controller to show it from the displayer. If the file is the MIDI file, the system will send data to audio source to play the music [17] .

The software design includes the initiation, input, data process and output. Here, the core program for SD card is illustrated in detail. The flow chart for SD card to read data is shown in Figure 7. When there are data of 512 KB deposited in the RAM, SCM will write the block data into SD card. Then the command of CMD17 is sent, then the system starts to receive data after identifying the key value. When the output is high level, the read operation is finished.

4. Results and Discussions

Based on the design above, the interactive platform of sight-singing and ear- training has been constructed. The learners can choose different difficulties of material according to their own needs and hobbies. By establishing a material library in the form of directory, the MIDI files and image files are stored as the directory structure shown in Figure 8. The files with the suffix of midi is the ear-

Figure 7. The flow chart for SD card.

Figure 8. The directory structure of materials.

training materials and the suffix of bmp is the sight-singing materials which are shown as the stave.bmp and the numbered.bmp. Furthermore, there are two modes of sight-singing and ear-training in general. In the sight singing mode, there are numbered musical notation files and staff files with different difficulties. In the ear training mode, there are MIDI files with different difficulties [18] . The materials are organized with low-difficulty, medium-difficulty and high- difficulty. System can identify the file and execute the related program to invoke the corresponding files.

In addition, SD card is used to store the materials which are organized based on the ZNFAT32 file system. The materials can be easily managed with the SD card [19] , because the files can be operated by using the functions in ZNFAT32 file system, such as read, write, open, close, search and so on.

The directory structure of the sight-singing and ear-training system is shown in Figure 8 [20] . Figure 9 shows the staff files stored in SD card. After pressing the S1 key, the system starts to read the image file by the serial port. Then we can see the data of stave 2. bmp as shown in Figure 10. When the data is send to VGA controller, the image is shown by the displayer [21] . It needs to take several seconds to show in the displayer as Figure 11 shown.

5. Conclusions

This system can achieve the functions of sight-singing and ear-training. It is convenient to scan the keyboard, read key value, and handle the interruption, read files and display files in this system. It can be used as the interactive platform for music learners. It is a kind of attempt to use the SCM to substitute the computer to achieve the functions of sight singing and ear training. In future, it will become an electronic product, which is popular to the music learners.

However, the high quality image has a large amount of data; it is not enough to store the whole image in our system. In the testing, some images may be

Figure 9. The stave files in SD card.

Figure 10. The file data.

Figure 11. System demonstration.

shown point by point even defectively. It indicates that the speed is not so good and the performance of system can be improved. It is necessary to improve the display method to display images completely and fast. Moreover, the collection of music materials is a formidable task; so far the material library is not perfect and still needs to be supplemented. Meanwhile, in order to ensure the accuracy of music, the quality of the speaker and displayer is needed to be improved. Thirdly, on the basis of these efforts, it also can develop other functions like simulation for exam and competition, grade, time, and so on, which make the system more comprehensive. All these are the future research for us.


This work was supported by the Applied Basic Research Plan of Qinghai (2017- ZJ-753), the Chun Hui Project of Education Ministry (Z2016071) and (Z2015 033), the natural science foundation of Qinghai (2016-ZJ-922) and the open fund of wireless sensor network and Communication Key Laboratory for Shang- hai Institute of Micro-System and Information Technology of Chinese Academy of Sciences (2016002).

Cite this paper

Lin, Q., Chen, S.-J., Jiang, W. and Jia, G.-Q. (2017) Study of an Interactive System of Sight-Singing and Ear-Training. Journal of Computer and Communications, 5, 67-78.


  1. 1. Nakayama, M. (2010) Fundamental Research of Construction for the Singing Training Support System for Shigin of Japanese Traditional Singing. IEICE Technical Report, 110, 19-24.

  2. 2. Goto, M. (2003) Smart Music KlOSK: Music Listening Station with Chorus-Search Function. Proceedings of UIST, 31-40.

  3. 3. Zhu, X., Shi, Y.Y., Kim, H.G. and Eom, K.W. (2006) An Integrated Music Recommendation System. IEEE Transactions on Consumer Electronics, 52, 917-925.

  4. 4. Wang, B.-R. and Chen, C.-Y. (2014) Development of an Image Processing Based Sheet Music Recognition System for iOS Devices. IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW), Taipei, 26-28 May 2014, 223-224.

  5. 5. Lim, S.-C., Lee, J.-S., Jang, S.-J., Lee, S.-P. and Kim, M.Y. (2012) Music-Genre Classification System Based on Spectro-Temporal Features and Feature Selection. IEEE Transactions on Consumer Electronics, 58, 1262-1268.

  6. 6. Yoo, J.M., Kim, G.H. and Lee, G.S. (2008) Mask Matching for Low Resolution Musical Note Recognition. IEEE International Symposium on Signal Processing and Information Technology, 2008, 345-353.

  7. 7. Tzanetakis, G. and Cook, P. (2002) Musical Genre Classification of Audio Signals. IEEE Transactions on Speech and Audio Processing, 10, 293-302.

  8. 8. Fujihara, H., Goto, M., Ogata, J. and Komatani, K. (2005) INTER: D: A Drum Sound Equalizer for Controlling Volume and Timbre of Drums. Proceedings of EWIMT, 2005, 205-212.

  9. 9. Akbari, M. and Cheng, H. (2015) Real-Time Pianon Music Transcription Based on Computer Vision. IEEE Transactions on Multimedia, 17, 2113-2121.

  10. 10. Cui, J.L., He, H. and Wang, Y.D. (2010) An Adaptive Staff Line Removal in Music Score Image. IEEE 10th International Conference on Signal Processing (ICSP), Beijing, 24-28 October 2010, 66-74.

  11. 11. Shao, B., Ogihara, M., Wang, D. and Li, T. (2009) Music Recommendation Based on Acoustic Features and User Access Patterns. IEEE Transactions on Audio, Speech, and Language Processing, 17, 1602-1611.

  12. 12. Goto, M. and Hirata, K. (2004) Invited Review: Recent Studies on Music Information Processing. Acoustical Science and Technology, 25, 419-425.

  13. 13. Zhou, P., Wang, T., Wang, X.A. and Wang, Y.H. (2014) Hardware Implementation of a Low Power SD Card Controller. Proceedings of IEEE International Conference on Signal Processing, Hangzhou, 19-23 October 2014, 158-161.

  14. 14. Mei, S.D. and Toshiba, S.D. (2000) Memory Card Specifications. SD Group, Beirut, 1-83.

  15. 15. Pan, Y.C., Liu, W.C. and Li, X. (2010) Development and Research of Music Player Application Based on Android. Proceedings of International Conference on Communications and Intelligence Information Security, Nanning, 13-14 October 2014, 23-25.

  16. 16. Parra, R., Ramirez, J. and Lahaye, M.A. (2014) Design and Implementation of a Music Composition Application Using Speech Recognition. Proceedings of 2014 XL Latin American Computing Conference, Montevideo, 15-19 September 2014, 1-12.

  17. 17. Goto, M. (2006) A Chorus-Section Detection Method for Musical Audio Signals and Its Application to a Music Listening Station. IEEE Transactions on Audio, Speech, and Language Processing, 14, 1783-1794.

  18. 18. Yoshii, K., Goto, M. and Okuno, H.G. (2007) Drum Sound Recognition for Poly Phonic Audio Signals by Adaptation and Matching of Spectrogram Templates with Harmonic Structure Suppression. IEEE Transactions on Audio, Speech, and Language Processing, 15, 333-345.

  19. 19. Microsoft Company (2000) Hardware White Paper Microsoft Extensible Firmware Initiative FAT32 File System Specification. Microsoft, Redmond, 7-25.

  20. 20. Jian, D. (2015) FAT File System Principle and Implement. Computer and Digital Project, 12, 35-42.

  21. 21. Cui, J.L., He, H. and Wang, Y.D. (2010) An Adaptive Staff Line Removal in Music Score Images. Proceedings of IEEE 10th International Conference on Signal Processing, Beijing, 24-28 October 2010, 1105-1112.