1、Application Number: 12272650 Application Date: 17.11.2008 Publication Number: 20100123785 Publication Date: 20.05.2010 Publication Kind : A1 IPC: H04N 5/228 H04R 3/00 G06F 3/048 G06K 9/00 Applicants: APPLE INC. Inventors: Chen Shaohai Tamchina Philip George Lee
2、Jae Han Seguin Chad G. Lee Michael Agents: APPLE INC./BSTZ;BLAKELY SOKOLOFF TAYLOR &; ZAFMAN LLP Priority Data:優先級數據 Title: (EN) Graphic Control for Directional Audio Input(圖形控制定向音頻輸入) Abstract: (EN) A device to provide an audio output includes a microphone array, a signal processo
3、r, and a graphic user interface (GUI).(圖形用戶接口) The signal processor is coupled to the microphone array to perform audio with input from the microphone array. The GUI is coupled to the signal processor to display a plurality of audio sources, to receive a selection of at least one of the plurality of
4、 audio sources from a user, and to provide the selection to the signal processor for aiming the audio beamforming (波束成形)toward the selected audio source. The selection may be made by touching the display. The device may further include a camera and the GUI may display an image received from the came
5、ra as the plurality of audio sources. The camera may provide a moving video image and the signal processor may provide a synchronized audio signal aimed at the selected audio source. Graphic Control for Directional Audio Input(圖形控制定向音頻輸入) 1. Field Embodiments of the invention rela
6、te to the field of audio beamforming; and more specifically, to the aiming of audio beamforming. 2. Background Under typical imperfect conditions, a single microphone that is embedded (插入)in a mobile device does a poor job of capturing sound because of background sounds that are captur
7、ed along with the sound of interest. An array of microphones can do a better job of isolating a sound source and rejecting ambient(周圍的) noise and reverberation. Beamforming is a way of combining sounds from two or more microphones that allows preferential capture of sounds coming from certain
8、 directions. In a delay-and-sum (延時疊加)beamformer sounds from each microphone are delayed relative to sounds from the other microphones, and the delayed signals are added. The amount of delay determines the beam angle(波束寬度)—the angle in which the array preferentially “listens.” When a sound arrives f
9、rom this angle, the sound signals from the multiple phones are added constructively. The resulting sum is stronger, and the sound is received relatively well. When a sound arrives from another angle, the delayed signals from the various microphones add destructively—with positive and negative parts
10、of the sound waves canceling out to some degree—and the sum is not as loud as an equivalent sound arriving from the beam angle. For example, if the sound comes into the microphone on the right before it enters the microphone on the left, then you know the sound source is to the right of the m
11、icrophone array. During sound capturing, the microphone array processor can aim a capturing beam in the direction of the sound source. Beamforming allows a microphone array to simulate a highly directional microphone pointing toward the sound source. The directivity of the microphone array reduces t
12、he amount of captured ambient noises and reverberated(反射) sound as compared to a single microphone. This may provide a clearer representation of a speaker's voice. A beamforming microphone array may made up of distributed (分佈的)omnidirectional (全方向的)microphones linked to a processor that combi
13、nes the several inputs into an output with a coherent form. Arrays may be formed using numbers of closely spaced microphones. Given a fixed physical relationship in space between the different individual microphone transducer array elements, simultaneous digital signal processor (DSP) processing of
14、the signals from each of the individual microphones in the array can create one or more “virtual” microphones. Different algorithms (算法)permit the creation of virtual microphones with extremely complex virtual polar patterns and even the possibility to steer the individual lobes of the virtual micro
15、phones patterns so as to home-in-on, or to reject, particular sources of sound. Beamforming techniques, however, rely on knowledge of the location of the sound source. Therefore it is necessary to aim the beamforming at the intended sound source to benefit from the use of a microphone array. SUMMAR
16、Y A device to provide an audio output includes a microphone array, a signal processor, and a graphic user interface (GUI). The signal processor is coupled to the microphone array to perform audio beamforming with input from the microphone array. The GUI is coupled to the signal processor to d
17、isplay a plurality of audio sources, to receive a selection of at least one of the plurality of audio sources from a user, and to provide the selection to the signal processor for aiming the audio beamforming toward the selected audio source. The selection may be made by touching the display. The de
18、vice may further include a camera and the GUI may display an image received from the camera as the plurality of audio sources. The camera may provide a moving video image and the signal processor may provide a synchronized audio signal aimed at the selected audio source. Other features and ad
19、vantages of the present invention will be apparent from the accompanying drawings and from the detailed description that follows below. BRIEF DESCRIPTION OF THE DRAWINGS The invention may best be understood by referring to the following description and accompanying drawings that are used to
20、illustrate embodiments of the invention by way of example and not limitation. In the drawings, in which like reference numerals indicate similar elements: FIG. 1 is a block diagram of a device in a typical environment for use. FIG. 2 is a block diagram of an implementation of the signa
21、l processor shown in FIG. 1. FIGS. 3 through 9 are alternate displays on the graphic user interface shown in FIG. 1. FIGS. 10 and 11 are conceptual polar diagrams of microphone pickups that might result from the source selections shown in FIGS. 8 and 9. FIG. 12 is an alternate d
22、isplay on the graphic user interface shown in FIG. 1. FIG. 13 is a conceptual polar diagram of microphone pickup that might result from the source selections shown in FIG. 12. DETAILED DESCRIPTION In the following description, numerous specific details are set forth. However, it is un
23、derstood that embodiments of the invention may be practiced without these specific details. In other instances, well-known circuits, structures and techniques have not been shown in detail in order not to obscure the understanding of this description. FIG. 1 shows a device 10 that provides an
24、 audio output. The device may be a mobile device such as a cellular telephone, a camera with an audio recorder, or a video recorder. The device 10 includes a microphone array 12,14. Microphones in the array may be omnidirectional microphones or they may have a directional pickup pattern. Each of the
25、 microphones may be one of an electret condenser microphone (ECM), a micro-electro-mechanical systems (MEMS), or other technology microphone, particularly a technology that provides microphones of a small size. A signal processor 24 is coupled to the microphone array to produce the audio outp
26、ut using audio beamforming with input from the microphone array. FIG. 2 shows an embodiment of the signal processor 24 that includes a central processing unit (CPU) 26 coupled to a memory 28. The memory includes instructions which, when executed by the CPU 26, provide the audio beamforming function
27、of the signal processor 24. It will be appreciated that the CPU 26 may perform additional functions that may or may not be related to the audio beamforming. FIG. 1 further shows a graphic user interface (GUI) 20 coupled to the signal processor 24. The GUI 20 displays an image of a plurality o
28、f audio sources such as the exemplary group of speakers 30, 32, 34 shown in the figure. The GUI 20 further receives a selection 18 of at least one of the plurality of audio sources from a user. The GUI 20 provides the selection to the signal processor 24 for aiming the audio beamforming toward the s
29、elected audio source 30 as suggested by the dashed line. The signal processor 24 may identify a spatial arrangement of sounds received by the microphone array 12, 14 and provides the spatial arrangement to the GUI 20. The GUI may display a graphic representation of the spatial arrangement of
30、audio sources as the image of the plurality of audio sources. The spatial arrangement identified by the signal processor 24 may be in the form of a plurality of beamforming angles that are directed to the plurality of audio sources. The spatial arrangement may identify only one dimension. Therefore,
31、 the graphic representation of the spatial arrangement of audio sources may be a somewhat abstract representation. FIG. 3 shows the GUI 20 displaying a representation of each audio source in a linear arrangement that suggests their position across the range of beamforming angles. Graphic indi
32、cator 40 represents speaker 30 shown in FIG. 1. Likewise indicator 42 represents speaker 32 and indicator 44 represents speaker 34. The graphic representation of the spatial arrangement of audio sources may include an indication of the average volume of the audio source by means such as size, intens
33、ity, color, or the like. For example, in FIG. 3 the leftmost graphic indicator 40 is large to suggest a loud audio source while the middle indicator 42 is small to indicate a quiet audio source. The rightmost indicator 44 is of medium size to indicate a sound volume between that indicated by the oth
34、er two indicators 40, 42. As shown in FIG. 1, the device 10 may include a camera 16 coupled to the GUI 20. The GUI may display an image received from the camera 16 as the image of the plurality of audio sources for selection 18 by the user. The selection may be made by touching the image on t
35、he GUI or by a pointing device such as a trackball or joystick. The signal processor 24 may identify a spatial arrangement of sounds received by the microphone array 12, 14 and provide the spatial arrangement to the GUI 20. As shown in FIG. 4, The GUI 20 may enhance the image 50, 52, 54 recei
36、ved from the camera 16 based on the spatial arrangement to suggest the audio sources within the image. The enhancements may further suggest the relative volume of the audio sources by means such as size, intensity, color, or the like. Alternatively, as shown in FIG. 5, the GUI 20 may display the gra
37、phic representation 40, 42, 44 of the spatial arrangement of audio sources as an overlay on the image received from the camera 16 as the image of the plurality of audio sources for selection by the user. As shown in FIG. 1, the device 10 may include an image processor 22 coupled to the camera
38、 16 and the GUI 20. The image processor 22 may identify faces in the image received from the camera 16. The memory 28 shown in FIG. 2 may further include instructions which, when executed by the CPU 26, provide the face recognition function of the image processor 22. As shown in FIG. 6, the G
39、UI 20 may display the identified faces 60, 62, 64 in the image as selectable audio sources. The identified faces 60, 62, 64 may be indicated by a variety of means such as an outline, presenting the identified faces lighter than the remaining image, presenting the identified faces in color with the r
40、emaining image in black and white, etc. The image processor 22 may receive the spatial arrangement of sounds received by the microphone array 12, 14 identified by the signal processor 24. As shown in FIG. 7, the image processor 22 may limit the face identification to faces that correspond to
41、audio sources identified by the signal processor 24. In the example shown, the image of the middle speaker 62′ may not be identified as a selectable audio source if the volume of sound received from that direction is below a sound level threshold for identifying audio sources. The GUI 20 may provide
42、 a way of selecting a audio source other than one identified by the signal processor 24. As shown in FIGS. 8 and 9, the GUI 20 may receive a size associated with the selection 80, 90 of the audio source. The signal processor 24 may adjust a front lobe size according to the size associated wit
43、h the selection of the audio source. FIG. 8 shows a selection 80 of one person as the audio source at which the beam forming is aimed, which would cause the front lobe to be adjusted to provide a highly directional audio input as shown in the polar pattern of microphone pickup of FIG. 10. FIG
44、 9 shows a selection 90 of two adjacent people as the audio source at which the beam forming is aimed, which would cause the front lobe to be adjusted to provide a less directional audio input suitable for receiving a conversation between the two people as shown in the polar pattern of microphone p
45、ickup of FIG. 11. (It should be noted that FIGS. 10 and 11 are conceptual illustrations of microphone pickup patterns and may not represent patterns that could be obtained with any particular microphone array.) It will be appreciated that the selection on the GUI 20 may provide a width and a
46、height of the audio source at which the beamforming is to be aimed but the beamforming may be responsive to one dimension of the selection such as the width. As shown in FIG. 12, the GUI may permit selections 100, 102 of two or more of the plurality of audio sources from the user. The selecti
47、on of more than one audio source may cause the signal processor to search for voice activity only among the selected two or more of the plurality of audio sources. In another embodiment, the signal processor may provide for simultaneously receiving audio from audio sources in more than one direction
48、 by providing a virtual microphone with more than one prominent lobe as shown in the polar pattern of microphone pickup of FIG. 13 or by providing more than one signal processing path to provide more than one virtual microphone. (It should be noted that FIG. 13 is a conceptual illustration of a micr
49、ophone pickup pattern and may not represent a pattern that could be obtained with any particular microphone array.) The device may be a camera that provides a moving video image with the signal processor providing a synchronized audio signal aimed at the selected audio source as the audio out
50、put. In other embodiments, the camera, if present, may be used only to provide images to the image processor to assist in the aiming of the audio beamforming with the device providing only an audio output aimed at the selected audio source. While certain exemplary embodiments have been descri






