Giter VIP home page Giter VIP logo

momo-asr-bot's Introduction

Assignment 1 for Human Computer Interaction

Speech Recognition App

Aeroraven

A. Run

Execute the command

python main.py

Here are all parameters supported.

python main.py [-h] [--msaa MSAA] [--fps FPS]
# --help       Display help
# --msaa MSAA  Antialiasing (Defaults to 16)
# --fps FPS    Limit of FPS (Defaults to 60)

NOTE: Deep learning models are not uploaded for they are too large.

B. Running Prerequisite

System Requirement

  • Graphics Card
  • Microphone
  • Python >= 3.7

Python Packages

Package Version Usage
PyOpenGL (*) 3.1.6 UI
PyOpenGL-accelerate (*) 3.1.6 UI Acceleration
PyQt5 5.15.4 MSAA & Context
SpeechRecognition (!) 3.8.1 ASR
NumPy 1.19.3 Math Operations

Use conda or pip to install these packages


Following parts are the submitted document.

1. Graphic Interface Improvements

1.1 Basic Structure of Modified GUI

1.1.1 Overall Design Goals / Ideas

The aim of this assignment is to design a voice recognition apps. Here are several considerations.

  • Protrait Layout. This is the default layout of smartphones, which can make user feel the application is more light, compact and familiar.
  • Smooth Transition & Avoid Sharp Elements. This provides aesthetic beauty. Smooth transition can make the UI more dynamic and the avoidance of sharp elements can make the visual representation easier on eyes.
    • Examples used in this app: Rounded-rectangle, Cubic-bezier transition, Switching animation
  • Placebo Effect & Valid Feedback. Placebo effect originally refers to a beneficial effect produced by a placebo drug or treatment, which cannot be attributed to the properties of the placebo itself, and must therefore be due to the patient's belief in that treatment. In practice, some placebo-effect designs can usefully reduce user's tense when they use this application.
    • Examples used in this app: Voice power indicator
  • **Affinity. ** It's better if the users feel they are contact with an companion instead of a cold machine.
    • Example used in this app: Anthropomorphic expression, Robot icons
  • Performance. If the recognition model has higher accuracy, users will have better experience.
    • Example used in this app: Denoising, Changing model
  • Guidance. Users should first know how to use the software first.
    • Example used in this app: Help page. "Wake Up" page (intended to guide users to get familiar with the App)
1.1.2 Integration with OpenGL and Qt5

The submitted code is modified based on the given code. However, large modifications were done.

The UI framework adopted in this assignment is the integration of PyQt5 and PyOpenGL. PyOpenGL serves as the main UI library and PyQt5 is used to provide the context for OpenGL and listening to events.

To achieve this, the class ZwGraphicsContainer inherits QGLWidget and override many virtual functions including paintGL, initializeGL and resizeGL. And two events handlers (mouseMoveEvent and mousePressEvent) are also overriden to allow the widget to listen to mouse events. And to control the framerate, a QTimer is added to limit the max frame rate.

# graphics_utils.py
class ZwGraphicsContainer(QGLWidget):
    def __init__(self, parent, format_s, frame_rate=60, app=None, s_width=None, s_height=None):
        super().__init__(format_s, parent)
        self.wparent = parent
        self.gl_assist = ZwOpenGLHelper()
        self.attrib = ZwInteractivePageAttrib()
        self.application = app
        self.updater = QTimer(self)
        self.updater.timeout.connect(self.frame_advance)
        self.interval = 1000 / frame_rate
        self.setMouseTracking(True)
        self.s_width = s_width
        self.s_height = s_height

    def mouseMoveEvent(self, event):
        try:
            x = event.x()
            y = event.y()
            dy = self.attrib.page_width_scale * self.s_height / self.s_width * (y / self.s_height)
            dy = dy * 2 - self.attrib.page_width_scale * self.s_height / self.s_width
            dx = self.attrib.page_width_scale * (x / self.s_width)
            dx = dx * 2 - self.attrib.page_width_scale
            self.application.qt_mouse_move_event(dx, dy)
            event.accept()
        except AttributeError:
            pass

    def mousePressEvent(self, event):
        try:
            x = event.x()
            y = event.y()
            dy = self.attrib.page_width_scale * self.s_height / self.s_width * (y / self.s_height)
            dy = - dy * 2 + self.attrib.page_width_scale * self.s_height / self.s_width
            dx = self.attrib.page_width_scale * (x / self.s_width)
            dx = dx * 2 - self.attrib.page_width_scale
            self.application.qt_mouse_down_event(dx, dy)
            ZwLogger().log(ZwLogger.ROLE_GUIQT, ZwLogger.LV_INFO,
                           "Mouse press event. ax=" + str(dx) + ", ay=" + str(dy))
            event.accept()
        except AttributeError:
            pass

    def frame_advance(self):
        self.application.frame_update()
        self.application.play()
        self.update()

    def paintGL(self) -> None:
        self.makeCurrent()
        self.application.gl_clear_func()
        ogl.glLoadIdentity()
        ogl.glTranslated(0, 0, -10)
        ogl.glClear(ogl.GL_COLOR_BUFFER_BIT | ogl.GL_DEPTH_BUFFER_BIT)
        self.application.render()

    def initializeGL(self) -> None:
        self.updater.start(int(self.interval))
        self.makeCurrent()
        ZwLogger().log(ZwLogger.ROLE_GUI, ZwLogger.LV_INFO, "Initializing OpenGL in Qt5 Framework")
        ogl.glShadeModel(ogl.GL_SMOOTH)
        ogl.glEnable(ogl.GL_DEPTH_TEST)
        # ogl.glEnable(ogl.GL_CULL_FACE)
        ogl.glEnable(ogl.GL_POINT_SMOOTH)
        ogl.glEnable(ogl.GL_LINE_SMOOTH)
        ogl.glEnable(ogl.GL_MULTISAMPLE)
        ogl.glEnable(ogl.GL_POLYGON_SMOOTH)
        ogl.glHint(ogl.GL_POLYGON_SMOOTH_HINT, ogl.GL_NICEST)
        ogl.glHint(ogl.GL_LINE_SMOOTH_HINT, ogl.GL_NICEST)
        ogl.glHint(ogl.GL_POINT_SMOOTH_HINT, ogl.GL_NICEST)
        ZwLogger().log(ZwLogger.ROLE_GUI, ZwLogger.LV_INFO, "Initializing OpenGL Utility Tools in Qt5 Framework")
        self.glut_initialization()
        ZwLogger().log(ZwLogger.ROLE_GUI, ZwLogger.LV_INFO, "Graphic initialization completed.")

    def glut_initialization(self):
        glut.glutInit()
        glut.glutMouseFunc(self.glut_mouse_event)
        glut.glutMotionFunc(self.glut_motion_event)

    def glut_mouse_event(self, button, state, x, y):
        print(button, state, x, y)

    def glut_motion_event(self, x, y):
        print("Motion", x, y)

    def resizeGL(self, w: int, h: int) -> None:
        self.makeCurrent()
        side = min(w, h)
        if side < 0:
            return
        ogl.glViewport(0, 0, w, h)
        ogl.glMatrixMode(ogl.GL_PROJECTION)
        ogl.glLoadIdentity()
        ogl.glOrtho(-self.attrib.page_width_scale, self.attrib.page_width_scale,
                    -self.attrib.page_width_scale * h / w, self.attrib.page_width_scale * h / w,
                    self.attrib.z_near, self.attrib.z_far)
        ogl.glMatrixMode(ogl.GL_MODELVIEW)

Besides, in Qt5 framework, a main widget should be defined to serve as the main window. Thus, class InteractionUI which inherits class QWidget to serve as the main window.

class InteractionUI(QWidget):
    arg_msaa = 16
    max_fps = 60

    def __init__(self, width=590 * 1.5, height=950):
        super().__init__()
        ZwLogger().log(ZwLogger.ROLE_GUIQT, ZwLogger.LV_INFO, "Initializing Window Widget")
        # MSAA Anti-aliasing
        self.msaa = QGLFormat()
        self.msaa.setSamples(InteractionUI.arg_msaa)

        # Init ASR
        SpeechRecognitionUtil().env_noise_record('./noise.wav')

        # QT Configuration
        self.custom_width = int(width)
        self.custom_height = int(height)
        self.ogl_widget = ZwGraphicsContainer(self,
                                              format_s=self.msaa,
                                              frame_rate=InteractionUI.max_fps,
                                              app=ZwUIHCILab01(),
                                              s_width=int(width),
                                              s_height=int(height))
        self.setFixedSize(width, height)
        self.init_ui()

    def init_ui(self):
        ZwLogger().log(ZwLogger.ROLE_GUIQT, ZwLogger.LV_INFO, "Initializing UI")
        self.resize(self.custom_width, self.custom_height)
        self.move(400, 200)
        self.setWindowTitle("Hello")

        # OGL Widget
        self.ogl_widget.move(0, 0)
        self.ogl_widget.resize(self.custom_width, self.custom_height)
        self.show()
1.1.3 Encapsulate Shapes

To render stuffs on the screen, drawing calls should be explicitly included in the codes. However, although OpenGL provides more freedom than Qt5 Components, all shapes should be defined by ourself. To perserve the maintainability of the code, the drawing calls are encapsulated in the abstract class which defines the shared behaviours of all shape rendering procedure.

The class ZwUIAbstractShape encapsulate all OpenGL rendering calls and provide an methods to create other shapes by inheriting the class itself.

# zwui_shapes.py
class ZwUIAbstractShape:
    def __init__(self, **kwargs):
        # Primitive Attributes
        self.fill_type = gl.GL_POLYGON
        self.text_mode = False
        self.shape_list = []
        self.color_list = []
        # Text Attributes
        self.caption = ""
        self.location_x = 0
        self.location_y = 0
        self.scale = 5
        self.line_width = 2
        self.font_color = ZwUIHelper.get_opengl_rgba_vector(0, 0, 0)
        self.space_modifier = -10000
        self.letter_modifier = 1900

    def render(self,**kwargs):
        offset_z = 0
        if 'z_index' in kwargs:
            offset_z = kwargs['z_index']

        if not self.text_mode:
            gl.glBegin(self.fill_type)
            for i in range(len(self.shape_list)):
                gl.glColor4f(self.color_list[i][0], self.color_list[i][1], self.color_list[i][2], self.color_list[i][3])
                gl.glVertex3f(self.shape_list[i][0], self.shape_list[i][1], self.shape_list[i][2]+offset_z)
            gl.glEnd()
        else:
            gl.glPushMatrix()
            gl.glTranslatef(self.location_x, self.location_y, offset_z)
            gl.glScalef(self.scale, self.scale, 0)
            gl.glLineWidth(self.line_width)
            for i in range(len(self.caption)):
                gl.glColor4f(self.font_color[0], self.font_color[1], self.font_color[2], self.font_color[3])
                glut.glutStrokeCharacter(ctypes.c_void_p(0), ctypes.c_int(ord(self.caption[i])))
                if self.caption[i] == ' ':
                    gl.glTranslatef(self.space_modifier * self.scale, 0, 0)
                else:
                    gl.glTranslatef(self.letter_modifier * self.scale, 0, 0)
            gl.glLineWidth(1)
            gl.glPopMatrix()

    def reset(self):
        self.shape_list = []
        self.color_list = []

    def insert(self, shape, color):
        if not isinstance(shape, list) or not isinstance(color, list):
            raise Exception("Arguments must be lists")
        if len(shape) != 3 or len(color) != 4:
            raise Exception("Vector dimensions mismatch")
        self.shape_list.append(shape)
        self.color_list.append(color)

    def initialize(self, **kwargs):
        # Custom shapes should re-implement this method
        pass

In most design practices, rounded rectangle enjoys high popularity. Thus, in this assignment, The rounded rectangle is implemented.

# zwui_shapes.py
class ZwUIRoundedRectangle(ZwUIAbstractShape):
    def __init__(self, left, bottom, width, height, radius, color):
        super(ZwUIRoundedRectangle, self).__init__()
        self.left = left
        self.bottom = bottom
        self.width = width
        self.height = height
        self.color = color
        self.radius = radius
        self.initialize()

    def initialize(self, **kwargs):
        step = 90
        # Left bottom end
        self.insert(ZwUIHelper.get_opengl_pos3d_vector(self.left + self.radius, self.bottom), self.color)
        # Right bottom start
        self.insert(ZwUIHelper.get_opengl_pos3d_vector(self.left + self.width - self.radius, self.bottom), self.color)
        # Right bottom arc
        for i in range(1, step + 1):
            tx = self.left + self.width - self.radius
            ty = self.bottom + self.radius
            tx += self.radius * math.cos(i / step * PI / 2 - PI / 2)
            ty += self.radius * math.sin(i / step * PI / 2 - PI / 2)
            self.insert(ZwUIHelper.get_opengl_pos3d_vector(tx, ty), self.color)
        # Right Top start:
        self.insert(ZwUIHelper.get_opengl_pos3d_vector(self.left + self.width, self.bottom + self.height - self.radius),
                    self.color)
        # Right top arc
        for i in range(1, step + 1):
            tx = self.left + self.width - self.radius
            ty = self.bottom + self.height - self.radius
            tx += self.radius * math.cos(i / step * PI / 2)
            ty += self.radius * math.sin(i / step * PI / 2)
            self.insert(ZwUIHelper.get_opengl_pos3d_vector(tx, ty), self.color)
        # Left top start:
        self.insert(ZwUIHelper.get_opengl_pos3d_vector(self.left + self.radius, self.bottom + self.height), self.color)
        # Left top arc:
        for i in range(1, step + 1):
            tx = self.left + self.radius
            ty = self.bottom + self.height - self.radius
            tx += self.radius * math.cos(i / step * PI / 2 + PI / 2)
            ty += self.radius * math.sin(i / step * PI / 2 + PI / 2)
            self.insert(ZwUIHelper.get_opengl_pos3d_vector(tx, ty), self.color)
        # Left bottom start
        self.insert(ZwUIHelper.get_opengl_pos3d_vector(self.left, self.bottom + self.radius), self.color)
        # Left bottom arc
        for i in range(1, step + 1):
            tx = self.left + self.radius
            ty = self.bottom + self.radius
            tx += self.radius * math.cos(i / step * PI / 2 + PI)
            ty += self.radius * math.sin(i / step * PI / 2 + PI)
            self.insert(ZwUIHelper.get_opengl_pos3d_vector(tx, ty), self.color)

Implemented shapes are listed below:

Class Name Function File:Line
ZwUIText Plain text zwui_shapes.py : 72
ZwUIRectangle Rectangle zwui_shapes.py : 84
ZwUITriangle Triangle zwui_shapes.py : 101
ZwUICircle Circle zwui_shapes.py : 116
ZwUIArc Arc zwui_shapes.py : 137
ZwUIRoundedRectangle RoundedRectangle zwui_shapes.py : 159
ZwUICompoundShape Shape container zwui_shapes.py : 213
ZwUIFunctionalCurve Curve y=f(x) zwui_shapes.py : 226
1.1.4 Encapsulate Components and Controls

A component or a control is a set of shapes, which are integrated together to function consistently. Buttons and labels are typical components. Besides, in order to display transition or animation, keyframe mechanism is added to the components to improve the visual expression. Like the shape, a vitrual class ZwUIAbstractControl is designed for all components to reduce the code redundancy.

Scene is the special component.

class ZwUIAbstractControl:
    def __init__(self, **kwargs):
        self.ani_clips = {"default": ZwUIClips()}
        self.current_clips = "default"
        self.z_index = 0
        self.paused = False
        self.play_ended = False
        self.handlers = {}
        self.name = ""
        self.hidden = False
        self.recorded_dict = kwargs

    def reload(self, **kwargs):
        for key in kwargs.keys():
            self.recorded_dict[key] = kwargs[key]
        self.ani_clips = {"default": ZwUIClips()}
        self.initialize(**self.recorded_dict)

    def add_handler(self, event: str, handler: callable):
        self.handlers[event] = handler

    def switch_to_animation_list(self, name):
        self.current_clips = name
        if name not in self.ani_clips:
            self.ani_clips[name] = ZwUIClips()

    def switch_to_keyframe(self, idx):
        self.ani_clips[self.current_clips].selected_clips = idx

    def play_animation(self, **kwargs):
        if self.paused:
            return
        self.ani_clips[self.current_clips].selected_clips += 1
        if self.ani_clips[self.current_clips].selected_clips >= len(self.ani_clips[self.current_clips].clips):
            self.ani_clips[self.current_clips].selected_clips = len(self.ani_clips[self.current_clips].clips) - 1
            if not self.play_ended:
                if ZwUIConstant.EV_PLAY_ENDED in self.handlers:
                    self.handlers[ZwUIConstant.EV_PLAY_ENDED](
                        initiator=ZwUIConstant.ROLE,
                        control_name=self.name,
                        event=ZwUIConstant.EV_PLAY_ENDED
                    )
            self.play_ended = True
        else:
            self.play_ended = False

    def append_keyframe(self, keyframe):
        self.ani_clips[self.current_clips].add_frame(keyframe)

    def pop_keyframe(self):
        self.ani_clips[self.current_clips].pop_frame()

    def render(self):
        if not self.hidden:
            self.ani_clips[self.current_clips].render(z_index=self.z_index)

    def on_mouse_down(self, x, y):
        pass

    def initialize(self, **kwargs):
        ZwLogger().log(ZwLogger.ROLE_GUILA, ZwLogger.LV_WARN, "Abstract initialization function is called")
        keyframe = ZwUIKeyFrame()
        self.append_keyframe(keyframe)

    def frame_update(self, **kwargs):
        pass

Implemented components are listed below:

Class Function File : Line
ZwUIScene Scene zwui_controls.py : 75
ZwUICustomStaticControl Custom shape container zwui_controls.py : 119
ZwUIButtonControl Button
(with rounded rectangle border)
(Available transition: fade_in, fade_out)
zwui_controls.py : 131
ZwUILabelControl Label
(Available transition: rising, descending, typing)
zwui_controls.py : 226
ZwUINestedLabelControl Label with Rounded Rectangle Background
(Available transition: typing)
zwui_controls.py : 281
ZwUIStripedWaveControl Voice power indicator zwui_controls.py : 359
ZwUISwitchMaskControl Mask used for scene switching zwui_controls.py : 413
ZwUIMicrophoneIconControl Microphone icon zwui_controls.py : 456
ZwUITimer Timer zwui_controls.py : 487
ZwUIMomoBotIconControl Robot icon zwui_controls.py : 514
ZwUIAniMomoBotIconControl Robot icon with transition support
(Available transition: fade_in)
zwui_controls.py : 632

1.2 Transition and Bezier Curve

A Bรฉzier curve is a parametric curve used in computer graphics and related fields. A set of discrete "control points" defines a smooth, continuous curve by means of a formula.

A transition with Bezier curve can display the interface with elasticity and dynamic traits.

image-20220429151128246

A cubic Bezier curve is the Bezier curve that has four control points. For simplicity, point $P_0$ is the origin and $P_3$ is (1,1). It can be described in the following form. $$ F(x,y): P_0(1-t)^3+3P_1t(1-t)+3P_2t^2(1-t)+P_3t^3=0 $$

# zwui_animation.py
def standard_cubic_bezier(x1, y1, x2, y2):
    return lambda t: 3 * np.array([x1, y1]) * t * (1 - t) * (1 - t) + 3 * np.array([x2, y2]) * t * t * (
        1 - t) + np.array([1, 1]) * t * t * t

To transform the $F(x,y)$ into the form of $y=f(x)$, solving the cubic equation is required.

# zwui_animation.py
def standard_cubic_bezier_time(x1, y1, x2, y2):
    def cubic_root(x):
        if x >= 0:
            return x ** (1. / 3.)
        else:
            return -(-x) ** (1. / 3.)

        def solver(x):
            a = (1 + 3 * x1 - 3 * x2)
            a2 = a * a
            b = (3 * x2 - 6 * x1)
            b2 = b * b
            c = 3 * x1
            p = (3. * a * c - b2) / (9. * a2)
            q = (-9. * a * b * c - 27. * a2 * x + 2. * b * b2) / (54. * a * a2)
            w = math.sqrt(q * q + p * p * p)
            v = -0.5 + math.sqrt(3) * 0.5j
            v2 = v * v
            f = b / 3. / a
            s = cubic_root(-q + w)
            t = cubic_root(-q - w)
            ex = 1000
            ret = [s + t - f,
                   v * s + v2 * t * 1j - f,
                   v2 * s + v * t * 1j - f]
            for i in ret:
                if (isinstance(i, float) or isinstance(i, int)) and (0.0 <= i <= 1.0):
                    return round(i * ex) / ex
                if isinstance(i, complex):
                    if math.fabs(i.imag) < 1e-5 and (0.0 <= i.real <= 1.0):
                        return round(i.real * ex) / ex
                    raise Exception("No solution", " X=" + str(x), "t=" + str(t), "SimCoef" + str([1, p, q]),
                                    "Coef=" + str([a, b, c, -x]), ret, )

                    def solver_x(x):
                        t = solver(x)
                        eq = ZwUITransition.standard_cubic_bezier(x1, y1, x2, y2)
                        return eq(t)

                    return lambda x: solver_x(x)

Following items are transitions used in this app

# app.py
# Transition Schemes
t_bezier = ZwUITransition.standard_cubic_bezier_time(.76, .36, .24, .56)
t_bezier_beta = ZwUITransition.standard_cubic_bezier_time(.78, .17, .22, .85)
t_bezier_alpha = ZwUITransition.standard_cubic_bezier_time(.88, .28, .26, .91)
t_linear = lambda x: (x, x)

1.3 Recognition & Multithreading & Instruction Mapping

To render the interface and listen to the recorder in parallel, multithreading is the best solution. When the application starts, a new thread will be created to do recognition works.

# app.py
class ZwUIHCILab01EventThread(threading.Thread):
    def __init__(self, parent: object = None, event_callback_handler: callable = None):
        super(ZwUIHCILab01EventThread, self).__init__()
        self.callback = event_callback_handler
        self.parent = parent

    def run(self):
        ZwLogger().log(ZwLogger.ROLE_APP, ZwLogger.LV_INFO, "Speech recognition thread starts")
        responder = Responder()
        responder.waiting_and_respond(self.callback, True, self.parent)

the class Responder is the junction for the ASR kernel and the UI front end. It calls the function provided by the ASR kernel and send the message to the UI module via the call back function provided in the argument.

class Responder:
    def __init__(self):
        self.text_analyst = TextUtil()
        self.application_map = { ... }

    def waiting_and_respond(self,
                            callback_function: callable = None,
                            is_member_function: bool = False,
                            parent_node: object = None):
        async def callback_wrapper(**kwargs):
            ZwLogger().log(ZwLogger.ROLE_ALRSP, ZwLogger.LV_INFO, "Initiating the callback procedure"
                                                                  " to the event handler")
            await asyncio.sleep(1)
            if callback_function is not None:
                callback_function(**kwargs)

        sr = SpeechRecognitionUtil()
        tx = TextUtil()
        while True:
            rec = sr.interpret_from_microphone(None)
            action = None
            for key in self.application_map.keys():
                if tx.similarity(key, rec) > 0.8:
                    action = self.application_map[key]
                    break
            asyncio.run(callback_wrapper(initiator="responder", action=action, result=rec))

The core function mentioned in the code above is interpret_from_microphone. It involves following steps

  • First, record an audio and save the audio clip (via record_audio function)
    • First, a wave stream is opened and basic parameters and thresholds are set
    • Second, FFT is performed to calculate the power of the audio clip. If the power is lager than the threshold, the record procedure will be started.
    • When the power is lower than the threshold for about a second, the record procedure will be terminated.
    • Last, the stream is closed and denosing processing is performed to the audio file.
  • Second, load the saved audio clip and transform it into the spectrogram (for DeepSpeech). After the preprocessing, the data is fed into the model for inference.
    # algorithm_utils.py
    def record_audio(self, wave_out_path, callback_function=None):
        async def callback_wrapper(**kwargs):
            await asyncio.sleep(1)
            if callback_function is not None:
                callback_function(**kwargs)

        chunk = 1024
        audio_format = pyaudio.paInt16
        channels = 1
        rate = 16000
        p = pyaudio.PyAudio()
        stream = p.open(format=audio_format,
                        channels=channels,
                        rate=rate,
                        input=True,
                        frames_per_buffer=chunk)
        wf = wave.open(wave_out_path, 'wb')
        wf.setnchannels(channels)
        wf.setsampwidth(p.get_sample_size(audio_format))
        wf.setframerate(rate)
        occupied = False
        stopflag = 0
        stopflag2 = 0
        threshold = 7000
        ZwLogger().log(ZwLogger.ROLE_ALRC, ZwLogger.LV_INFO, "Start to listening the Recorder")
        while True:
            data = stream.read(chunk)
            rt_data = np.frombuffer(data, np.dtype('<i2'))
            fft_temp_data = fftpack.fft(rt_data, rt_data.size, overwrite_x=True)
            fft_data = np.abs(fft_temp_data)[0:fft_temp_data.size // 2 + 1]
            SpeechRecognitionUtil.wave_ratio = sum(fft_data) // len(fft_data)
            if sum(fft_data) // len(fft_data) > threshold:
                stopflag += 1
                if not occupied:
                    ZwLogger().log(ZwLogger.ROLE_ALRC, ZwLogger.LV_INFO, "Start recording. Power gained via FFT:" +
                                   str(sum(fft_data) // len(fft_data)))
                    occupied = True
                    stopflag = 0
                    stopflag2 = 0
            else:
                stopflag2 += 1
                if not occupied:
                    continue
            ons = rate / chunk * 2
            if stopflag2 + stopflag > ons:
                if stopflag2 > ons * 0.8:
                    ZwLogger().log(ZwLogger.ROLE_ALRC, ZwLogger.LV_INFO, "Record completed")
                    break
                else:
                    stopflag2 = 0
                    stopflag = 0

            wf.writeframes(data)
            # ZwLogger().log(ZwLogger.ROLE_ALRC, ZwLogger.LV_INFO, data)
        stream.stop_stream()
        stream.close()
        p.terminate()
        wf.close()
        rate, data = wavfile.read(wave_out_path)
        _, noise_data = wavfile.read("./noise.wav")
        reduced_noise = nr.reduce_noise(y=data, y_noise=noise_data, sr=rate)
        wavfile.write(wave_out_path, rate, reduced_noise)

And

# algorithm_utils.py
def interpret_from_file(self, path):
    if SpeechRecognitionUtil.engine == "ds":
        fin = wave.open(path, 'rb')
        audio = np.frombuffer(fin.readframes(fin.getnframes()), np.int16)
        return SpeechRecognitionUtil.model.stt(audio)
    else:
        file = sr.AudioFile(path)
        with file as f:
            audio = self.recognizer.record(f)
        return self.recognizer.recognize_sphinx(audio)

After receiving the result from the function interpret_from_microphone, the Responder will compare the result with keys in the instruction mapping dictionary self.application_map ๏ผˆAppendix 3.3). If the similarity is higher than 80%, the message will return to the UI module.

1.4 Voice Indicator

This design is the placebo-effect design, which is intended to notice user that their voice can be sensed by the software. And this will reduce the tense for users.

image-20220429143705262

The indicator will display the realtime voice power using FFT algorithm.

class ZwUIStripedWaveControl(ZwUIAbstractControl):
    def __init__(self, x, center_y, strips, strip_width, strip_gap, color):
        super(ZwUIStripedWaveControl, self).__init__()
        self.x = x
        self.y = center_y
        self.st = strips
        self.wid = strip_width
        self.gap = strip_gap
        self.color = color
        self.initialize()
        self.floating = 0
        self.floating_lim = 45
        self.hv = 0.05
        self.emp_list = [0.05 for _ in range(100)]

    def initialize(self, **kwargs):
        kf = ZwUIKeyFrame()
        for i in range(self.st):
            t = ZwUIRectangle(self.x + (self.wid + self.gap) * i, -0.1 + self.y, self.wid, 0.2, self.color)
            kf.add_shape(t)
        self.append_keyframe(kf)

    def set_height(self, height=0.1):
        self.hv = height

    def frame_update(self, **kwargs):
        self.floating = self.floating + 1

        kf = ZwUIKeyFrame()
        for i in range(self.st - 1):
            t = ZwUIRectangle(self.x + (self.wid + self.gap) * i +
                              (self.floating / self.floating_lim) * (self.wid + self.gap),
                              -self.emp_list[i + 1] + self.y, self.wid, self.emp_list[i + 1] * 2, self.color)
            kf.add_shape(t)
        dr = self.floating / self.floating_lim
        dl = self.gap / (self.wid + self.gap)
        if dr >= dl:
            t = ZwUIRectangle(self.x, self.y - self.emp_list[0],
                              self.wid * (dr - dl) / (1 - dl), self.emp_list[0] * 2, self.color)
            kf.add_shape(t)
        else:
            t = ZwUIRectangle(self.x + (self.wid + self.gap) * (self.st - 1) +
                              (self.floating / self.floating_lim) * (self.wid + self.gap),
                              -self.emp_list[self.st] + self.y, self.wid * (1 - dr / dl),
                              self.emp_list[self.st] * 2, self.color)
            kf.add_shape(t)
        if self.floating == self.floating_lim:
            self.emp_list.pop()
            self.emp_list.insert(0, self.hv)
            self.floating = 0
        self.pop_keyframe()
        self.append_keyframe(kf)

The power will be updated every frame update

# app.py 
def on_frame_update(self, **kwargs):
    hw = (min(0.05 * max(math.log(SpeechRecognitionUtil.wave_ratio + 1) - 0.2, 0), 0.5) + 0.05)
    self.get_scene("main_scene") \
    .get_control("wave") \
    .set_height(hw)

    self.get_scene("welcome_scene") \
    .get_control("start_guide_mcb") \
    .set_height(hw)

1.5 Functions Implemented

In this assignment, following functions are implemented.

1.5.1 Open Notepad

The first one is open the notepad. Users can say open notepad to let the application launch the notepad (notepad.exe)

To ensure the generalizability. There are other alternatives including notepad, write notes, write something, take notes.

The launch procedure is based on the library win32api(or pywin32). The code below shows the launch procedure.

# algorithm_utils.py
class ShellUtil:
    @staticmethod
    def execute(path: str):
        ZwLogger().log(ZwLogger.ROLE_ALWIN, ZwLogger.LV_INFO, "Executing "+str(path))
        win32api.ShellExecute(0, 'open', path, '', '', 1)

When the instruction is recognized, a tuple (shell, notepad.exe) will be passed to the handler. And UI module modifies the elements to be rendered and executes the given application.

# app.py
elif action[0] == "shell":
    self.get_scene("response_scene").get_control("tip_2").reload(
        overridden_text="I've found the application!")
    self.get_scene("response_scene").get_control("rec_2a").reload(
        overridden_text=action[1])
    self.get_scene("response_scene").get_control("rec_2b").reload(
        overridden_text="Execute " + self.action_list["applications"][action[1]])
    ShellUtil.execute(self.action_list["applications"][action[1]])

image-20220429150607981

1.5.2 Play Music

When you say play music, the application will launch the default music player and play the music music.mp3.

Equivalent instructions include music, play songs, listen to music

The procedure resembles what has already been mentioned in the open notepad section

image-20220429151006502

1.5.3 Open the Calculator

When you say open calculator, the application will launch the default calculator.

The procedure resembles what has already been mentioned in the open notepad section

image-20220429150436072

1.5.4 Open Movie Websites

By saying watch a movie, you can command the application to open a movie website for you. This step is done by webbrowser in-built package.

# app.py
if action[0] == "web_browse" and action[1] == "movie":
    self.get_scene("response_scene").get_control("tip_2").reload(
        overridden_text="I'll open a movie website!")

    key_chosen = random.choice(list(self.action_list["movie"].keys()))
    webbrowser.open(self.action_list["movie"][key_chosen], new=0, autoraise=True)
    self.get_scene("response_scene").get_control("rec_2a").reload(
        overridden_text=key_chosen)
    self.get_scene("response_scene").get_control("rec_2b").reload(
        overridden_text=self.action_list["movie"][key_chosen])

image-20220429150128084

Alternative websites (All items are arranged in random order)

Name Website
IMDB https://www.imdb.com/
iQIYI https://www.iqiyi.com/lib/
Tencent Video https://v.qq.com/
MTime http://www.mtime.com/
Bilibili https://www.bilibili.com/movie/
NetFlex https://www.netflix.com/
1.5.5 Throw a dice

If you say roll a dice, throw the dice or play dice, the application will generate a random integer larger than zero and smaller than 7.

elif action[0] == "event":
    if action[1] == "dice":
        self.get_scene("response_scene").get_control("tip_2").reload(
            overridden_text="Is that you want?")
        self.get_scene("response_scene").get_control("rec_2a").reload(
            overridden_text="Dice Rolling")
        self.get_scene("response_scene").get_control("rec_2b").reload(
            overridden_text="Result is " + str(random.randint(1, 6)))

image-20220429150707246

1.6 Miscellaneous Improvements

1.6.1 Multiscale Anti-Aliasing & Primitive Smoothing

Anti-aliasing is a process of removing the aliasing effect from computer objects. It reduces jagged edges that occur on different edges graphical objects or curved objects. This often improves user's visual feeling by removing "sharp" elements on the user interface.

In the assignment multi-scale anti-aliasing(MSAA) is adopted as the anti-aliasing technique. It's done via PyQt5 framework

# main.py
def __init__(self, width=590 * 1.5, height=950):
    super().__init__()
    ZwLogger().log(ZwLogger.ROLE_GUIQT, ZwLogger.LV_INFO, "Initializing Window Widget")
    # MSAA Anti-aliasing
    self.msaa = QGLFormat()
    self.msaa.setSamples(InteractionUI.arg_msaa)
     self.ogl_widget = ZwGraphicsContainer(self,
                                              format_s=self.msaa,...)
    
# graphics_utils.py
class ZwGraphicsContainer(QGLWidget):
    def __init__(self, parent, format_s, frame_rate=60, app=None, s_width=None, s_height=None):
        super().__init__(format_s, parent)

Other smoothing options are also configured.

# graphics_utils.py
ogl.glEnable(ogl.GL_POINT_SMOOTH)
ogl.glEnable(ogl.GL_LINE_SMOOTH)
ogl.glEnable(ogl.GL_MULTISAMPLE)
ogl.glEnable(ogl.GL_POLYGON_SMOOTH)
ogl.glHint(ogl.GL_POLYGON_SMOOTH_HINT, ogl.GL_NICEST)
ogl.glHint(ogl.GL_LINE_SMOOTH_HINT, ogl.GL_NICEST)
ogl.glHint(ogl.GL_POINT_SMOOTH_HINT, ogl.GL_NICEST)

Here's the effect of MSAA

No Anti-aliasing (1x) 8x MSAA
image-20220429143426005 image-20220429143531101

1.7 Screenshots & Visual Effects

1.7.1 Start Page

Start page shows welcome information and guide user to use the App by saying "Hello". This helps them know how the App works.

image-20220429144051126

image-20220429144247566

1.7.2 Help Page

This page shows help information.

image-20220429144334624

1.7.3 Main Page

This is the main page of the App. There will be three random tips.

image-20220429144746452

1.7.4 Success Page

If a command is successfully recognized by the App, this page will show.

image-20220429144926551

1.7.5 Failure Page

If a command cannot be recognized by the App, this page will show.

image-20220429145150958

2. Recognition Performance Improvements

This part includes the model selection and noise reduction, which contributes to the improvement of recognition accuracy.

2.1 Model Selection

In this assignment, two models are surveyed. The first is the baseline model Sphinx, which is mentioned in the assignment instruction. The second one is DeepSpeech, which is based on neural network.

To determine wheter the deep learning model outperforms the traditional ones, an experiment is carried to investigate the Word Error Rates(WER) of two models under two different situations.

The first situation is the common situation. In this situation, voices of random sentences are given for two models. In the second situationk, voices of short instructions (like open the notepad) are given for two models. And to ensure the experiment exhibits the generalizability, four differents text-to-speech(TTS) methods are adopted. These methods include Google Translate, Baidu Translate, Bing Translate and pyttsx3.

The result shows DeepSpeech outperforms Sphinx in both situations.

image-20220428233341918

Thus, DeepSpeech is adopted as the default recognition engine. However, user can still use Sphinx via command arguments.

def interpret_from_file(self, path):
    # DeepSpeech
    if SpeechRecognitionUtil.engine == "ds":
        fin = wave.open(path, 'rb')
        audio = np.frombuffer(fin.readframes(fin.getnframes()), np.int16)
        return SpeechRecognitionUtil.model.stt(audio)
    # Sphinx
    else:
        file = sr.AudioFile(path)
        with file as f:
            audio = self.recognizer.record(f)
            return self.recognizer.recognize_sphinx(audio)

2.2 Noise Reducing

When application starts, a short video clip will be recorded. The clip contains the environment noise. And every time a new clip is recorded, this noise clip will be used to noise reducing using spectral gating algorithm

3. Appendix

3.1 Accuracy of ASR Models (Random Sentences)

WER is the abbreviation of Word Error Rate. This means the lower WER it is, the better the model it should be.

Ground Truth TTS Engine Sphinx DeepSpeech Sphinx WER DeepSpeech WER
parsley sage rosemary and time pyttsx3 parsley sage rosemary and time parsley say rose marianne time 0.00% 23.33%
should old acquaintance be forgot pyttsx3 should auld acquaintance be forgot soot old the quaintance pefore got 6.06% 36.36%
The quick brown fox jumps over a lazy dog pyttsx3 the quake brown fox down solvay lazy dog the quick bron fox jumps over ar lazy dog 36.59% 7.32%
We have provided a pretrained model pyttsx3 we have provided a crackling model we have provided a redoring model 20.00% 17.14%
it aims at developing Practical Algorithms for General Image Restoration. pyttsx3 it aims at developing practical algorithms for generally made to estimation hey mstap developing practical algarithms for a general image resteration 13.70% 16.44%
I want a bottle of water pyttsx3 i want a bottle of water i wan tay bodle of water 0.00% 20.83%
Vulkan is constantly evolving to bring new capabilities and improvements to the API. pyttsx3 full kenny's constantly evolving to bring your capabilities and improvements to the a. p. i fookcal is constantly volving to bring new capabilities and improvements to the api 18.07% 8.43%
Ambient light is light that is diffused equally throughout an environment. pyttsx3 ambient light is like that is the fewest equally throughout an environment anbiant light his like that instifused equally through out ind environment 15.07% 17.81%
In computer graphics, ray tracing is a rendering technique for generating an image by tracing the path of light as pixels in an image plane and simulating the effects of its encounters with virtual objects. pyttsx3 a computer graphics retracing is a man during technique for generating an image by tracing the path of light as pixel seen any makes plain and simulating the effect some facing converse with virtual objects in conputer graphics ratwacing is a rendering tachnique for a generating and image by twicing the path of light aspicks olds in the limit playing an simbiulating the effects of its in counters with virtual objects 16.50% 16.50%
Logistic regression is basically a supervised classification algorithm. pyttsx3 logistically question is basically a supervised classification algorithm lotistic regression the ins basically a supervised classification algarithem 12.68% 12.68%
A Lie group is a smooth manifold obeying the group properties and that satisfies the additional condition that the group operations are differentiable. pyttsx3 they like what they say smallest manifold the pain the cruel properties and that satisfies that additional condition that the growth operations are different answerable a ligrol bis as moovh manifold abeying the groel properties and their satisfies the additional condition that the creup operations are differentiable 30.26% 11.84%
The game features an open-world environment and an action-based battle system using elemental magic and character-switching. pyttsx3 the game features an open world environment and an action based that old system using elemental magic and character sweetie the game features and open whereold environment and an action baced bat o system using elemental magic and character switching 12.10% 10.48%
Pudding is a type of food that can be either a dessert or a savoury dish that is part of the main meal. pyttsx3 pudding is a type of flood that can be either a dessert or a savory days that is part of the main meal putting his eight hype food that can be either aiddesern or a sabory adish that is part of the main meal 5.83% 19.42%
Greenland is an island nation located between the Arctic and Atlantic oceans pyttsx3 we monday said island nation located between the arctic and atlantic oceans green lyndisan island nation located between the arctic andad lantic oceans 15.79% 9.21%
Generative Adversarial Networks are a powerful class of neural networks that are used for unsupervised learning pyttsx3 the animated adversarial networks are a powerful class of neural networks that are used for unsupervised learning generated adversary ol networks are a powerful class of nero networks that are used for unsupervised to learning 8.11% 10.81%
Attention is a powerful mechanism developed to enhance the performance of the Encoder-Decoder architecture on neural network-based machine translation tasks pyttsx3 attention is a powerful mechanism developed to enhanced the performance of the hinkle dirty claudette architecture on the road network based machine translation tasks attention they ay powerful mechanism developed to inhance the performance of the impoder decolder architecture arm neuro network based machine twenslation tasks 15.29% 12.10%
A radical Infected organization, Reunion rallies around the cause of Infected nationalism, rejecting racial and national identities. pyttsx3 a medical infected organization the union bally's about the cause of the infected nationalism rejecting racial and national identities a radical infected organization reunion rallies around the coust of infected nationalism rhich acting racial end national identities 14.29% 9.77%
The Panda bear is a large mammal native to central China. These bears are also known as giant pandas, or simply pandas. pyttsx3 the p. and that there is a large mammal native to central china these days are also milne s. giant pandas are simply pandas the pande perious tay large memo native to central china these pairs are also known nas giant pandes or simply pandes 21.01% 17.65%
The Trump administration has accused TikTok of transferring user data to servers in China and has banned the app. pyttsx3 the twelve administration has accused take till got transforming user david was ever seen china and has banned the gap the twele pud ministration has a cused tictoke of twensfurin user data to server s in china and has band the ap 25.66% 17.70%
Back face culling determines whether a polygon of a graphical object is visible. pyttsx3 that face colleen determines whether a colleague on ave graphical object is visible back face colling determincs weather a polygon of a graphical objectives visible 22.50% 10.00%
Geometric camera calibration, also referred to as camera resectioning, estimates the parameters of a lens and image sensor of an image or video camera. pyttsx3 geometric camera calibration also referred to as camera reception the estimates that the managers of aliens and he makes sense or an image or video camera teo metric camera calibration also referred to ask hamme rebi sectiony astimates the prrameters of alence and iage sensor of animite or video camera 18.54% 16.56%
Food security is the measure of the availability of food and individuals ability to access it. pyttsx3 full of securities that measure of the availability of food and individuals ability to access ford security is the measure of the availability of food ale indhividuals ability to access it 12.63% 6.32%
Pineapple is a large tropical fruit with a spiky, tough skin and sweet insides. google_translate find out or so large tropical fruit with the spiky tasking in sweet insights pinapple is a large tropical fruit with a spikin tuff skin an sweden sides 33.33% 14.10%
Rosemary is one of those wonderful herbs that makes a beautiful ornamental plant as well as a versatile culinary seasoning. google_translate rosemary is one of those wonderful words that makes the unocal ornamental claim as well as and pursued alcala neary seasoning rosmarry is one of those wonderful erbs that makes op beautiful ornomental plan as well as a vers ital colinary seasoning 22.95% 9.84%
Lily of the valley has slender rootstock and two large oblong leaves with prominent veins. google_translate lily of the valley has slender root stock into large oblong agrees with common in chains lil of the valley has slender rootes stock and two large oblong leaves with prominent veins 22.22% 5.56%
Sodium chloride is an essential nutrient and is used in healthcare to help prevent patients from becoming dehydrated google_translate sodium chloride is an essential nutrient didn't use to tell her to help her that he since from becoming dehydrated sodium cloride is in essential nutrient ind is used in health care to help prevent patience from becoming dehydrated 23.28% 5.17%
Rasterization can be generally defined as a process of converting or mapping fragments into the projection plane google_translate roster as a stinking be generally teeth are hindus are processor chamber engler mapping fragments into the rejection plane reste ization cin be generally defined as a process of converting or mapping fragments into the projection 34.82% 4.46%
In information theory, the entropy of a random variable is the average level of information, surprise, or uncertainty inherent to the variable's possible outcomes. google_translate in information here eat the entropy upper random variable the average landlord increasing surprise or uncertainty in hearing did variables possible outcomes information theory the entripe of a random variable is the aperage level of information surprise or uncertainty inherent to the variables possible outcomes 23.16% 5.59%
Red foxes have long snouts and red fur across the face, back, sides, and tail. google_translate ready boxes have long's now regret for cross nice back sites in taro boxes have long snouts and red fur across the face back sides and tail 38.46% 11.54%
A shader is a computer program that calculates the appropriate levels of light, darkness, and color during the rendering of a scene, a process known as shading baidu_translate washington aren't you forgot my only the all early there are these are monitoring and seen on guard are changing shager insicomputer program that calcinly the appropriate levels of life darkness and colore during the rentering of the see at process non or shiting 69.81% 20.75%
Structure from motion is a photogrammetric range imaging technique for estimating three-dimensional structures from two-dimensional image sequences that may be coupled with local motion signals. baidu_translate start your commercial need a doctor not real and lamenting me for me and treatment shirts shoes aren't you do not sure he makes the greek it got me all the akbar burglar komercni ya struct ure from toshin if of the tolk remetric range inegingtectne for at meting treat imentional structures from sudamantional imex sequenie that may be happled with local motion ficnals 71.50% 25.39%
A B-spline or basis spline is a spline function that has minimal support with respect to a given degree, smoothness, and domain partition baidu_translate at least aren't working on who's on car should not listening in worker and the stunt men and our main parting shot pind or bathis pinis is blin calsian that has minimal support wat especti icas in degree moveness and o main qartition 68.61% 32.12%
Level of detail refers to the complexity of a model representation. It can be decreased as the model moves away from the viewer or according to other metrics such as object importance, viewpoint-relative speed or position. baidu_translate let her on let her know grace to eat 'em are and he shot the king and alm her week on you weren't work lincoln got an auction got stuck on the importance you want ardently or cutting shot elevtle of detail referse to the comprexity he model or presotation egh can be decreaced to se model move herway from the viewer or ccording to other metric suc as object importance point relative feed or posention 69.37% 18.02%
A deputy-involved shooting at a Target store in Kissimmee, Florida, Wednesday evening left at least one person dead and three others injured, according to reports. baidu_translate and what we got she on the ring any law bring you blame not workers and get on yet there's inching poured into the loot theputy involved shooting attit harket stor if theny florida when the evening left at least one person dead entreer gers indurn thaccording to repours 67.48% 26.99%
The Belt and Road Initiative, formerly known as One Belt One Road or OBOR for short, is a global infrastructure development strategy adopted by the Chinese government in 2013 to invest in nearly 70 countries and international organizations. bing_translate the dow inroad in a mid dial in earnest in learning our one bell one good r. o. b. or curse words in the global interest or you did not answer and c. got it i it's tiny peppermint indeed that they're hoops and that in your eat and eat and the engine asked organizations the belt and rod in it the belt and rod inationtive por mali nel as one belt one road or obio or fo shore is a global infrastructure development strategy adopted by the chinese government intwo thousand thirteen to invest i nearly seventy countries and international organizations 67.08% 29.17%
Some results are removed in response to a notice of local law requirement. For more information, please see here. bing_translate some results are not in response to an arrogant and local law requirements are laurin mace and meet each year some results are removed in response to a notice of local law requirements for more information please see here 40.71% 2.65%
The 996 working hour system is a work schedule practiced by some companies in the Peoples Republic of China. bing_translate the nine hundred ninety six working artist and igor it all is that some companies in it he calls republican synar the nisi working hour system is a work skhedule practiced by some companies in the peoples republic of china 60.19% 5.56%

3.2 Accuracy of ASR Models (Instructions Used in App)

Ground Truth TTS Engine Sphinx DeepSpeech Sphinx WER DeepSpeech WER
Open the notepad baidu_translate and a from the not dad 81.25% 37.50%
Open the notepad bing_translate logan a note pa open the note pad 43.75% 6.25%
Open the notepad google_translate opening up pat pen the note padh 56.25% 18.75%
Hello baidu_translate hello elo 0% 60%
Hello bing_translate and leno hello 100% 0%
Hello google_translate hello hello 0% 0%
Play music baidu_translate protein called p mefil 100% 60%
Play music bing_translate claim he's dead play music 100% 0%
Play music google_translate Play music Play music 0% 0%
Watch a movie baidu_translate or to me in mincy 76.92% 84.62%
Watch a movie bing_translate why sanity watch a movie 76.92% 0%
Watch a movie google_translate watch a movie watch a movie 0% 0%

3.3 Mapping Dictionary

self.application_map = {
    "open notepad": ["shell", "Notepad"],
    "notepad": ["shell", "Notepad"],
    "write notes": ["shell", "Notepad"],
    "write something": ["shell", "Notepad"],
    "take notes": ["shell", "Notepad"],
    "play music": ["shell", "Music Player"],
    "music": ["shell", "Music Player"],
    "play songs": ["shell", "Music Player"],
    "listen to music": ["shell", "Music Player"],
    "open calculator": ["shell", "Calculator"],
    "watch a movie": ["web_browse", "movie"],
    "movie": ["web_browse", "movie"],
    "watch movies": ["web_browse", "movie"],
    "a movie": ["web_browse", "movie"],
    "roll a dice": ["event", "dice"],
    "play dice": ["event", "dice"],
    "roll the dice": ["event", "dice"],
    "throw the dice": ["event", "dice"],
    "roll dice": ["event", "dice"]
}

3.4 Icon Drawing

class ZwUIAniMomoBotIconControl(ZwUIAbstractControl):
    def __init__(self, x, y, radius=2, fore=ZwUIHelper.get_opengl_rgba_vector(1, 1, 1),
                 back=ZwUIHelper.get_opengl_rgba_vector(0, 0, 0), **kwargs):
        super(ZwUIAniMomoBotIconControl, self).__init__()
        self.x = x
        self.y = y
        self.r = radius
        self.fore = fore
        self.back = back
        self.paused = True
        self.initialize(**kwargs)

    def initialize(self, **kwargs):
        # Primary Params
        x = self.x
        y = self.y
        r = self.r
        r2 = self.r * 0.718
        # Face Params
        fr = 0.5
        fr2 = 0.3
        dh = r / 1.618
        fs = r / 3
        delta = 0.01
        # Antenna Params
        anw = r / 5
        anh = r / 3
        anhd = 0.1
        anhr = r / 5
        # Eye Params
        eyecl = r / 2.8
        eyelh = r / 3.8
        eyenr = r / 8
        eyesp = r / 9

        eyecl2 = r / 3
        eyelh2 = r / 3.5
        eyenr2 = r / 5

        eyecl2i = r / 2
        eyelh2i = r / 4.5
        eyenr2i = r / 9

        # Ear Params
        earw = r / 5
        earh = r / 2
        earb = r / 3
        ear_adjust = 0.1

        assert 'animation_duration' in kwargs, "Animation duration must be configured"
        tra = lambda x: x
        if 'animation_transition' in kwargs:
            tra = lambda x: kwargs['animation_transition'](x)[1]
        for i in range(-1, int(kwargs['animation_duration'])):
            fgc = self.fore
            bgc = ZwUITransition.color_trans(tra, self.fore, self.back, (i + 1) / int(kwargs['animation_duration']))
            face1 = ZwUICircle(x + r, y + dh, r, bgc, 0, PI + 1.0 / 360 * PI * 2)
            face2 = ZwUIRoundedRectangle(x, y, 2 * r, dh, fr, bgc)
            face3 = ZwUIRectangle(x, y + fr, 2 * r, dh - fr + delta, bgc)
            face_in1 = ZwUICircle(x + r, y + dh, r2, fgc, 0, PI + 1.0 / 360 * PI * 2)
            face_in2 = ZwUIRoundedRectangle(x + (r - r2), y + dh - fs + fr2, r2 * 2, fs, fr2, fgc)
            antenna1 = ZwUIRectangle(x + r - anw / 2, y + dh + r - anhd, anw, anh, bgc)
            antenna2 = ZwUICircle(x + r, y + dh + r - anhd + anh, anhr, bgc)
            eye_normal1 = ZwUICircle(x + r - eyecl, y + dh - fs + fr2 + eyelh, eyenr, bgc)
            eye_normal2 = ZwUICircle(x + r + eyecl, y + dh - fs + fr2 + eyelh, eyenr, bgc)
            eye_exciting1 = ZwUITriangle(
                ZwUIHelper.get_opengl_pos3d_vector(x + r - eyecl2 + eyenr2, y + dh - fs + fr2 + eyelh2),
                ZwUIHelper.get_opengl_pos3d_vector(x + r - eyecl2 - eyenr2, y + dh - fs + fr2 + eyelh2 + eyenr2),
                ZwUIHelper.get_opengl_pos3d_vector(x + r - eyecl2 - eyenr2, y + dh - fs + fr2 + eyelh2 - eyenr2),
                bgc
            )
            eye_exciting2 = ZwUITriangle(
                ZwUIHelper.get_opengl_pos3d_vector(x + r + eyecl2 - eyenr2, y + dh - fs + fr2 + eyelh2),
                ZwUIHelper.get_opengl_pos3d_vector(x + r + eyecl2 + eyenr2, y + dh - fs + fr2 + eyelh2 + eyenr2),
                ZwUIHelper.get_opengl_pos3d_vector(x + r + eyecl2 + eyenr2, y + dh - fs + fr2 + eyelh2 - eyenr2),
                bgc
            )
            eye_exciting_in1 = ZwUITriangle(
                ZwUIHelper.get_opengl_pos3d_vector(x + r - eyecl2i + eyenr2i, y + dh - fs + fr2 + eyelh2),
                ZwUIHelper.get_opengl_pos3d_vector(x + r - eyecl2i - eyenr2i, y + dh - fs + fr2 + eyelh2 + eyenr2i),
                ZwUIHelper.get_opengl_pos3d_vector(x + r - eyecl2i - eyenr2i, y + dh - fs + fr2 + eyelh2 - eyenr2i),
                fgc
            )
            eye_exciting_in2 = ZwUITriangle(
                ZwUIHelper.get_opengl_pos3d_vector(x + r + eyecl2i - eyenr2i, y + dh - fs + fr2 + eyelh2),
                ZwUIHelper.get_opengl_pos3d_vector(x + r + eyecl2i + eyenr2i, y + dh - fs + fr2 + eyelh2 + eyenr2i),
                ZwUIHelper.get_opengl_pos3d_vector(x + r + eyecl2i + eyenr2i, y + dh - fs + fr2 + eyelh2 - eyenr2i),
                fgc
            )
            ear1 = ZwUIRectangle(x - earw + ear_adjust, y + earb, earw, earh, bgc)
            ear2 = ZwUIRectangle(x + 2 * r - ear_adjust, y + earb, earw, earh, bgc)

            keyframe1 = ZwUIKeyFrame()
            keyframe1.add_shape(eye_exciting_in1)
            keyframe1.add_shape(eye_exciting_in2)
            keyframe1.add_shape(eye_exciting1)
            keyframe1.add_shape(eye_exciting2)
            keyframe1.add_shape(face_in1)
            keyframe1.add_shape(face_in2)
            keyframe1.add_shape(face1)
            keyframe1.add_shape(face2)
            keyframe1.add_shape(face3)
            keyframe1.add_shape(antenna1)
            keyframe1.add_shape(antenna2)
            keyframe1.add_shape(ear1)
            keyframe1.add_shape(ear2)

            keyframe2 = ZwUIKeyFrame()
            keyframe2.add_shape(eye_normal1)
            keyframe2.add_shape(eye_normal2)
            keyframe2.add_shape(face_in1)
            keyframe2.add_shape(face_in2)
            keyframe2.add_shape(face1)
            keyframe2.add_shape(face2)
            keyframe2.add_shape(face3)
            keyframe2.add_shape(antenna1)
            keyframe2.add_shape(antenna2)
            keyframe2.add_shape(ear1)
            keyframe2.add_shape(ear2)

            self.append_keyframe(keyframe1)
            self.ani_clips["normal"] = ZwUIClips()
            self.ani_clips["normal"].add_frame(keyframe2)

momo-asr-bot's People

Contributors

aeroraven avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.