Giter VIP home page Giter VIP logo

Research Interests

Deep Learning: At LG Electronics, I am developing an AI coding assistant using large language models (LLMs). I have successfully trained LLMs in the distributed settings, and have deployed LLMs to hundreds of users. Recently, I am conducting research on fast and accurate LLM inference.

Algorithm Engineering: My primary research efforts have been devoted to developing fast algorithms. I developed fast algorithms for graph isomorphism, graph isomorphism query processing, and multiple pattern Cartesian tree matching during my Ph.D. studies.

Work Experience

LG Electronics - Artificial Intelligence Lab (Senior Researcher)

  • Jan. 2024 - Present: Development of AI Coding Assistant using Large Language Model
    • Conducting research on domain adaptive continual pretraining code LLMs.
    • Maintaining custom benchmark dataset for offline evaluation.
    • Analyzing user data and feedback for online evaluation.
    • Constructing instruction dataset and conducting instruction-tuning.
  • Aug. 2022 – Dec. 2023: Development of AI Coding Assistant using Large Language Model
    • Conducted distributed training of LLMs based on decoder-only transformer.
    • Filtered and deduplicated terabytes of source code data.
    • Developed a fast LLM inference server in terms of latency and throughput.
  • Apr. 2022 – Dec. 2022: Development of Coding Education Program Utilizing AI
    • Constructed training data for generating Python code from natural language instruction.
    • Trained an encoder-decoder transformer from scratch.
    • Developed a web client that inputs prompt, prints AI-generated code, and executes Python code.
    • Created a inference server that runs on multiple GPUs, loads multiple copies of the model, and offers dynamic batching for increased throughput.

Seoul National University – Institute of Computer Technology (Post-Doctoral Assistant)

  • Jan. 2022 – Mar. 2022: Algorithm Development for Graph Isomorphism Query Processing
    • Developed a fast graph isomorphism query processing algorithm that runs orders of magnitude faster than state-of-the-art algorithms.

NAVER – AI Dev2 (Internship)

  • Oct. 2021: Analyzing Conversion Tracking Data
    • Conducted exploratory data analysis on glad for advertisement data to find meaningful trends.
    • Handled hundred gigabytes of (raw) conversion tracking data.
    • Solved optimization problem of maximizing conversion rate using linear programming.

Tech/Skills

Competitive Programming

Solved.ac 프로필

Programming Languages

Libraries

  • PyTorch, TensorFlow, Triton (OpenAI), Seaborn, Pandas, PySpark, HuggingFace Transformers, DeepSpeed, NVIDIA Triton, NVIDIA Faster Transformer, FastAPI, gtest

Others

  • AWS (SageMaker, EC2, Lustre, S3)

CV

GeonmoGu_CV

Geonmo Gu's Projects

apps icon apps

APPS: Automated Programming Progress Standard (NeurIPS 2021)

casrel icon casrel

A Novel Cascade Binary Tagging Framework for Relational Triple Extraction. Accepted by ACL 2020.

ceci-release icon ceci-release

Source Code for "CECI: Compact Embedding Cluster Index for Scalable Subgraph Matching"

cliques icon cliques

Refined pivot selection for maximal clique enumeration in graphs, Theoretical Computer Science 2016

codegen icon codegen

CodeGen is an open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.

daf icon daf

Efficient Subgraph Matching: Harmonizing Dynamic Programming, Adaptive Matching Order, and Failing Set Together

dfscode icon dfscode

To generate the minimum DFS code of a given graph

dmce icon dmce

Distributed Maximal Clique Computation, IEEE BigData 2014

gboost icon gboost

A fork of Sebastian Nowozin's and Koji Tsuda's gboost code

gmgu.github.io icon gmgu.github.io

:triangular_ruler: Jekyll theme for building a personal site, blog, project documentation, or portfolio.

graph_edit_distance icon graph_edit_distance

This project aims at exact graph edit distance (GED) computation and GED verification (verify whether the GED between two graphs is smaller than a given threshold), where all edit operators are assumed to have unit costs.

idar icon idar

Fast Supergraph Search Using DAG Integration

lasagne icon lasagne

A fork of the LASAGNE project (http://amici.dsi.unifi.it/lasagne/) looking for improvement.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.