Masafumi Oyamada

Masafumi Oyamada

小山田 昌史

Chief Scientist

NEC Corporation

Google Scholar

GitHub

X / Twitter

Research Interests

I am a computer scientist based in Tokyo, Japan. I am passionate about maximizing human potential through the power of machines (機械) and knowledge (知識). My current research interests focus on three key areas: developing agentic AI systems that can autonomously interact with their environment 🤖, advancing computer automation to streamline complex workflows ⚡, and exploring methods for LLM self-improvement to enhance AI capabilities 📈.

I am now a Chief Scientist at NEC Corporation, where I lead research initiatives in large language models, data preprocessing automation, and AI systems. Our recent work includes developing novel approaches for LLM self-improvement, optimizing low-resource language model training, and creating intelligent systems for data preprocessing and table understanding. We are particularly focused on making AI systems more efficient, autonomous, and capable of handling complex real-world tasks.

Contact

  • E-Mail
  • I welcome opportunities for research collaboration and internships in the areas of AI systems, automation, and language models. If you're interested in working together on cutting-edge AI research, please reach out.

Recent News

November 9, 2024

Paper Accepted to IEEE Big Data 🤖

Our poster paper "Towards Automated Workflow Construction for AI Agents: A Preliminary Study" has been accepted to IEEE Big Data 2024.

Publication
November 7, 2024

Paper Accepted to EMNLP 🪼

Our paper "Jellyfish: Instruction-Tuning Local Large Language Models for Data Preprocessing" has been accepted to EMNLP 2024.

Publication
November 5, 2024

Lecture at Kobe University 🎓

Presented on Real-world Large Language Model Development and Recent Research Trends (such as Test-time Compute) at Kobe University.

Talk
October 28, 2024

Talk at ACM MM 🎯

Presented at ACM MM'24 on NEC's large language model development.

Talk
September 11, 2024

Invited Talk at WebDB Summer Workshop 🗣️

Presented on Self-Improving LLM, RAG, and Action Model at the organized session of LLM and Data Management in WebDB Summer Workshop.

Talk
July 6, 2024

Tutorial Accepted to CIKM 📊

Our tutorial "On the Use of Large Language Models for Table Tasks" has been accepted to CIKM 2024.

Publication
July 3, 2024

Talk at IPSJ Seminar 🎤

Presented at IPSJ Seminar on NEC's large language model development.

Talk
January 5, 2024

Paper Accepted to *SEM@NAACL 🎯

Our paper "Relevance, Diversity, and Exclusivity: Designing Keyword-augmentation Strategy for Zero-shot Classifiers" has been accepted to *SEM@NAACL 2024.

Publication
August 15, 2023

Paper Accepted to VLDB 🔍

Our paper "DeepJoin: Joinable Table Discovery with Pre-trained Language Models" has been accepted to VLDB 2023.

Publication
July 20, 2023

Paper Accepted to IEEE Big Data 📚

Our paper "Towards Large Language Model Organization: A Case Study on Abstractive Summarization" has been accepted to IEEE Big Data 2023.

Publication
June 10, 2023

Paper Accepted to EMNLP Findings 🔍

Our paper "Context Quality Matters in Training Fusion-in-Decoder for Extractive Open-Domain Question Answering" has been accepted to EMNLP Findings 2023.

Publication
March 15, 2023

Paper Accepted to PAKDD 🤝

Our paper "QA-Matcher: Unsupervised Entity Matching Using a Question Answering Model" has been accepted to PAKDD 2023.

Publication
April 20, 2022

Paper Accepted to SIGIR 📊

Our paper "Table Enrichment System for Machine Learning" has been accepted to SIGIR 2022.

Publication

Publications

Can Large Language Models Invent Algorithms to Improve Themselves?

Yoichi Ishibashi, Taro Yano, Masafumi Oyamada

CoRR(2024)

Jellyfish: Instruction-Tuning Local Large Language Models for Data Preprocessing.

Haochen Zhang, Yuyang Dong, Chuan Xiao, Masafumi Oyamada

EMNLP(2024)

On the Use of Large Language Models for Table Tasks.

Yuyang Dong, Masafumi Oyamada, Chuan Xiao, Haochen Zhang

CIKM(2024)

Large Language Models as Data Preprocessors.

Haochen Zhang, Yuyang Dong, Chuan Xiao, Masafumi Oyamada

VLDB Workshops(2024)

LightPAL: Lightweight Passage Retrieval for Open Domain Multi-Document Summarization.

Masafumi Enomoto, Kunihiro Takeoka, Kosuke Akimoto, Kiril Gashteovski, Masafumi Oyamada

CoRR(2024)

DeepJoin: Joinable Table Discovery with Pre-trained Language Models.

Yuyang Dong, Chuan Xiao, Takuma Nozawa, Masafumi Enomoto, Masafumi Oyamada

Proc. VLDB Endow.(2023)

Context Quality Matters in Training Fusion-in-Decoder for Extractive Open-Domain Question Answering.

Kosuke Akimoto, Kunihiro Takeoka, Masafumi Oyamada

EMNLP (Findings)(2023)

QA-Matcher: Unsupervised Entity Matching Using a Question Answering Model.

Shogo Hayashi, Yuyang Dong, Masafumi Oyamada

PAKDD (4)(2023)

Large Language Models as Data Preprocessors.

Haochen Zhang, Yuyang Dong, Chuan Xiao, Masafumi Oyamada

CoRR(2023)

Jellyfish: A Large Language Model for Data Preprocessing.

Haochen Zhang, Yuyang Dong, Chuan Xiao, Masafumi Oyamada

CoRR(2023)

Table Enrichment System for Machine Learning.

Yuyang Dong, Masafumi Oyamada

SIGIR(2022)

Table Enrichment System for Machine Learning.

Yuyang Dong, Masafumi Oyamada

CoRR(2022)

DeepJoin: Joinable Table Discovery with Pre-trained Language Models.

Yuyang Dong, Chuan Xiao, Takuma Nozawa, Masafumi Enomoto, Masafumi Oyamada

CoRR(2022)

Continuous top-k spatial-keyword search on dynamic objects.

Yuyang Dong, Chuan Xiao, Hanxiong Chen, Jeffrey Xu Yu, Kunihiro Takeoka, Masafumi Oyamada, Hiroyuki Kitagawa

VLDB J.(2021)

Low-resource Taxonomy Enrichment with Pretrained Language Models.

Kunihiro Takeoka, Kosuke Akimoto, Masafumi Oyamada

EMNLP (1)(2021)

Efficient Joinable Table Discovery in Data Lakes: A High-Dimensional Similarity-Based Approach.

Yuyang Dong, Kunihiro Takeoka, Chuan Xiao, Masafumi Oyamada

ICDE(2021)

Quality Control for Hierarchical Classification with Incomplete Annotations.

Masafumi Enomoto, Kunihiro Takeoka, Yuyang Dong, Masafumi Oyamada, Takeshi Okadome

PAKDD (3)(2021)

Learning with Unsure Responses.

Kunihiro Takeoka, Yuyang Dong, Masafumi Oyamada

AAAI(2020)

Efficient Joinable Table Discovery in Data Lakes: A High-Dimensional Similarity-Based Approach.

Yuyang Dong, Kunihiro Takeoka, Chuan Xiao, Masafumi Oyamada

CoRR(2020)

Meimei: An Efficient Probabilistic Approach for Semantically Annotating Tables.

Kunihiro Takeoka, Masafumi Oyamada, Shinji Nakadai, Takeshi Okadome

AAAI(2019)

Compressed Vector Set: A Fast and Space-Efficient Data Mining Framework.

Masafumi Oyamada, Jianquan Liu, Shinji Ito, Kazuyo Narita, Takuya Araki, Hiroyuki Kitagawa

J. Inf. Process.(2018)

Link Prediction for Isolated Nodes in Heterogeneous Network by Topic-Based Co-clustering.

Katsufumi Tomobe, Masafumi Oyamada, Shinji Nakadai

PAKDD (1)(2017)

MOARLE: Matrix Operation Accelerator Based on Run-Length Encoding.

Masafumi Oyamada, Jianquan Liu, Kazuyo Narita, Takuya Araki

APWeb(2014)

Continuous query processing with concurrency control: reading updatable resources consistently.

Masafumi Oyamada, Hideyuki Kawashima, Hiroyuki Kitagawa

SAC(2013)

Efficient Invocation of Transaction Sequences Triggered by Data Streams.

Masafumi Oyamada, Hideyuki Kawashima, Hiroyuki Kitagawa

3PGCIC(2011)