Research Interests
I am a computer scientist based in Tokyo, Japan. I am passionate about maximizing human potential through the power of machines (機械) and knowledge (知識). My current research interests focus on three key areas: developing agentic AI systems that can autonomously interact with their environment 🤖, advancing computer automation to streamline complex workflows ⚡, and exploring methods for LLM self-improvement to enhance AI capabilities 📈.
I am now a Chief Scientist at NEC Corporation, where I lead research initiatives in large language models, data preprocessing automation, and AI systems. Our recent work includes developing novel approaches for LLM self-improvement, optimizing low-resource language model training, and creating intelligent systems for data preprocessing and table understanding. We are particularly focused on making AI systems more efficient, autonomous, and capable of handling complex real-world tasks.
Contact
I welcome opportunities for research collaboration and internships in the areas of AI systems, automation, and language models. If you're interested in working together on cutting-edge AI research, please reach out.
Recent News
Paper Accepted to IEEE Big Data 🤖
Our poster paper "Towards Automated Workflow Construction for AI Agents: A Preliminary Study" has been accepted to IEEE Big Data 2024.
Paper Accepted to EMNLP 🪼
Our paper "Jellyfish: Instruction-Tuning Local Large Language Models for Data Preprocessing" has been accepted to EMNLP 2024.
Lecture at Kobe University 🎓
Presented on Real-world Large Language Model Development and Recent Research Trends (such as Test-time Compute) at Kobe University.
Talk at ACM MM 🎯
Presented at ACM MM'24 on NEC's large language model development.
Invited Talk at WebDB Summer Workshop 🗣️
Presented on Self-Improving LLM, RAG, and Action Model at the organized session of LLM and Data Management in WebDB Summer Workshop.
Tutorial Accepted to CIKM 📊
Our tutorial "On the Use of Large Language Models for Table Tasks" has been accepted to CIKM 2024.
Talk at IPSJ Seminar 🎤
Presented at IPSJ Seminar on NEC's large language model development.
Paper Accepted to *SEM@NAACL 🎯
Our paper "Relevance, Diversity, and Exclusivity: Designing Keyword-augmentation Strategy for Zero-shot Classifiers" has been accepted to *SEM@NAACL 2024.
Paper Accepted to VLDB 🔍
Our paper "DeepJoin: Joinable Table Discovery with Pre-trained Language Models" has been accepted to VLDB 2023.
Paper Accepted to IEEE Big Data 📚
Our paper "Towards Large Language Model Organization: A Case Study on Abstractive Summarization" has been accepted to IEEE Big Data 2023.
Paper Accepted to EMNLP Findings 🔍
Our paper "Context Quality Matters in Training Fusion-in-Decoder for Extractive Open-Domain Question Answering" has been accepted to EMNLP Findings 2023.
Paper Accepted to PAKDD 🤝
Our paper "QA-Matcher: Unsupervised Entity Matching Using a Question Answering Model" has been accepted to PAKDD 2023.
Paper Accepted to SIGIR 📊
Our paper "Table Enrichment System for Machine Learning" has been accepted to SIGIR 2022.
Publications
Can Large Language Models Invent Algorithms to Improve Themselves?
Yoichi Ishibashi, Taro Yano, Masafumi Oyamada
Optimizing Low-Resource Language Model Training: Comprehensive Analysis of Multi-Epoch, Multi-Lingual, and Two-Stage Approaches.
Kosuke Akimoto, Masafumi Oyamada
Jellyfish: Instruction-Tuning Local Large Language Models for Data Preprocessing.
Haochen Zhang, Yuyang Dong, Chuan Xiao, Masafumi Oyamada
On the Use of Large Language Models for Table Tasks.
Yuyang Dong, Masafumi Oyamada, Chuan Xiao, Haochen Zhang
Relevance, Diversity, and Exclusivity: Designing Keyword-augmentation Strategy for Zero-shot Classifiers.
Taro Yano, Kunihiro Takeoka, Masafumi Oyamada
Large Language Models as Data Preprocessors.
Haochen Zhang, Yuyang Dong, Chuan Xiao, Masafumi Oyamada
Context Quality Matters in Training Fusion-in-Decoder for Extractive Open-Domain Question Answering.
Kosuke Akimoto, Kunihiro Takeoka, Masafumi Oyamada
LightPAL: Lightweight Passage Retrieval for Open Domain Multi-Document Summarization.
Masafumi Enomoto, Kunihiro Takeoka, Kosuke Akimoto, Kiril Gashteovski, Masafumi Oyamada
DeepJoin: Joinable Table Discovery with Pre-trained Language Models.
Yuyang Dong, Chuan Xiao, Takuma Nozawa, Masafumi Enomoto, Masafumi Oyamada
Towards Large Language Model Organization: A Case Study on Abstractive Summarization.
Krisztián Boros, Masafumi Oyamada
Cross-Domain User Similarity without Overlapping Attributes via Optimal Transport Theory.
Genki Kusano, Masafumi Oyamada
Context Quality Matters in Training Fusion-in-Decoder for Extractive Open-Domain Question Answering.
Kosuke Akimoto, Kunihiro Takeoka, Masafumi Oyamada
QA-Matcher: Unsupervised Entity Matching Using a Question Answering Model.
Shogo Hayashi, Yuyang Dong, Masafumi Oyamada
Large Language Models as Data Preprocessors.
Haochen Zhang, Yuyang Dong, Chuan Xiao, Masafumi Oyamada
Jellyfish: A Large Language Model for Data Preprocessing.
Haochen Zhang, Yuyang Dong, Chuan Xiao, Masafumi Oyamada
DeepJoin: Joinable Table Discovery with Pre-trained Language Models.
Yuyang Dong, Chuan Xiao, Takuma Nozawa, Masafumi Enomoto, Masafumi Oyamada
Continuous top-k spatial-keyword search on dynamic objects.
Yuyang Dong, Chuan Xiao, Hanxiong Chen, Jeffrey Xu Yu, Kunihiro Takeoka, Masafumi Oyamada, Hiroyuki Kitagawa
Low-resource Taxonomy Enrichment with Pretrained Language Models.
Kunihiro Takeoka, Kosuke Akimoto, Masafumi Oyamada
Efficient Joinable Table Discovery in Data Lakes: A High-Dimensional Similarity-Based Approach.
Yuyang Dong, Kunihiro Takeoka, Chuan Xiao, Masafumi Oyamada
User Identity Linkage for Different Behavioral Patterns across Domains.
Genki Kusano, Masafumi Oyamada
Quality Control for Hierarchical Classification with Incomplete Annotations.
Masafumi Enomoto, Kunihiro Takeoka, Yuyang Dong, Masafumi Oyamada, Takeshi Okadome
Efficient Joinable Table Discovery in Data Lakes: A High-Dimensional Similarity-Based Approach.
Yuyang Dong, Kunihiro Takeoka, Chuan Xiao, Masafumi Oyamada
Meimei: An Efficient Probabilistic Approach for Semantically Annotating Tables.
Kunihiro Takeoka, Masafumi Oyamada, Shinji Nakadai, Takeshi Okadome
Extracting Feature Engineering Knowledge from Data Science Notebooks.
Masafumi Oyamada
Compressed Vector Set: A Fast and Space-Efficient Data Mining Framework.
Masafumi Oyamada, Jianquan Liu, Shinji Ito, Kazuyo Narita, Takuya Araki, Hiroyuki Kitagawa
Accelerating Feature Engineering with Adaptive Partial Aggregation Tree.
Masafumi Oyamada
Relational Mixture of Experts: Explainable Demographics Prediction with Behavioral Data.
Masafumi Oyamada, Shinji Nakadai
Link Prediction for Isolated Nodes in Heterogeneous Network by Topic-Based Co-clustering.
Katsufumi Tomobe, Masafumi Oyamada, Shinji Nakadai
MOARLE: Matrix Operation Accelerator Based on Run-Length Encoding.
Masafumi Oyamada, Jianquan Liu, Kazuyo Narita, Takuya Araki
Continuous query processing with concurrency control: reading updatable resources consistently.
Masafumi Oyamada, Hideyuki Kawashima, Hiroyuki Kitagawa
Efficient Invocation of Transaction Sequences Triggered by Data Streams.
Masafumi Oyamada, Hideyuki Kawashima, Hiroyuki Kitagawa