Publications

(2023). QA-Matcher: Unsupervised Entity Matching Using a Question Answering Model. PAKDD.

(2023). DeepJoin: Joinable Table Discovery with Pre-trained Language Models. VLDB.

(2023). Cross-Domain User Similarity without Overlapping Attributes via Optimal Transport Theory. SIGIR eCom.

(2022). Table Enrichment System for Machine Learning (Demo). SIGIR.

(2021). Low-resource Taxonomy Enrichment with Pretrained Language Models. EMNLP.

(2021). Entity Matching with String Transformation and Similarity-Based Features. SFDI.

(2021). User Identity Linkage for Different Behavioral Patterns across Domains. ICWSM.

(2021). Efficient Joinable Table Discovery in Data Lake: A High-Dimensional Similarity-Based Approach. ICDE.

(2021). Quality Control for Hierarchical Classification with Incomplete Annotations. PAKDD.

(2020). Learning from Unsure Responses. AAAI 2020.

PDF DOI

(2020). Continuous Top-k Spatial-Keyword Search on Dynamic Objects. VLDB Journal.

(2019). Extracting Feature Engineering Knowledge from Data Science Notebooks. IEEE Big Data 2019.

(2019). Meimei: An Efficient Probabilistic Approach for Semantically Annotating Tables. AAAI 2019.

PDF DOI

(2018). Compressed Vector Set: A Fast and Space-Efficient Data Mining Framework. Journal of Information Processing (JIP).

PDF DOI

(2017). Link Prediction for Isolated Nodes in Heterogeneous Network by Topic-Based Co-clustering. PAKDD 2017.

PDF DOI

(2014). MOARLE: Matrix Operation Accelerator Based on Run-Length Encoding. APWeb 2014.

PDF DOI

(2013). Data Stream Processing with Concurrency Control. ACM SIGAPP Applied Computing Review.

PDF DOI

(2013). Continuous Query Processing with Concurrency Control: Reading Updatable Resources Consistently. ACM SAC 2013.

PDF DOI

(2011). Efficient Invocation of Transaction Sequences Triggered by Data Streams. SMDMS 2011.

PDF DOI