数据科学学院师生26篇论文被顶级国际会议NeurIPS 2023接收
香港中文大学(深圳)数据科学学院师生共26篇论文被机器学习和计算神经科学领域的顶级国际会议NeurIPS (Conference on Neural Information Processing Systems 神经信息处理系统大会, 简称NeurIPS或NIPS) 2023接收。论文来自学院18位教授、1位博士后及10位博士生、1位硕士生,除研究生外,学院的本科生也积极参与科研,论文作者中还包括2位学院本科生。NeurIPS 2023的接收率为26.1%。
2位本科生:
卢艺文、施展
1位硕士生:
晏志远
10位博士生:
董婧、李子牛、路舜麟、乔冠仁、孙子恒、王远程、魏少魁、杨超、张明达、朱明丽
1位博士后:
李文浩
18位教授:
丁宏强、樊继聪、李彤欣、李海洲、李爽、李文烨、李肖、刘桂良、罗智泉、马晨昊、茅剑锋、孙若愚、王趵翔、王本友、吴保元、武执政、查宏远、张瑞茂
NeurlPS简介
神经信息处理系统大会(NeurIPS/NIPS)是机器学习和计算神经科学领域的顶尖国际会议。在中国计算机学会的国际学术会议排名中,NeurIPS是人工智能领域的A类学术会议。大会讨论的内容包含深度学习、计算机视觉、大规模机器学习、学习理论、优化、稀疏理论等众多细分领域。该会议固定在每年的12月举行, 由NIPS基金会主办,今年是该会议举办的第37届,将于12月10日至12月16日在美国新奥尔良会议中心举行。
来源:NeurIPS官网、百度百科
更多学生信息,详见:https://mp.weixin.qq.com/s/fmn4Lxc7bl1EAM17Xf1Zcg
26篇论文详情
1. Federated Spectral Clustering via Secure Similarity Reconstruction
作者:
Dong Qiao, Chris Ding, Jicong Fan
简介:
联邦学习在保护数据隐私上具有优势。自谷歌提出联邦学习的概念以来,有很多基于联邦学习框架的安全学习算法已经被提出,以应对数据泄露和信息安全威胁带来的挑战。无监督学习在实际生产中有着广泛的应用。然而,回顾现有的文献,我们发现相关的研究,特别是关于聚类的联邦学习研究,还较少。在这篇文章中,我们提出了一个基于核函数因子分解的联邦谱聚类方法,用于分布式数据的安全聚类。我们基于联邦学习的基本框架,隐式地构建了一个近似的相似度矩阵。基于该相似度矩阵,我们的方法能够在不直接获取终端敏感数据的情况下执行谱聚类任务。为了说明所提算法的有效性,我们证明了算法的收敛性、相似度重构的残差上界以及保证聚类结果的充分条件。除此之外,我们也证明了所提算法可以通过加噪音的方式满足差分隐私。合成数据集和真实数据集上的实验结果显示我们的算法是有效且可比的。
Abstracts:
Federated learning has a significant advantage in protecting data and information privacy. Many scholars proposed various secure learning methods within the framework of federated learning but the study on secure federated unsupervised learning especially clustering is limited. We in this work propose a secure kernelized factorization method for federated spectral clustering on distributed data. The method is non-trivial because the kernel or similarity matrix for spectral clustering is computed by data pairs, which violates the principle of privacy protection. Our method implicitly constructs an approximation for the kernel matrix on distributed data such that we can perform spectral clustering under the constraint of privacy protection. We provide a convergence guarantee of the optimization algorithm, a reconstruction error bound of the Gaussian kernel matrix, and the sufficient condition of correct clustering of our method. We also present guarantees of differential privacy. Numerical results on synthetic and real datasets demonstrate that the proposed method is efficient and accurate in comparison to the baselines.
链接:
https://nips.cc/virtual/2023/poster/71656
2. Lovász Principle for Unsupervised Graph Representation Learning
作者:
Ziheng Sun (SDS博士生), Chris Ding, Jicong Fan
简介:
本文侧重于图级表示学习,旨在将图表示为可直接用于图分类等下游任务的向量。我们受到了图论中Lovász数的启发并提出了一种名为Lovász原理的新型图级表示学习原理。Lovász数是一个实数,是图Shannon容量的上界,与图的各种全局特征密切相关。具体而言,我们展示了用于计算Lovász数的伞柄向量可能是图表示的合适选择,因为它捕捉了图的全局特性。为了处理直接应用伞柄向量带来的困难和问题,我们将Lovász原理应用于图神经网络来解决这些问题。此外,我们提出了一个增强版的Lovász原理来更高效地利用子图的Lovász数。实验证明,我们的Lovász原理在无监督和半监督图级表示学习任务中与基线方法相比取得了具有竞争力的表现。
Abstracts:
This paper focuses on graph-level representation learning that aims to represent graphs as vectors that can be directly utilized in downstream tasks such as graph classification. We propose a novel graph-level representation learning principle called Lovász principle, which is motivated by the Lovász number in graph theory. The Lovász number is a real number that is an upper bound for graph Shannon capacity and is strongly connected with various global characteristics of graph. Specifically, we show that the handle vector for computing the Lovász number is potentially a suitable choice for graph representation, as it captures a graph's global properties, though a direct application of the handle vector is difficult and problematic. We propose to use neural networks to address the problems and hence provide the Lovász principle. Moreover, we propose an enhanced Lovász principle that is able to exploit the subgraph Lovász numbers directly and efficiently. The experiments demonstrate that our Lovász principles achieve competitive performance compared to the baselines in unsupervised and semi-supervised graph-level representation learning tasks.
链接:
https://nips.cc/virtual/2023/poster/73041
3. Graph Convolutional Kernel Machine versus Graph Convolutional Networks
作者:
Zhihao Wu, Zhao Zhang, Jicong Fan
简介:
具有一两个隐藏层的图卷积网络(GCN)已广泛用于处理各个学科中普遍存在的图数据。许多研究表明,使 GCN 更深的增益很小,甚至是负的。这意味着图数据的复杂性通常是有限的,浅层模型通常足以提取节点分类等各种任务的表达特征。因此,在这项工作中,我们提出了一个称为图卷积核机(GCKM)的框架,用于基于图的机器学习。GCKM 建立在与图卷积集成的核函数之上。一个例子是用于节点分类的图卷积核支持向量机(GCKSVM),我们分析了泛化误差界并讨论了图结构的影响。与 GCN 相比,GCKM 在架构设计、超参数调整和优化方面需要更少的工作。更重要的是,GCKM保证获得全局最优解,并且具有很强的泛化能力和高可解释性。GCKM 是可组合的,可以扩展到大规模数据,并且适用于各种任务(例如,节点或图分类、聚类、特征提取、降维)。基准数据集上的数值结果表明,除了上述优点之外,GCKM 与 GCN 相比至少具有有竞争力的准确性。
Abstracts:
Graph convolutional networks (GCN) with one or two hidden layers have been widely used in handling graph data that are prevalent in various disciplines. Many studies showed that the gain of making GCNs deeper is tiny or even negative. This implies that the complexity of graph data is often limited and shallow models are often sufficient to extract expressive features for various tasks such as node classification. Therefore, in this work, we present a framework called graph convolutional kernel machine (GCKM) for graph-based machine learning. GCKMs are built upon kernel functions integrated with graph convolution. An example is the graph convolutional kernel support vector machine (GCKSVM) for node classification, for which we analyze the generalization error bound and discuss the impact of the graph structure. Compared to GCNs, GCKMs require much less effort in architecture design, hyperparameter tuning, and optimization. More importantly, GCKMs are guaranteed to obtain globally optimal solutions and have strong generalization ability and high interpretability. GCKMs are composable, can be extended to large-scale data, and are applicable to various tasks (e.g., node or graph classification, clustering, feature extraction, dimensionality reduction). The numerical results on benchmark datasets show that, besides the aforementioned advantages, GCKMs have at least competitive accuracy compared to GCNs.
链接:
https://nips.cc/virtual/2023/poster/71620
4. Boosting Spectral Clustering on Incomplete Data via Kernel Correction and Affinity Learning
作者:
Fangchen Yu, Runze Zhao, Zhan Shi (SDS本科生), Yiwen Lu (SDS本科生), Jicong Fan, Yicheng Zeng, Jianfeng Mao, Wenye Li
简介:
谱聚类方法因其简单性和有效性在非凸数据的聚类中备受欢迎。在谱聚类中,相似性度量衡量数据样本之间的局部近邻关系,使用高质量的相似性度量对构建有效的相似性图是非常重要的。然而,缺失数据可能导致不准确的相似性度量,从而降低聚类性能。为了解决这些问题,我们提出了一个无插补的框架,和两类新颖的方法来改进缺失数据上的谱聚类。首先,我们引入了一种新的核校正方法,该方法增强了对缺失数据估计的核矩阵的质量,且具有理论保证,从而使基于预定义核的经典谱聚类受益。其次,我们开发了一系列新的相似性学习方法,基于自表达框架和Lp-范数,并构建具有自适应扩展的内禀相似性矩阵。我们的方法在基准数据集上超越了现有的数据插补和距离校准技术,为各种实际应用中缺失数据的谱聚类提供了有前景的解决方案。
Abstracts:
Spectral clustering has gained popularity for clustering non-convex data due to its simplicity and effectiveness. It is essential to construct a similarity graph using a high-quality affinity measure that models the local neighborhood relations among data samples. However, incomplete data can lead to inaccurate affinity measures, resulting in degraded clustering performance. To address these issues, we propose an imputation-free framework with two novel approaches to improve spectral clustering on incomplete data. Firstly, we introduce a new kernel correction method that enhances the quality of the kernel matrix estimated on incomplete data with a theoretical guarantee, benefiting classical spectral clustering on pre-defined kernels. Secondly, we develop a series of new affinity learning methods that equips the self-expressive framework with Lp-norm to construct an intrinsic affinity matrix with adaptive extensions. Our methods outperform existing data imputation and distance calibration techniques on benchmark datasets, offering a promising solution to spectral clustering on incomplete data in various real-world applications.
https://nips.cc/virtual/2023/poster/70019
5. Anytime-Constrained Reinforcement Learning with Policy Prior
作者:
Jianyi Yang, Pengfei Li, Tongxin Li, Adam Wierman, Shaolei Ren
简介:
本文研究了“随时约束马尔可夫决策过程”(A-CMDP)问题。现有关于约束马尔可夫决策过程(CMDPs)的研究目标是在随机动态中优化预期奖励,同时约束预期成本,但在具体的特定时刻中,成本仍可能过高且不尽人意。相对而言,A-CMDP的目标是在每轮任何时刻中保证有界成本的前提下,优化预期奖励,以应对策略先验。论文提出了一种新的算法,名为“随时约束强化学习”(ACRL),并可靠地确保了随时的成本约束。遗憾分析理论显示,该策略在随时约束下会渐近地匹配最优奖励。此外,关于碳智能计算的应用实验证实了ACRL在奖励性能和成本约束保证方面的有效性。
Abstracts:
This paper studies the problem of Anytime-Constrained Markov Decision Process (A-CMDP). Existing works on Constrained Markov Decision Processes (CMDPs) aim to optimize the expected reward while constraining the expected cost over random dynamics, but the cost in a specific episode can still be unsatisfactorily high. In contrast, the goal of A-CMDP is to optimize the expected reward while guaranteeing a bounded cost in each round of any episode against a policy prior. We propose a new algorithm, called Anytime-Constrained Reinforcement Learning (ACRL), which provably guarantees the anytime cost constraints. The regret analysis shows the policy asymptotically matches the optimal reward achievable under anytime constraints. Experiments on the application of carbon-intelligent computing verify the reward performance and cost constraint guarantee of ACRL.
6. Beyond Black-Box Advice: Learning-Augmented Algorithms for MDPs with Q-Value Predictions
作者:
Tongxin Li, Yiheng Lin, Shaolei Ren, Adam Wierman
简介:
本文在单轨迹时变马尔可夫决策过程(MDP)的背景下,深入探讨了一致性与鲁棒性之间的理论权衡,特别是在处理不受信任的机器学习建议的情境中。与一般以来自黑盒的建议处理方式不同,研究考虑了一个灰箱环境,在这个环境中,不仅有黑盒决策,也可以获得黑盒决策生成时的附加信息。论文在一个包含连续与离散状态/动作空间的广义MDP模型下,基于不可信Q价值函数建议,证明了一种具有创新性的一致性与鲁棒性权衡。论文研究结果凸显了,利用Q价值函数灰箱模型可以实现机器学习建议与鲁棒基线之间的动态平衡,因此可以获得接近最优的性能保证。在理论上超越了仅依赖黑箱建议所能实现的性能。
Abstracts:
We study the tradeoff between consistency and robustness in the context of a single-trajectory time-varying Markov Decision Process (MDP) with untrusted machine-learned advice. Our work departs from the typical approach of treating advice as coming from black-box sources by instead considering a setting where additional information about how the advice is generated is available. We prove a first-of-its-kind consistency and robustness tradeoff given Q-value advice under a general MDP model that includes both continuous and discrete state/action spaces. Our results highlight that utilizing Q-value advice enables dynamic pursuit of the better of machine-learned advice and a robust baseline, thus result in near-optimal performance guarantees, which provably improves what can be obtained solely with black-box advice.
链接:
https://arxiv.org/abs/2307.10524
7. Disentangling Voice and Content with Self-Supervision for Speaker Recognition
作者:
Tianchi Liu, Kong Aik Lee, Qiongqiong Wang, Haizhou Li
简介:
针对说话者识别,由于语音中混合了说话者特征和内容,因此从语音中提取准确的说话者表示是困难的。本文提出了一个同时建模语音中说话者特征和内容变异性的解耦框架。这一框架通过三个高斯推断层实现,每个推断层都包括一个可学习的transition模型,用于提取不同的语音成分。值得注意的是,我们专门设计了一个强化的transition模型,用于建模复杂的语音动态。我们还提出了一种自监督方法,可以在没有除说话者身份之外的标签的情况下动态解耦内容。所提出的框架的有效性通过在VoxCeleb和SITW数据集上进行的实验进行验证,其中EER和minDCF平均降低了分别为9.56%和8.24%。由于不需要额外的模型训练或数据,因此它在实际应用中易于使用。
Abstracts:
For speaker recognition, it is difficult to extract an accurate speaker representation from speech because of its mixture of speaker traits and content. This paper proposes a disentanglement framework that simultaneously models speaker traits and content variability in speech. It is realized with the use of three Gaussian inference layers, each consisting of a learnable transition model that extracts distinct speech components. Notably, a strengthened transition model is specifically designed to model complex speech dynamics. We also propose a self-supervision method to dynamically disentangle content without the use of labels other than speaker identities. The efficacy of the proposed framework is validated via experiments conducted on the VoxCeleb and SITW datasets with 9.56% and 8.24% average reductions in EER and minDCF, respectively. Since neither additional model training nor data is specifically needed, it is easily applicable in practical use
8. Discovering Intrinsic Spatial-Temporal Logic Rules to Explain Human Actions.
作者:
Chengzhi Cao, Chao Yang (SDS博士生), Ruimao Zhang, Shuang Li
简介:
我们通过分析人类运动的轨迹,提出了一个基于逻辑的知识驱动的人类运动建模框架。我们的方法受到这样一个事实的启发,即人类的行为通常由他们的意图或欲望驱动,并受到环境因素的影响,如与周围物体的空间关系。在本文中,我们引入了一组时空逻辑规则作为解释人类行为的知识。这些规则将从观测数据中自动发现。为了学习模型参数和规则内容,我们设计了一种期望最大化(EM)算法,该算法将规则内容视为潜在变量。EM算法在E步和M步之间交替:在E步中,评估潜在规则内容上的后验分布;在M步骤中,通过最大化当前期望的对数似然性来联合优化规则生成器和模型参数。我们的模型可能在体育分析、机器人和自动驾驶汽车等领域有广泛的应用,在这些领域,理解人类运动至关重要。我们在行人和NBA篮球运动员数据集上展示了该模型优越的可解释性和预测性能,两者都取得了有希望的结果。
Abstracts:
We propose a logic-informed knowledge-driven modeling framework for human 1 movements by analyzing their trajectories. Our approach is inspired by the fact that 2 human actions are usually driven by their intentions or desires, and are influenced 3 by environmental factors such as the spatial relationships with surrounding objects. 4 In this paper, we introduce a set of spatial-temporal logic rules as knowledge 5 to explain human actions. These rules will be automatically discovered from 6 observational data. To learn the model parameters and the rule content, we design 7 an expectation-maximization (EM) algorithm, which treats the rule content as 8 latent variables. The EM algorithm alternates between the E-step and M-step: 9 in the E-step, the posterior distribution over the latent rule content is evaluated; 10 in the M-step, the rule generator and model parameters are jointly optimized by 11 maximizing the current expected log-likelihood. Our model may have a wide 12 range of applications in areas such as sports analytics, robotics, and autonomous 13 cars, where understanding human movements are essential. We demonstrate the 14 model’s superior interpretability and prediction performance on pedestrian and 15 NBA basketball player datasets, both achieving promising results.
链接:
https://arxiv.org/pdf/2306.12244
9. ReSync: Riemannian Subgradient-based Robust Rotation Synchronization
作者:
Huikang Liu, Xiao Li, Anthony Man-Cho So.
简介:
本文介绍了ReSync,这是一个基于黎曼梯度的算法,用于解决在各种工程应用中出现的鲁棒旋转同步问题。ReSync解决了在旋转群上的最小非平方最小化公式,该公式是非光滑且非凸的,并且旨在直接恢复潜在的旋转。在随机损坏设置下为ReSync提供了强大的理论保证。具体来说,首先证明ReSync的初始化程序产生了一个位于地面真实旋转周围局部区域的合适初始点,接着建立了上述公式的弱锐度性质,然后利用这个性质推导出ReSync对地面真实旋转的局部线性收敛性。通过结合这些保证,得出ReSync在适当条件下线性收敛到地面真实旋转的结论。实验结果证明了ReSync的有效性。
Abstracts:
This work presents ReSync, a Riemannian subgradient-based algorithm for solving the robust rotation synchronization problem, which arises in various engineering applications. ReSync solves a least-unsquared minimization formulation over the rotation group, which is nonsmooth and nonconvex, and aims at recovering the underlying rotations directly. We provide strong theoretical guarantees for ReSync under the random corruption setting. Specifically, we first show that the initialization procedure of ReSync yields a proper initial point that lies in a local region around the ground-truth rotations. We next establish the weak sharpness property of the aforementioned formulation and then utilize this property to derive the local linear convergence of ReSync to the ground-truth rotations. By combining these guarantees, we conclude that ReSync converges linearly to the ground-truth rotations under appropriate conditions. Experiment results demonstrate the effectiveness of ReSync.
https://arxiv.org/abs/2305.15136
10. An Alternative to Variance: Gini Deviation for Risk-averse Policy Gradient.
作者:
Yudong Luo, Guiliang Liu, Pascal Poupart, Yangchen Pan
简介:
将策略回报的方差限制在一定范围内是风险规避强化学习(RL)中的常见选择,因为它有清晰的数学定义和易于解释的特性。传统方法直接限制总回报的方差。最近的方法将每步奖励的方差作为替代。我们深入研究了这些基于方差的方法的局限性,例如对数值规模的敏感性和阻碍策略学习,并建议使用另一种风险度量,Gini偏差,作为替代。我们研究了这种新的风险度量的各种属性,并推导出一种策略梯度算法来最小化它。在风险规避可以明确定义的领域进行的实证评估表明,我们的算法可以缓解基于方差的风险度量的局限性,并在其他算法无法学习到合理策略的情况下,以方差和Gini偏差的低风险获得高回报。
Abstracts:
Restricting the variance of a policy’s return is a popular choice in risk-averse Reinforcement Learning (RL) due to its clear mathematical definition and easy interpretability. Traditional methods directly restrict the total return variance. Recent methods restrict the per-step reward variance as a proxy. We thoroughly examine the limitations of these variance-based methods, such as sensitivity to numerical scale and hindering of policy learning, and propose to use an alternative risk measure, Gini deviation, as a substitute. We study various properties of this new risk measure and derive a policy gradient algorithm to minimize it. Empirical evaluation in domains where risk-aversion can be clearly defined, shows that our algorithm can mitigate the limitations of variance-based risk measures and achieves high return with low risk in terms of variance and Gini deviation when others fail to learn a reasonable policy.
https://arxiv.org/pdf/2307.08873.pdf
11. Multi-Modal Inverse Constrained Reinforcement Learning from a Mixture of Demonstrations
作者:
Guanren Qiao (SDS博士生), Guiliang Liu, Pascal Poupart, Zhiqiang Xu
简介:
逆向约束强化学习(Inverse Constraint Reinforcement Learning,ICRL)旨在以数据驱动的方式恢复专家代理遵循的基本约束。现有的ICRL算法通常假设示范数据由单一类型的专家生成。然而,在实践中,示范通常包含从遵循不同约束的各种专家代理收集的轨迹的混合,这使得用统一的约束函数解释专家行为变得具有挑战性。为了解决这个问题,我们提出了一种多模式逆向约束强化学习(Multi-Modal Inverse Constrained Reinforcement Learning,MMICRL)算法,用于同时估计对应于不同类型专家的多个约束。MMICRL构建了一个基于流的密度估计器,从示范中实现了无监督的专家识别,以推断特定于代理的约束。根据这些约束,MMICRL使用一种新颖的多模式约束策略优化目标来模仿专家策略,该目标最小化了代理条件下的策略熵并最大化了无条件的策略熵。为了增强鲁棒性,我们将这个目标纳入对比学习框架中。这种方法使得模仿策略能够捕捉到专家代理之间的行为多样性。在离散和连续环境中进行的大量实验证明,MMICRL在约束恢复和控制性能方面优于其他基线算法。
Abstracts:
Inverse Constraint Reinforcement Learning (ICRL) aims to recover the underlying constraints respected by expert agents in a data-driven manner. Existing ICRL algorithms typically assume that the demonstration data is generated by a single type of expert. However, in practice, demonstrations often comprise a mixture of trajectories collected from various expert agents respecting different constraints, making it challenging to explain expert behaviors with a unified constraint function. To tackle this issue, we propose a Multi-Modal Inverse Constrained Reinforcement Learning (MMICRL) algorithm for simultaneously estimating multiple constraints corresponding to different types of experts. MMICRL constructs a flow-based density estimator that enables unsupervised expert identification from demonstrations, so as to infer the agent-specific constraints. Following these constraints, MMICRL imitates expert policies with a novel multi-modal constrained policy optimization objective that minimizes the agent-conditioned policy entropy and maximizes the unconditioned one. To enhance robustness, we incorporate this objective into the contrastive learning framework. This approach enables imitation policies to capture the diversity of behaviors among expert agents. Extensive experiments in both discrete and continuous environments show that MMICRL outperforms other baselines in terms of constraint recovery and control performance.
12. PAC-Bayesian Spectrally-Normalized Bounds for Adversarially Robust Generalization
作者:
Jiancong Xiao, Ruoyu Sun, Zhi-Quan Luo
简介:
深度神经网络(DNNs)容易受到对抗性攻击。经验发现,对抗性鲁棒泛化在建立对抗性攻击的防御算法中至关重要。因此,研究鲁棒泛化的理论保证是有趣的。本文重点研究 PAC-Bayes 分析(Neyshabur 等人,2017b)。主要的挑战在于将标准设置中的一个关键成分,即权重扰动界限,扩展到鲁棒设置中。现有的尝试严重依赖额外的强假设,导致界限宽松。在本文中,我们解决了这个问题,并为 DNNs 提供了一个光谱归一化的鲁棒泛化界限。我们的界限至少与标准的泛化界限一样紧密,只是在扰动强度 $\epsilon$ 的一个因子上有所不同。与现有的鲁棒泛化界限相比,我们的界限有两个显著的优点:1)它不依赖额外的假设,和 2)它明显更为紧密。我们提出了一个框架,使我们能够得出更为普遍的结果。具体来说,我们将主要结果扩展到 1)对抗一般非-$\ell_p$ 攻击的鲁棒性,和 2)其他神经网络架构,如 ResNet。
Abstracts:
Deep neural networks (DNNs) are vulnerable to adversarial attacks. It is found empirically that adversarially robust generalization is crucial in establishing defense algorithms against adversarial attacks. Therefore, it is interesting to study the theoretical guarantee of robust generalization. This paper focuses on PAC-Bayes analysis (Neyshabur et al., 2017b). The main challenge lies in extending the key ingredient, which is a weight perturbation bound in standard settings, to the robust settings. Existing attempts heavily rely on additional strong assumptions, leading to loose bounds. In this paper, we address this issue and provide a spectrally-normalized robust generalization bound for DNNs. Our bound is at least as tight as the standard generalization bound, differing only by a factor of the perturbation strength $\epsilon$. In comparison to existing robust generalization bounds, our bound offers two significant advantages: 1) it does not depend on additional assumptions, and 2) it is considerably tighter. We present a framework that enables us to derive more general results. Specifically, we extend the main result to 1) adversarial robustness against general non-$\ell_p$ attacks, and 2) other neural network architectures, such as ResNet.
13. Imitation Learning from Imperfection: Theoretical Justifications and Algorithms
作者:
Ziniu Li (SDS博士生), Tian Xu, Zeyu Qin, Yang Yu, Zhi-Quan Luo
简介:
模仿学习(IL)算法擅长于从专家数据中获取高质量的策略,以处理顺序决策任务。但是,当面临专家数据有限的情况时,它们的效果会受到阻碍。为了应对这个挑战,出现了一种名为(离线)IL与补充数据的新框架,通过整合一个额外但不完美的数据集来增强学习,该数据集是从次优策略中低成本获取的。然而,由于可能包含了超出专家分布的样本,学习变得具有挑战性。在这项工作中,我们首次对这个框架进行了数学形式化,揭示了它的限制。我们的理论分析显示,一个简单的方法——将行为克隆(BC)算法的概念应用于合并的专家和补充数据集——可能不及只依赖专家数据的普通BC算法。这个不足是由于两个数据源之间的分布偏移造成的。为了解决这个问题,我们提出了一种新的基于重要性抽样的技术,用于选择专家分布内的数据。我们证明,所提出的方法理论上消除了简单方法的差距,突显了在处理不完美数据时的效果。实证研究表明,我们的方法在包括机器人运动控制、Atari视频游戏和图像分类在内的任务中,超越了先前的最先进方法。总的来说,我们的工作强调了通过有效的数据选择利用多样化数据源来改善模仿学习的潜力。
Abstracts:
Imitation learning (IL) algorithms excel in acquiring high-quality policies from expert data for sequential decision-making tasks. But, their effectiveness is hampered when faced with limited expert data. To tackle this challenge, a novel framework called (offline) IL with supplementary data has emerged, which enhances learning by incorporating an additional yet imperfect dataset obtained inexpensively from sub-optimal policies. Nonetheless, learning becomes challenging due to the potential inclusion of out-of-expert-distribution samples. In this work, we pioneer the mathematical formalization of this framework, uncovering its limitations. Our theoretical analysis reveals that a naive approach—applying the behavioral cloning (BC) algorithm concept to the combined set of expert and supplementary data—may fall short of vanilla BC, which solely relies on expert data. This deficiency arises due to the distribution shift between the two data sources. To address this issue, we propose a new importance-sampling-based technique for selecting data within the expert distribution. We prove that the proposed method theoretically eliminates the gap of the naive approach, highlighting its efficacy when handling imperfect data. Empirical studies demonstrate that our method outperforms previous state-of-the-art methods in tasks including robotics locomotion control, Atari video games, and image classification. Overall, our work underscores the potential of improving IL by leveraging diverse data sources through effective data selection.
链接:
https://openreview.net/forum?id=vO04AzsB49
14. Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs
作者:
Jinyang Li, Binyuan Hui, GE QU, Binhua Li, Jiaxi Yang, Bowen Li, Bailin Wang, Bowen Qin, Ruiying Geng, Nan Huo, Xuanhe Zhou, Chenhao Ma, Guoliang Li, Kevin Chang, Fei Huang, Reynold Cheng, Yongbin Li
简介:
Text-to-SQL解析旨在将自然语言指令转换为可执行的SQL,在近年来受到了越来越多的关注。特别是Codex和ChatGPT在此任务上展示了令人印象深刻的结果。然而,大多数流行的benchmark,例如Spider和WikiSQL,主要关注具有少量数据库内容的数据库模式,这在学术研究和现实世界应用之间留下了差距。为了缓解这一差距,我们提出了Bird,一个大型benchmark,用于大规模数据库的text-to-SQL任务,其中包含12,751对text-to-SQL数据和95个数据库,总大小为33.4 GB,横跨37个专业领域。我们对数据库值的强调突显了数据库内容的新挑战、NL问题与数据库内容之间的外部知识以及SQL效率,特别是在大型数据库的背景下。为了解决这些问题,text-to-SQL模型必须具有数据库值理解能力,除了语义解析。实验结果显示了在为大型数据库生成准确的text-to-SQLs时,数据库值的重要性。此外,即使是最有效的text-to-SQL模型,例如ChatGPT,在执行准确性上仅达到40.08%,这仍然远远低于人类的92.96%的结果,证明挑战仍然存在。此外,我们还提供了一个效率分析,为生成对行业有益的text-to-efficient-SQLs提供了见解。我们相信BIRD将有助于推进text-to-SQL研究的实际应用。
Abstracts:
Text-to-SQL parsing, which aims at converting natural language instructions into executable SQLs, has gained increasing attention in recent years. In particular, Codex and ChatGPT have shown impressive results in this task. However, most of the prevalent benchmarks, i.e., Spider, and WikiSQL, focus on database schema with few rows of database contents leaving the gap between academic study and real-world applications. To mitigate this gap, we present Bird, a big benchmark for large-scale database grounded in text-to-SQL tasks, containing 12,751 pairs of text-to-SQL data and 95 databases with a total size of 33.4 GB, spanning 37 professional domains. Our emphasis on database values highlights the new challenges of dirty database contents, external knowledge between NL questions and database contents, and SQL efficiency, particularly in the context of massive databases. To solve these problems, text-to-SQL models must feature database value comprehension in addition to semantic parsing. The experimental results demonstrate the significance of database values in generating accurate text-to-SQLs for big databases. Furthermore, even the most effective text-to-SQL models, i.e. ChatGPT, only achieves 40.08% in execution accuracy, which is still far from the human result of 92.96%, proving that challenges still stand. Besides, we also provide an efficiency analysis to offer insights into generating text-to-efficient-SQLs that are beneficial to industries. We believe that BIRD will contribute to advancing real-world applications of text-to-SQL research.
链接:
https://arxiv.org/abs/2305.03111
15. Balanced Training for Sparse GANs
作者:
Yite Wang, Jing Wu, Naira Hovakimyan, Ruoyu Sun
简介:
在过去的几年中,人们对开发更大、更深的神经网络,包括像生成对抗网络(Generative adversarial networks, GANs)这样的深度生成模型越来越感兴趣。然而,GANs 通常伴随着高计算复杂度,这使得研究者开始探索降低训练和推理成本的方法。在监督学习中逐渐受到欢迎的一种方法是动态稀疏训练(dynamic sparse training, DST),它在稀疏化神经网络时不仅能够保持良好性能且享有出色的训练效率。尽管DST有很多潜在的好处,但由于GANs训练过程的对抗性,将它应用到GANs还存在着许多挑战。在本文中,我们提出了一种名为平衡比率(balance ratio, BR)的新指标,用于研究稀疏生成器和鉴别器之间的平衡。我们进一步介绍了一种名为平衡动态稀疏训练(balanced dynamic sparse training, ADAPT)的新方法,该方法尝试在GAN训练中控制BR,很好地实现了稀疏化GANs性能和训练成本之间的平衡。我们将提出的方法在多个数据集上测试,其优秀的结果证明了ADAPT的有效性。
Abstracts:
Over the past few years, there has been growing interest in developing larger and deeper neural networks, including deep generative models like generative adversarial networks (GANs). However, GANs typically come with high computational complexity, leading researchers to explore methods for reducing the training and inference costs. One such approach gaining popularity in supervised learning is dynamic sparse training (DST), which maintains good performance while enjoying excellent training efficiency. Despite its potential benefits, applying DST to GANs presents challenges due to the adversarial nature of the training process. In this paper, we propose a novel metric called the balance ratio (BR) to study the balance between the sparse generator and discriminator. We also introduce a new method called balanced dynamic sparse training (ADAPT), which seeks to control the BR during GAN training to achieve a good trade-off between performance and computational cost. Our proposed method shows promising results on multiple datasets, demonstrating its effectiveness.
https://neurips.cc/virtual/2023/poster/70078
16. Information Design in Multi-Agent Reinforcement Learning
作者:
Yue Lin, Wenhao Li (SDS博士后), Hongyuan Zha, Baoxiang Wang
简介:
强化学习(RL)受到了人类和动物与环境互动的启发。这种设定有些理想化,因为在实际任务中,环境中的其他智能体有自己的目标,并会根据自我智能体的行为适应性地行动。为了在这些环境中获得优秀的表现,智能体需要影响其他智能体,使得他的行为变得更有助益且不那么有害。计算经济学的研究总结了两种直接影响他人的方法:通过提供有形商品(机制设计)和通过提供信息(信息设计)。这篇工作研究了一组RL智能体的信息设计问题。主要的挑战有两方面。一方面是提供的信息会立即影响智能体轨迹的转换,这引入了额外的非平稳性。另一方面是信息可能会被忽略,所以发送者必须提供接收者愿意尊重的信息。我们制定了马尔可夫传信博弈,并发展了传信梯度和扩展服从约束的概念来应对这些挑战。我们的算法在各种混合动机任务上都很高效,并为计算经济学提供了进一步的见解。
Abstracts:
Reinforcement learning (RL) is inspired by how humans and animals interact with the environment. The setting is somewhat idealized because, in actual tasks, other agents in the environment have their own goals and behave adaptively to the ego agent. To thrive in those environments, the agent needs to influence other agents so their actions become more helpful and less harmful. Research in computational economics distills two ways to influence others directly: by providing tangible goods (mechanism design) and by providing information (information design). This work investigates information design problems for a group of RL agents. The main challenges are two-fold. One is the information provided will immediately affect the transition of the agent trajectories, which introduces additional non-stationarity. The other is the information can be ignored, so the sender must provide information that the receiver is willing to respect. We formulate the Markov signaling game, and develop the notions of signaling gradient and the extended obedience constraints that address these challenges. Our algorithm is efficient on various mixed-motive tasks and provides further insights into computational economics.
https://github.com/YueLin301/InformationDesignMARL
17. Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback
作者:
Canzhe Zhao, Ruofeng Yang, Baoxiang Wang, Xuezhou Zhang, Shuai Li
简介:
我们研究了在全信息反馈设置下的对抗性低秩马尔可夫决策过程. 这个设定下, 转移概率函数时未知的, 并且存在低秩矩阵分解, 同时损失函数可能会发生对抗性变化, 但会在每次迭代之后向学习者揭示。我们提出了一种基于策略优化的算法, POLO, 并证明它达到了 $\widetilde{O}\left(\frac{K^{\frac{3}{4}} A^{\frac{1}{ 2}} d\ln^{\frac{1}{4}}M}{1-\gamma}+\frac{\sqrt{K}}{(1-\gamma)^2}\right)$的后悔界, 其中 $d$ 是转移矩阵的秩, $A$ 是动作空间的大小,$M$ 是模型集合的大小, $\gamma$是折扣因子. 值得注意的是,我们的算法调用orcale次数较低, 并且后悔界与状态集大小无关. 据我们所知, 这是第一个结合了表示学习, 探索和利用的平衡, 以实现具有非线性函数逼近和对抗性损失的强化学习的次线性后悔保证。
Abstracts:
In this work, we study the low-rank MDPs with adversarially changed losses in the full-information feedback setting. In particular, the unknown transition probability function admits a low-rank matrix decomposition \citep{REPUCB22}, and the loss functions may change adversarially but are revealed to the learner at the end of each episode. We propose a policy optimization-based algorithm POLO, and we prove that it attains the $\widetilde{O}\left(\frac{K^{\frac{3}{4}} A^{\frac{1}{2}} d\ln^{\frac{1}{4}}M}{1-\gamma}+\frac{\sqrt{K}}{(1-\gamma)^2}\right)$ regret guarantee, where $d$ is rank of the transition kernel (and hence the dimension of the unknown representations), $A$ is the cardinality of the action space, $M$ is the cardinality of the model class, and $\gamma$ is the discounted factor. Notably, our algorithm is oracle-efficient and has a regret guarantee with no dependence on the size of potentially arbitrarily large state space. To the best of our knowledge, we present the first algorithm that interleaves representation learning, exploration, and exploitation to achieve the sublinear regret guarantee for RL with nonlinear function approximation and adversarial losses.
18. Two Heads are Better Than One: A Simple Exploration Framework for Efficient Multi-Agent Reinforcement Learning
作者:
Jiahui Li, Kun Kuang, Baoxiang Wang, Xingchen Li, Long Chen, Fei Wu, Jun Xiao
简介:
探索策略在强化学习中发挥着重要作用,尤其是在稀疏奖励任务中。在协作多智能体强化学习(MARL)中,由于状态空间大和智能体之间复杂的交互,设计合适的探索策略更具挑战性。目前,MARL中的主流探索方法要么有助于探索大而稀疏的陌生状态,要么以高计算成本测量智能体之间的交互。我们发现一个有趣的现象,不同类型的探索在不同的MARL场景中发挥着不同的作用,选择合适的探索往往比设计精致的算法更有效。在本文中,我们提出了一种结合基于好奇心和基于影响力的探索(COIN)的探索方法,该方法简单但在各种情况下都有效。首先,COIN 基于互信息理论衡量每个智能体对其他智能体的影响,并将其设计为应用于每个个体价值函数的内在奖励。此外,COIN 通过添加到外在奖励中的预测误差来计算基于好奇心的内在奖励。为了整合这两种内在奖励,COIN 采用了一种新颖的框架,使它们相互补充,并对合作 MARL 任务进行了充分有效的探索。我们对三个具有挑战性的基准进行了广泛的实验:星际争霸 II、MACO 和 Google Football。不同场景的结果显示了我们 COIN 的优越性。
Abstracts:
Exploration strategy plays an important role in reinforcement learning, especially in sparse-reward tasks. In cooperative multi-agent reinforcement learning (MARL), designing a suitable exploration strategy is much more challenging due to the large state space and the complex interaction among agents. Currently, mainstream exploration methods in MARL either contribute to exploring the unfamiliar states which are large and sparse, or measuring the interaction among agents with high computational costs. We found an interesting phenomenon that different kinds of exploration plays a different role in different MARL scenarios, and choosing a suitable one is often more effective than designing an exquisite algorithm. In this paper, we propose a exploration method that incorporate the curiosity-based and influence-based exploration (COIN) which is simple but effective in various situations. First, COIN measures the influence of each agent on the other agents based on mutual information theory and designs it as intrinsic rewards which are applied to each individual value function. Moreover, COIN computes the curiosity-based intrinsic rewards via prediction errors which are added to the extrinsic reward. For integrating the two kinds of intrinsic rewards, COIN utilizes a novel framework in which they complement each other and lead to a sufficient and effective exploration on cooperative MARL tasks. We perform extensive experiments on three challenging benchmarks: StarCraft II, MACO, and Google Football. The results across different scenarios show the superiority of our COIN.
19. Med-UniC: Unifying Cross-Lingual Medical Vision-Language Pre-Training by Diminishing Bias
作者:
Zhongwei Wan, Che Liu, Mi Zhang, Jie Fu, Benyou Wang, Sibo Cheng, Lei Ma, César Quilodrán-Casas, Rossella Arcucc
简介:
数据稀缺是医学视觉-语言预训练(VLP)有效性的一个关键障碍。一个可能的解决方案是结合来自不同语言社区的数据集。然而,主要的挑战来自于整合多样的语法和语义、特定于语言的医学术语以及特定于文化的隐式知识的复杂性。因此,考虑的一个关键方面是由于不同语言而产生的社区偏见。本文提出了一个名为统一跨语言医学视觉-语言预训练(Med-UniC)的新框架,旨在整合来自英语和西班牙语这两种最普遍的语言的多模态医学数据。具体来说,我们提出了CTR(跨语言文本对齐规范化)来明确地统一来自不同语言社区的医学报告的跨语言语义表示。通过潜在语言的解缠,优化了CTR,使我们的优化目标不依赖于负样本,从而显著减少了在类似的医学报告中确定正负样本对的偏见。此外,它确保了跨语言表示不偏向于任何特定的语言社区。Med-UniC在5个医学图像任务和10个数据集中达到了卓越的性能,涵盖了30多种疾病,为统一多模态医学数据提供了一个多功能的框架,适用于不同的语言社区。实验结果突显了跨语言VLP中社区偏见的存在。减少这种偏见不仅提高了视觉-语言任务的性能,而且提高了单一模式的视觉任务的性能。
Abstracts:
The scarcity of data presents a critical obstacle to the efficacy of medical vision-language pre-training (VLP). A potential solution lies in the combination of datasets from various language communities. Nevertheless, the main challenge stems from the complexity of integrating diverse syntax and semantics, language-specific medical terminology, and culture-specific implicit knowledge. Therefore, one crucial aspect to consider is the presence of community bias caused by different languages. This paper presents a novel framework named Unifying Cross-Lingual Medical Vision-Language Pre-Training (\textbf{Med-UniC}), designed to integrate multi-modal medical data from the two most prevalent languages, English and Spanish. Specifically, we propose \textbf{C}ross-lingual \textbf{T}ext Alignment \textbf{R}egularization (\textbf{CTR}) to explicitly unify cross-lingual semantic representations of medical reports originating from diverse language communities. \textbf{CTR} is optimized through latent language disentanglement, rendering our optimization objective to not depend on negative samples, thereby significantly mitigating the bias from determining positive-negative sample pairs within analogous medical reports. Furthermore, it ensures that the cross-lingual representation is not biased toward any specific language community. \textbf{Med-UniC} reaches superior performance across 5 medical image tasks and 10 datasets encompassing over 30 diseases, offering a versatile framework for unifying multi-modal medical data within diverse linguistic communities. The experimental outcomes highlight the presence of community bias in cross-lingual VLP. Reducing this bias enhances the performance not only in vision-language tasks but also in uni-modal visual tasks.
链接:
https://arxiv.org/abs/2305.19894
20. All In One A Chinese Multi-Modal Dataset for Multi-Affection Detection in Conversations
作者:
Yazhou Zhang, Yang Yu, Qing Guo, Benyou Wang, Dongming Zhao, Sagar Uprety, Dawei Song, Jing Qin, Qiuchi Li
简介:
人类的交流具有多模态和多情感的特性。不同情感和情绪之间的相互关系使得利用多模态线索共同检测多种人类情感面临挑战。最近在这个领域的进展采用了多任务学习范式,以实现任务之间的相互关系,但是公开资源的稀缺性限制了这方面工作的潜力。为了填补这一空白,我们构建了第一个中文多模态多情感对话(CMMA)数据集,其中包含了3,000个多方对话和来自各种电视剧风格的21,795个多模态话语。CMMA包含了各种各样的情感标签,包括情绪、情感、讽刺和幽默,以及某些任务对之间的新颖相互关系数值。此外,它还提供了对话中的话题和发言者信息,促进了对话背景的更好建模。在这个数据集上,我们经验性地分析了不同数据模态和对话背景对不同情感分析任务的影响,并展示了任务间关联的实际益处。
Abstracts:
Human communication has a multi-modal and multi-affection nature. The inter-relatedness of different emotions and sentiments poses a challenge to jointly detect multiple human affections with multi-modal clues. Recent advances in this field employed multi-task learning paradigms to render the inter-relatedness across tasks, but the scarcity of publicly available resources sets a limit to the potential of works. To fill this gap, we build the first Chinese Multi-modal Multi-Affection conversation (CMMA) dataset, which contains 3,000 multi-party conversations and 21,795 multi-modal utterances collected from various styles of TV-series. CMMA contains a wide variety of affection labels, including sentiment, emotion, sarcasm and humor, as well as the novel inter-correlations values between certain pairs of tasks. Moreover, it provides the topic and speaker information in conversations, which promotes better modeling of conversational context. On the dataset, we empirically analyze the influence of different data modalities and conversational contexts on different affection analysis tasks, and exhibit the practical benefit of inter-task correlations.
链接:
https://neurips.cc/virtual/2023/poster/73481
21. DeepfakeBench: A Comprehensive Benchmark of Deepfake Detection
作者:
Zhiyuan Yan (SDS硕士生),Yong Zhang, Xinhang Yuan, Siwei Lyu, Baoyuan Wu
简介:
Deepfake检测领域的一个关键但常被忽视的挑战是:缺乏标准化、统一、全面的基准。这会导致不公平的性能比较和潜在的误导性结果。具体来说,数据处理流程缺乏统一性,导致输入每个检测模型的数据不统一。此外,不同方法的实验设置存在明显差异,评估策略和指标也普遍缺乏标准化。为了填补这一空白,我们提出了领域内第一个用于 Deepfake检测的综合基准,称为DeepfakeBench,它提供了三个关键贡献:1)统一的数据管理系统,以确保所有检测器的输入一致,2)针对最新的SOTA方法集成的一套统一训练框架,以及3)标准化的评估指标和协议,以提高透明度和可重复性。DeepfakeBench具有可扩展、模块化的代码库,包含15种最先进的检测方法、9个deepfake数据集、一系列deepfake检测评估协议和分析工具以及综合评估。此外,我们根据从不同角度(例如数据增强、骨干网络)基于这些评估进行了广泛分析并提供了新的见解。我们希望我们的努力能够促进未来的研究并促进这个日益重要的领域的创新。
Abstracts:
A critical yet frequently overlooked challenge in the field of deepfake detection is the lack of a standardized, unified, comprehensive benchmark. This issue leads to unfair performance comparisons and potentially misleading results. Specifically, there is a lack of uniformity in data processing pipelines, resulting in inconsistent data inputs for detection models. Additionally, there are noticeable differences in experimental settings, and evaluation strategies and metrics lack standardization. To fill this gap, we present the first comprehensive benchmark for deepfake detection, called DeepfakeBench, which offers three key contributions: 1) a unified data management system to ensure consistent input across all detectors, 2) an integrated framework for state-of-the-art methods implementation, and 3) standardized evaluation metrics and protocols to promote transparency and reproducibility. Featuring an extensible, modular-based codebase, DeepfakeBench contains 15 state-of-the-art detection methods, 9 deepfake datasets, a series of deepfake detection evaluation protocols and analysis tools, as well as comprehensive evaluations. Moreover, we provide new insights based on extensive analysis of these evaluations from various perspectives (e.g., data augmentations, backbones). We hope that our efforts could facilitate future research and foster innovation in this increasingly critical domain. All codes, evaluations, and analyses of our benchmark are publicly available at this https URL.
链接:
https://arxiv.org/abs/2307.01426
22. Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples
作者:
Shaokui Wei (SDS博士生), Mingda Zhang (SDS博士生), Hongyuan Zha, Baoyuan Wu
简介:
后门攻击是机器学习的重大安全威胁,对手可以将带有触发器的样本注入训练集,从而训练一个后门模型,该模型可以预测带有特定触发器的样本到特定的目标类别,而在良性样本上表现正常。在这篇论文中,我们探索了使用小的干净数据集净化后门模型的任务。通过建立后门风险和对抗风险之间的联系,我们推导出了一个新颖的后门风险上界,其主要捕捉了后门模型和净化模型之间的共享对抗样本(SAEs)的风险。这个上界进一步提出了一种新颖的双层优化问题,用于利用对抗训练技术减轻后门的影响。为了解决这个问题,我们提出了共享对抗反学习(SAU)。具体而言,SAU首先生成SAEs,然后反学习生成的SAEs,以便它们被净化模型正确分类或由两个模型以不同的方式分类,从而在净化模型中减轻后门的影响。在各种基准数据集和网络架构上的实验表明,我们提出的方法在后门防御方面达到了最先进的性能。
Abstracts:
Backdoor attacks are serious security threats to machine learning models where an adversary can inject poisoned samples into the training set, causing a backdoored model which predicts poisoned samples with particular triggers to particular target classes, while behaving normally on benign samples. In this paper, we explore the task of purifying a backdoored model using a small clean dataset. By establishing the connection between backdoor risk and adversarial risk, we derive a novel upper bound for backdoor risk, which mainly captures the risk on the shared adversarial examples (SAEs) between the backdoored model and the purified model. This upper bound further suggests a novel bi-level optimization problem for mitigating backdoor using adversarial training techniques. To solve it, we propose Shared Adversarial Unlearning (SAU). Specifically, SAU first generates SAEs, and then, unlearns the generated SAEs such that they are either correctly classified by the purified model and/or differently classified by the two models, such that the backdoor effect in the backdoored model will be mitigated in the purified model. Experiments on various benchmark datasets and network architectures show that our proposed method achieves state-of-the-art performance for backdoor defense.
链接:
https://arxiv.org/pdf/2307.10562
23. Neural Polarizer: A Lightweight and Effective Backdoor Defense via Purifying Poisoned Features
作者:
Mingli Zhu (SDS博士生), Shaokui Wei (SDS博士生), Hongyuan Zha, Baoyuan Wu
简介:
最近的研究证明了深度神经网络对后门攻击的敏感性。给定一个后门模型,尽管触发器信息和良性信息共存,但其对具有触发的中毒样本的预测将由触发信息主导。受光学偏振器机制的启发,偏振器可以通过特定偏振的光波,同时过滤其他偏振的光波,我们提出了一种新颖的后门防御方法,通过在后门模型中插入可学习的神经偏振器作为中间层,以便通过过滤触发信息来净化中毒样本,同时保持良性信息。神经偏振器被实例化为一个轻量级线性变换层,它是通过基于有限的干净数据集解决精心设计的双层优化问题来学习的。与其他经常调整后门模型所有参数的基于微调的防御方法相比,所提出的方法只需要额外学习一层,因此效率更高,并且需要更少的干净数据。大量的实验证明了我们的方法在消除各种神经网络架构和数据集中的后门方面的有效性和效率,特别是在干净数据非常有限的情况下。
Abstracts:
Recent studies have demonstrated the susceptibility of deep neural networks to backdoor attacks. Given a backdoored model, its prediction of a poisoned sample with trigger will be dominated by the trigger information, though trigger information and benign information coexist. Inspired by the mechanism of the optical polarizer that a polarizer could pass light waves with particular polarizations while filtering light waves with other polarizations, we propose a novel backdoor defense method by inserting a learnable neural polarizer into the backdoored model as an intermediate layer, in order to purify the poisoned sample via filtering trigger information while maintaining benign information. The neural polarizer is instantiated as one lightweight linear transformation layer, which is learned through solving a well designed bi-level optimization problem, based on a limited clean dataset. Compared to other fine-tuning-based defense methods which often adjust all parameters of the backdoored model, the proposed method only needs to learn one additional layer, such that it is more efficient and requires less clean data. Extensive experiments demonstrate the effectiveness and efficiency of our method in removing backdoors across various neural network architectures and datasets, especially in the case of very limited clean data.
链接:
https://arxiv.org/pdf/2306.16697.pdf
24. AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models
作者:
Yuancheng Wang (SDS博士生), Zeqian Ju, Xu Tan, Lei He, Zhizheng Wu, Jiang Bian, Sheng Zhao
简介:
音频编辑适用于多种目的,例如添加背景音效、替换乐器伴奏或者修复损坏的音频。最近,一些基于深度扩散模型的方法通过使用以输出音频的文本描述为条件的扩散和去噪过程实现了零样本音频编辑。然而,这些方法仍然存在一些问题:1)它们没有经过编辑任务的训练,无法保证良好的编辑效果;2)他们可能会错误地修改不需要编辑的音频片段;3)他们需要输出音频的完整描述,这在实际场景中并不总是可用或必需的。在这项工作中,我们提出了 AUDIT,一种基于潜在扩散模型的指令引导音频编辑模型。具体来说,AUDIT具有三个主要设计特点:1)我们为不同的音频编辑任务构建三元组训练数据(指令、输入音频、输出音频),并使用指令和输入(待编辑)音频作为条件训练扩散模型并生成输出 (编辑)音频;2)通过比较输入和输出音频的差异,自动学习只修改需要编辑的片段;3)只需要编辑指令,而不需要完整的目标音频描述作为文本输入。AUDIT 在多个音频编辑任务(例如添加、删除、替换、修复、超分辨率)的客观和主观指标方面均取得了最先进的结果。
Abstracts:
Audio editing is applicable for various purposes, such as adding background sound effects, replacing a musical instrument, and repairing damaged audio. Recently, some diffusion-based methods achieved zero-shot audio editing by using a diffusion and denoising process conditioned on the text description of the output audio. However, these methods still have some problems: 1) they have not been trained on editing tasks and cannot ensure good editing effects; 2) they can erroneously modify audio segments that do not require editing; 3) they need a complete description of the output audio, which is not always available or necessary in practical scenarios. In this work, we propose AUDIT, an instruction-guided audio editing model based on latent diffusion models. Specifically, AUDIT has three main design features: 1) we construct triplet training data (instruction, input audio, output audio) for different audio editing tasks and train a diffusion model using instruction and input (to be edited) audio as conditions and generating output (edited) audio; 2) it can automatically learn to only modify segments that need to be edited by comparing the difference between the input and output audio; 3) it only needs edit instructions instead of full target audio descriptions as text input. AUDIT achieves state-of-the-art results in both objective and subjective metrics for several audio editing tasks (e.g., adding, dropping, replacement, inpainting, super-resolution).
链接:
https://arxiv.org/abs/2304.00830
25. Motion-X A Large-scale 3D Expressive Whole-body Human Motion Dataset
作者:
Jing Lin, Ailing Zeng, Shunlin Lu (SDS 博士生), Yuanhao Cai, Ruimao Zhang, Haoqian Wang, Lei Zhang
简介:
在本文中,我们提出了Motion-X,这是一个大规模的3D表情全身运动数据集。现有的运动数据集主要包含仅限于身体的姿势,缺少面部表情、手势和详细的姿势描述。而且,这些数据集主要是在实验室环境中以手工标注文本描述的方式收集而来,这大大限制了它们的可扩展性。为了克服这些限制,我们开发了一个全身运动和文本注释流程,它可以自动注释来自单视图或多视图视频的运动,并为每个视频提供全面的语义标签,以及为每个帧提供详细的全身姿势描述。这个流程具有高精度、成本效益,并且可扩展,适用于进一步的研究。基于此,我们构建了Motion-X,它包含了1370万精确的3D全身姿势注释(即SMPL-X),涵盖了来自大量场景的96K运动序列。此外,Motion-X还提供了1370万帧级全身姿势描述和96K序列级语义标签。全面的实验验证了注释流程的准确性,以及Motion-X在增强表情丰富、多样化和自然运动生成以及3D全身人体网格恢复方面的显著优势。
Abstracts:
In this paper, we present Motion-X, a large-scale 3D expressive whole-body motion dataset. Existing motion datasets predominantly contain body-only poses, lacking facial expressions, hand gestures, and fine-grained pose descriptions. Moreover, they are primarily collected from limited laboratory scenes with textual descriptions manually labeled, which greatly limits their scalability. To overcome these limitations, we develop a whole-body motion and text annotation pipeline, which can automatically annotate motion from either single- or multi-view videos and provide comprehensive semantic labels for each video and fine-grained whole-body pose descriptions for each frame. This pipeline is of high precision, cost-effective, and scalable for further research. Based on it, we construct Motion-X, which comprises 13.7M precise 3D whole-body pose annotations (i.e., SMPL-X) covering 96K motion sequences from massive scenes. Besides, Motion-X provides 13.7M frame-level whole-body pose descriptions and 96K sequence-level semantic labels. Comprehensive experiments demonstrate the accuracy of the annotation pipeline and the significant benefit of Motion-X in enhancing expressive, diverse,and natural motion generation, as well as 3D whole-body human mesh recovery.
链接:
https://arxiv.org/pdf/2307.00818.pdf
26. A Batch-to-Online Transformation under Random-Order Model
作者:
Jing Dong(SDS博士生),Yuichi Yoshida
简介:
我们介绍了一个转换框架,可用于开发低功耗在线算法通过离线近似算法在随机顺序模型中近似后悔。我们首先给出一个通用归约定理,将具有低平均灵敏度的离线近似算法转换为具有低近似遗憾的在线算法。然后,我们证明可以使用核心集构造方法将离线近似算法转换为低灵敏度版本。为了展示我们方法的多功能性,我们将其应用于各种问题,包括在线聚类、在线矩阵近似和在线回归,并成功实现每个问题的多对数近似后悔。此外,我们表明,在所有三种情况下,我们的算法也具有较低的不一致性,这在某些在线应用程序中可能是需要的。
Abstracts:
We introduce a transformation framework that can be utilized to develop online algorithms with low
approximate regret in the random-order model from offline approximation algorithms. We first give a general reduction theorem that transforms an offline approximation algorithm with low average sensitivity to an online algorithm with low approximate regret. We then demonstrate that offline approximation algorithms can be transformed into a low-sensitivity version using a coreset construction method. To showcase the versatility of our approach, we apply it to various problems, including online clustering, online matrix approximation, and online regression, and successfully achieve polylogarithmic approximate regret for each problem. Moreover, we show that in all three cases, our algorithm also enjoys low inconsistency, which may be desired in some online applications.
https://openreview.net/forum?id=B6HSIgvyJ3&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DNeurIPS.cc%2F2023%2FConference%2FAuthors%23your-submissions)