15-17 Nov 2022 Montpellier (France)

FR EN

Keynotes overview

Date: Tuesday, November 15, 2022 14:00

Title: Optimal Transport and its Applications for Single Cell Genomics.

Abstract: Optimal transport (OT) has gained lot of interest in machine learning. It is a natural tool to compare in a geometrically faithful way probability distributions. It has recently proven useful in a variety of applications to single cell genomics, in particular to infer cell developmental trajectories and to perform multi-omics integration. Vanilla OT is however plagued by several issues, which are routinely encountered in these genomics applications. This includes in particular: (i) the curse of dimensionality, since it might require a number of samples which grows exponentially with the dimension, (ii) sensitivity to outliers, since it prevents mass creation and destruction during the transport (iii) impossibility to transport between two disjoint spaces. In this talk, I will review several recent proposals to address these issues and showcase how they work hand-in-hand to provide a comprehensive machine learning pipeline for genomics applications. The three key ingredients are: (i) entropic regularization which defines computationally efficient loss functions in high dimensions (ii) unbalanced OT, which relaxes the mass conservation to make OT robust to missing data and outliers, (iii) Gromov-Wasserstein formulation, introduced by Sturm and Memoli, which is a non-convex quadratic optimization problem defining transport between disjoint spaces (for instance genes and peaks spaces for single cell data integration). More information and references can be found on the website of our book "Computational Optimal Transport" https://optimaltransport.github.io/

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Bertrand Thirion

Date: Tuesday, November 15, 2022 15:30

Title: Enhancing our understanding of brain function with machine learning on large-scale data repositories.

Abstract: To map the neural substrate of mental function, cognitive neuroimaging relies on controlled psychological manipulations that engage brain systems associated with specific cognitive processes. In order to build comprehensive atlases of cognitive function in the brain, it must assemble maps for many different cognitive processes, which often evoke overlapping patterns of activation. Such data aggregation faces contrasting goals: on the one hand finding correspondences across vastly different cognitive experiments, while on the other hand precisely describing the function of any given brain region. In this talk I will present two analysis frameworks that tackle these difficulties and thereby enable the generation of brain atlases for cognitive function. The first one uses deep-learning techniques to extract representations—task-optimized networks—that form a set of basis cognitive dimensions relevant to the psychological manipulations. This approach does not assume any prior knowledge of the commonalities shared by the studies in the corpus; those are inferred during model training. The second one leverages ontologies of cognitive concepts and multi-label brain decoding to map the neural substrate of these concepts. Crucially, it can accurately decode the cognitive concepts recruited in new tasks. These results demonstrate that aggregating independent task-fMRI studies can provide a more precise global atlas of selective associations between brain and cognition.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Stephen Becker

Date: Tuesday, November 15, 2022 16:30

Title: Introduction to compressed sensing and matrix completion.

Abstract: We give a brief introduction to compressed sensing, which is a theory developed in 2004 that shows how, in an appropriate setting, one can take many fewer measurements of a signal than the classical limit. It does this by exploiting compressibility in a particular way. An extension of compressed sensing is matrix completion, which fills in missing entries in a matrix by exploiting another type of compressibility. Applications of both theories were initially mostly for signal processing, but the applications are increasingly turning toward data processing and general machine learning. If time permits, we will review some of our group and collaborator's work on using extensions of these tools on imaging problems such as cloud removal from satellite images, super-resolution fluorescence microscopy, fMRI of brain data, MRI, ultrasound, photo-acoustic tomography, and MEG. A common theme among these applications is the use of numerical optimization.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Flora Jay

Date: Wednesday, November 16, 2022 9:00

Title: Digging Historical Diversity Patterns out of Large-Scale Genomic Data using Exchangeable and Generative Neural Networks.

Abstract: In the era of next generation sequencing, population genetic datasets keep increasing in size and it is now common to observe millions of genomic markers sequenced for hundreds or thousands of individuals. Extracting relevant information from these genomic datasets can be complex due to their size, the complexity the evolutionary process to be inferred, and sometimes impossible due to privacy rules that govern several human genome databases. Deep learning approaches have thus been introduced at different levels of populations genetics for parameter inference, data visualization or generation. I will present some advances we made in the design and application of (i) permutation invariant networks for evolutionary inference and (ii) generative neural networks (GAN, RBM, VAE) that create surrogates of real genomes that could become valuable assets in future genetic studies by limiting privacy issues associated with genome donors.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Julien Chiquet

Date: Wednesday, November 16, 2022 10:30

Title: The Poisson-Lognormal Model as a Versatile Framework for the Joint Analysis of Species Abundances

Abstract: Joint Species Distribution Models (JSDM) provide a general multivariate framework to study the joint abundances of all species from a community. JSDM account for both structuring factors (environmental characteristics or gradients, such as habitat type or nutrient availability) and potential interactions between the species (competition, mutualism, parasitism, etc.), which is instrumental in disentangling meaningful ecological interactions from mere statistical associations. Modeling the dependency between the species is challenging because of the count-valued nature of abundance data and most JSDM rely on Gaussian latent layer to encode the dependencies between species in a covariance matrix. The multivariate Poisson-lognormal (PLN) model is one such model, which can be viewed as a multivariate mixed Poisson regression model. Inferring such models raises both statistical and computational issues, many of which were solved in recent contributions using variational techniques and convex optimization tools. The PLN model turns out to be a versatile framework, within which a variety of analyses can be performed, including multivariate sample comparison, clustering of sites or samples, dimension reduction (ordination) for visualization purposes, or inferring interaction networks. This paper presents the general PLN framework and illustrates its use on a series a typical experimental datasets. All the models and methods are implemented in the R package PLNmodels, available from cran.r-project.org.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Michael Blum

Date: Wednesday, November 16, 2022 11:30

Title: Machine Learning applications in personalized medicine.

Abstract: Personalized medicine offers the opportunity to deliver better treatments to patients accounting for personalized information. Genomic technologies are at the forefront of personalized medicine and allow for better diagnosis and treatment prescriptions. I will review some applications of machine learning in personalized medicine and genomics with a focus on the development of biomarkers in oncology, which guide prescriptions of emerging drugs.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Emmanuel Faure

Date: Wednesday, November 16, 2022 14:00

Title: MorphoDeep : a toolbox to curate fluorescence microscopy images

Abstract: Fluorescence microscopy is an essential tool in the life sciences for investigating the spatiotemporal dynamics of cells, tissues, and developing organisms. But it carries some inherent flaws, for example, the photon budget strongly constraints the equilibrium between image quality and acquisition duration. Deep learning methods have considerably improved image analysis and postprocessing over the recent few years. We will present a deep learning based toolbox to assist microscopists and bioimage analysts with image acquisition, postprocessing and generation. MorphoDeep contains a broad range of methods such as artificial addition of acquisition channel, segmentation curation, image isotropization, artificial numerical zoom as well as image or body inference. Thanks to its signal standardization module, MorphoDeep aims to reach various microscopy techniques as well as various species. These methods will be all available for two-dimensional and three-dimensional microscopy images. MorphoDeep is finally an open framework whose methods are fully pipelinezable and integrated in the bioimage analysis community tools.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Nathalie Vialaneix

Date: Wednesday, November 17, 2022 15:30

Title: Multi-omics data integration methods: kernel and other machine learning approaches.

Abstract: The substantial development of high-throughput biotechnologies has rendered large-scale multi-omics datasets increasingly available. New challenges have emerged to process and integrate this large volume of information, often obtained from widely heterogeneous sources. In this presentation, I will make a brief review of popular data integration methods and then focus on kernel methods and why they are usually well suited to this task.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Yun S. Song

Date: Thursday, November 17, 2022 9:00

Title: Improving Variant Effect Predictions using Language Models and Cross-Protein Transfer Learning.

Abstract: Predicting the effects of mutations is a major challenge in genomics with important applications in disease diagnosis, protein design, and understanding gene regulation. In this talk, I will describe my lab's work on improving variant effect predictions, for both coding and non-coding regions, by leveraging recent advances in unsupervised learning, especially self-supervised learning in natural language processing. For coding variants, I will also present an approach to transfer models between unrelated proteins and demonstrate how it is able to achieve state-of-the-art performance on clinical disease variant prediction.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Charles-Henri Lecellier

Date: Thursday, November 17, 2022 10:30

Title: Machine learning to probe novel regulatory elements in the human genome.

Abstract: Finding how regulatory DNA sequence operates to control genome expression (i.e., characterizing a DNA cis-regulatory code) is key to clinically interpret genetic variations and foster genomic medicine. Bioinformatics and machine learning approaches are instrumental in this task. While numerous approaches exist, most of them focuses on single nucleotides and motifs (typically Transcription Factor (TF) binding sites) and do not take-into-account the fact that the nucleotide distribution along the genome is not uniform, and forms large and relatively homogeneous regions with low complexity i.e., regions with biased composition containing simple sequence repeats (called Low Complexity Regions, LCRs). After providing examples supporting the central roles of LCRs in genomic regulations, I will present statistical and computational methods we develop in our interdisciplinary team to characterize and study the implication of different types of LCRs in two major biological processes: RNA transcription and TF binding. These questions are typical machine learning problems, where the goal is to predict RNA levels (regression) or TF binding (classification) based solely on genomic sequence features (predictive variables). An important aspect of our work is that we aim at developing fully interpretable models, including in the case of deep learning, able to generate novel biological knowledge and to be experimentally validated.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Diego Marcos

Date: Thursday, November 17, 2022 11:30

Title: Evaluating plant trait descriptions crawled from the Web.

Abstract: Although computer vision methods for plant species identification are improving at a fast pace, we cannot be sure if they are doing so in a way that is compatible with current botanical knowledge, resulting in unreliable predictions for data-poor species. In order to make sure that computer vision methods follow a process that more closely resembles what botanists do, we first need to enable them to identify the relevant morphological traits on the image. This requires the construction of a large and comprehensive dataset of morphological traits to train such a system. We propose to crawl textual descriptions from the Web in order to obtain plant species traits in a scalable manner.

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Sophie Donnet

Date: Thursday, November 17, 2022 14:00

Title: Modeling collections of networks by stochastic block models. Application in ecology and sociology.

Abstract: Networks, which allow to represent a set of interactions within a system, are widely used in molecular biology, ecology, sociology... Stochastic block models, which are based on a classification of nodes according to their role in the network, make it possible to learn the macroscopic structure of networks and thus to obtain a summarized image of them. In recent years, it has become necessary to study not one but several networks together. After introducing the stochastic block model, I will present recent works on the modeling collections of networks and illustrate my point with examples from ecology.

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Tim Landgraf

Date: Thursday, November 17, 2022 15:30

Title: BeesBook: Machine learning for the analysis of honeybee social behavior

Abstract: Honeybees are a popular model for collective behavior and decision making. Many thousand individuals live in a crowded hive, each tasked with different duties, from brood care to the collection of food. Bees employ a variety of communication systems, from simple semio-chemicals (e.g. the alarm pheromone) to the abstract dance “language” in which polar coordinates of nearby food sources are indicated. We have developed a pipeline of algorithms to detect, identify and track each individual in small colonies (up to 4000 animals), and to detect communication behaviors to eventually reflect the entire colony life in a time series of multi-modal interaction tensors. In my talk, I will first give a deeper look into the methods we developed and the peculiarities of the dataset, and will then present a novel matrix factorization method to extract rich descriptors for each animal that predict her current task and even her remaining life span. We jointly learn the average developmental path and structured variations of individuals in the social network over their entire lives. Our method yields inherently interpretable embeddings that are biologically relevant and consistent over time, allowing us to compare individuals' functional roles regardless of when or in which colony they lived. While current technological achievements have produced impressive datasets of unprecedented size, only little biological insight has been generated so far. Our methodological advances fill that gap as we provide a novel quantitative framework for understanding behavioral heterogeneity in complex social systems and may finally justify the high efforts to mark hundreds of bees, record Terabytes of videos and run analyses that may take weeks to complete.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Daniele Silvestro

Date: Thursday, November 17, 2022 16:30

Title: Using artificial intelligence to predict future biodiversity trends and guide conservation efforts

Abstract: Over a million species face extinction, urging the need for conservation policies that maximize the protection of biodiversity to sustain its manifold contributions to people. Here we present a suite of new methods aimed to help guiding conservation efforts using artificial intelligence. Specifically, we develop machine learning methods to predict the effect of current and forecasted extinction risks on biodiversity and compare future trends with historical extinction trajectories. We present a deep learning approach to evaluate the extinction risk across thousands of species, complementing the Red List compiled by the International Union for Conservation of Nature (IUCN). Finally, we will introduce a novel framework for spatial conservation prioritization based on reinforcement learning that consistently outperforms available state-of-the-art software using simulated and empirical data. This model, CAPTAIN (Conservation Area Prioritization Through Artificial INtelligence), quantifies the trade-off between the costs and benefits of area and biodiversity protection, allowing the exploration of multiple biodiversity metrics. Under a limited budget, the model protects significantly more species from extinction than areas selected randomly or naively and meets conservation targets more reliably than alternative software. Artificial intelligence holds great promise for improving the conservation and sustainable use of biological and ecosystem values in a rapidly changing and resource-limited world.

RSS Feed | Privacy | Accessibility