Publications — Andrew Sundstrom

Popular Press Articles

[1] Andrew Sundstrom. “A Computational Model for Intelligent Manufacturing”. Industry Today. 21 Jan 2021.
Abstract: Due to human limits and blind spots, and static process control, manufacturing errors often go undetected or unreported and are propagated. To make factories more efficient, resilient, and secure, we need to begin by reconsidering how errors are detected and remedied in the assembly line by using a more dynamic approach. Artificial Intelligence Process Control (AIPC) finds solutions to defects in near real-time, during the manufacturing process.

Refereed Articles

[9] Damas Limoge, Andrew Sundstrom, Vadim Pinskiy, Matthew Putman. “Defending industrial production using AI process control”. Proceedings of the IEEE/NDIA/INCOSE System Security Symposium 2020, Crystal City, Virginia, USA (15 Sep 2020). [doi: 10.1109/SSS47320.2020.9197727] [article: pdf]
Abstract: Cyberattacks have grown more nuanced and sophisticated in recent years, in part to meet the growing complexity of the systems they are designed to compromise or destroy. The new breed of cyberattacks are decidedly systemic, affecting more than a single node or a single point of failure, to better hide and time-integrate its malicious programming. Current modes of intrusion detection and correction in an industrial setting are based on a statistical process control scheme that unfolded in the mid-Twentieth Century and which, while still effective for diagnosing pronounced, single-node malicious behavior, is ill-suited for the properties of modern, sophisticated cyberattacks. We propose a novel approach, based on deep reinforcement learning, that treats malicious behavior as a process variation and corrects for it by actively tuning the operating parameters of the system. In this way, it can be layered atop, and functionally complement, standard statistical process control. We describe our approach in the additive manufacturing setting of 3D printing and explain how it can scale to large systems composed of many nodes in a complex topology. We argue an overlay of AI process control facilitates whole-system protection against modern cyberattacks.

[8] Andrew Sundstrom, Eun-Sol Kim, Damas Limoge, Vadim Pinskiy, Matthew Putman. “A computational model for decision-making and assembly optimization in manufacturing”. Proceedings of the American Control Conference 2020, Denver, Colorado, USA (31 Jul 2020). [doi: 10.23919/ACC45564.2020.9147715] [article: pdf]
Abstract: Full-scale automated manufacturing is reserved for selected industries and high quantity production of single parts. The majority of consumer manufacturing and industrial component manufacturing remains a manual or, at best, semi-automated process with a large human element. Though advances have been made in computer aided quality control for defective part classification and sorting, these techniques do not address the inefficiency and cost of discarding faulty products at the end of the manufacturing cycle. We present a Deep Learning model for detecting and correcting errors in a sample manufacturing process early in a multi-node assembly chain. Instead of simply classifying individual items into quality groups, our model aims to track the manufacturing process in real-time and if an error is detected, the model makes changes to subsequent assembly steps to recover from the error and save the part. This model and system can be applied to any manufacturing cycle with a human assembly feedback control and allows for product manufacturing to be dynamically altered throughout the process.

[7] Andrew Sundstrom, Damas Limoge, Vadim Pinskiy, Matthew Putman. “Securing industrial production from sophisticated cyberattacks”. Proceedings of the 6th International Conference on Information Systems Security and Privacy - Volume 1: ICISSP: 663-670 (26 Mar 2020), Valletta, Malta. [doi: 10.5220/0009148206630670] [article: pdf]
Abstract: Sophisticated industrial cyberattacks focus on machine level operating systems to introduce process variations that are undetected by conventional process control, but over time, are detrimental to the system. We propose a novel approach to industrial security, by treating suspect malicious activity as a process variation and correcting for it by actively tuning the operating parameters of the system. As threats to industrial systems increase in number and sophistication, conventional security methods need to be overlaid with advances in process control to reinforce the system as a whole.

[6] Andrew Sundstrom, Dafna Bar-Sagi, Bud Mishra. "Simulating heterogeneous tumor cell populations". PLoS ONE, 11(12): e0168984 (28 Dec 2016). [doi: 10.1371/journal.pone.0168984] [pmid: 28030620] [article: pdf] [code]
Abstract: Certain tumor phenomena, like metabolic heterogeneity and local stable regions of chronic hypoxia, signify a tumor's resistance to therapy. Although recent research has shed light on the intracellular mechanisms of cancer metabolic reprogramming, little is known about how tumors become metabolically heterogeneous or chronically hypoxic, namely the initial conditions and spatiotemporal dynamics that drive these cell population conditions. To study these aspects, we developed a minimal, spatially-resolved simulation framework for modeling tissue-scale mixed populations of cells based on diffusible particles the cells consume and release, the concentrations of which determine their behavior in arbitrarily complex ways, and on stochastic reproduction. We simulate cell populations that self-sort to facilitate metabolic symbiosis, that grow according to tumor-stroma signaling patterns, and that give rise to stable local regions of chronic hypoxia near blood vessels. We raise two novel questions in the context of these results: (1) How will two metabolically symbiotic cell subpopulations self-sort in the presence of glucose, oxygen, and lactate gradients? We observe a robust pattern of alternating striations. (2) What is the proper time scale to observe stable local regions of chronic hypoxia? We observe the stability is a function of the balance of three factors related to O₂—diffusion rate, local vessel release rate, and viable and hypoxic tumor cell consumption rate. We anticipate our simulation framework will help researchers design better experiments and generate novel hypotheses to better understand dynamic, emergent whole-tumor behavior.

[5] Andrew Sundstrom, Elda Grabocka, Dafna Bar-Sagi, Bud Mishra. "Histological image processing features induce a quantitative characterization of chronic tumor hypoxia". PLoS ONE, 11(4): e0153623 (19 Apr 2016). [doi: 10.1371/journal.pone.0153623] [pmid: 27093539] [article: pdf] [code] [data]
Abstract: Hypoxia in tumors signifies resistance to therapy. Despite a wealth of tumor histology data, including anti-pimonidazole staining, no current methods use these data to induce a quantitative characterization of chronic tumor hypoxia in time and space. We use image-processing algorithms to develop a set of candidate image features that can formulate just such a quantitative description of xenographed colorectal chronic tumor hypoxia. Two features in particular give low-variance measures of chronic hypoxia near a vessel: intensity sampling that extends radially away from approximated blood vessel centroids, and multithresholding to segment tumor tissue into normal, hypoxic, and necrotic regions. From these features we derive a spatiotemporal logical expression whose truth value depends on its predicate clauses that are grounded in this histological evidence. As an alternative to the spatiotemporal logical formulation, we also propose a way to formulate a linear regression function that uses all of the image features to learn what chronic hypoxia looks like, and then gives a quantitative similarity score once it is trained on a set of histology images.

[4] Justin Jee, Andrew Sundstrom, Steven E. Massey, Bud Mishra. "What can information-asymmetric games tell us about the context of Crick's 'frozen accident'?" Journal of the Royal Society Interface, 10(88):20130614 (6 Nov 2013). Published online before print 28 Aug 2013. [doi: 10.1098/rsif.2013.0614] [pmid: 23985735] [article: pdf] [supplementary material #1: pdf], [supplementary material #2: pdf], [supplementary material #3: pdf], [supplementary material #4: pdf], [supplementary material #5: pdf]
Abstract: This paper describes a novel application of information-asymmetric (signaling) games to molecular biology in which utility is determined by the message complexity (rate) in addition to the error in information transfer (distortion). We show using a computational model how it is possible for the agents in one such game to evolve a signaling convention (separating equilibrium) that is suboptimal in terms of information transfer, but is nonetheless stable. In the context of an RNA world merging with a nascent amino acid one, such a game’s equilibrium is alluded to by the genetic code, which is nearly optimal in terms of information transfer, but is also near-universal and nearly immutable. Such a framework suggests that cellularity may have emerged to encourage coordination between RNA species and sheds light on other aspects of RNA world biochemistry yet to be fully understood.

[3] Andrew Sundstrom, Silvio Cirrone, Salvatore Paxia, Carlin Hsueh, Rachel Kjolby, James K. Gimzewski, Jason Reed, Bud Mishra. "Image analysis and length estimation of biomolecules using AFM". IEEE Transactions on Information Technology in Biomedicine, 16(6):1200-1207 (Nov 2012). Published online before print 29 Jun 2012. [doi: 10.1109/TITB.2012.2206819] [pmid: 22759526] [article: pdf] [supplementary material: pdf]
Abstract: There are many examples of problems in pattern analysis for which it is often possible to obtain systematic characterizations, if in addition a small number of useful features or parameters of the image are known a priori or can be estimated reasonably well. Often the relevant features of a particular pattern analysis problem are easy to enumerate, as when statistical structures of the patterns are well understood from the knowledge of the domain. We study a problem from molecular image analysis, where such a domain-dependent understanding may be lacking to some degree and the features must be inferred via machine-learning techniques. In this paper, we propose a rigorous, fully-automated technique for this problem. We are motivated by an application of atomic force microscopy (AFM) image processing needed to solve a central problem in molecular biology, aimed at obtaining the complete transcription profile of a single cell, a snapshot that shows which genes are being expressed and to what degree. Reed, et al (“Single molecule transcription profiling with AFM”, Nanotechnology, 18:4, 2007) showed the transcription profiling problem reduces to making high-precision measurements of biomolecule backbone lengths, correct to within 20-25 bp (6-7.5 nm). Here we present an image processing and length estimation pipeline using AFM that comes close to achieving these measurement tolerances. In particular, we develop a biased length estimator on trained coefficients of a simple linear regression model, biweighted by a Beaton-Tukey function, whose feature universe is constrained by James-Stein shrinkage to avoid overfitting. In terms of extensibility and addressing the model selection problem, this formulation subsumes the models we studied.

[2] Jason Reed, Carlin Hsueh, Miu-Ling Lam, Rachel Kjolby, Andrew Sundstrom, Bud Mishra, and James K. Gimzewski. "Identifying individual DNA species in a complex mixture by precisely measuring the spacing between nicking restriction enzymes with atomic force microscope". Journal of the Royal Society Interface, 9(74):2341-2350 (7 Sep 2012). Published online before print 28 March 2012. [doi: 10.1098/rsif.2012.0024] [pmid: 22456455] [article: pdf] [supplementary material: doc]
Abstract: We discuss a novel atomic force microscope-based method for identifying individual short DNA molecules (<5000 bp) within a complex mixture by measuring the intra-molecular spacing of a few sequence-specific topographical labels in each molecule. Using this method, we accurately determined the relative abundance of individual DNA species in a 15-species mixture, with fewer than 100 copies per species sampled. To assess the scalability of our approach, we conducted a computer simulation, with realistic parameters, of the hypothetical problem of detecting abundance changes in individual gene transcripts between two single-cell human messenger RNA samples, each containing roughly 9,000 species. We found that this approach can distinguish transcript species abundance changes accurately in most cases, including transcript isoforms which would be challenging to quantitate with traditional methods. Given its sensitivity and procedural simplicity, our approach could be used to identify transcript-derived complementary DNAs, where it would have substantial technical and practical advantages versus established techniques in situations where sample material is scarce.

[1] Hao Wu, Kevin J. Kim, Kshama Mehta, Salvatore Paxia, Andrew Sundstrom, Thomas Anantharaman, Ali I. Kuraishy, Tri Doan, Jayati Ghosh, April D. Pyle, Amander Clark, William Lowry, Guoping Fan, Tim Baxter, Bud Mishra, Yi Sun, Michael A. Teitell. "Copy number variant analysis of human embryonic stem cells". Stem Cells, 26(6):1484-1489 (Jun 2008). Published online before print 27 Mar 2008. [doi: 10.1634/stemcells.2007-0993] [pmid: 18369100] [article: pdf] [supplementary material #1: pdf] [supplementary material #2: pdf]
Abstract: Differences between individual DNA sequences provide the basis for human genetic variability. Forms of genetic variation include single-nucleotide polymorphisms, insertions / duplications, deletions, and inversions / translocations. The genome of human embryonic stem cells (hESCs) has been characterized mainly by karyotyping and comparative genomic hybridization (CGH), techniques whose relatively low resolution at 2–10 megabases (Mb) cannot accurately determine most copy number variability, which is estimated to involve 10\%-20\% of the genome. In this brief technical study, we examined HSF1 and HSF6 hESCs using array-comparative genomic hybridization (aCGH) to determine copy number variants (CNVs) as a higher-resolution method for characterizing hESCs. Our approach used five samples for each hESC line and showed four consistent CNVs for HSF1 and five consistent CNVs for HSF6. These consistent CNVs included amplifications and deletions that ranged in size from 20 kilobases to 1.48 megabases, involved seven different chromosomes, were both shared and unique between hESCs, and were maintained during neuronal stem/progenitor cell differentiation or drug selection. Thirty HSF1 and 40 HSF6 less consistently scored but still highly significant candidate CNVs were also identified. Overall, aCGH provides a promising approach for uniquely identifying hESCs and their derivatives and highlights a potential genomic source for distinct differentiation and functional potentials that lower-resolution karyotype and CGH techniques could miss.

Doctoral Dissertation

"Toward a computational solution to the inverse problem of how hypoxia arises in metabolically heterogeneous cancer cell populations". Accepted 23 Sep 2013 (readers: Profs. Bud Mishra, Dafna Bar-Sagi, and Leslie Greengard; auditors: Profs. Ravi Iyengar and Ernest Davis). [ProQuest] [pdf]
Abstract: As a tumor grows, it rapidly outstrips its blood supply, leaving portions of tumor that undergo hypoxia. Hypoxia is strongly correlated with poor prognosis as it renders tumors less responsive to chemotherapy and radiotherapy. During hypoxia, HIFs upregulate production of glycolysis enzymes and VEGF, thereby promoting metabolic heterogeneity and angiogenesis, and proving to be directly instrumental in tumor progression. Prolonged hypoxia leads to necrosis, which in turn activates inflammatory responses that produce cytokines that stimulate tumor growth. Hypoxic tumor cells interact with macrophages and fibroblasts, both involved with inflammatory processes tied to tumor progression. So it is of clinical and theoretical significance to understand: Under what conditions does hypoxia arise in a heterogeneous cell population? Our aim is to transform this biological origins problem into a computational inverse problem, and then attack it using approaches from computer science. First, we develop a minimal, stochastic, spatiotemporal simulation of large heterogeneous cell populations interacting in three dimensions. The simulation can manifest stable localized regions of hypoxia. Second, we employ and develop a variety of algorithms to analyze histological images of hypoxia in xenographed colorectal tumors, and extract features that can be used to construct a modal-logical characterization of hypoxia. We also consider characterizing hypoxia by a linear regression functional learning mechanism that yields a similarity score. Third, we employ a Bayesian statistical model checking algorithm that can be used to determine, over some bounded number of simulation executions, whether hypoxia is likely to emerge under some fixed set of simulation parameters, and some fixed modal-logical or functional description of hypoxia. Driving the model checking process is one of three adaptive Monte Carlo sampling algorithms we developed to explore the high dimensional space of simulation initial conditions and operational parameters. Taken together, these three system components formulate a novel approach to the inverse problem above, and constitute a design for a tool that can be placed into the hands of experimentalists, for testing hypotheses based upon known parameter values or ones the tool might discover. In principle, this design can be generalized to other biological phenomena involving large heterogeneous populations of interacting cells.

Master’s Thesis

"Measuring biomolecules: an image processing and length estimation pipeline using atomic force microscopy to measure DNA and RNA with high precision". Accepted 22 Sep 2008 (readers: Profs. Bud Mishra and Davi Geiger). [Stockholm] [pdf]
Abstract: Background. An important problem in molecular biology is to determine the complete transcription profile of a single cell, a snapshot that shows which genes are being expressed and to what degree. Seen in series as a movie, these snapshots would give direct, specific observation of the cell.s regulation behavior. Taking a snapshot amounts to correctly classifying the cell's ~300,000 mRNA molecules into ~30,000 species, and keeping accurate count of each species. The cell's transcription profile may be affected by low abundances (1-5 copies) of certain mRNAs; thus, a sufficiently sensitive technique must be employed. A natural choice is to use atomic force microscopy (AFM) to perform single-molecule analysis. Reed, et al (``Single molecule transcription profiling with AFM'', Nanotechnology, 18:4, 2007) developed such an analysis that classifies each mRNA by first multiply cleaving its corresponding synthesized cDNA with a restriction enzyme, then constructing its classification label from ratios of the lengths of its resulting fragments. Thus, they showed the transcription profiling problem reduces to making high-precision measurements of cDNA backbone lengths—correct to within 20-25 bp (6-7.5 nm). Contribution. We developed an image processing and length estimation pipeline using AFM that can achieve these measurement tolerances. In particular, we developed a biased length estimator using James-Stein shrinkage on trained coefficients of a simple linear regression model, a formulation that subsumes the models we studied. Methods. First, AFM images were processed to extract molecular objects, skeletonize them, select proper backbone objects from the skeletons, then compute initial lengths of the backbones. Second, a linear regression model was trained on a subset of molecules of known length, namely their computed image feature quantities. Third, the model's coefficients underwent James-Stein shrinkage to create a biased estimator. Fourth, the trained and tuned model was applied to the image feature quantities computed for each test molecule, giving its final, corrected backbone length. Results. Training data: one monodisperse set of cDNA molecules of theoretical length 75 nm. Test data: two monodisperse sets of cDNA molecules of unknown length. Corrected distributions of molecular backbone lengths were within 6-7.5 nm from the theoretical lengths of the unknowns, once revealed. Conclusions. The results suggest our pipeline can be employed in the framework specified by Reed, et al to render single-molecule transcription profiles. The results reveal a high degree of systematic error in AFM measurements that suggests image processing alone is insufficient to achieve a much higher measurement accuracy.

Research Contributions