Login |

Michael R. Berthold: Selected Publications

Below is a selection of most relevant publications, describing past and recent research activities (a full list can be found on our publication page):

  • General Data Analysis Text Books:

    • NEW! A new look at data analysis through practitioner's eyes but still with solid theoretical basis. A comprehensive coverage of everything that needs to be known to successfully and intelligently analyze real world data:

    • Also still available in a second, completely revised and extended edition (now in its third print):

  • General Review and Other Articles of Interest:
    • A recent article discussing trends in data analysis, claiming that we need for the more complex data analysis scenarios we need to discovery support systems rather than pattern finding algorithms:
    • The Nature Methods paper by a number of colleagues discussing the need for integrative informatics platforms for biological image analysis:
      • Eliceiri, Kevin W., Berthold, Michael R., Goldberg, Ilya G., Ibáñez, Luis, Manjunath, B. S., Martone, Maryann E., Murphy, Robert F., Peng, Hanchuan, Plant, Anne L., Roysam, Badrinath, Stuurmann, Nico, Swedlow, Jason R., Tomancak, Pavel, Carpenter, Anne E., Biological imaging software tools, Nature Methods, vol. 9, pp. 697-710, 2012; Doi: 10.1038/nmeth.2084. [BibTeX]


  • Bisociative Knowledge Discovery:
    Finding patterns crossing domains that trigger new insights is one of the still open problems in data analysis. Within a recently concluded EU Project (BISON), a number of network and other analysis methods were developed to address this challenge. Motivation, formalization of that problem domain, and a number of very promising approaches can be found in the resulting open access LNCS/LNAI state of the art volume:
  • Active Learning:
    An approach to learn from large, unlabeled data with access to an (expensive) oracle to label isolated instances. The PBAC algorithm fuses exploration and exploitation and hence avoids random initializations but still fine tunes class boundaries later on.

  • Parallel Universes:
    Learning in different descriptor spaces (parallel universes) in concert, resulting in a model that spreads out over universes. The setup in the first paper and Neighborgrams, an intuitive way to visually explore clusters in several descriptor spaces in the second one.:

    • Bernd Wiswedel, Frank Höppner and Michael R. Berthold: Learning in Parallel Universes, Data Mining and Knowledge Discovery, vol. 21, pp. 130-152, 2010. [PDF] [BibTeX]
    • Michael R. Berthold, Bernd Wiswedel and David Patterson: Interactive Exploration of Fuzzy Clusters Using Neighborgrams , Fuzzy Sets and Systems, vol. 149, no. 1, pp. 21-37, Elsevier, 2005 [PDF] [BibTeX]

  • Discriminative Fragment Mining:
    The paper introducing MoFa, our frequent fragment mining algorithm:

    • Christian Borgelt, Michael R. Berthold, Mining Molecular Fragments: Finding Relevant Substructures of Molecules, IEEE Data Mining, pp. 51-58, IEEE Press, 2002, [MoFa page] [PDF] [BibTeX]

  • Fuzzy Rules:

    • Learning fuzzy rules from numerical, nominal and granulated data:

      • Michael R. Berthold, Mixed Fuzzy Rule Formation , International Journal of Approximate Reasoning (IJAR), vol. 32, pp. 67-84, Elsevier, 2003 [PDF] [BibTeX]

      see also the follow-up paper where we investigate the influence of various norms and inference methods:

      • Thomas R. Gabriel, Michael R. Berthold, Influence of fuzzy norms and other heuristics on "Mixed Fuzzy Rule Formation" , International Journal of Approximate Reasoning (IJAR), vol. 35, pp. 195-202, Elsevier, 2004 [PDF] [BibTeX]


    • Fuzzy rules in parallel coordinates, an easy way to display them in medium dimensional spaces:

      • Michael R. Berthold, Lawrence O. Hall, Visualizing Fuzzy Points in Parallel Coordinates , IEEE Transactions on Fuzzy Systems, vol. 11, no. 3, pp. 369-374, IEEE Press, 2003 [PDF] [BibTeX]


    • Analyzing fuzzy models can give interesting insights into the relevance of the available features, here shown for a medical application:

      • Rosaria Silipo, Michael R. Berthold, Input Features Impact on Fuzzy Decision Processes , IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics, pp. 821-834, 2000 [PDF] [BibTeX]

  • Probabilistic Neural Networks(PNN):

    • A fast and robust method to build PNNs from scratch:

      • Michael R. Berthold, Jay Diamond, Constructive Training of Probabilistic Neural Networks , NeuroComputing, vol. 19, pp. 167-183, Elsevier Publisher, 1998 [PDF] [BibTeX]


    • The original NIPS paper, introducing the DDA:

      • Michael R. Berthold, Jay Diamond, Boosting the Performance of RBF Networks with Dynamic Decay Adjustment , Advances in Neural Information Processing Systems, Gerry Tesauro, David S. Touretzky, Todd K. Leen (eds), pp. 512-528, MIT Press, 1995 [PDF] [BibTeX]