Login |

Parallel Data Mining


Credits: SWS : 2, Credits 4

Participants: Min: 1 Max: 12 Expected: 6

Course type : Seminar

Language: english


How to work together without talking.

Spending (massively) parallel and distributed resources on dealing with more data is a common trend in data analysis. However, using distributed resources to find better solutions is a brand new research area. Not only does it require search strategies that can be distributed but it also relies on distributing algorithms that inherently need to communicate to guarantee global properties. Reducing (or ideally: eliminating) that - potentially enormous - communication overhead is one of the main challenges of this area of research.

The seminar will focus on discussing papers from related disciplines such as distributed search algorithms and self tuning heuristics. Participants will be expected to study a recent paper and present the described methodology in a common framework in addition to writing a short summary report.


The slides of the introduction can be found here and here.

List of papers

The paper that details the widening framework can be found here.

Due to copyright constraints we cannot host the papers here. They can easily be found via Google (Scholar).

2005 - Computing nearest neighbors in real time
2009 - Distributed Similarity Search in High Dimensions Using Locality Sensitive Hashing
2009 - G-Hash: Towards Fast Kernel-based Similarity Search in Large Graph Databases
2012 - Multiple Choice Learning: Learning to Produce Multiple Structured Outputs
2012 - Efficient Decomposed Learning for Structured Prediction
2013 - Computing the M Most Probable Modes of a Graphical Model
2013 - Topology Preserving Hashing for Similarity Search
2014 - Efficiently Enforcing Diversity in Multi-Output Structured Prediction
2014 - Submodular meets Structured: Finding Diverse Subsets in Exponentially-Large Structured Item Sets