Tutorial - River - a library for data stream mining


Dr Jacob Montiel

  • Incremental learning - All the tools in river can be updated with a single sample at a time.

  • Adaptive learning - Adaptive methods are specifically designed to be robust against concept drift in dynamic environments.

  • General-purpose - River caters for different machine learning problems, including regression, classification, unsupervised learning, and ad-hoc tasks.

  • Efficient - By design, streaming techniques efficiently handle resources such as memory and processing time, given the unbounded nature of data streams.

  • Easy to use - River is intended for users with any experience level. As a machine learning package, it caters for practitioners as well as researchers.

  • Expandable - River is a constantly evolving resource with new and updated tools providing additional, or improved, capabilities.

  • Topics

    • From batch to stream learning.

    • Evaluating model accuracy.

    • Process training sample points one at a time.

    • Python programming.

    • Stream processing

      • Basic concepts.
      • Data pre-processing.
    • Sample problem - NOAA weather data ('NEWWeather' dataset)

      • Decision Trees.
      • Pipelines (chaining sequences of operations).
      • Visualising operations.
      • Concept drift.
  • Additional Resources