数据库与机器学习都是数据科学的主要部分。 数据库可以做在线分析处理(OLAP)。
数据需不需要持久化到数据库? 大数据是机器学习必须的吗? 数据存储与数据处理能不能在一起处理? 分布式数据库与分布式机器学习如何结合起来?
数据科学的全流程管理,基于PCS的面向机器学习的便于决策的数据库。
- 数据作为资产。
- 数据驱动的决策与服务。
- 数据密集的应用。
- 实时计算。
Pravega - Streaming as a new software defined storage primitive
- http://www.pravega.io/
- https://github.com/pravega/pravega
- https://moosefs.com/
- https://github.com/chrislusf/seaweedfs
- https://nutanixbible.com/
- https://awesomeopensource.com/projects/distributed-storage
- http://www.xtreemfs.org/
- https://analytics-zoo.github.io/master/
- https://bigdl-project.github.io/0.10.0/
- http://rasbt.github.io/mlxtend/
Samsara is an open source real-time analytics platform
Getting Started with Apache MADlib using Jupyter Notebooks We have created a library of Jupyter Notebooks to help you get started quickly with MADlib. It includes many commonly used algorithms by data scientists.
Apache PredictionIO is an open source machine learning framework for developers, data scientists, and end users. It supports event collection, deployment of algorithms, evaluation, querying predictive results via REST APIs. It is based on scalable open source services like Hadoop, HBase (and other DBs), Elasticsearch, Spark and implements what is called a Lambda Architecture.