sparklyr - R Interface to Apache Spark
R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.
Last updated 24 days ago
apache-sparkdistributeddplyridelivymachine-learningremote-clusterssparksparklyr
946 stars 15.84 score 38 dependencies 21 dependentssparklyr.flint - Sparklyr Extension for 'Flint'
This sparklyr extension makes 'Flint' time series library functionalities (<https://github.com/twosigma/flint>) easily accessible through R.
Last updated 3 years ago
apache-sparkdata-analysisdata-miningdata-sciencedistributeddistributed-computingflintremote-clusterssparksparklyrstatistical-analysisstatisticsstatssummarizationsummary-statisticstime-seriestime-series-analysistwosigma-flint
9 stars 6.50 score 39 dependenciessparkwarc - Load WARC Files into Apache Spark
Load WARC (Web ARChive) files into Apache Spark using 'sparklyr'. This allows to read files from the Common Crawl project <http://commoncrawl.org/>.
Last updated 3 years ago
13 stars 3.89 score 40 dependencies