sparklyr - R Interface to Apache Spark
R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.
Last updated 2 months ago
apache-sparkdistributeddplyridelivymachine-learningremote-clusterssparksparklyr
15.26 score 957 stars 21 packages 3.7k scripts 46k downloadssparklyr.flint - Sparklyr Extension for 'Flint'
This sparklyr extension makes 'Flint' time series library functionalities (<https://github.com/twosigma/flint>) easily accessible through R.
Last updated 3 years ago
apache-sparkdata-analysisdata-miningdata-sciencedistributeddistributed-computingflintremote-clusterssparksparklyrstatistical-analysisstatisticsstatssummarizationsummary-statisticstime-seriestime-series-analysistwosigma-flint
6.46 score 9 stars 54 scripts 801 downloadssparkwarc - Load WARC Files into Apache Spark
Load WARC (Web ARChive) files into Apache Spark using 'sparklyr'. This allows to read files from the Common Crawl project <http://commoncrawl.org/>.
Last updated 3 years ago
3.89 score 13 stars 12 scripts 162 downloads