RowCore: A Processing-Near-Memory Architecture for Big Data Machine Learning
2016
- 1,230Usage
Metric Options: CountsSelecting the 1-year or 3-year option will change the metrics count to percentiles, illustrating how an article or review compares to other articles or reviews within the selected time period in the same journal. Selecting the 1-year option compares the metrics against other articles/reviews that were also published in the same calendar year. Selecting the 3-year option compares the metrics against other articles/reviews that were also published in the same calendar year plus the two years prior.
Example: if you select the 1-year option for an article published in 2019 and a metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019. If you select the 3-year option for the same article published in 2019 and the metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019, 2018 and 2017.
Citation Benchmarking is provided by Scopus and SciVal and is different from the metrics context provided by PlumX Metrics.
Example: if you select the 1-year option for an article published in 2019 and a metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019. If you select the 3-year option for the same article published in 2019 and the metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019, 2018 and 2017.
Citation Benchmarking is provided by Scopus and SciVal and is different from the metrics context provided by PlumX Metrics.
Metrics Details
- Usage1,230
- Downloads1,057
- 1,057
- Abstract Views173
Article Description
The technology-push of die stacking and application-pull ofBig Data machine learning (BDML) have created a uniqueopportunity for processing-near-memory (PNM). This papermakes four contributions: (1) While previous PNM workexplores general MapReduce workloads, we identify threeworkload characteristics: (a) irregular-and-compute-light (i.e.,perform only a few operations per input word which includedata-dependent branches and indirect memory accesses); (b)compact (i.e., the computation has a small intermediate livedata and uses only a small amount of contiguous input data);and (c) memory-row-dense (i.e., process the input data withoutskipping over many bytes). We show that BDMLs haveor can be transformed to have these characteristics which,except for irregularity, are necessary for bandwidth- and energyefficientPNM, irrespective of the architecture. (2) Based onthese characteristics, we propose RowCore, a row-orientedPNM architecture, which (pre)fetches and operates on entirememory rows to exploit BDMLs’ row-density. Insteadof this row-centric access and compute-schedule, traditionalarchitectures opportunistically improve row locality whilefetching and operating on cache blocks. (3) RowCore employswell-known MIMD execution to handle BDMLs’ irregularity,and sequential prefetch of input data to hide memorylatency. In RowCore, however, one corelet prefetchesa row for all the corelets which may stray far from eachother due to their MIMD execution. Consequently, a leadingcorelet may prematurely evict the prefetched data beforea lagging corelet has consumed the data. RowCore employsnovel cross-corelet flow-control to prevent such eviction. (4)RowCore further exploits its flow-controlled prefetch for frequencyscaling based on novel coarse-grain compute-memoryrate-matching which decreases (increases) the processor clockspeed when the prefetch buffers are empty (full). Using simulations,we show that RowCore improves performance andenergy, by 135% and 20% over a GPGPU with prefetch,and by 35% and 34% over a multicore with prefetch, whenall three architectures use the same resources (i.e., numberof cores, and on-processor-die memory) and identical diestacking(i.e., GPGPUs/multicores/RowCore and DRAM).
Bibliographic Details
Provide Feedback
Have ideas for a new metric? Would you like to see something else here?Let us know