15.2 A 28nm 64Kb Inference-Training Two-Way Transpose Multibit 6T SRAM Compute-in-Memory Macro for AI Edge Chips

Su, Jian-Wei; Si, Xin; Chou, Yen-Chi; Chang, Ting-Wei; Huang, Wei-Hsing; Tu, Yung-Ning; Liu, Ruhui; Lu, Pei-Jung; Liu, Ta-Wei; Wang, Jing-Hong; others

Javascript is disabled or not supported in your browser. JavaScript must be enabled in order for you to use WIKINDX fully. Enable JavaScript through your browser options then try again, otherwise, try using a different browser.

AI Bibliography

WIKINDX Resources

Su, J.-W., Si, X., Chou, Y.-C., Chang, T.-W., Huang, W.-H., & Tu, Y.-N., et al.. 2020, 15.2 a 28nm 64kb inference-training two-way transpose multibit 6t sram compute-in-memory macro for ai edge chips. Paper presented at 2020 IEEE International Solid-State Circuits Conference-(ISSCC).

Resource type: Proceedings Article
BibTeX citation key: Su2020
View all bibliographic details

Categories: Artificial Intelligence, Computer Science, Data Sciences, Engineering, General, Innovation, Military Science
Subcategories: Big data, Cloud computing, Command and control, Edge AI, Internet of things, JADC2
Creators: Chang, Chou, Huang, Liu, Liu, Lu, others, Si, Su, Tu, Wang
Publisher:
Collection: 2020 IEEE International Solid-State Circuits Conference-(ISSCC)

Attachments

Abstract

Many Al edge devices require local intelligence to achieve fast computing time (t AC ), high energy efficiency (EF), and privacy. The transfer-learning approach is a popular solution for Al edge chips, wherein data used to re-train the Al in the cloud is used to fine-tune (re-train) a few of the neural layers in edge devices. This enables the dynamic incorporation of data from in-situ environments or private information. Computing-in-memory (CIM) is a promising approach to improve EF for Al edge chips, existing CIM schemes support inference [1]-[5] with forward (FWD) propagation; however, they do not support training, requiring both FWD and backward (BWD) propagation, due to differences in weight-access flow for FWD and BWD propagation. As Fig. 15.2.1 shows, efforts to increase the precision of the input (IN), weight (W), and/or output (OUT) tend to degrade r AC and EF for training operations irrespective of scheme: digital FWD and BWD (DF-DB) or CIM-FWD-digital-BWD (CiMF-DB). This work develops a two-way transpose (TWT) SRAM-CIM macro supporting multibit MAC operations for FWD and BWD propagation with fast r ACand high EF within a compact area. The proposed scheme features (1) A TWT multiply cell (TWT-MC) with a high resistance to process variation; and (2) a small-offset gain-enhancement sense amplifier (SOGE-SA) to tolerate a small read margin. A 28nm 64Kb TWT SRAM-CIM macro was fabricated using a foundry-provided compact 6T-SRAM cell for SRAM-CIM devices supporting both inference and training operations for the first time. This macro also demonstrates the fastest t AC (3.8 - 21ns) and highest EF (7 - 61.1TOPS/w) for MAC operations using 2 - 8b inputs, 4 - 8b weights and 12 - 20b outputs.

WIKINDX 6.7.0 | Total resources: 1621 | Username: -- | Bibliography: WIKINDX Master Bibliography | Style: American Psychological Association (APA)