Multi-Ported Memories for FPGAs (and ASICs too)
As FPGAs increase in size, designers use them to build larger
systems-on-chip that require frequent data sharing, communication, queueing,
and synchronization among distributed functional units. These features boil
down to FIFOs and register files, which we can implement using multi-ported
memories.
This page indexes research papers about multi-ported memories, from IEEE,
IET, ACM, and Hindawi publications, focusing on architectural techniques for
multi-ported memory other than word/bit-line replication in VLSI SRAMs.
Also not covered here (or only briefly mentioned) are related publications
on post-silicon testing and repair of ASIC multi-ported memories, and
implementations of Content-Addressable Memories (CAMs), caches, and VLSI SRAM
memories.
Prior to 2010, multi-ported memories referred to SRAM memories with extra
word/bit lines, simple multi-pumping schemes, or memories with arbitration
logic to support multiple pending requests from several functional units. After
2010, multi-ported memories focus on implementations supporting multiple,
concurrent, non-blocking accesses, with different area, power, and speed
tradeoffs.
Please direct any questions, corrections, or suggested additions to
Charles Eric LaForest, PhD at eric@fpgacpu.ca.
2019
- Efficient Write Scheme for Algorithm-Based Multi-Ported Memory
Bo-Ya Chen, Bo-En Cher, Bo-Cheng Lai
International Symposium on VLSI Design, Automation and Test (VLSI-DAT), April 2019
"REMAP+ applies the banking structure of memory design and implements the remap table with SRAM cells instead of costly registers. In the remap table, REMAP+ only keeps the most significant bit of write addresses to more efficiently utilize the space in the table."
2018
- Lossy Multiport Memory
Bowen P.Y. Kwan, Gary C.T. Chow, Tim Todman, Wayne Luk, Wenguang Xu
International Conference on Field-Programmable Technology (FPT), Naha, Okinawa, Japan, December 2018
"The proposed architecture contains parallel memory banks connected by lossy switch networks to multiple input ports and local ring buffers. For 4 parallel read/write ports, our design reduces BRAM usage by 68% while having the operating frequency increased by 50% as compared to state-of-the-art memory designs. The drop rate of the design is 2% under full port utilization, and is reducible without altering the architecture at runtime."
- Optimized Counter-Based Multi-Ported Memory Architectures for Next-Generation FPGAs
S. Navid Shahrouzi, Darshika G. Perera
31st IEEE International System-on-Chip Conference (SOCC), September 2018
"In this paper, we introduce four unique, novel, and optimized counter-based multi-ported memory architectures. With our memories, circular paths are eliminated; thus significantly reducing the design/routing complexity, while enhancing the operating frequency and area-efficiency."
- Designing a shared access memory and its application in data transmission and protection systems
Petro Humenniy, Orest Volynskyy, Ivan Albanskiy, Artur Voronych
14th International Conference on Advanced Trends in Radioelecrtronics, Telecommunications and Computer Engineering (TCSET), April 2018, Lviv-Slavske, Ukraine
"The method of parallel access of a large number of subscribers to a shared memory based on Galois field codes is developed. This method provides parallel data recording into multiport memory of the subscribers' mailboxes, and also allows for simultaneous parallel reading of any addressed page array. [...] Shared access memory is proposed to be used for data protection on the basis of the residue number system."
- Efficient TCAM Design Based on Multipumping-Enabled Multiported SRAM on FPGA
Inayat Ullah, Zahid Ullah, Jeong-A Lee
IEEE Access, Volume 6, April 2018
"This paper presents a multipumping-enabled multiported SRAM-based TCAM design on FPGA, to achieve an efficient utilization of SRAM memory."
- A 4-READ 2-WRITE Multi-Port Register File Design Using Pulsed-Latches
T.S Manivannan, Meena Srinivasan
Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), March 2018, Coimbatore, India
"Pulsed-latches provide high performance with low power consumption by taking the advantages of both flip-flops and latches [...] In this paper, an area efficient and low power consumption design approach is proposed to perform the multi-read and multi-write operations in the pulsed-latches based multiport register files."
2017
- Work-in-progress: efficient pulsed-latch implementation for multiport register files
Wael M. Elsharkasy, Hasan Erdem Yantir, Amin Khajeh, Ahmed M. Eltawil, Fadi J. Kurdahi
International Conference on Compilers, Architectures and Synthesis For Embedded Systems (CASES), October 2017, Seoul, South Korea
"In this paper, register file design using pulsed latches is presented. Having some advantages in performance, area and power, pulsed latches represent an attractive implementation of register files. In addition, a proposed multiport register file architecture is introduced using single physical read/write ports to virtualize additional ports for read and write. The initial results show huge savings in area and power in comparison to the traditional architectures."
- An Efficient Hierarchical Banking Structure for Algorithmic Multiported Memory on FPGA
Bo-Cheng Charles Lai, Kun-Hua Huang
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Volume: 25, Issue 10, October 2017
"This paper proposes HB-NTX with a resource efficient hierarchical banking structure for nontable-based multi-ported memory design on FPGA."
(Hierarchical XOR-based multiported memories)
- An efficient FPGA-based memory architecture for compute-intensive applications on embedded devices
S. Navid Shahrouzi, Darshika G. Perera
IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM), August 2017, Victoria, BC, Canada
"Our proposed memory architecture bridges the gap between a major tradeoff associated with the existing multi-ported memory designs, i.e., increasing the number of ports, reduces the total
available memory on chip for the BRAMs to store essential data for real-time processing. With our unique memory design, an ample number of ports can be provided for simultaneous R/W operations, without significantly reducing the total available on-chip memory to store essential data. Our proposed Port Priority Module coupled with the proposed Port Assignment Algorithm, manage and assign the ports to the memory banks to avoid any contention issues. We also introduced a Data Distribution Scheme, which gives guidelines to the application designers to efficiently manage and assign the data to the corresponding ports."
- An efficient embedded multi-ported memory architecture for next-generation FPGAs
S. Navid Shahrouzi, Darshika G. Perera
IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP), July 2017, Seattle, WA, USA
"Apart from the BRAMs, our proposed multi-ported memory design only consists of the Decision Making Modules and a counter, thus simplifying the design process. [...] We also evaluate our architecture with the most recently proposed multi-ported memory designs, implemented using LVT and XOR techniques, from the existing literature."
- Implementation of BIST technology for fault detection and repair of the multiported memory using FPGA
S. R. Patil, D. B. Musle
International conference of Electronics, Communication and Aerospace Technology (ICECA), April 2017, Coimbatore, India
"Built-in self test is used to detect and repair the faults in multiported memory. The microcode based Asynchronous p-MBIST is implemented using FPGA and compared with similar feature Synchronous P-MBIST."
- Modular Switched Multiported SRAM-Based Memories
Abdelhadi, Ameer M. S. and Lemieux, Guy G. F.
ACM Transactions on Reconfigurable Technology and Systems (TRETS) - Special Issue on Reconfigurable Components with Source Code, Volume 9, Issue 3, September 2016, Article No. 22
"In this article, a novel, modular, and BRAM-based switched multiported RAM architecture is proposed. In addition to unidirectional ports with fixed read/write, this switched architecture allows a group of write ports to switch with another group of read ports dynamically, hence altering the number of active ports. The proposed switched-ports architecture is less flexible than a true-multiported RAM where each port is switched individually. Nevertheless, switched memories can dramatically reduce BRAM consumption compared to true ports for systems with alternating port requirements. [...] Previous live-value-table (LVT) and XOR approaches are merged and optimized into a generalized and modular structure that we call an invalidation-based live-value-table (I-LVT). [...] Two specific I-LVT implementations are proposed and evaluated: binary and thermometer coding. The I-LVT approach is especially suitable for deep memories because the table is implemented only in SRAM cells."
- Efficient Designs of Multiported Memory on FPGA
Bo-Cheng Charles Lai, Jiun-Liang Lin
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Volume: 25, Issue 1, January 2017
"This paper first introduces a brand new perspective and a more efficient way of using a conventional two reads one write (2R1W) memory as a 2R1W/4R memory. By exploiting the 2R1W/4R as the building block, this paper introduces a hierarchical design of 4R1W memory that requires 25% fewer BRAMs than the previous approach of duplicating the 2R1W module."
2016
- A Multi-ported Memory Compiler Utilizing True Dual-Port BRAMs
Ameer M. S. Abdelhadi, Guy G. F. Lemieux
IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), August 2016, Washington, DC, USA
"In this paper, we instead reduce area by optimizing the design of the "data banks" portion. The optimization is embedded into a memory compiler that solves a set cover problem. When the set cover problem is solved optimally, the data banks use minimum area. Our technique applies to multi-ported RAMs that have a structural pattern we describe as "switched ports". Switched ports are a generalization of true ports, where a certain number of write ports can be dynamically switched into a possibly different number of read ports using one common read/write control signal."
- A shared memory, parameterized and configurable in FPGA, for use in multiprocessor systems
Elio Strollo, Alessandro Trifiletti
23rd International Conference Mixed Design of Integrated Circuits and Systems (MIXDES), June 2016, Lodz, Poland
Describes multiported shared memories as semaphores to coordinate shared exclusive access to multiple other memories by multiple processors.
2015
- Register Port Prediction for a Banked Register File
Hiroaki Kawashima, Takahiro Sasaki, Yuki Fukazawa, Toshio Kondo
Third International Symposium on Computing and Networking (CANDAR), December 2015, Sapporo, Japan
"In order to reduce performance degradation caused by bank conflict, this paper proposes [a] register write-back port prediction mechanism."
- OpenCL computing on FPGA using multiported shared memory
Tahsin Turker Mutlugun, Sheng-De Wang
25th International Conference on Field Programmable Logic and Applications (FPL), October 2015, London, UK
"In this work, conflict-free multiported memories are used to minimize local memory access latency. Experiments show that multiported memories can successfully increase computation speed and reduce the required parallel workload for maximum throughput to practical amounts."
- Design and implementation of area efficient multi-ported memories with write conflict resolution
Akshata Anil Muddebihal, Carla Purdy
IEEE 58th International Midwest Symposium on Circuits and Systems (MWSCAS), August 2015, Fort Collins, CO, USA
"[...] we describe modifications to LVT enabling us to construct MPMs which are superior with respect to area of both block RAMS (BRAMs) and logic elements (LEs). As an example, we build a 32-bit wide, 256-bit deep 4W/8R memory containing 43% fewer LEs and 75% fewer BRAMS, compared to an MPM based on the original LVT techniques. We also study the effect of write port to read port ratios on performance and we propose a hardware mechanism for write conflict detection and resolution."
- BRAM efficient multi-ported memory on FPGA
Jiun-Liang Lin, Bo-Cheng Charles Lai
VLSI Design, Automation and Test(VLSI-DAT), April 2015
"This paper proposes a BRAM efficient scheme on increasing read ports and write ports. When compared with previous works, the proposed multi-ported memory can reduce up to 53% requirement on BRAMs with only minor frequency degradation."
- A Scalable Unsegmented Multiport Memory for FPGA-Based Systems
Kevin R. Townsend, Osama G. Attia, Phillip H. Jones, and Joseph Zambreno
International Journal of Reconfigurable Computing, Volume 2015, Article ID 826283, 12 pages
"In this paper, we present two approaches for designing multiport memory cores that are suitable for reconfigurable accelerators with substantial on-chip memory or complex communication. Our design approaches tackle these challenges by banking RAM blocks and utilizing interconnect networks which allows scaling without sacrificing logic resources."
2014
- Composing Multi-Ported Memories on FPGAs
Charles Eric LaForest, Zimo Li, Tristan O'Rourke, Ming G. Liu, J. Gregory Steffan
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 7, Issue 3, Article 16, September 2014. (ACM page)
Updates the LVT and XOR papers, adds Xilinx implementation results, as well as more exploration of bi-directional LVT-based multi-ported memories.
- Application specific multi-port memory customization in FPGAs
Gorker Alp Malazgirt, Hasan Erdem Yantir, Arda Yurdakul, Smail Niar
24th International Conference on Field Programmable Logic and Applications (FPL), October 2014, Munich, Germany
"In this paper, we present a fully automated methodology that tailors our multi-port memory from a given application. We present our performance improvements and area tradeoffs on state-of-the-art string matching algorithms."
- Application specific multi-port memory customization in FPGAs
Gorker Alp Malazgirt, Hasan Erdem Yantir, Arda Yurdakul, Smail Niar
IEEE International Conference on Field Programmable Logic and Applications (FPL), September 2014
Presents a fully automated methodology to tailor multi-port memories for a given application.
- An Efficient Heterogeneous Register File Implementation for FPGAs
Hasan Erdem Yantir, Arda Yurdakul
IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), May 2014
Explores heterogeneous, multi-pumped multi-ported memory implementations.
- Area Efficient Multi-Ported Memories with Write Conflict Resolution
Akshata Muddebihal, MS Thesis, University of Cincinnati, April 2014
Combines pure-multipumped memory banks with LVT to save logic and SRAM area, explores varying read/write port ratios, and adds conflict detection and resolution methods (a phantom bank, and port priority).
- Modular Multi-ported SRAM-based Memories
Ameer M.S. Abdelhadi and
Guy G.F. Lemieux
ACM International Symposium on Field-Programmable Gate Arrays (FPGA), February 2014, Monterey, CA. (ACM Page)
Generalizes LVT and XOR designs into an invalidation-based Live Value Table (I-LVT), using either XOR cancellation or a novel thermometer one-hot coding, yielding smaller and faster designs. Also formally characterizes the resource usage of various multi-ported memory designs.
- A Systematic Approach for Register File Design in FPGAs
Hasan Erdem Yantir's M.Sc. Thesis
Explores the design space of heterogeneous register files on FPGAs. See the two papers above too.
2013
2012
- Impact of Cache Architecture and Interface on Performance and Area of FPGA-Based Processor/Parallel-Accelerator Systems
Jongsok Choi, Kevin Nam, Andrew Canis, Jason Anderson, Stephen Brown, and Tomasz Czajkowski
IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 17-24, Toronto, Canada, April 2012.
Introduces multi-pumped and LVT-based multi-ported memories with bi-directional read/write ports. They use these to implement processor caches.
- Multi-Ported Memories for FPGAs via XOR
Charles Eric LaForest, Ming G. Liu, Emma Rae Rapati, and J. Gregory Steffan
ACM International Symposium on Field-Programmable Gate Arrays (FPGA), February 2012, Monterey, CA. (ACM Page)
Uses XOR cancellation to reconstruct the most-recently written data amongst multiple memory banks. Another form of true multi-ported memory.
- Algorithmic Memory Brings an Order of Magnitude Performance Increase to Next Generation SoC Memories
Sundar Iyer, Da Chuang (Memoir Systems)
DesignCon, January 2012, San Francisco, CA
Whitepaper motivating the applications enabled by ASIC "algorithmic memory", which is their name for multiported memory. Memories with more ports are constructed hierarchically from simpler ones. (Also online here)
See 2010 US patent below.
2011
- CRAM: Coded Registers for Amplified Multiporting
Vignyan Reddy Kothinti Naresh, David J. Palframan, Mikko H. Lipasti
IEEE/ACM International Symposium on Microarchitecture (MICRO-44), 2011
Not an FPGA implementation, but shows an alternative form of XOR coding for multi-ported register files in superscalar processors.
2010
- A Multiported Register File with Register Renaming for Configurable Softcore VLIW Processors
Fakhar Anjam, Stephan Wong, and Faisal Nadeem
A VLIW Softcore Processor with Dynamically Adjustable Issue-slots
Fakhar Anjam, Muhammad Nadeem, and Stephan Wong
IEEE International Conference on Field-Programmable Technology (FPT), December 2010, Beijing.
Two papers showing how to use register-renaming to avoid needing an LVT, saving power and area, and a modified LVT multi-ported memory configurable as a single 4w8r or dual 2w4r memories.
- High speed memory systems and methods for designing hierarchical memory systems
Sundar Iyer, Shang-Tse Chuang (Memoir Systems)
US Patent US9442846B2, filed August 2010, now assigned to Cisco Technology Inc.
Describes LVT (with conflict detection) and XOR-based multiported memories for ASICs, and the hierarchical construction of memories with more ports.
(This work independently discovered multiported memories, naming them instead "algorithmic memory". See 2012 DesignCon whitepaper above.)
- Efficient Multi-Ported Memories for FPGAs
Charles Eric LaForest and J. Gregory Steffan
ACM International Symposium on Field-Programmable Gate Arrays (FPGA), February 2010, Monterey, CA. (ACM Page)
Introduces the Live Value Table (LVT), which coordinates multiple banks of memory into a true multi-ported memory.
2009
- Advanced Synthesis Cookbook
Altera's July 2009 (now updated to 2011) Advanced Synthesis Cookbook describes a 1-bit flag to get a quad-port memory out of dual-port memories. See page 60.
2008
- An Intelligent Multi-Port Memory
Zuo Wang, Qi Zuo, Jiaxing Li
International Symposium on Intelligent Information Technology Application Workshops, Shanghai, December 2008
"With the help of port-priority and r/w-priority, our multi-port memory can resolve both write-write conflict and read-write conflict. When write-write conflict occurs, the port with the highest priority can execute its write operation; when read-write conflict occurs, the multi-port memory intelligently chooses read-then-write or write-then-read according to r/w-priority."
- Area Optimisation for Field-Programmable Gate Arrays in SystemC Hardware Compilation
Johan Ditmar, Steve McKeever, and Alex Wilson
Hindawi International Journal of Reconfigurable Computing, Volume 2008, Article ID 674340, 14 pages
While this article does not address the implementation of multiported memories, it presents an interesting optimization pass to determine, given multiple concurrent access patterns from functional units, what combination of read, write, and read-write ports are necessary, and thus how much of the accesses can be placed in FPGA BRAMs rather than register logic. Nowadays, the port requirements could drive the architecture selection of a true multiported memory implementation.
2007
- Multi-port Memory Design Methodology Based on Block Read and Write
Weixing Ji, Feng Shi, Baojun Qiao, Hong Song
IEEE International Conference on Control and Automation, November 2007, Guangzhou, China
"Multi-port memory design methodology based on block read/write is proposed in this paper. This new multi-port memory is constructed using 1-port memory banks and features parallel read/write access with low port access rejection probability. In comparison with conventional implementation of multi-port memory based on 1-port memory banks, the number of necessary 1-port memory banks is greatly reduced. Moreover, the complexity of switching network and arbitration circuits are also simplified."
- A Configurable Multi-Ported Register File Architecture for Soft Processor Cores
Mazen A. R. Saghir and Rawan Naous
International Workshop on Applied Reconfigurable Computing (ARC 2007), pp. 14-25, Springer-Verlag LNCS 4419, March 27-29, 2007.
Early research on banked and replicated multi-ported memories.
2004
- Distributed crossbar architecture for area-efficient combined data/instruction caches with multiple ports
K. Johguchi, Z. Zhu, T. Hirakawa, T. Koide, T. Hironaka, H.J. Mattausch
IET Electronics Letters, Volume 40, Issue 3, February 2004
"A proposal to improve the low access bandwidth of conventional one-port caches by utilising a multi-bank structure with distributed crossbar to increase port number at small additional area cost is presented. This enables combination of data and instruction caches into a single multi-port cache as well as different wordlength for each port. Through dynamically scheduling the storage space used for data and instructions, 25% smaller storage capacity is sufficient for a given maximum cache-miss probability."
- Distributed against centralised crossbar function for realising bank-based multiport memories
S. Fukae, T. Inoue, H.J. Mattausch, T. Koide, T. Hironaka
IET Electronics Letters, Volume 40, Issue 2, January 2004
"The two extreme functionally equivalent possibilities of distributed or centralised crossbar function for realising a bank-based multiport memory are compared on the basis of required global signal and transistor numbers. With respect to both measures, the distributed crossbar function is found to be the better choice for practically relevant multiport-memory configurations."
2003
- A novel hierarchical multi-port cache
Z. Zhu, K. Johguchi, H.J. Mattausch, T. Koide, T. Hirakawa, T. Hironaka
ESSCIRC 2004 - 29th European Solid-State Circuits Conference (IEEE Cat. No.03EX705), September 2003, Estoril, Portugal, Portugal
"A novel hierarchical multi-port cache is described in this paper, which implements the hierarchical multi-port memory architecture (HMA) based on 1-port banks."
- Design issues for prototype implementation of a pipelined superscalar processor in programmable logic
Naraig Manjikian
IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM), volume 1, pages 155-158 vol.1, Aug. 2003.
Multipumps memories by performing reads and writes on consecutive rising and falling clock edges within a processor cycle.
- Implementing Multi-Port Memories in ProASICPLUS Devices
Actel's July 2003 Application Note AC176 describes a multi-pumped memory block to get dual or quad ports.
2002
2001
- Compact central arbiters for memories with multiple read/write ports
N. Omori, H.J. Mattausch
IET Electronics Letters, Volume 37, Issue 13, June 2001
"Fast and compact central arbiter circuits for detection and regulation of access conflicts in memories with multiple ports are proposed. A layout study in 0.5 /spl mu/m, 2 metal CMOS technology verifies that area-overhead and access time penalty are small up to 32 ports."
1999
- Fast quadratic increase of multiport-storage-cell area with port number
Y. Tatsumi, H.J. Mattausch
IET Electronics Letters, Volume 35, Issue 25, December 1999
"It is shown that the space required for wordline/bitline routing leads to a quadratic multiport-storage-cell area increase with port number N, dominating from as little as N=2, N=6 for small ROM and large SRAM cell types, respectively. Larger N results in enormous area increases (e.g. by a factor of 80 for a 32-port SRAM)I making conventional multiport memories unacceptable for most practical applications."
- Area-efficient multiport memories for the Tb/s bandwidth era
H.J. Mattausch, Y. Tatsumi, K. Kishi, T. Gyoten, K. Yamada
Proceedings of the 25th European Solid-State Circuits Conference, September 1999, Duisburg, Germany
Proposes hierarchical arrangements of 1-port SRAM cells to compose multi-ported SRAM memories.
1998
- The Myriad Uses of Block RAM
Jan Gray
www.fpgacpu.org, October 1998
Large list of the many potential applications of the then new FPGA Block RAMs. Predicts many forms of multi-ported memory (multi-pumped, banked, replicated, LVT).
fpgacpu.ca