Load testing repositories
20 January, 2009
One of the issues that has worried me about moving from repositories of e-prints to repositories of data is the increased challenges of scale. Scale could be vastly different for data repositories in several dimensions, including
Stuart reports
Like all such, it's an artificial test, but it does give encouragement that DSpace could scale to handle a data repository for some tasks. I don't know if other issues would be show-stoppers or not, for something like a lab repository, but most of the scale issues seem OK.
- rate of deposit
- numbers of objects
- size of objects
- rate of access
- rate of change to existing objects...
Stuart reports
- "As expected, the more items that were in the repository, the longer an average deposit took to complete.
- On average deposits into an empty repository took about one and a half seconds
- On average deposits into a repository with three hundred thousand items took about seven seconds
- If this linear looking relationship between number of deposits and speed of deposit were to continue at the same rate, an average deposit into a repository containing one million items would take about 19 to 20 seconds.
- Extrapolate this to work out throughput per day, and that is about 10MB deposited every 20 seconds, 30MB per minute, or 43GB of data per day.
- The ROAD project proposal suggested we wanted to deposit about 2Gb of data per day, which is therefore easily possible.
- If we extrapolate this further, then DSpace could theoretically hold 4 to 5 million items, and still accept 2B of data per day deposited via SWORD."
Like all such, it's an artificial test, but it does give encouragement that DSpace could scale to handle a data repository for some tasks. I don't know if other issues would be show-stoppers or not, for something like a lab repository, but most of the scale issues seem OK.
- Home
- Digital curation
- About us
- News
- Events
- Resources
- Briefing Papers
- Introduction to Curation
- Annotation
- Appraisal and Selection
- Curating Emails
- Curating e-Science Data
- Curating Geospatial Data
- Data Accreditation
- Data Citation and Linking
- Data Protection
- Database Archiving
- Digital Repositories
- Freedom of Information
- Genre Classification
- Interoperability
- Persistent Identifiers
- Trust Through Self Audit
- Using OAIS for Curation
- Web 2.0
- What is Digital Curation?
- Making the Case for RDM
- Research Data Readiness
- Legal Watch Papers
- Standards Watch Papers
- Technology Watch Papers
- Introduction to Curation
- How-to Guides
- Curation Reference Manual
- Peer review
- Editorial Board
- Completed chapters
- Appraisal and Selection
- Archival Metadata
- Archiving Web Resources
- Curating Emails
- File Formats
- Investment in an Intangible Asset
- Learning Object Metadata
- Metadata
- Ontologies
- Open Source for Digital Curation
- Preservation Metadata
- Preservation Strategies
- Principles for Enabling Access to Engineering Design Information Through Life
- Chapters in production
- Curation Lifecycle Model
- Policy and legal
- Data Management Plans
- Tools
- Case studies
- Repository audit and assessment
- Standards
- Publications and presentations
- Roles
- Curation journals
- Informatics research
- External resources
- Briefing Papers
- Training
- Projects
- Community

Comments
There are some performance and scalability tests o...
http://fedora.fiz-karlsruhe.de/docs/