top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

hypothetical question about data storage using MySql

0 votes
340 views

We run an VERY io intensive file application service. Currently, our problem is that our disk spindles are being completely killed due to insufficient SEEK time on the hard drives (NOT physical read/write speeds).

We have an directory structure where the files are stored based on the MD5 checksum of the file name, i.e. /0/00/000/**********fce5cf3497f87de1d060 The majority of these files, are between 256K and 800K with the ODD exception (say less than 15%) being more than 1M but no more than 5M in size. The content of the files are pure text (MIME Encoded).

We believe that storing these files into an InnoDB table, may actually give us better performance:
- There is one large file that is being read/written, instead of BILLIONS of small files
- We can split the structure so that each directory (4096 in total) sit's on their own database
- We can move the databases as load increases, which means that we can potentially run 2 physical database servers, each with 2048 databases each)
- It's easy to move / migrate the data due to mysql and replication - same can be said for redundancy of the data

We are more than likely looking at BLOB columns of course, and we need to read/write from the DB in excess of 100mbit/s

Would the experts consider something like this as being feasible? Is it worth it to go down this avenue, or are we just going to run into different problems? If we are facing different problems, what can we possibly expect to go wrong here?

posted Jul 25, 2013 by Seema Siddique

Share this question
Facebook Share Button Twitter Share Button LinkedIn Share Button

2 Answers

+1 vote

Sorry but mysql is not the address of it , use riak instead of mysql With riak which is key and value based , all keys are on memory and just only one seek enough to handle it. Consider to use riak

answer Jul 25, 2013 by anonymous
+1 vote

I'm afraid that this is not what databases are for, and the first thing you'll likely run into is amount of concurrent connections.

This is typically something you should really tackle from a systems perspective. Seek times are dramatically improved on SSD or similar storage - think FusionIO cards, but there's also a couple of vendors (Violin comes to mind) who provide full-blown SSD SANs.

If you prefer staying with spinning disks, you could still improve the seeks by focusing on the inner cylinders and potentially by using variable sector formatting. Again, there's SANs that do this for you.

Another minor trick is to turn off access timestamp updates when you mount the filesystem (noatime).

Also benchmark different filesystems, there's major differences between them. I've heard XFS being recommended, but I've never needed to benchmark for seek times myself. We're using IBM's commercial GPFS here, which is good with enormous amounts of huge files (media farm here), not sure how it'd fare with smaller files.

answer Jul 26, 2013 by Amit Parthsarthi
Similar Questions
+1 vote

On two different occasions and on two different servers I have found server debug information written out to the mysqld.log file.

Now I know that by issuing the kill -1 mysql-pid will write that information to the mysqld.log file, and by using mysqladmin debug will also write debug information to the mysqld.log file.

I am wondering what other methods would write the debug info to the mysqld.log file. On the two servers in question my fellow DBA's nor myself did nothing that would have written debug info.

Are there any internal settings or processes that would write debug information to the mysqld.log file? Could a client tool do something like this? We have reviewed the app privileges and the they have no ability to write debug info.

...