Every file on a computer uses a certain amount of resources when sent over the internet or stored. Keeping mind of your kilobytes (kB) and megabytes (MB) can prevent problems and produce a smoother online experience. This GreenNet guide is here to help you tell the whales from the minnows.
Computer resources do have physical limits to their capacities, even if the idea of computer resources can be scaled up indefinitely. So we really want to think of the sizes of files in a tidy, minimalist way and thereby make the most of the resources we already have. Although most people nowadays seem to have internet connections which cope easily with audio, video and high-resolution images, it is worth remembering that many people do not. If care is not taken, it is possible to produce a large media file that actually conveys no more information to people than a file a tenth or a hundredth of the size.
Software packages that consume excessive memory and disk space for their function are sometimes called "bloatware", and one could apply a similar aesthetic to media files. For instance, making transcripts available on a web site might help people to find the information they are looking for more quickly than having audio or video interviews alone. Similarly, you might want to consider whether it's easier for people, including those with visual impairments, to read the date and time of an event from a text email, or to have to open a large PDF or image file of a poster. (By the way, the Microsoft term "document" for files never really caught on. The two words are synonymous in this context.)
So how big is too big? Obviously, it depends on the context. If you are signing off on a report that is intended to go to the printers, then emailing a 10MB PDF attachment to a few people asking for final comments is completely reasonable. What would be unreasonable is then to email the finished 10MB file to your list of 2000 supporters. Instead, you could create a lower-resolution or even text-only version of the PDF, put that on your website, and email a link to the file, perhaps with a little indicator of the file size (like "[1.2 MB PDF]") next to the download link.
Why worry about file size when it only takes someone on high-speed broadband 15 seconds to download a 10MB file?
Although the download might take 15 seconds for some people (eg GreenNet ADSL2+ broadband offering speeds "up to" 12Mbps), 10% of household internet connections in the UK as at 2009 are still dial-up, higher in many other countries. A 10MB download on dial-up might take nearly an hour. And older broadband connections or in rural areas the download speed might be 512kbps and the transfer still takes several minutes. Even on the fastest broadband, uploading is often limited to 256kbps, so if you expect a 10MB file to be retransmitted, that is likely to be slower than expected.
A large file on its own may be no problem, but when multiplied by the size of the audience it can cause bandwidth problems that affect internet service providers and other users. Transmission also consumes a greater amount of energy, and it may result in having to upgrade hardware (up to 80% of energy over the lifetime of computer equipment is "embodied", that is, in its manufacture). GreenNet doesn't limit bandwidth, but it is subject to a "fair use" policy.
Then there's the backup. If someone intends to keep the document or image or archives all email, it might be replicated on backup media many times over. People may also be reluctant to keep files that consume more storage than they are worth, and so delete them.
It's still 15 seconds, even if it's a background download. Some of us get impatient waiting for the computer for more than half a second.
What do each of the units of computer storage actually mean?
In short, the "kilo-", "mega-", "giga-" and "tera-" prefixes are similar to their use in any other unit of measurement, like metres or watts:
- 1 B = 1 byte;
- 1 kB = 1000 bytes;
- 1 MB = 1000 kB;
- 1 GB = 1000 MB or 1 000 000 000 bytes.
(To confuse matters, "1 KB" or "1K" is used by many computer people to mean 1024 bytes, which is a convenient number in binary, and memory or disk is often allocated by operating systems in units of 1024. To avoid this confusion with standard scientific usage of "mega-" and so on, the terms "kibibyte" (KiB), "mebibyte" (MiB), "gibibyte" (GiB) and "tebibyte" are now recommended for these non-decimal technical units. You might still feel short-changed if you bought a 4GB flash drive and it's only 3.725GiB. For simplicity this article will stick to round 1000s and kilobytes [kB].)
How do I see file sizes?
File or attachment size is usually easily accessible, if not already prominent. In Windows, right-clicking on any file, folder, or drive and choosing "Properties..." will show the size. In an Explorer window, you can select "Details" from the "View" menu; or in a file open or save dialogue box there is a "View" button from which you can also choose "Details". If you then click on the word "Size" at the top of the column, you can group together the largest files in a folder. In Mac OS X, you can press Command+i to show details of an individual file, or Command+Option+i to show details of all selected items in an Inspector window. The Mac equivalent of Details view is "List" view, and Command+J gives you the option to "calculate all sizes" of folders as well as files.
Most email programs such as Windows Mail or Thunderbird always show the size of attachments next to the file name. In Thunderbird (and many other programs) you can click on the columns button up the top right of a list to add a column showing the size of each item. FTP programs, used to transfer files to websites, almost all show the size of files by default, although usually in bytes, so you need to split these large numbers by eye into groups of three digits to see which are measured in B or kB and which in MB.
Table of approximate file sizes
|bytes||in units||typical meaning|
|1||1 byte or 8 bits||A single keystroke or (non-accented) character; a number from 0 to 255|
|70||70 B||A line of text|
|1,000||1 kB||Half a page of unformatted text; a very short email;|
an icon or small button image
|8,000||8 kB||Typical size of an organisation's logo as you might want it on a web page|
(about 200 x 200 pixels PNG or GIF)
|30,000||30 kB||A 5-page word-processor document; a typical HTML web page; traditionally, the maximum recommended size for an image on a web page (maybe 640 x 480 pixels JPEG)|
|100,000||100 kB||The maximum recommended total of all the elements on a single web page, including images and HTML (some authorities say 30 or 40 kB instead)|
|500,000||500 kB||A 5-page word-processor document including a badly-chosen letterhead or logo image;|
a reasonable size for a PDF document someone might choose to download;
two 1280x960 JPEG photos from a smartphone, too large for inline use in a web page
|1,000,000||1 MB||1 minute of near-CD quality audio as MP3 or OGG;|
A 2048x1536 (4 megapixel) JPEG photo from a smartphone or digital camera, even if blurry because of low light;
the complete comedies and tragedies of Shakespeare when compressed using bzip2
A three-minute MP3 audio at a very high bitrate (256kpbs);
|10,000,000||10 MB||Maximum size of an email that you can expect all recipients to receive|
|25,000,000||25 MB||Maximum size of an email attachment received by GreenNet or GMail (as of 2010);|
approximate size of the 26-volume 1911 edition of Encyclopaedia Britannica
|100,000,000||100 MB||Uncompressed TIFF of a single A4 sheet at 600dpi. Note that this may be 100,000 times the size of the equivalent plain text.|
The kind of mailbox size or .PST file size at which corruption becomes more likely
|700,000,000||700 MB||Maximum amount of data on one CD-ROM; a two-hour TV programme downloaded from BBC iPlayer|
|4,000,000,000||4 GB||Amount of data on a DVD-ROM or typical new USB flash drive ("memory stick")|
Maximum amount of RAM (working memory) a 32-bit processor can use directly.
|100,000,000,000||100 GB||Typical hard drive size on a computer as of 2009 (doubles about every 2 years)|
|2,000,000,000,000||2 TB||Large external backup hard drive as of 2010|
As you may gather, one of the main factors in determining how cumbersome a file is is the quality or resolution of images. A 300 dpi (dots or pixels per inch) image added to a word-processor or PDF file takes up about four times as much space as a 150 dpi image (because the resolution applies both horizontally and vertically). Now, if you need to share an image with someone online either on a website or by email, and you're not expecting them to print it out, nor to expect perfect copy or zoom in to examine minute detail, then it's only going to be shown on the screen. So it's worth knowing a bit about screen resolutions. A typical flat-panel screen is 1280 pixels wide. However, some may be smaller or lower resolution, and allowing for navigation bars and margins on the side of a screen, and also that a visitor's web browser might not occupy the full size of the screen, there's probably little point in uploading an image that is wider than 800 pixels. Anything larger and the the viewer may only see the top left-hand corner of the image and have to scroll to see the rest.
Scans or digital photos may be 20 times that size and yet appear no sharper to the recipient. So if you have such an image, you will need to resize it or scale down before you upload or publish. A common mistake when creating a web page is to try to resize the image on the page by changing the image element properties. Some content-management systems, such as Drupal, may include an image module that automatically creates a scaled copy of the image at the size you specify, but if you're editing pages in a web authoring program like Dreamweaver or KompoZer, the chances are you're forcing every web site visitor to download far too much information and then make their computer work quite hard at doing the downscaling. So it's best to try to keep photographic images, even banners, to no more than 800 pixels across and perhaps no more than 50 KB. Any image-editing software, such as the open source GIMP, allows you to easily produce a smaller file. Simply open the large file, choose an "image size" or "scale image" function, select the width you want, remembering that 800px is often full-width, and save in an appropriate file format.
The other thing to be aware of with images is the different advantages of the different kinds of compression and file format. As mentioned above, JPEG files (also called .jpg files because Windows was once limited to 3-character extensions) are most commonly used for photography, and JPEG is the format used by almost all digital cameras. They store a full range of colours but do lose a certain amount of fine detail; there is a balance between the file size and the acceptable amount of distortion. A highly compressed JPEG may show a Fourier fringe effect, but most people won't notice it. Mostly you will want a mid-range JPEG quality around 50 (out of 100). The other main formats used on the web are PNG or the older GIF, and these are "lossless" formats that are not suitable for photographs or full-colour scans of artwork. However, for images such as line drawings or logos that have been created on a computer in the first place, choosing PNG allows areas of flat colour to be compressed very efficiently and maintain the sharp edges of a design that JPEG would lose. PNG also tends to be used for smaller images, as for larger images the size reduction from using JPEG is much more important. The following images illustrate the reason JPG is not used for small files with only a handful of colours:
|Close-up of GreenNet logo as PNG|
(aliased slightly, but with "hard edges")
|Close-up of GreenNet logo with JPEG quality of 20|
So, in other words, for internet usage:
- use PNG (or GIF) for buttons, line-art, diagrams, most logos with sharp edges, and maybe completely black-and-white things like scanned text;
- scale down if wider than 800px
- convert to indexed colour, and choose an adaptive palette if offered, with the smallest number of colours that appear (64 is often plenty)
- use maximum compression
- for large, intricate diagrams and line-art, you might consider the new SVG (scalable vector graphics) format, which is supported by Firefox 2 and Internet Explorer 9 and later.
- use JPEG for all other images: photographic or partly-photographic images, full-colour scans, images with gradients
- crop to what is relevant, and/or suitably scale down so not wider than the number of pixels you expect it to take up on the screen.
- select any option to optimise, and use a modest quality parameter (<60)
- do not use BMP, TIFF or other formats
Other things that you might like to know
When you attach a file to an email, it will usually be converted to ("base 64") text, which can only represent 6 bits per character. This means a 1 MB file will produce an email of about 1.37 MB (including some additional overhead, the ratio works out at 26:19, 26 bytes of email for every 19 bytes of attachment).
Data transfer speeds can be measured in bits (usually for the rating of the connection itself) or bytes (more commonly for actual download or upload speeds, and shown with a capital "B"). The conversion factor is usually 8 bits to 1 byte (excluding now-rare parity or stop bits). So an old dial-up modem might upload and download at 32kbps, but that is only 4 kBps or 4000 bytes per second. A broadband/DSL connection rated at 8 megabits per second (Mbps) actually only means an absoute maximum of 1MB/s, and a 100MB software package (like OpenOffice) will take at least 100 seconds to download, very possibly longer.
To summarise, thinking early on about making a file of a size that is easy to transmit and convenient for the recipient can save lots of people lots of time and storage later on.