Introduction
Recently one of the reader of DotNetBips.com posted a question on the discussion forums about compressing and decompressing files via code. There is no obvious answer to this question because C# and VB.NET lack this functionality. However, J# does have a way to zip and unzip files programmatically. In this article I am going to explain how this can be achieved. In this article you will develop a reusable class library that can be used to create, extract and alter ZIP files. Once developed you can use it in your Windows or web applications.
Background
In some applications it is needed that files (they may include documents, XML files or any other type of files) be compressed on the fly and ZIP file be created. For example, a typical requirement is that users should be able to select files and selected files should be downloaded as a single ZIP file on client machine. There is no out of the box solution for this requirement in C# and VB.NET. Developers often turn to the following alternatives:
- Use a third party component
- Use some open source component
- Implement ZIP algorithm manually
The first option requires extra licensing cost and many times developers are reluctant to use third party black box components in their applications. The second option is certainly attractive as you get the complete source code of the component. However, licensing, bugs (if any), upgrades and support are still a big issue there. Finally, the last option is difficult and requires reasonable efforts from developer's end.
Luckily, J# (which is a part of overall .NET infrastructure) provides a handy way to compress and decompress files via code. The advantages of using J# compression features are:
- J# is a part of overall .NET infrastructure
- As J# is provided by Microsoft future upgrades and support is assured
- No need to use any third party component
Considering this it makes sense to use J# features to compress and decompress files programmatically and that is what I am going to illustrate.
Creating a class library
We will create a C# class library that will internally consume J# classes for compressing and decompressing files. This way once the library is developed any C# or VB.NET developer can consume it. To begin with create a new class library project. Add a class to it called ZipFileHelper. To use J# compression classes you must refer vjslib.dll assembly. The following figure shows the Add Reference dialog of Visual Studio with this assembly selected.
Once the reference is added to vjslib.dll, you also need to import the following namespaces:
- java.util;
- java.util.zip;
- java.io;
The java.util namespace contains some utility classes. The java.util.zip namespace is the core namespace that contains classes related to ZIP file creation. Finally, the java.io namespace provides some classes related to file IO. The classes that we will use from the above namespaces are :
- ZipFile
- ZipEntry
- InputStream
- OutputStream
- FileInputStream
- FileOutputStream
- ZipOutputStream
- Enumeration
The ZipFile class is a programmatic representation of a ZIP file. A ZipFile contains zero or more ZipEntry objects and actual content of the zipped files. Each entry is nothing but metadata about a zipped file.
The InputStream, OutputStream, FileInputStream and FileOutputStream classes represent streams pointing to in-memory and file based streams respectively.
The ZipOutputStream class represents a writable stream pointing to a ZIP file. This stream can be used to write ZipEntry objects and content to the ZIP file.
Finally, the Enumeration class is J# way to represent collections.
Creating ZIP files
Before we actually write code to create or extract ZIP files let's create some helper methods that we need later. We need to create the following helper methods:
- GetZippedItems()
- CopyEntries() (two overloads)
- CopyStream()
- AddEntries()
- RemoveEntries()
Obtaining a list of items inside a ZIP file
The GetZippedItems() method returns a generic List of ZipEntry objects from a ZipFile. The GetZippedItems() method is shown below:
private static List<ZipEntry> GetZippedItems(ZipFile file)
{
List<ZipEntry> entries = new List<ZipEntry>();
Enumeration e = file.entries();
while (true)
{
if (e.hasMoreElements())
{
ZipEntry entry = (ZipEntry)e.nextElement();
entries.Add(entry);
}
else
{
break;
}
}
return entries;
}
The GetZippedItems() method accepts a ZipFile object and returns a generic List of ZipEntry objects. The method creates a generic collection of ZipEntry type. It then calls entries() method of ZipFile class to return an Enumeration of ZipEntry objects. The code then iterates through the enumeration and populates the List. Finally, the populated List is returned to the caller.
Copying streams
While adding or removing files from an existing ZIP file we need to copy contents of constituent files from source to destination streams. Hence, we need a helper method called CopyStreams() to do that job. The CopyStreams() method is shown below:
private static void CopyStream(InputStream source,
OutputStream destination)
{
sbyte[] buffer = new sbyte[8000];
int data;
while (true)
{
try
{
data = source.read(buffer, 0, buffer.Length);
if (data > 0)
{
destination.write(buffer, 0, data);
}
else
{
return;
}
}
catch (Exception ex)
{
string msg = ex.Message;
}
}
}
The CopyStream() method accepts source and destination streams in the form of InputStream and OutputStream objects respectively. It then reads the source stream using read() method. The read() method reads data in chunks of 8000 sbytes (signed integer) and writes it to the destination stream using write() method of OutputStream class.
Copying ZipEntry objects
The J# compression classes do not allow you to add or remove files from an existing ZIP file. The only way to add or remove files from an existing ZIP file is to create a new ZIP file with required items and then replace original ZIP file with this newly created ZIP file. Hence, we need a helper method that copies ZipEntry objects from one ZIP file into the other. CopyEntries() is such a method. The CopyEntries() method has two overloads as shown below:
private static void CopyEntries(ZipFile source,
ZipOutputStream destination)
{
List<ZipEntry> entries = GetZippedItems(source);
foreach (ZipEntry entry in entries)
{
destination.putNextEntry(entry);
InputStream s = source.getInputStream(entry);
CopyStream(s, destination);
destination.closeEntry();
s.close();
}
}
private static void CopyEntries(ZipFile source,
ZipOutputStream destination,string[] entryNames)
{
List<ZipEntry> entries = GetZippedItems(source);
for(int i=0;i<entryNames.Length;i++)
{
foreach (ZipEntry entry in entries)
{
if (entry.getName() == entryNames[i])
{
destination.putNextEntry(entry);
InputStream s = from.getInputStream(entry);
CopyStream(s, destination);
destination.closeEntry();
s.close();
}
}
}
}
The first overload of CopyEntries() method accepts two parameters. The first parameter is the source ZipFile from which entries are to be copied. The second parameter is the target ZipOutputStream to which the entries are to be written.
The second overload of CopyEntries() method is intended to copy only certain entries and accepts three parameters. The significance of the first two parameters is the same as before. The third parameter is an array of entry names that are to be copied to the destination ZipOutputStream.
Both the overloads of CopyEntries() method essentially retrieve a List of ZipEntries using GetZippedItems() helper method. The entries are then transferred to the ZipOutputStream. The putNextEntry() method of ZipOutputStream class accepts a ZipEntry to be added to the ZIP file and writes it to the ZIP file. The getInputStream() method of ZipFile class accepts a ZipEntry and returns an InputStream pointing to that entry. This stream is used by CopyStream() helper method for reading the data from that entry. Remember that ZipEntry simply provides metadata about an entry whereas the stream obtained from getInputStream() method provides the actual content of the file. Finally, closeEntry() method of ZipOutputStream class is called to finish writing the entry.
Adding entries to an existing ZIP file
The AddEntries() method adds ZipEntry objects to a ZIP file. The AddEntries() method is shown below:
private static void AddEntries(ZipFile file,string[] newFiles)
{
string fileName = file.getName();
string tempFileName = Path.GetTempFileName();
ZipOutputStream destination = new ZipOutputStream
(new FileOutputStream(tempFileName));
try
{
CopyEntries(file, destination);
if (newFiles != null)
{
foreach (string f in newFiles)
{
ZipEntry z = new ZipEntry(f.Remove
(0,Path.GetPathRoot(f).Length));
z.setMethod(ZipEntry.DEFLATED);
destination.putNextEntry(z);
try
{
FileInputStream s = new FileInputStream(f);
try
{
CopyStream(s, destination);
}
finally
{
s.close();
}
}
finally
{
destination.closeEntry();
}
}
}
}
finally
{
destination.close();
}
file.close();
System.IO.File.Copy(tempFileName, fileName, true);
System.IO.File.Delete(tempFileName);
}
The code retrieves the full path of the ZipFile by calling its getName() method. It also obtains a temporary file name using GetTempFileName() method of System.IO class. You might be wondering as to why we need a temporary file here. The AddEntries() is a helper method that will be called while creating a new ZIP file as well as while adding files to existing ZIP file. The J# compression classes do not allow you to modifying ZIP files directly. Hence, we create a new ZIP file with required items and then delete the old ZIP file. For this temporary ZIP file we need a temporary file name and hence we used the GetTempFileName() method. We then create a new ZipOutputStream object this time pointing to the temporary ZIP file. Then CopyEntries() helper method is called. The CopyEntries() helper method copies entries from specified ZIP file (first parameter) to a ZipOutputStream (second parameter). If you are creating a new ZIP file then CopyEntries() method will not copy any entries. However, if you are adding files to an existing ZIP file then it will copy all the entries from existing ZIP file to the new temporary ZIP file.
Next, a for loop adds all the files to be zipped to the ZipFile. Each zipped file is represented by a class called ZipEntry. The constructor of ZipEntry class accepts the name of the entry. The setMethod() method sets the compression method to DEFLATED. The other possibility is STORED which packages the file in un-compressed format. The newly created ZipEntry is added to the ZipOutputStream using its putNextEntry() method. A ZipEntry merely represents metadata of an entry. You still need to add actual contents of the file into the ZIP file. This is done by CopyStream() helper method.
Removing entries from an existing ZIP file
As opposite to the AddEntries() method, the RemoveEntries() method removes ZipEntry objects from a given ZIP file. The RemoveEntries() method is shown below:
private static void RemoveEntries(ZipFile file, string[] items)
{
string fileName = file.getName();
string tempFileName = Path.GetTempFileName();
ZipOutputStream destination = new ZipOutputStream
(new FileOutputStream(tempFileName));
try
{
List<ZipEntry> allItems = GetZippedItems(file);
List<string> filteredItems = new List<string>();
foreach (ZipEntry entry in allItems)
{
bool found = false;
foreach (string s in items)
{
if (s != entry.getName())
{
found = true;
}
}
if (found)
{
filteredItems.Add(entry.getName());
}
}
CopyEntries(file, destination,filteredItems.ToArray());
}
finally
{
destination.close();
}
file.close();
System.IO.File.Copy(tempFileName, fileName, true);
System.IO.File.Delete(tempFileName);
}
The RemoveEntries() method accepts the ZipFile from which entries are to be removed and an array of entry names to be removed. The code of RemoveEntries() method is very similar to AddEntries() method except that it doesn't copies specified entries. Notice the code mark in bold letters. The code essentially compares list of all the entries and list of the entries to be removed. The difference between these two lists is nothing but a list of entries to be copied. The CopyEntries() method is then called by passing the list of entries to be copied. Recollect that second overload of CopyEntries() is designed for copying only the specified entries.
Creating a new ZIP file
In order to create a new ZIP file we write a static method named CreateZipFile() inside the ZipFileHelper class. The CreateZipFile() method accepts two parameters viz. path and name of the ZIP file to be created and array of file names that are to be zipped. The CreateZipFile() method is shown below:
public static void CreateZipFile(string filename,
string[] items)
{
FileOutputStream fout = new FileOutputStream(filename);
ZipOutputStream zout = new ZipOutputStream(fout);
zout.close();
ZipFile zipfile = new ZipFile(filename);
AddEntries(zipfile, items);
}
The code creates an instance of FileOutputStream class. The FileOutputStream class represents a stream capable of writing to a file. The constructor of FileOutputStream class accepts the path of the file to which we wish to write. This FileOutputStream instance is then supplied to an instance of ZipOutputStream class. The ZipOutputStream class represents a writable stream to a ZIP file. The ZipOutputStream is then closed causing a new empty ZIP file to create. An object of ZipFile class is then created. The ZipFile class represents a ZIP file in your code and is used to manipulate contents of the ZIP file. Finally, AddEntries() helper method is called by passing the ZipFile object and names of the files to be zipped.
Adding files to an existing ZIP file
There might be situations wherein you may wish to add files to an existing ZIP file. The AddToZipFile() method does exactly that. The AddToZipFile() method is shown below:
public static void AddToZipFile(string filename,
string[] items)
{
ZipFile file = new ZipFile(filename);
AddEntries(file, items);
}
The AddToZipFile() method accepts the path of the ZIP file and array of new files to be added. It then creates an instance of ZipFile class and calls AddEntries() method we created earlier.
Removing files from an existing ZIP file
The RemoveFromZipFile() method removes specified entries from a ZIP file. The method is shown below:
public static void RemoveFromZipFile(string filename,
string[] items)
{
ZipFile file = new ZipFile(filename);
RemoveEntries(file, items);
}
The RemoveFromZipFile() method accepts name of the ZIP file from which items are to be removed and an array of entry names that are to be removed. It then calls RemoveEntries() method by passing ZipFile and entries to be removed.
Extracting a ZIP file
Up till now you learnt to compress files into a ZIP file and modify existing ZIP files by adding or removing items from them. Now it's time to learn how to extract ZIP files. The ExtractZipFile() method is intended for doing this job and is shown below:
public static void ExtractZipFile(string zipfilename,
string destination)
{
ZipFile zipfile = new ZipFile(zipfilename);
List<ZipEntry> entries = GetZippedItems(zipfile);
foreach (ZipEntry entry in entries)
{
if (!entry.isDirectory())
{
InputStream s = zipfile.getInputStream(entry);
try
{
string fname = System.IO.Path.GetFileName(entry.getName());
string dir=System.IO.Path.GetDirectoryName(entry.getName());
string newpath = destination + @"\" + dir;
System.IO.Directory.CreateDirectory(newpath);
FileOutputStream dest = new FileOutputStream
(System.IO.Path.Combine(newpath, fname));
try
{
CopyStream(s, dest);
}
finally
{
dest.close();
}
}
finally
{
s.close();
}
}
}
}
The ExtractZipFile() method accepts path of a ZIP file to be extracted and destination folder where the files will be extracted. It then creates a ZipFile object and obtains entries within the ZIP file using GetZippedItems() helper method. The for loop iterates through all the entries. With each iteration the entry is extracted to the specified folder. The getInputStream() method of ZipFile class returns an InputStream for that entry. This stream acts as the source stream. The getName() method of ZipEntry class returns full name of the entry. Note that an entry name doesn't contain drive information for obvious reasons. Based on this entry name destination path and name of the file is calculated. Unzipping a file must create the same directory structure as present during zipping it. This is done by calling CreateDirectory() method Directory class. A FileOutputStream is then created to write the extracted file onto the disk. The CopyStream() method transfers data from source InputStream to destination FileOutputStream.
That's it! This completes our class library.
Using the class library
Using the class library is relatively easy. You simply need to call methods of ZipFileHelper class as per your requirement. For example, to create a new ZIP file you need to call CreateZipFile() method and to extract a ZIP file you need to call ExtractZipFile() method. The accompanying source code contains a Windows based client application that consumes the ZipFileHelper class we just created. Though we will not discuss the client code in any details here is how the client looks like:
You can simply run the client and test if your class library works as expected.
Summary
C# and VB.NET do not provide any ready made solution for compressing and decompressing files programmatically. However, using J# classes you can accomplish this task. The java.utils.zip namespaces from vjslib.dll provides classes such as ZipFile, ZipEntry and ZipOutputStream that allow you to work with ZIP files. Our C# class ZipFileHelper encapsulates J# classes so that your client application need not have any J# specific class references. This way other developers not knowing J# can also use our class library for compressing and decompressing files. Moreover, you can use the class library in Windows as well as web applications.