Bulk vs Individual Compression

I'd like to share something very brief and very obvious - that compression works better with large amounts of data. That is, if you have to compress 100 sentences you'd better compress them in bulk rather than once sentence at a time. Let me illustrate that:

Java
 




x
13


 
1
public static void main(String[] args) throws Exception {
2
    List<String> sentences = new ArrayList<>();
3
    for (int i = 0; i < 100; i ++) {
4
        StringBuilder sentence = new StringBuilder();
5
        for (int j = 0; j < 100; j ++) { 
6
          sentence.append(RandomStringUtils.randomAlphabetic(10)).append(" "); 
7
        } 
8
        sentences.add(sentence.toString()); 
9
    } 
10
    byte[] compressed = compress(StringUtils.join(sentences, ". ")); 
11
    System.out.println(compressed.length); 
12
    System.out.println(sentences.stream().collect(Collectors.summingInt(sentence -> compress(sentence).length)));
13
}


The compress method is using commons-compress to easily generate results for multiple compression algorithms: