Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix for storm issuse #245 hashCode does not work for byte[] #429

Closed
wants to merge 1 commit into from

Conversation

xiaokang
Copy link
Contributor

@xiaokang xiaokang commented Jan 7, 2013

use java.util.Arrays.deepHashCode() instead of List.hashCode() for tuple

@xiaokang
Copy link
Contributor Author

xiaokang commented Jan 7, 2013

for #245.

@nathanmarz
Copy link
Owner

I don't think this is a good idea as toArray will cause the list to be copied.

@nathanmarz nathanmarz closed this Mar 26, 2013
@xiaokang
Copy link
Contributor Author

hi nathan. is it ok to implement java.util.Arrays.deepHashCode() in storm code to avoid toArray() copying list.

@nathanmarz
Copy link
Owner

Maybe... you'd have to show benchmarks that demonstrate that there's no performance degradation.

@xiaokang
Copy link
Contributor Author

xiaokang commented Nov 3, 2013

@nathanmarz , I compared hashCode performance of three cases : list.hashCode(), java.util.Arrays.deepHashCode() and deepHashCode() without toArray() copying. The result shows that for array length less than 1M there is little diffenence at ms time level. But deepHashCode's time grows linearly with array length.

Since performance degradation for array less than 1M is almost negligible, it's worth to fix the byte[] hashCode problem. And for tuple bigger than 1M performance bottleneck should be network not cpu.

CODE:
  public static void main(String[] args) {

    for (int n : new int[] {1, 100, 1000, 10000, 100000, 1000000, 10000000}) {
      byte[] ba = new byte[n];
      ArrayList list = new ArrayList();
      list.add(ba);
      long t1 = System.currentTimeMillis();

      list.hashCode();
      long t2 = System.currentTimeMillis();

      deepHashCode(list);
      long t3 = System.currentTimeMillis();

      Arrays.deepHashCode(list.toArray());
      long t4 = System.currentTimeMillis();

      System.out.println(n + " times " + (t2 - t1) + " " + (t3 - t2) + " " + (t4 - t3));
    }
  }

OUTPUT:
1 times 0 0 0
100 times 0 0 0
1000 times 0 0 0
10000 times 0 0 0
100000 times 0 1 0
1000000 times 0 1 2
10000000 times 0 20 19

@xiaokang
Copy link
Contributor Author

xiaokang commented Nov 3, 2013

The following is the deepHashCode(List a) without toArray() copying

  public static int deepHashCode(List a) {
    if (a == null)
      return 0;
    int result = 1;

    for (Object element : a) {
      int elementHash = 0;
      if (element instanceof byte[])
        elementHash = Arrays.hashCode((byte[]) element);
      else if (element instanceof short[])
        elementHash = Arrays.hashCode((short[]) element);
      else if (element instanceof int[])
        elementHash = Arrays.hashCode((int[]) element);
      else if (element instanceof long[])
        elementHash = Arrays.hashCode((long[]) element);
      else if (element instanceof char[])
        elementHash = Arrays.hashCode((char[]) element);
      else if (element instanceof float[])
        elementHash = Arrays.hashCode((float[]) element);
      else if (element instanceof double[])
        elementHash = Arrays.hashCode((double[]) element);
      else if (element instanceof boolean[])
        elementHash = Arrays.hashCode((boolean[]) element);
      else if (element instanceof List)
        elementHash = deepHashCode((List) element);
      else if (element != null)
        elementHash = element.hashCode();

      result = 31 * result + elementHash;
    }

    return result;
  }

@nathanmarz
Copy link
Owner

I think this would be better implemented using Clojure protocols to avoid the if else else else else... conditionals. It would also be more extensible and more performant.

@xiaokang
Copy link
Contributor Author

xiaokang commented Nov 5, 2013

OK, I will try to implement it using Clojure.

@xiaokang
Copy link
Contributor Author

@nathanmarz I have implemented it using Clojure. I try to use Clojure protocols as you suggested. I am not so skilled at it and find that protocols is used to define interface with multi different method. So I have not find a way to implement this using protocols and implemented it in another way. Any more suggestion?

The new code does not show up since this PR is closed so I made another PR #746.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants