Quantcast
Channel: Hortonworks » All Replies
Viewing all articles
Browse latest Browse all 3435

Sort cuts Rowcounts

$
0
0

I uploaded a large dataset (the CMS provider data) and ran a simple query which returned 500K rows. I then added a simple sort by clause for one of the columns and the new resultset was 22K rows. I don’t understand how sorting the data would filter out a substantial number of records; but all I want to do is sort the 500K rows and not filter them.

I also tried another experiment where I copied a subset of data into a new table and then tried to join that table on the original table to speed up the query. Surprisingly, the query went from about 30 seconds in the original large table to about twenty minutes on the joined tables.

Clearly, Hive and Hadoop are a much different beast from the standard Oracle, SQL, and other relational systems; but, both of these seem counter intuitive.


Viewing all articles
Browse latest Browse all 3435

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>