computer components & hardware - ILM and Full Text Search

Page 1 of 2

Show 40 post(s) from this thread on one page

HardwareBanter (http://www.hardwarebanter.com/index.php)

- Storage & Hardrives (http://www.hardwarebanter.com/forumdisplay.php?f=30)

- - ILM and Full Text Search (http://www.hardwarebanter.com/showthread.php?t=144853)

[email protected]

January 30th 07 10:01 PM

ILM and Full Text Search

Hello,

I'm looking into various ILM products such as those from Kazeon, EMC,
NeoPath, etc. One question that comes up is how these products behave
when a client does a full-text search against a volume that contains
data that's been migrated away.

From what I understand, a file access causes many of these products to
bring the file back from a secondary tier. I know that some ILM API's
allow for redirection, which would seemingly avoid this issue.
However, others do not have redirection. Wouldn't this mean that a
full-text search causes the entire set of data to be brought back onto
the primary tier? Doesn't this cause capacity issues?

What am I missing? Your help is greatly appreciated.

Thanks,
Ron

Nik Simpson

January 30th 07 10:47 PM

ILM and Full Text Search

wrote:
Hello,

I'm looking into various ILM products such as those from Kazeon, EMC,
NeoPath, etc. One question that comes up is how these products behave
when a client does a full-text search against a volume that contains
data that's been migrated away.

From what I understand, a file access causes many of these products to
bring the file back from a secondary tier. I know that some ILM API's
allow for redirection, which would seemingly avoid this issue.
However, others do not have redirection. Wouldn't this mean that a
full-text search causes the entire set of data to be brought back onto
the primary tier? Doesn't this cause capacity issues?

What am I missing? Your help is greatly appreciated.

Typically, a content search is performed against a content index, not
against the original file, so the search doesn't touch the file at all.
The file is read during the indexing process, if that occurs before
migration then the file will not be hit after migration.

PS. If you looking at this space you should also take a look at Scentric
(FTR I work for Scentric, well at least for another ten days :-)

--
Nik Simpson

[email protected]

February 1st 07 06:28 PM

ILM and Full Text Search

On Jan 30, 2:47 pm, Nik Simpson wrote:
wrote:
Hello,

I'm looking into various ILM products such as those from Kazeon, EMC,
NeoPath, etc. One question that comes up is how these products behave
when a client does a full-text search against a volume that contains
data that's been migrated away.

From what I understand, a file access causes many of these products to
bring the file back from a secondary tier. I know that some ILM API's
allow for redirection, which would seemingly avoid this issue.
However, others do not have redirection. Wouldn't this mean that a
full-text search causes the entire set of data to be brought back onto
the primary tier? Doesn't this cause capacity issues?

What am I missing? Your help is greatly appreciated.

Typically, a content search is performed against a content index, not
against the original file, so the search doesn't touch the file at all.
The file is read during the indexing process, if that occurs before
migration then the file will not be hit after migration.

PS. If you looking at this space you should also take a look at Scentric
(FTR I work for Scentric, well at least for another ten days :-)

--
Nik Simpson

What happens when someone opens Windows file explorer and performs a
search through it's search tool? Wont it try and read all the files
off of the NAS and to the OPs point, wont it cause all the files to be
moved from tier II to tier I again?

Dvy

Nik Simpson

February 1st 07 10:24 PM

ILM and Full Text Search

wrote:

What happens when someone opens Windows file explorer and performs a
search through it's search tool? Wont it try and read all the files
off of the NAS and to the OPs point, wont it cause all the files to be
moved from tier II to tier I again?

ON XP, Think the answer would be yes, unless the ILM solution is smart
and recognizes the type of access as being something it should not
migrate for. In Vista (or if you are using something like Google Desktop
search which maintains an index this should not be such a big problem.
--
Nik Simpson

bcwalrus

February 1st 07 10:37 PM

ILM and Full Text Search

On Jan 30, 2:01 pm, "
wrote:
Hello,

I'm looking into various ILM products such as those from Kazeon, EMC,
NeoPath, etc. One question that comes up is how these products behave
when a client does a full-text search against a volume that contains
data that's been migrated away.

From what I understand, a file access causes many of these products to

bring the file back from a secondary tier. I know that some ILM API's
allow for redirection, which would seemingly avoid this issue.
However, others do not have redirection. Wouldn't this mean that a
full-text search causes the entire set of data to be brought back onto
the primary tier? Doesn't this cause capacity issues?

What am I missing? Your help is greatly appreciated.

Thanks,
Ron

Not for the NeoPath FileDirector. They redirect access traffic to the
migration destination. If you access it frequent enough, then
depending on how you set up the placement policy, data may be migrated
back to the primary tier. Or you can set up your policy not to do
that. In other words, data access and data placement policy are
independent.

(I happen to be the NFS guy at NeoPath.)

Cheers,
bc

Faeandar

February 1st 07 11:01 PM

ILM and Full Text Search

On 30 Jan 2007 14:01:52 -0800, "
wrote:

Hello,

I'm looking into various ILM products such as those from Kazeon, EMC,
NeoPath, etc. One question that comes up is how these products behave
when a client does a full-text search against a volume that contains
data that's been migrated away.

From what I understand, a file access causes many of these products to
bring the file back from a secondary tier. I know that some ILM API's
allow for redirection, which would seemingly avoid this issue.
However, others do not have redirection. Wouldn't this mean that a
full-text search causes the entire set of data to be brought back onto
the primary tier? Doesn't this cause capacity issues?

What am I missing? Your help is greatly appreciated.

Thanks,
Ron

So, since we have two people from companies in this space I'd like to
pose the competitive question:

What are your thoughts on Index Engines?

Thanks.

~F

Nik Simpson

February 2nd 07 01:54 AM

ILM and Full Text Search

Faeandar wrote:

So, since we have two people from companies in this space I'd like to
pose the competitive question:

What are your thoughts on Index Engines?

First, right now I would not see Index Engines as a direct competitor,
they are purely a search application and don't offer much in the way of
classification or policy-based data management which is needed for ILM.

Second for enterprise wide search the problem is that when I'm looking
for document X, I'd rather find it on disk than buried on a backup tape.
If I can't find it online, then I'd go backup tape. So other than as an
application for helping me keep better track of what I've backed up I
don't see much of a future for it.

Interesting technology that I suspect will get embedded in things like
VTLs and D2D disk backup appliances. I don't see it as a standalone
technology. Good acquisition candidate for somebody in that space.

--
Nik Simpson

[email protected]

February 2nd 07 02:00 AM

ILM and Full Text Search

On Feb 1, 5:54 pm, Nik Simpson wrote:
Faeandar wrote:

So, since we have two people from companies in this space I'd like to
pose the competitive question:

What are your thoughts on Index Engines?

First, right now I would not see Index Engines as a direct competitor,
they are purely a search application and don't offer much in the way of
classification or policy-based data management which is needed for ILM.

Second for enterprise wide search the problem is that when I'm looking
for document X, I'd rather find it on disk than buried on a backup tape.
If I can't find it online, then I'd go backup tape. So other than as an
application for helping me keep better track of what I've backed up I
don't see much of a future for it.

Interesting technology that I suspect will get embedded in things like
VTLs and D2D disk backup appliances. I don't see it as a standalone
technology. Good acquisition candidate for somebody in that space.

--
Nik Simpson

Where does the google search appliance fit into this?

Dvy

Faeandar

February 2nd 07 02:32 AM

ILM and Full Text Search

On Thu, 01 Feb 2007 20:54:20 -0500, Nik Simpson
wrote:

Faeandar wrote:

So, since we have two people from companies in this space I'd like to
pose the competitive question:

What are your thoughts on Index Engines?

First, right now I would not see Index Engines as a direct competitor,
they are purely a search application and don't offer much in the way of
classification or policy-based data management which is needed for ILM.

Second for enterprise wide search the problem is that when I'm looking
for document X, I'd rather find it on disk than buried on a backup tape.
If I can't find it online, then I'd go backup tape. So other than as an
application for helping me keep better track of what I've backed up I
don't see much of a future for it.

Interesting technology that I suspect will get embedded in things like
VTLs and D2D disk backup appliances. I don't see it as a standalone
technology. Good acquisition candidate for somebody in that space.

They can get metadata directly from NDMP dumps. If someone figures
out how to flag the dump to only pass the metadata then they will be
able to get an entire storage array's metadata in a matter of hours
instead of days that file crawlers will take.
Even without the flag they still get data far faster than any file
crawler.

I may have been asking far too open ended a question. My needs are
fairly simple; tell me what, where, how big, how frequently accessed,
what type of file, etc. I've no need for a deep dive of content.

I'm looking for typical SRM stats, but on a fair scale.

Hopefully this provides more to go on.

Thanks.

~F

Nik Simpson

February 3rd 07 01:06 PM

ILM and Full Text Search

Faeandar wrote:

They can get metadata directly from NDMP dumps. If someone figures
out how to flag the dump to only pass the metadata then they will be
able to get an entire storage array's metadata in a matter of hours
instead of days that file crawlers will take.
Even without the flag they still get data far faster than any file
crawler.

Yes, they could do that, but then so could every other competitor, NDMP
is available to anybody, not just Index Engines. EMC does something
similar, though probably proprietary with it's classification product
which gets a "dump" of metadata from Celerra file servers rather walking
the file system over the network.

I may have been asking far too open ended a question. My needs are
fairly simple; tell me what, where, how big, how frequently accessed,
what type of file, etc. I've no need for a deep dive of content.

Index Engines wouldn't be a solution then, since to the best of my
knowledge it's all about content indexing & search. However, both
Scentric and Kazeon can do what you want without having to generate a
content index.

I'm looking for typical SRM stats, but on a fair scale.

So you don't actually want to take any actions like migrating little
used stuff to tier2? Anyway, both Scentric and Kazeon offer extensive
SRM reporting, though if reporting is all you want, you might want to
take a look at Monosphere which has a pure file SRM solution. How big is
a "fair scale" to you, 10s, 100s, 1000s of TB?

If you do want to take actions, then a policy engine is something you
want to look at. I can't speak for Kazeon's policy engine, but Scentric
lets you build policies with classification rules. For example "find all
OFFICE files larger than 50MB, & not accessed in 30days" which can be
combined with one or more actions that work on the results of the
filter. Actions include move, copy, delete, script, archive with
retention, etc.

You can schedule these policies on a calendar or event trigger (i.e.
once a week, or when file system has less than 20% free), you can also
trigger them from external scripts.

--
Nik Simpson

All times are GMT +1. The time now is 11:02 AM.

Page 1 of 2

Show 40 post(s) from this thread on one page